I'm a machine learning / python novice, so apologies if my question is simple, but I haven't been able to find it addressed.
I'm very interested in using ML to determine the most important features predicting a binary target from a dataset. However, in addition to the feature importance 'magnitude', i'm also interested in the feature importance 'direction' (i.e. are they net 'positive' or 'negative' features, based on their association with the target).
I've found TreeSHAP fantastic for capturing the 'magnitude' of a feature's importance:
np.abs(shap_values[1]).mean(0)
Intuitively, looking at a beeswarm plot, you can glean that certain features are 'positive' (high feature values clustering to right of the plot - positive SHAP values). Is there a way to capture which features are 'positive' and 'negative', to then convert the magnitudes into positives or negatives?
Thus far, i have been using the direction of the feature's model coefficient from a separate fitting of a linear model (regularized logistic regression), though I figure i must be overthinking this and there must be an easier way!
Once again, apologies if this is a trivial question.
Reproducible example using the Titanic dataset:
from seaborn import load_dataset
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import shap
import numpy as np
import pandas as pd
#Load and curate data
titanic = load_dataset("titanic")
drop na
titanic_complete = titanic.dropna()
X and y
X = titanic_complete.drop(["survived","alive","adult_male","who",'deck','alone','class','sex','embarked','embark_town'], 1)
y = titanic_complete["survived"]
features = X.columns
cat_features = []
for cat in X.select_dtypes(exclude="number"):
cat_features.append(cat)
X[cat] = X[cat].astype("category").cat.codes.astype("category")
clf = RandomForestClassifier()
clf.fit(X,y)
Get SHAP feature importances
explainer = shap.TreeExplainer(clf)
shap_values = explainer.shap_values(X)
rf_resultX = pd.DataFrame(shap_values[1], columns = features)
vals = np.abs(rf_resultX.values).mean(0)
shap_importance = pd.DataFrame(list(zip(features, vals)), columns=['col_name', 'feature_importance_vals'])
shap_importance.sort_values(by=['feature_importance_vals'], ascending=False, inplace=True)
print(shap_importance)
col_name feature_importance_vals
1 age 0.141620
4 fare 0.112346
2 sibsp 0.030421
3 parch 0.029923
0 pclass 0.011663
So 'age' and 'fare' are the most important of these features (makes sense), and based on the beeswarm plot shap.summary_plot(shap_values[1], X) we see age is 'negative' (high age --> lower SHAP values) while fare is 'positive' (high fare --> higher SHAP values), but how can i calculate this to convert the SHAP score into a positive or negative result?
Many thanks again!