Questions tagged [shap]

73 questions
14
votes
2 answers

SHAP value analysis gives different feature importance on train and test set

Should SHAP value analysis be done on the train or test set? What does it mean if the feature importance based on mean |SHAP value| is different between the train and test set of my lightgbm model? I intend to use SHAP analysis to identify how each…
pbk
  • 143
  • 1
  • 6
7
votes
1 answer

How is the "base value" of SHAP values calculated?

I'm trying to understand how the base value is calculated. So I used an example from SHAP's github notebook, Census income classification with LightGBM. Right after I trained the lightgbm model, I applied explainer.shap_values() on each row of the…
David293836
  • 217
  • 1
  • 2
  • 6
6
votes
1 answer

Shapley values without intercept (or without `expected_value`)

I have a model and I want to derive its interpretability by using feature contributions. In the end, I want to have some contribution per feature such that the sum of contributions equals the prediction of the model. One approach may be to use…
David Masip
  • 6,136
  • 2
  • 28
  • 62
6
votes
1 answer

How to achieve SHAP values for a CatBoost model in R?

I'm asked to create a SHAP analysis in R but I cannot find it how to obtain it for a CatBoost model. I can get the SHAP values of an XGBoost model with shap_values <- shap.values(xgb_model = model, X_train = train_X) but not for CatBoost. Here is…
user100740
  • 91
  • 2
6
votes
1 answer

Is it valid to compare SHAP values across models?

Let's say I have three models: a random forest with 100 trees a random forest with 1000 trees an xgboost model. I can rank the importance of my features on my dataset for each model using SHAP, and compare relative importance across models. What…
DKL
  • 88
  • 1
  • 5
5
votes
1 answer

Explanation of how DeepExplainer works to obtain SHAP values in simple terms

I have been using DeepExplainer (DE) to obtain the approximate SHAP values for my MLP model. I am following the SHAP Python library. Now I'd like learn the logic behind DE more. From the relevant paper it is not clear to me how SHAP values are…
mlee_jordan
  • 153
  • 1
  • 8
5
votes
1 answer

Why the mean shap values of 1 class are X2 than the other class?

I'm looking on the explanation of shap from: datacamp I looked on the summary plot: do when looking on: We have 2 classes, then I assume that the effect of a feature of one class is the opposite effect for the other class. But according to the…
user3668129
  • 769
  • 4
  • 15
4
votes
1 answer

Why do Shapley value solutions remain consistent when the value function of the empty set changes in the ML context?

Hey there data science stack exchange - question about SHAP. In the original Shapley value formulation from Lloyd, one assumption is that the value function of the empty set equals zero, $v(\emptyset) = 0$. In other words, if no players are playing…
shay
  • 143
  • 3
4
votes
1 answer

Is multicollinarity a problem when interpreting SHAP values from an XGBoost model?

I'm using an XGBoost model for multi-class classification and is looking at feature importance by using SHAP values. I'm curious if multicollinarity is a problem for the interpretation of the SHAP values? As far as I know, XGB is not affected by…
hideonbush
  • 41
  • 1
3
votes
2 answers

difference between feature effect and feature importance

Is there a difference between feature effect (eg SHAP effect) and feature importance in machine learning terminologies?
3
votes
1 answer

How can interparet shap.summary_plot and its gray color concerning outliers/anomaly?

I inspired by this notebook, and I'm experimenting IsolationForest algorithm using scikit-learn==0.22.2.post1 for anomaly detection context on the SF version of KDDCUP99 dataset, including 4 attributes. The data is directly fetched from sklearn and…
Mario
  • 571
  • 1
  • 6
  • 24
3
votes
0 answers

Explain FastText model using SHAP values

I have trained fastText model and some fully connected network build on its embeddings. I figured out how to use Lime on it: complete example can be found in Natural Language Processing Is Fun Part 3: Explaining Model Predictions The idea is clear -…
Mikhail_Sam
  • 131
  • 4
3
votes
0 answers

Is there an alternative to Shapley values for tree-based models where the sum of feature contributions is equal to the prediction?

I'm currently working on a project where the ultimate goal is to reduce the quantity of a bad thing, b. I've been tasked with assigning the blame between several different features that are believed to increase b. Then, we can appropriately…
Tom Adams
  • 31
  • 2
3
votes
0 answers

Capturing the 'direction' of feature importances using TreeSHAP?

I'm a machine learning / python novice, so apologies if my question is simple, but I haven't been able to find it addressed. I'm very interested in using ML to determine the most important features predicting a binary target from a dataset. However,…
agun7
  • 31
  • 1
3
votes
1 answer

Can you do the math for this simple treeSHAP example (decisionTree)?

[EDIT] The question now has been solved, I updated the calculations bellow. I've been trying to understand the math behind shap values. So far I understand all the concepts in SHAP but not to get to the shap values that are in this example (coming…
Tom
  • 85
  • 8
1
2 3 4 5