How far or close would feature importance information from an ML model is from causal diagrams?

Question

The title pretty much covers my question, but to elaborate it: given data (let's assume, for simplicity, it is good enough representation of the underlying distribution) for a binary classification problem (again, for simplicity, and to give a 'feel' of treatment and control groups), when we employ a machine learning model such as random forest, we eventually obtain feature importance from the trained model. The training has taken care of data imbalance using up or down sampling or some other method, as well as used proper samplings such as stratified during training and validations, to mimic randomized control trials. Let's also assume that we have all the confounds in the feature list, i.e., no other confounds left. I know that an ML model would only hope to learn correlations but certainly not causality among features. How far or close would the feature importance plot would be from the actual causal structure? Sure, there won't be any causal arrows in the feature importance plot. Would a first guess on the causal arrows going from the most important feature to the least important feature be too far from reality? Genuinely trying to understand this issue rather than giving opinion here. If there is also some reference that discusses this, that would be helpful too.

Scriddie · Answer 1 · 2023-03-14T16:28:57.983

Very var in general - they capture different things

The crucial part of causal diagrams is identifying a graph encoding predictive relationships between variables that agree with their conditional independence statements. See here for an introduction to causal graphs. By contrast, a standard machine learning algorithm will simply use all other variables to best predict a target variable with no regard for conditional independencies.

It may be the case that the feature importance would carry some information about causal dependencies between features and target (for example if variables that are not direct causes are not helpful for predicting the target). However, this need not be the case and there is no way of knowing in general. As for any relationships among features, the feature importance map is not a suitable indicator for causality or anything else really - it only measures predictive importance of the features with respect to the target.

Illustrative examples

1. Assume that all features are direct or indirect causes of the target (e.g. genetic predispositions for a disease)

The machine learning model may correctly identify that indirect causes are not needed for prediction, discard them, and assign importance to direct causes in proportion to their causal influence. In this case, the feature importance could be a good proxy for the causal influence of the features on the target.

2. Assume that all features are effects of the target (e.g. symptoms of a disease)

The machine learning model may predict well, but any relationship it would find would be in the opposite direction from the actual causal direction. In this case, the feature importance would give a very wrong idea about the causal relationships between features and target.

How far or close would feature importance information from an ML model is from causal diagrams?

1 Answers1

Very var in general - they capture different things

Illustrative examples