7

Sklearn has a feature_importances_ attribute, but this is highly model-specific and I'm not sure how to interpret it as removing the most important feature does not necessarily decrease the models quality most.

Is there a model-agnostic way to tell which features are important for a prediction problem?

The only way I could see is the following:

  • Use an ensemble of different models
  • Either start with a big set of features and remove one at a time. To find the features "uplift", compare the ensembles quality with the full feature set against the ensembles quality against the removed feature set.

(What this can't do is to find connected features: Some features might be not exactly the same, but have a common underlying cause which is important for the prediction. Hence removing either of them doesn't change much, but removing both might change a lot. I ask another question for that.)

Martin Thoma
  • 19,540
  • 36
  • 98
  • 170

2 Answers2

8

Ways to "determine feature importance" are normally called feature selection algorithms.

There are 3 types of feature selection algorithms:

  • Filter approaches: they choose variables without using a model at all, just looking at the feature values. One example is scikit-learn's variance threshold selector.
  • Wrapper approaches: they use whatever prediction algorithm to score different subsets of features and choose the best subset based on that. These use a model but are model agnostic, as they don't care about which model you use. One example is recursive feature elimination.
  • Embedded approaches: in these approaches, the variable selection is part of a model, hence the feature selection and the model are coupled together. This is the case of the feature_importances_ in random forest algorithm.

From the question, I understand that both filter and wrapper approaches are suitable for the OP needs. A classic article that covers both very well is this one by Kovavi and John.

Here you can see an overview of scikit-learn feature selection capabilities, which includes examples of the three aforementioned types of variable selection algorithms.

noe
  • 28,203
  • 1
  • 49
  • 83
0

Yes, there are feature importance approaches which are model agnostic, i.e. can be applied to any predictive model.

For example, check the permutation importance approach, described in chapter "15.3.2 Variable Importance" in The Elements of Statistical Learning

Disclosure This part is copied from my another answer

aivanov
  • 1,520
  • 10
  • 14