Feature selection in binary classification

Question

I have a dataset with two classes and am interested in learning which features are 'important' for predicting the class. There are a lot of features available and I want to find subset(s) that lead to a good classifier.

Suppose I have found two classifiers clf1 and clf2 that are close in performance (but clf1 > clf2). Now suppose that the pipeline used for clf1 involved feature selection, and call the subset of features found feat_sub_1 and similar for classifier 2. Presumably these feature subsets will be different -- is it correct to use those from clf1 as it performed better?

score 1 · Accepted Answer · answered Apr 14 '24 at 07:39

There are many different approaches to that problem. Some are specific for the model and I don't know if you can use them, but others don't.

Some model-agnostic methods:

a similar one to what you mentioned, you can train models with the same architecture for all combinations of input parameters you are interested in and pick the one with the lowest error. Sometimes it's even used as a tool to interpret that some variables influence the explained one better than others, but I wouldn't use that with a high trust. An example I remember.
SHAP - it's more about the interpretation of prediction, but you also use it as a feature selection method, like here.
others - I found a discussion on this with some helpful findings here

However, many tools have some built-in tools or procedures to select features like XGBoost.

Feature selection in binary classification

1 Answers1