I would have a question on heat map and correlation among variables. I created this heat map, looking at possible correlation among variables and target. I got very small values. I wanted to set a small threshold, e.g., 0.05, for selecting features. Do you think it makes sense, or should I exclude all of them?
Asked
Active
Viewed 364 times
1 Answers
1
From the info you provide, it seems you are carrying feature selection based on the correlation between your predictor variables and the target.
This is correct as a type of feature selection (see here) in the family of univariate filter selection, although not the only one. It is fast and intuitive, although you can have a look at other methods. You might also be interested in:
- variance threshold selection (also per input feature, univariate filter method): it assumes that higher variance in a feature values could mean more prediction power
- sequential backward selection (look here): it means more performance cost, but features are judged in subsets (not independently as above) and is ok if you don't have many features (as it seems to be)
There are many other strategies for feature selection (you might want to check for this source)
German C M
- 2,744
- 7
- 18
