How to determine feature importance in a neural network?

Question

I have a neural network to solve a time series forecasting problem. It is a sequence-to-sequence neural network and currently it is trained on samples each with ten features. The performance of the model is average and I would like to investigate whether adding or removing features will improve the performance. I have constructed the neural network using keras.

The features I have included are:

The historical data
quarterly lagged series of the historical data (4 series)
A series of the change in value each week
Four time invariant features tiled to extend the length of the series. (another 4 series)

I am aware I could run the model many times changing the combination of features included each time. However, along with tuning the hyperparameters (for it might be that 8 features works really well with one set of hyperparameters but not with another set) this is really a lot of possible combinations.

Is there any separate way that I can use to guage if a feature is likely to add value to the model or not?

I am particuarly concerned that I have four time-invariant features being fed into the model which is designed to work with time varying data and I would like a way to measure their impact and if they add anything or not?

score 15 · Accepted Answer · edited May 05 '19 at 18:05

15

Don't remove a feature to find out its importance, but instead randomize or shuffle it.

Run the training 10 times, randomize a different feature column each time and then compare the performance. There is no need to tune hyper-parameters when done this way.

Here's the theory behind my suggestion: feature importance

edited May 05 '19 at 18:05

Stephen Rauch

1,831
11
23
34

answered May 05 '19 at 17:15

scholle

174
2
3

rodrigo-silveira · Answer 2 · 2022-01-13T01:20:30.007

Linking to the same paper as @scholle but explaining the process differently (book and paper).

You do not need to train the model multiple times. The algorithm described in the links above require a trained model to begin with.
Given a trained model, compute the metric of interest on some dataset (the book discusses pros/cons of using training set vs test set).
For each feature in your same dataset, shuffle the values of the feature in question. All other features and labels should remain unchanged for each observation.
Perform inference on the model with this shuffled dataset (one shuffled feature at a time), and compute the desired metric for each pass.
Now compute the difference between the original metric (unchanged dataset) and the metric obtained for each feature pass (the book also mentions dividing the permuted score / original score).

Voila! The list of feature importance is the sorted output of step 5 (in descending order - higher value means the feature is more important to the model in question).

Edit - should I use training set or test/dev set to do permutation feature importance?

The book linked above addresses this question. A more concise answer can be found on SKLearn's docs:

Permutation importances can be computed either on the training set or on a held-out testing or validation set. Using a held-out set makes it possible to highlight which features contribute the most to the generalization power of the inspected model. Features that are important on the training set but not on the held-out set might cause the model to overfit.

score 3 · Answer 3 · answered Jan 11 '21 at 09:19

3

You can do this sort of thing using SHAP, it looks at permutation importance as well.

answered Jan 11 '21 at 09:19

Brian Droncheff

31
2

How to determine feature importance in a neural network?

3 Answers3

Linked