0

I understand how to make predictions with a trained neural network model that uses loss=binary_crossentropy and a 1-node activation=sigmoid output layer to make binary classifications.

But how can I determine the strength of association between a feature and a label? I'm trying to make a neural network that competes with generalized linear models that show p values for each feature.

The predictions are not the important part. I need to provide insight about the features so that we can learn about the biology of a disease (DNA variant is feature and disease is label). I know that there is such a thing as feature importance, but isn't that just a rank ordered list of features?

def create_baseline():

    model = Sequential()
    model.add(Dense(60, input_dim=60, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='sigmoid'))

    model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model
LayneSadler
  • 549
  • 6
  • 17

1 Answers1

1

I think NN is a little problematic here, because you run into the bias-variance-tradeoff. NN are really good in producing estimates with low variance. However, if you look for a causal influence of $x$ on some $y$, you would choose an unbiased estimator (with higher variance). Options are OLS or Lasso/Ridge estimators. Especially with DNA information, Lasso is a good choice since you may face a problem with (relatively) high dimensionality.

If you really want to explore feature importance in a NN setting, I guess you need to resort to stepwise forward/backward techniques similar to the method as used for feature selection. See here and here for some background. To my best knowledge Keras does not come with a predefined function for the problem.

Peter
  • 7,896
  • 5
  • 23
  • 50