13

Background: While fitting neural networks with relu activation, I found that sometimes the prediction becomes near constant. I believe that this is due to the relu neurons dieing during training as stated here. (What is the "dying ReLU" problem in neural networks?)

Question: What I'm hoping to do is to implement a check in the code itself to check if the neurons are dead. After that, the code could refit the network if needed.

As such, what is a good citeria to check for dead neurons? Currently, I'm thinking of checking for low variance in the prediction as a citeria.

If it helps, I'm using keras.

Aveiur
  • 141
  • 1
  • 3

3 Answers3

7

A dead ReLU pretty much just means that its argument value is negative such that the gradient stays at 0; no matter how you train it from that point on. You can simply have a look at the gradient during training to see whether a ReLU is dead or not.

In practice you may simply want to use leaky ReLUs, i.e. instead of f(x) = max(0,x) you set f(x) = x if x > 0 and f(x) = 0.01x if x <= 0. This way you always allow a small non-zero gradient and the unit should not get fully stuck in training anymore.

3

A dead neuron is a neuron that does not update during training, ie. 0 gradient.

Keras allows gradient extraction directly for a given row of data. (Another nice example)

Or you can extract the neuron weights and calculate the gradient yourself
(eg. for relu, negative argument to relu -> 0 gradient.)

Unfortunately, gradient is data point specific. Only if the gradient is 0 for every row of training data can you be sure that the neuron will not update for all minibatches during a training epoch.

Leaky relu can be a helpful strategy since there's no value for leaky relu where the gradient equals 0.

D Bolta
  • 131
  • 1
0

Dying(Dead) ReLU Problem can be detected:

  • if convergence is slow or stopped.
  • a loss function returns nan.