Dropout on which layers of LSTM?

Question

Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner layers were convolutional.

Obviously, I can test for my specific model, but I wondered if there was a consensus on this?

score 18 · Answer 1 · answered Sep 13 '18 at 13:35

I prefer not to add drop out in LSTM cells for one specific and clear reason. LSTMs are good for long terms but an important thing about them is that they are not very well at memorising multiple things simultaneously. The logic of drop out is for adding noise to the neurons in order not to be dependent on any specific neuron. By adding drop out for LSTM cells, there is a chance for forgetting something that should not be forgotten. Consequently, like CNNs I always prefer to use drop out in dense layers after the LSTM layers.

n1k31t4 · Answer 2 · 2018-09-13T13:30:55.080

7

There is not a consensus that can be proved across all model types.

Thinking of dropout as a form of regularisation, how much of it to apply (and where), will inherently depend on the type and size of the dataset, as well as on the complexity of your built model (how big it is).

edited Sep 13 '18 at 13:30

answered Sep 13 '18 at 13:20

n1k31t4

15,468
2
33
52

Dropout on which layers of LSTM?

2 Answers2

Linked