16

Using a multi-layer LSTM with dropout, is it advisable to put dropout on all hidden layers as well as the output Dense layers? In Hinton's paper (which proposed Dropout) he only put Dropout on the Dense layers, but that was because the hidden inner layers were convolutional.

Obviously, I can test for my specific model, but I wondered if there was a consensus on this?

BigBadMe
  • 760
  • 1
  • 7
  • 19

2 Answers2

18

I prefer not to add drop out in LSTM cells for one specific and clear reason. LSTMs are good for long terms but an important thing about them is that they are not very well at memorising multiple things simultaneously. The logic of drop out is for adding noise to the neurons in order not to be dependent on any specific neuron. By adding drop out for LSTM cells, there is a chance for forgetting something that should not be forgotten. Consequently, like CNNs I always prefer to use drop out in dense layers after the LSTM layers.

Green Falcon
  • 14,308
  • 10
  • 59
  • 98
7

There is not a consensus that can be proved across all model types.

Thinking of dropout as a form of regularisation, how much of it to apply (and where), will inherently depend on the type and size of the dataset, as well as on the complexity of your built model (how big it is).

n1k31t4
  • 15,468
  • 2
  • 33
  • 52