2

Is there a way to reduce the noisiness and stochasticity of Rprop (and for that matter the iRprop+)?

Specifically, in deep networks (with 8+ layers) this effect starts to become apparent, as the earliest layers are adjusted. This has a massive effect on the outcome, and the error jumps around.

The noise also happens in a network with just 2-3 layers. However, it is only apparent when the error reached 0.00, and the iRprop+ keeps running. In some cases it will suddenly cause a very abrupt change, and the cross-entropy cost function will produce error larger than 1 000 000

I've built a custom LSTM in c++ and experiencing this noise during debugging of overfitting

Perhaps the learning rates from each layer should be initialized with a different magnitude, depending on the layer's depth? [link]

Kari
  • 2,756
  • 2
  • 21
  • 51

0 Answers0