Why not always use the ADAM optimization technique?

Question

It seems the Adaptive Moment Estimation (Adam) optimizer nearly always works better (faster and more reliably reaching a global minimum) when minimising the cost function in training neural nets.

Why not always use Adam? Why even bother using RMSProp or momentum optimizers?

score 41 · Accepted Answer · edited Aug 05 '20 at 08:48

41

Here’s a blog post reviewing an article claiming SGD is a better generalized adapter than ADAM.

There is often a value to using more than one method (an ensemble), because every method has a weakness.

edited Aug 05 '20 at 08:48

Zephyr

997
4
11
20

answered Apr 15 '18 at 17:36

Christopher Klaus

552
5
6

score 18 · Answer 2 · edited Oct 07 '24 at 02:09

18

You should also take a look at this post comparing different gradient descent optimizers. As you can see below Adam is clearly not the best optimizer for some tasks as many converge better.

edited Oct 07 '24 at 02:09

Community

1

answered Apr 15 '18 at 22:25

Why not always use the ADAM optimization technique?

2 Answers2

Linked