1

Assume a function similar to a saddle, like this. enter image description here

At the origin, the gradient will be 0. However, assume you have a function similar to this one where the gradient of the saddle at the origin is not 0.

Then obviously, the magnitude of the direction of maximum increase is either in the x direction or in the y direction. However, if you add up those two vectors, which is taking the gradient of this function, you get a vector going sideways, which is not in the direction of maximum increase.

So, this is confusing me. The vector addition of the rate of increase the x direction and the rate of increase in the y direction does not equal the direction of highest increase.

JobHunter69
  • 3,613

1 Answers1

2

The gradient is not the direction of maximum rate of increase. It is the direction of maximum infinitesimal rate of increase, to first order. Said another way, it is the direction of maximum rate of increase of the first-order approximation.

At a point where the gradient is zero or close to zero, the first-order approximation may stop being a good approximation, and so the gradient may stop having anything to do with the actual direction of maximum rate of increase, which may now be mostly controlled e.g. by the second-order approximation (so by the Hessian). This is a known practical issue when applying gradient descent to optimize a function; sometimes you can get stuck in "plateaus."

Qiaochu Yuan
  • 468,795
  • I was assuming that the function was a variation of the saddle point which had the saddle point's general shape and increased the most infinitesimally at the origin in the y direction. – JobHunter69 Mar 17 '17 at 18:37
  • The gradient in this case isn't going to point in the maximum infinitesimal rate of increase. – JobHunter69 Mar 17 '17 at 18:37
  • @Goldname: at the saddle point itself, the actual best direction to move in is either in the x or the y direction, as you say. This is because at the saddle point the gradient is zero and so one must work with the Hessian, and then everything is controlled by the eigenvectors and eigenvalues of the Hessian. At a slight tilt of the saddle point so that the gradient is not zero, I claim that if you zoom in far enough it stops looking like a saddle point, and the gradient dominates the shape of the saddle point. If you don't zoom in far enough then the Hessian is still important. – Qiaochu Yuan Mar 17 '17 at 18:40
  • I know that it is 0 at the origin. Take a point slightly to the right of the origin. Are you saying that the highest rate of change when infinitesimally close to the origin is no longer the x and y direction? – JobHunter69 Mar 18 '17 at 00:06
  • @Goldname: I'm not sure what direction "right" is, but if you mean right on your image, the answer is yes, I am saying that. Here is an explicit calculation: take $f(x, y) = x^2 - y^2$ and suppose we are considering the point $(\epsilon, \epsilon)$. At this point the gradients are $\frac{\partial f}{\partial x} = 2 \epsilon, \frac{\partial g}{\partial y} = - 2 \epsilon$, which reveals that the direction of maximum infinitesimal rate of increase is in the direction of the vector $(1, -1)$. You can verify this for yourself experimentally by taking $\epsilon$ to be small and then taking the... – Qiaochu Yuan Mar 18 '17 at 01:35
  • ...step size to be even smaller. – Qiaochu Yuan Mar 18 '17 at 01:35
  • Ok I see what you mean when it is infinitesimally close, but instead take a point instead that is (on the image) 1/2 to the right of the origin (0.5, 0). It is on the middle of the curve sloping upwards. In this case, how is it not that the direction of maximum rate of increase is directly upwards, to the right? – JobHunter69 Mar 18 '17 at 02:07
  • @Godname: That is the direction of maximum rate of increase (for sufficiently small step sizes), and it's also where the gradient is pointing. You can now consider the point $(\epsilon, 0)$ instead of $(\epsilon, \epsilon)$ in the computation above. You get that the gradients are $2 \epsilon$ and $0$, so the gradient points in the $(1, 0)$ direction. – Qiaochu Yuan Mar 18 '17 at 03:06
  • Yes, I see. I was visualizing the slope incorrectly, thanks very much – JobHunter69 Mar 18 '17 at 05:01
  • So this https://math.stackexchange.com/questions/223252/why-is-gradient-the-direction-of-steepest-ascent is complete nonsense? – user123124 Jul 11 '18 at 15:33
  • 1
    @user1: "complete nonsense" is too strong, but I would say the top answer is being imprecise. Halfway through it replaces the function with its first-order approximation without saying so. – Qiaochu Yuan Jul 11 '18 at 18:35
  • @Qiaochu Yuan I got confused by the comments, it looks like you are confirming that the greatest rate of increase of the function is indeed in the direction of the gradient. Or did I missinterpret your answer? It looks line we are cooping with the case with almost zero gradient aswell. –  Aug 01 '18 at 13:30