I imagine it as if one is going up a physical hill. It doesn't seem like there's a guarantee that going in the opposite direction of greatest increase in height will necessarily be the direction of greatest decrease in height.
-
4The crucial point here is that a physical hill doesn't have to be differentiable. Note that a differentiable function can be approximated by a hyperplane at the point which you are looking at. That the statement is true for hyperplanes in at least $\mathbb R^3$ should be intuitive again. – Lukas Betz May 20 '15 at 18:37
-
There are many similar questions already on stackexchange. Compare Why gradient descent works?. – David K May 20 '15 at 18:38
-
1Lebtz is right: the key is to think about what "greatest ascent/descent" actually means. How is it measured? In this case, to measure the rate of ascent up a hill in a direction $v$ means to first draw the tangent plane to the hill at your current location, then measure the slope of the plane in the $v$ direction. And now it should be intuitive that the slope in the $-v$ direction is opposite that in the $v$ direction. – user7530 May 20 '15 at 18:45
-
This comes down to smoothness. If a function is differentiable at a point, then it is continuous there, but more importantly, it is smooth there. So we can draw a tangent plane to the surface of z = f(x, y), for example, at that point. Then imagine drawing a tangent line in the tangent plane, at the point of tangency. If somehow the slope of the line were greater going down in a different direction, then it would be greater going up in the opposite direction, contradicting the direction of maximal increase. – KenM Feb 01 '21 at 21:25
1 Answers
The question is a tad old but wanted to leave some info in case others run into it. There is a wonderful explanation on the why here in this video from khan academy "Why the gradient is the direction of steepest ascent" (or descent if you throw a minus sign) https://www.youtube.com/watch?v=TEB2z7ZlRAw
The way to see why the gradient is the direction of greatest ascent (descent if we have a minus sign) is because it is the "thing" that can be used to maximized the value of the directional derivative: the maximum value of the directional derivate at some point in space occurs when we look at a unit vector that has the same direction as the gradient vector.
Addendum: it seems like circular logic. But the way I think about is that a gradient by itself is a vector field. So in order to find a "direction" we ought to use a dot product somewhere. And what easier and more reliable way to get the optimal direction of change than to do a dot product of the gradient with the unit vector pointing in the same direction as the gradient?
- 116