2

I want to understand intuitively why it is that the gradient gives the direction of steepest ascent. (I will consider the case of $f:\mathbb{R}^2\to\mathbb{R}$)

The standard proof is to note that the directional derivative is $$D_vf=v\cdot \nabla f=|\nabla f|\,\cos\theta$$ which is maximized at $\theta=0$. This is a good verification, but it doesn't really help me understand the result.

  • http://math.stackexchange.com/a/176782/2900 – M.B. May 03 '14 at 19:30
  • @M.B. That's basically the proof of the directional derivative formula. $D,f(x(t))=\nabla f(x(t))\cdot x'(t)$ at $t=0$ is $\nabla f(x,y)\cdot v$. –  May 03 '14 at 19:36
  • Yes, sure. Then I guess you want this: http://math.stackexchange.com/questions/686538/how-to-explain-lagrange-multipliers-to-a-lay-audience – M.B. May 03 '14 at 19:42
  • @M.B. That seems to presuppose that gradients are normal to level surfaces (why is that true intuitively?)... sorry for being stubborn :) –  May 03 '14 at 20:03
  • This result is not very intuitive, is it? – cactus314 May 09 '14 at 22:05
  • Just trying to clarify: If you don't view the standard proof as intuitive, what sort of understanding are you seeking? I can only see three conceptual steps: The first-order variation of $f$ is linear in $v$; $\nabla f$ is dual to $Df$; and $Df(v) = v \cdot \nabla f = |\nabla f| \cos\theta$ for every unit vector $v$.... – Andrew D. Hwang May 10 '14 at 17:37
  • @user86418 A visual/geometric understanding would be best. The proof outline(s) you give are very simple and easy to understand, but I wouldn't call them "intuitive" any more than I would call the quadratic formula "intuitive." –  May 10 '14 at 20:03

2 Answers2

1

Maybe the following helps to understand the intuition behind the object $\langle \nabla f,v\rangle$ occuring in the standard proof: $\nabla f(x)$ is the vector composed of the directional derivatives of $f$ in the directions of the $n$ standard basis vectors $e_1,\ldots e_n$. Now consider a unit vector $v$ in the 1-norm, i.e. $\sum |v_i|=1$. For simplicity let's think of the case $v_i\geq 0$.

Therefore $\langle \nabla f(x),v\rangle = \sum \frac{\partial f}{\partial x_i}(x) v_i$ is a convex combination of directional derivatives which is the directional derivative in the convex combination of the different directions. (Remember that derivatives are intuitivly linear approximations to the function) This is the equation $D_v f(x) = \langle \nabla f(x),v\rangle$. Thus: If we want to find the $v$ with maximal value of $D_v f(x)$ then we have to maximize $\langle \nabla f(x),v\rangle$.

Now the intuition behind $\langle u,v\rangle$ comes from thinking in terms of orthogonal projections: The scalar product equals the (signed) length of the projection of $u$ onto the line given by the direction $v$. This length can only be maximal if nothing is lost during the projection, i.e. if there is no orthogonal component. Therefore $u$ must be a multiple of $v$ and a positive multiple because we want a maximum.

Putting everything together: $D_vf(x)$ is maximal iff $v$ is the direction of $\nabla f(x)$.

  • "Therefore [sum] is a convex combination of directional derivatives which is the directional derivative in the convex combination of the different directions." I don't entirely understand this sentence. Why does it matter that they are convex? And the formula would only give the directional derivative of the vector $v$ if $v$ was an orthogonal combination of different directions, correct? I like this explanation though, I just want to make sure that I understand it correctly. Cheers! –  May 10 '14 at 22:33
  • The "convex" just means that $v_i\geq 0$ and $\sum v_i=1$. Convex combinations are something very intuitive for me, that's why I put it in although it's not strictly necessary for the argument. And of course is $v$ a linear combination of orthogonal directions: $v=\sum v_i e_i$. – Johannes Hahn May 10 '14 at 22:37
0

$\newcommand{\R}{\mathbf{R}}$Let $U$ be an open set in $\R^{2}$ and $f:U \to \R$ a differentiable function. If $x_{0} \in U$, then by definition there exists a linear function $Df(x_{0}):\R^{2} \to \R$ such that $$ \lim_{x \to x_{0}} \frac{|f(x) - f(x_{0}) - Df(x_{0})(x - x_{0})|}{\|x - x_{0}\|} = 0. $$

If $e_{1}$ and $e_{2}$ denote the standard basis of $\R^{2}$, then the partial derivatives of $f$ at $x_{0}$ are defined to be the components of $Df(x_{0})$: $$ f_{1}(x_{0}) = Df(x_{0})(e_{1}),\qquad f_{2}(x_{0}) = Df(x_{0})(e_{2}). $$ That is, $[\begin{matrix} f_{1}(x_{0}) & f_{2}(x_{0})\end{matrix}]$ is the standard matrix of $Df(x_{0})$.

The gradient vector $\nabla f(x_{0})$ is defined to be the transpose, $$ \nabla f(x_{0}) = \left[\begin{matrix} f_{1}(x_{0}) \\ f_{2}(x_{0}) \end{matrix}\right]. $$

Rearranging the definition of the derivative gives the linear approximation formula $$ f(x) = f(x_{0}) + Df(x_{0})(x - x_{0}) + o\bigl(\|x - x_{0}\|\bigr). $$ Particularly, if $v = \left[\begin{matrix} v_{1} \\ v_{2} \end{matrix}\right]$ is an arbitrary vector, then \begin{align*} f(x_{0} + tv) &= f(x_{0}) + Df(x_{0})(tv) + o(t) \\ &= f(x_{0}) + t\, Df(x_{0})(v) + o(t) \\ &= f(x_{0}) + t\bigl(f_{1}(x_{0})v_{1} + f_{2}(x_{0})v_{2}\bigr) + o(t) \\ &= f(x_{0}) + t\, \nabla f(x_{0})\cdot v + o(t). \end{align*} (The first two equalities follow from linearity of $Df(x_{0})$; the third comes from multiplying matrices; the fourth is the formula for the dot product.)

Introducing the function $g_{v}(t) = f(x_{0} + tv)$, the preceding equation becomes $$ g_{v}'(0) = \nabla f(x_{0})\cdot v. $$ This derivative is the rate of change of $f$ at $x_{0}$ in the direction $v$.

If $\nabla f(x_{0}) \neq (0, 0)$, and if $v$ is a unit vector making angle $\theta$ with $\nabla f(x_{0})$, then $$ f(x_{0} + tv) = f(x_{0}) + t\|\nabla f(x_{0})\|\cos\theta + o(t). $$ That is, $g_{v}'(0) = \|\nabla f(x_{0})\|\cos\theta$.

It follows immediately that

  1. If $\theta = 0$, i.e., if $v = \dfrac{\nabla f(x_{0})}{\|\nabla f(x_{0})\|}$, then $g_{v}'(0)$ is maximized over all unit vectors.

  2. If $\theta = \pi$, i.e., if $v = -\dfrac{\nabla f(x_{0})}{\|\nabla f(x_{0})\|}$, then $g_{v}'(0)$ is minimized over all unit vectors.

  3. If $\theta = \pi/2$, i.e., if $v \cdot \nabla f(x_{0}) = 0$, then $g_{v}'(0) = 0$, signifying that $f$ is constant to first order at $x_{0}$ in the direction $v$, namely that $v$ is tangent to the level curve of $f$ through $x_{0}$.