The "second derivative test" for $f(x,y)$

Question

I'm currently taking multivariable calculus, and I'm familiar with the second partial derivative test. That is, the formula $D(a, b) = f_{xx}(a,b)f_{yy}(a, b) - (f_{xy}(a, b))^2$ to determine the behavior of $f(x,y)$ at the point $(a, b, f(a,b))$.

However, my professor simply "spat" this formula at us and provided almost no explanation of its derivation/where it comes from. After researching a bit on my own, I now know that it's the determinant of the Hessian matrix for $f(x,y)$, and I see how the formula is easily derived from that matrix. Wikipedia just says: The following test can be applied at a non-degenerate critical point $x$. If the Hessian is positive definite at $x$, then $f$ attains a local minimum at $x$. If the Hessian is negative definite at $x$, then $f$ attains a local maximum at $x$. If the Hessian has both positive and negative eigenvalues then $x$ is a saddle point for $f$ (this is true even if $x$ is degenerate). Otherwise the test is inconclusive."

I understand that, but I still don't understand why the determinant of this matrix happens to model the behavior of $f$ in this way. Why is it? And if the test happens to fail, what steps should then be taken to determine the nature of $f(x,y)$ at $(a, b, f(a,b))$?

Your calculus teacher is probably constrained by not being able to assume that students know linear algebra. And the real way to explain this involves eigenvalues, eigenspaces, and facts about symmetric matrices. — 2'5 9'2, Mar 19 '14 at 21:18
@alex.jordan That's probably true considering linear algebra isn't a pre-req for the course. — foobar1209, Mar 19 '14 at 21:44

2'5 9'2 · Accepted Answer · 2014-04-10T17:53:40.513

That matrix is symmetric. It is a consequence of linear algebra that a symmetric matrix is orthogonally diagonalizable. That means there are two perpendicular directions upon which that matrix acts as scaling by $\lambda_1$ and by $\lambda_2$.

These $\lambda_i$ represent the quadratic coefficient of a parabolic approximation to the function $f$ at $(x_0,y_0)$ as you move through in the direction of each eigenspace. Since you already are looking at a critical point, the quadratic approximation is reaching its tip at $(x_0,y_0)$. If the two $\lambda_i$ are opposite in sign, you will have two parabolas orthogonal to each other opening in different directions, clearly creating a saddle. If you have two $\lambda_i$ that are of the same sign, then depending on that sign you either have a max or a min.

But the determinant of a $2\times2$ matrix works out to be the same thing as the product of the two eigenvalues. So you can see how a negative determinant implies $\lambda_i$ of opposite sign, which implies a saddle point, and a positive determinant similarly implies either a max or a min.

Locally at any $(x_0,y_0)$, there is a higher dimensional version of the Taylor series, grouped here by increasing order of derivative: $$\begin{align*} f(x,y)&=f(x_0,y_0)+\Big[f_x(x_0,y_0)\cdot(x-x_0)+f_y(x_0,y_0)\cdot(y-y_0)\Big]\\ &\phantom{{}={}}+\frac12\Big[f_{xx}(x_0,y_0)\cdot(x-x_0)^2+f_{xy}(x_0,y_0)\cdot(x-x_0)(y-y_0)\\ &\phantom{{}={}}+f_{yx}(x_0,y_0)\cdot(y-y_0)(x-x_0)+f_{yy}(x_0,y_0)\cdot(y-y_0)^2\Big]+\cdots\\ &=f(x_0,y_0)+\nabla f(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T\\ &\phantom{{}={}}+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots \end{align*}$$

When you are at a critical point, this simplifies to $$\begin{align*} f(x,y)&=f(x_0,y_0)+\frac12\left((x,y)-(x_0,y_0)\right)\cdot H(x_0,y_0)\cdot\left((x,y)-(x_0,y_0)\right)^T+\cdots \end{align*}$$

And if we could change coordinates to an $s$ and $t$ variable that run in the directions of $H$'s eigenspaces, based at the critical point, we'd just have

$$f(s;t)=f(0;0)+\frac12\lambda_1s^2+\frac12\lambda_2t^2+\cdots$$ which I hope helps to see the parabolas and the role of the eigenvalues of $H$.

I have $$ \nabla ^2f\left( x,y \right) =\frac{\partial ^2f\left( x,y \right)}{\partial x^2} + \frac{\partial ^2f\left( x,y \right)}{\partial y^2} \ \frac{\partial ^2f\left( x,y \right)}{\partial y^2} = f(x+1,y) - 2f(x,y) + f(x-1, y) $$
Why we have $\frac{\partial ^2f\left( x,y \right)}{\partial y^2} = f(x+1,y) - 2f(x,y) + f(x-1, y)$ please? — Avv, Jul 22 '22 at 05:37
@Avv This question and answer are about the second derivative test for a real-valued function of two variables. Your comment asks about the Laplacian, the sum of second derivatives with respect to each variable. That is the trace of the matrix in this discussion and it is not relevant to the question or answer (the determinant of that matrix is what is relevant). So I think you misunderstand something. — 2'5 9'2, Jul 23 '22 at 01:06

Jack · Answer 2 · 2014-03-21T07:55:12.310

Let $f: \mathbb{R}^n \longrightarrow \mathbb{R}$ a twice differentiable function.

The second derivative is defined as the derivative of the first derivative $x\mapsto \text{d}f(x) \in \mathcal{L}(\mathbb{R^n},\mathbb{R})$ therefore belongs to the following set of functions $\mathcal{L}(\mathcal{L}(\mathbb{R}^n,\mathbb{R}),\mathbb{R})$ this set is isomorphic to the set of bilinear functions from $\mathbb{R}^n \longrightarrow\mathbb{R}$. As f is twice differentiable, Schwarz's theorem shows that the bilinear form $d^2f(x)$ is actually symmetric.

The hessian matrix is defined as the matrix associated to the bilinear form $\text{d}^2f(x)$. Let's take a look at what $f$ ressembles locally, one has:

$f(x+h) = f(x) + \text{d}f(x).h + \frac{1}{2}\text{d}^2f(x).h^{(2)} + o(\|h\|^2)$ which one can rewrite (using Riez's representation theorem to define the gradient vector and the above remarks for the hessian matrix), $f(x+h)=f(x)+ \nabla f(x).h + \frac{1}{2}h^T\mathcal{H}f(x)h + o(\|h\|^2)$

Therefore if $x$ is a critical point one has locally:

$f(x+h)=f(x)+ \frac{1}{2}h^T\mathcal{H}f(x)h + o(\|h\|^2)$

From this we can see how the hessian matrix is going to help us determine the behavior of $f$ locally, for example if $\mathcal{H}f(x)$ is a positive definite symmetric bilinear form then if $h$ is small enough $f(x+h)-f(x)\geq 0$, and so $f$ will attain a local minimum at $x$, if $\mathcal{H}f(x)$ is a negative definite symmetric bilinear form then if $h$ is small enough $f(x+h)-f(x)\leq 0$ so $f$ will attain a local maximum at $x$ etc.. if it is neither one or the other.. it's a saddle point, etc.

We can see that the eigenvalues of $\mathcal{H}f(x)$ are going to be essential to determine whether or not the matrix is positive definite , negative definite or not.

In the case $n=2$, we can write, if we call $r=\frac{\partial^2 f}{\partial x^2}$, $t=\frac{\partial^2f}{\partial y^2}$ $s=\frac{\partial^2f}{\partial x \partial y}$, $q(h)=d^2f(x).(h1,h2)^{(2)} = rh_1^2 +2sh_1h_2 +th_2^2$.

if $r>0, q(h_1,h_2)=r[(h_1+\frac{s}{r}h_2)^2 + \frac{rt-s^2}{r^2}h_2^2]$

and if $rt-s^2>0$ then $q(h_1,h_2)\geq 0$ and $q(h_1,h_2)=0 \Rightarrow (h_1,h_2)=0$ so in this case $\mathcal{H}f(x)$ is positive definite.

if however $rt-s^2<0$ $q$ changes sign.

The case $r<0$ is the same as above except that "positive" becomes negative.

Last of all if $r=0$ we can't conclude.

score 8 · Answer 3 · answered May 21 '18 at 17:14

This is a proof which only uses directional derivatives.

We want to prove the Second Derivative Test.

Suppose the second partial derivatives of $f$ are continuous on a disk with center (a, b), and suppose that $f_x(a,b)=f_y(a,b)=0$. Let

$H=\begin{vmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{vmatrix}=f_{xx}(a,b)f_{yy}(a,b)-[f_{xy}(a,b)]^2$

(a) If $H>0, f_{xx}(a,b)>0$, then $f(a,b)$ is a local minimum.

(b) If $H>0, f_{xx}(a,b)<0$, then $f(a,b)$ is a local maximum.

(c) If $H<0$, then $f(a,b)$ is a saddle point.

(d) If $H=0$, inconclusive.

Let $u=(h, k)$, which is a unit vector.

$$\begin{align} D^2_uf(x,y)&=D_u[D_uf(x,y)]\\ &=D_u[\nabla f\cdot u]\\ &=(f_{xx}h+f_{yx}k)h+(f_{xy}h+f_{yy}k)k \end{align}$$

Since the second partial derivatives are continuous, $f_{xy}=f_{yx}$

$$\begin{align} D^2_uf(x,y)&=f_{xx}h^2+2f_{xy}hk+f_{yy}k^2\\ &=k^2(f_{xx}(\frac{h}{k})^2+2f_{xy}\frac{h}{k}+f_{yy}) \end{align}$$

Let $z=\frac{h}{k}$, $g(z)=f_{xx}z^2+2f_{xy}z+f_{yy}$

Use the quadratic function above to find the range of solutions of $D^2_uf(x,y)$.

Let $d=(2f_{xy})^2-4f_{xx}f_{yy}=4(f_{xy}^2-f_{xx}f_{yy})=-4H$

(1) If $d<0$, means that $D^2_uf(x,y)$ can only be either positive or negative.

We can know that the point $(a,b)$ must be a local minimum or a local maximum.

$$\begin{align} D^2_uf(x,y)&=f_{xx}h^2+2f_{xy}hk+f_{yy}k^2\\ &=f_{xx}(h^2+2\frac{f_{xy}}{f_{xx}}hk+\frac{f_{yy}}{f_{xx}}k^2)\\ &=f_{xx}[(h+\frac{f_{xy}}{f_{xx}}k)^2+(f_{xx}f_{yy}-f_{xy}^2)k^2]\\ &=f_{xx}[(h+\frac{f_{xy}}{f_{xx}}k)^2+Hk^2] \end{align}$$

We know that $H>0$, so,

if $f_{xx}>0$, the equation above will always be positive, which means a local minimum at $(a,b)$.

Otherwise if $f_{xx}<0$, it will always be negative, which means a local maximum at $(a,b)$.

(2) If $d>0$, means that $D^2_uf(x,y)$ can be both positive, zero or negative, so

$H<0$ tells us that the function has a saddle point at $(a,b)$.

(3) If $d>0$, means that $D^2_uf(x,y)$ can only be either positive and zero or negative and zero, so

$H=0$ cannot tell that whether $(a,b)$ is a local minimum, local maximum or a saddle point.

I have $$ \nabla ^2f\left( x,y \right) =\frac{\partial ^2f\left( x,y \right)}{\partial x^2} + \frac{\partial ^2f\left( x,y \right)}{\partial y^2} \ \frac{\partial ^2f\left( x,y \right)}{\partial y^2} = f(x+1,y) - 2f(x,y) + f(x-1, y) $$
Why we have $\frac{\partial ^2f\left( x,y \right)}{\partial y^2} = f(x+1,y) - 2f(x,y) + f(x-1, y)$ please? — Avv, Jul 22 '22 at 05:37

littleO · Answer 4 · 2021-10-15T05:34:16.927

Here is the intuition behind the second-derivative test for classifying critical points in multivariable calculus.

Let $f:\mathbb R^n \to \mathbb R$ be a smooth function (to be precise, let's assume that the second-order partial derivatives of $f$ exist and are continuous). Suppose that $x_0 \in \mathbb R^n$ is a critical point of $f$, so that $\nabla f(x_0) = 0$. We would like to know if $x_0$ is a local minimizer for $f$, or a local maximizer for $f$, or a saddle point for $f$ (that is, neither a local minimizer nor a local maximizer).

To answer this question, we use the fundamental strategy of calculus: we approximate $f$ locally by a polynomial function. The second-order Taylor approximation to $f$ near $x_0$ tells us that \begin{align} f(x) &\approx f(x_0) + \underbrace{\nabla f(x_0)^T}_0(x - x_0) + \frac12 (x - x_0)^T Hf(x_0) (x - x_0) \\ &= f(x_0) + \frac12 (x - x_0)^T Hf(x_0) (x - x_0). \end{align} Here $Hf(x_0)$ is the Hessian matrix of $f$ at $x_0$. The approximation is good when $x$ is close to $x_0$.

Key idea: If somehow we magically knew that $(x - x_0)^T Hf(x_0) (x - x_0) > 0$ for all $x \in \mathbb R^n$ (except $x = x_0$) then it is plausible that $f(x) > f(x_0)$ for all $x$ sufficiently close to $x_0$ (except for $x = x_0$). In other words, if we could somehow determine that the matrix $Hf(x_0)$ is positive definite, then we could conclude that $x_0$ is a local minimizer for $f$.

Similarly, if we could somehow determine that $Hf(x_0)$ is negative definite, then we could conclude that $x_0$ is a local maximizer for $f$. Reason: if $x$ is close (but not equal) to $x_0$, then \begin{align} f(x) &\approx f(x_0) + \frac12 \underbrace{(x - x_0)^T Hf(x_0)(x - x_0)}_{< \, 0} \\ &< f(x_0). \end{align} Likewise, if we could somehow determine that $(x - x_0)^T Hf(x_0) (x - x_0)$ is sometimes positive and sometimes negative, then we could conclude that $x_0$ is a saddle point for $f$.

So how can we find out if a symmetric matrix $H \in \mathbb R^{n \times n}$ is positive definite or negative definite or neither? Linear algebra provides an answer.

Fact: A symmetric matrix $H \in \mathbb R^{n \times n}$ is positive definite if and only if all the eigenvalues of $H$ are positive. Also, $H$ is negative definite if and only if all the eigenvalues of $H$ are negative. If $H$ has some positive eigenvalues and some negative eigenvalues, then $w^T H w$ is sometimes positive and sometimes negative, depending on the choice of the vector $w \in \mathbb R^n$.

The above fact from linear algebra gives us a method for classifying a critical point $x_0$ of our function $f$:

Compute the eigenvalues of the Hessian matrix $Hf(x_0)$.
If all the eigenvalues are positive, then $x_0$ is a local minimizer for $f$. If all the eigenvalues are negative, then $x_0$ is a local maximizer for $f$. If some eigenvalues are positive, and some are negative, then $x_0$ is a saddle point of $f$. If none of the above are true, then we can conclude nothing (in other words, our test fails).

Let me also explain how to compute the eigenvalues of the Hessian matrix when $n = 2$. It is a fact from linear algebra that if $H \in \mathbb R^{2 \times 2}$ then the eigenvalues of $H$ are the solutions to the equation $$ \tag{1} \lambda^2 - \text{trace}(H) \lambda + \det H = 0. $$ (Here $\text{trace}(H)$ is the sum of the diagonal entries of $H$.) So we can find the eigenvalues of $H$ by solving equation (1), which is a quadratic equation.

The "second derivative test" for $f(x,y)$

4 Answers4

Linked