8

For a $2$-variable function $f(x,y)$, the Hessian matrix is just the quadratic term of Taylor expansion,

$$H = \left[\begin{array}{cc}f_{xx} & f_{xy}\\ f_{xy} & f_{yy}\end{array}\right]$$

and according the Taylor expansion

$$f(x,y) \approx f(0,0) + [f_x, f_y]\left[\begin{array}{c}x\\y\end{array}\right] +\frac{1}{2}[x,y]H\left[\begin{array}{c}x\\y\end{array}\right]$$

My intuitive understanding of Hessian matrix is that, each entry in it is just the 2nd order derivative, and the 2nd order derivative indicates how fast the 1st order derivative changes, so I can understand that 2nd order derivatives show the concavity/convexity of $f(x,y)$.

But, there are many people out there saying that the eigenvalues/eigenvectors of Hessian can be used to determine/show blabla.... Why? How?

Furthermore, second partial derivative test utilizes Hessian matrix, but the most strange part of is that it just shows the cases for $(f_x,f_y) = (0,0)$, what about otherwise?

avocado
  • 1,269

1 Answers1

11

As for the last question: otherwise, you don't have a critical point and there is nothing to test. :-) Think with one variable: would you look for a maximum or minimum if $f'(x_0) \neq 0$?

Your intuitive understanding of the Hessian points in the right direction. The point is: how to "sum up" all the data $f_{xx}, f_{xy} = f_{yx}, f_{yy}$ in just one single fact?

Well, thinking about the quadratic form that the Hessian defines. Namely,

$$ q(x,y) = \begin{pmatrix} x & y \end{pmatrix} \begin{pmatrix} f_{xx} & f_{xy} \\ f_{yx} & f_{yy} \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = f_{xx}x^2 + 2 f_{xy}xy + f_{yy}y^2 \ . $$

If this quadratic form is positive-definitive, that is $q(x,y) > 0$ for all $(x,y) \neq (0,0)$, then $f$ has a local minimum at the point where this happens (just as in the one-variable case, $f''(x_0) > 0$ implies $f$ has a local minimum at $x_0$).

It's more or less obvious that for $q(x,y)$ to be positive or not at all points doesn't depend on the coordinate system you're using, isn't it?

Right, then do the following experiment: you have a nice quadratic form like

$$ q(x,y) = x^2 + y^2 $$

which is not ashamed to show clearly that she is positive-definite, is she?

Then, do to her the following linear change of coordinates:

$$ \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 1 & 0 \end{pmatrix} \begin{pmatrix} \overline{x} \\ \overline{y} \end{pmatrix} $$

and you'll get

$$ q(\overline{x}, \overline{y}) = 2\overline{x}^2 + 2 \overline{x}\overline{y} + \overline{y}^2 \ . $$

Is now also clear that $q(\overline{x}, \overline{y}) > 0$ for all $ (\overline{x}, \overline{y}) \neq (0,0)$?

So, we need some device that allows us to show when a symmetric matrix like $H$ will define a positive-definite quadratic form $q(x,y)$, no matter if the fact is disguised because we are using the wrong coordinate system.

One of these devices are the eigenvalues of $H$: if all of them are positive, we know that, maybe after a change of coordinate system, our $q(x,y)$ will have an associate matrix like

$$ \begin{pmatrix} \lambda & 0 \\ 0 & \mu \end{pmatrix} $$

with $\lambda, \mu > 0$. Hence, in some coordinate system (and hence, in all of them), our $q > 0$.

Agustí Roig
  • 18,489
  • Thank you so much for this detailed answer, but one thing I don't understand is that for $q(x,y)$ to be positive or not at all points doesn't depend on the coordinate system you're using, why? If x,y axis change, then $f_{xx},f_{xy},f_{yy}$ change, which means Hessian matrix might not be positive-definite any more, right? – avocado Sep 07 '13 at 03:33
  • Do you mean that, $q(x,y) > 0$ for one coordinate system $(x,y,z)$, later I switch to another coordinate system $(x',y'z)$, which means the x,y axes' direction are changed, but not the z axis, so if $q(x,y) > 0$ in the old system, then it'll still be positive in this new system, right? – avocado Sep 07 '13 at 03:37
  • Yes, but forget about the $z$ axis: we (you included) were talking about two variable functions. :-) – Agustí Roig Sep 07 '13 at 07:43
  • And, of course, for a function ($q$ in our case) to take positive values or not on a point doesn't depend on the coordinate system you're using to locate the point. – Agustí Roig Sep 07 '13 at 08:51
  • Well, I seem to be stuck with the coordinate system. In the answer, you mentioned the change of coordinate system from $q(x,y)$ to $q(\bar x,\bar y)$, and I actually envision $z=q(x,y)$ to be a surface in $R^3$, here comes my problem: if I change the $x,y$-axis to $x'y'$-axis, and the $x'$ axis is actually the $z$-axis of the old system, then now what? Or do you mean I can't change $x,y$ axis this way, because it's not linear, right? – avocado Sep 07 '13 at 09:04
  • I mean: there is no $z$ at all in your problem. Forget about it. – Agustí Roig Sep 07 '13 at 09:07
  • All right ;-), so 2 questions: the change of coordinate system is just rotation and translation, right? Why do we need to change coordinate system? – avocado Sep 07 '13 at 09:12
  • We want, we need, changing the coordinate system in order to see if our quadratic form $X^tHX$ is positive-definite: it it's written as a sum of squares, like $x^2 + y^2$, this is clear. But, if it's written like $2x^2 + 2xy + y^2$, this might be not so clear. – Agustí Roig Sep 07 '13 at 09:24
  • And nope: the changes of coordinates systems I'm talking about in $\mathbb{R}^2$ are just invertible $2\times 2$ matrices. – Agustí Roig Sep 07 '13 at 09:27
  • The changes of coordinate systems by invertible $2\times 2$ matrices are linear maps, aren't they? Just like change the basis of vector spaces, right? – avocado Sep 07 '13 at 10:06
  • 1
    Thanks for your patient explanation, Agusti. ;-) – avocado Sep 07 '13 at 10:20