6

Let $U$ be an open convex subset of $\mathbb R^n$ and $f:U\to\mathbb R$ a convex function on it.

  • It is a well-known fact that if the second partial derivatives exist everywhere on $U$ and are all continuous (i.e., if $f\in\mathcal C^2$), then the Hessian of $f$ is symmetric, that is, $\partial^2 f/(\partial x_i\partial x_j)=\partial^2 f/(\partial x_j\partial x_i)$ for any $i,j\in\{1,\ldots,n\}$. (Actually, $f$ needn't even be convex for this result.)
  • In fact, Alexandroff's theorem states that the Hessian exists and is symmetric almost everywhere with respect to the $n$-dimensional Lebesgue measure, without any additional assumptions beyond convexity.

Question: It is possible for $f$ to be twice differentiable (and thus have, not necessarily everywhere-continuous, second-order partial derivatives) everywhere on $U$ but a Hessian that is not symmetric at some $x\in U$?


Update: Dudley (1977) gives an example of a convex function with an existent and asymmetric Hessian at the origin. This counterexample doesn't settle my question, however, because Dudley's function doesn't have a second-order (Fréchet) derivative (i.e., not twice differentiable) at the origin (even though the second-order partial derivatives exist). I would like to see a convex function with both an existent second-order Fréchet derivative and with asymmetric Hessian at some point (which necessarily implies that some of the second-order partial derivatives are discontinuous at that point).

triple_sec
  • 23,935

1 Answers1

10

It turns out that twice-differentiability implies that the Hessian is symmetric even without convexity and with no reference to whether the second-order partial derivatives are continuous! The proof below is based on Theorem 8.12.2 in the book Foundations of Modern Analysis by Dieudonné (1969, p. 180).

Claim: Let $U\subseteq\mathbb R^n$ be an open set and $f:U\to\mathbb R$ a function. Suppose that $f$ is (Fréchet) differentiable on $U$ and that it is twice (Fréchet) differentiable at $\mathbf x_0\in U$. Then, the Hessian matrix $\mathbf H(\mathbf x_0)$ at $\mathbf x_0$ is symmetric.

Proof: Let $\mathbf D:U\to\mathbb R^n$ denote the gradient function of $f$. Fix $\varepsilon>0$. Since $\mathbf D$ is Fréchet differentiable at $\mathbf x_0$ by assumption, it follows that there exists some $\delta>0$ such that $\|\mathbf v\|<4\delta$ implies that $$\left\|\mathbf D(\mathbf x_0+\mathbf v)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot\mathbf v\right\|\leq\varepsilon\|\mathbf v\|.$$ There is no loss of generality in taking $\delta$ to be so small that the open ball $B(4\delta,\mathbf x_0)$ is contained in the open set $U$.

For any $i,j\in\{1,\ldots,n\}$, let $\mathbf e_i$ and $\mathbf e_j$ be the corresponding standard basis vectors of unit length. Let $\mathbf s\equiv\delta\mathbf e_i$ and $\mathbf t\equiv\delta\mathbf e_j$. It is clear that $\mathbf x_0+\xi\mathbf s+\mathbf t$ and $\mathbf x_0+\xi\mathbf s$ are both in $U$ whenever $\xi\in[0,1]$; this is because $\|\xi\mathbf s+\mathbf t\|<4\delta$ and $\|\xi\mathbf s\|<4\delta$. Define the following function $g:[0,1]\to\mathbb R$: $$g(\xi)\equiv f(\mathbf x_0+\xi\mathbf s+\mathbf t)-f(\mathbf x_0+\xi\mathbf s)\quad\forall\xi\in[0,1].$$

Clearly, $g$ is continuous on $[0,1]$ and differentiable on $(0,1)$. Lagrange's mean-value theorem, in turn, implies that there exists some $\xi\in(0,1)$ such that $$g(1)-g(0)=g'(\xi)=\mathbf s\cdot\left[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0+\xi\mathbf s)\right],$$ using the chain rule.

Next, one can derive the following chain of inequalities (the first one uses the Cauchy–Schwarz inequality): \begin{align*} &\left|g(1)-g(0)-\mathbf s\cdot\mathbf H(\mathbf x_0)\cdot\mathbf t\right|\leq\underbrace{\|\mathbf s\|}_{=\delta}\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)]-\mathbf H(\mathbf x_0)\cdot\mathbf t\right\|\\ =&\,\delta\left\|[\mathbf D(\mathbf x_0+\xi\mathbf s+\mathbf t)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s+\mathbf t)]-[\mathbf D(\mathbf x_0+\xi\mathbf s)-\mathbf D(\mathbf x_0)-\mathbf H(\mathbf x_0)\cdot(\xi\mathbf s)]\right\|\\ \leq&\,\delta\varepsilon\left(\|\xi\mathbf s+\mathbf t\|+\|\xi\mathbf s\|\right)<8\delta^2\varepsilon. \end{align*} That is, one has that $$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j|<8\delta^2\varepsilon,$$ and, by a completely analogous and symmetric reasoning in which $\mathbf s$ and $\mathbf t$ are interchanged, $$|f(\mathbf x_0+\mathbf s+\mathbf t)-f(\mathbf x_0+\mathbf s)-f(\mathbf x_0+\mathbf t)+f(\mathbf x_0)-\delta^2\mathbf e_j\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_i|<8\delta^2\varepsilon.$$ Given that $\mathbf e_i\cdot\mathbf H(\mathbf x_0)\cdot\mathbf e_j=h_{ij}(\mathbf x_0)\equiv\partial^2 f/(\partial x_i\partial x_j)(\mathbf x_0)$, the preceding two inequalities imply that $$\left|h_{ij}(\mathbf x_0)-h_{ji}(\mathbf x_0)\right|<16\varepsilon.$$ Taking $\varepsilon$ to be arbitrarily small, one sees that $h_{ij}(\mathbf x_0)=h_{ji}(\mathbf x_0)$. $\blacksquare$

triple_sec
  • 23,935
  • 1
    Thanks for this excellent exposition of Dieudonne's theorem. This is quite a slick proof. I didn't understand the proof in Dieudonne's real analysis book at first glance: this answer made everything quite clear. +1. – Balarka Sen Feb 12 '16 at 20:55
  • Great answer. FYI Theorem 12.12 in Mathematical Analysis by Apostol gives a result that (I think) is slightly stronger. The upshot is that Frechet differentiability of the first order partials is sufficient. – JasonJones Nov 16 '20 at 23:58
  • Thanks a lot for this clear answer! May I ask a silly question as to why one would need Schwarz's Theorem then? – oliver Sep 23 '22 at 09:20
  • @oliver My understanding (correct me if I’m wrong) is that Schwarz’s theorem is not nested: its premise requires (i) only that the second-order partial derivatives exist (without the original function necessarily being twice Fréchet differentiable), a weaker condition; but (ii) also that the second-order partial derivatives be continuous, a stronger condition. Schwarz’s theorem is more relevant because its premises are more naturally satisfied. Dieudonné’s theorem covers the somewhat pathological cases, which are more interesting from a theoretical rather than a practical point of view. – triple_sec Sep 23 '22 at 17:38
  • @triple_sec: doenst existence of second partials imply continuity of first partials and therefore frechet? and then continuity of second partials, which are first partials of frechet derivative, twice frechet differentiability? so isnt schwarz strictly a weaker form of dieudonne (ie implied by the latter)? – peter May 26 '24 at 18:52
  • @peter The existence of partial derivatives of a function doesn’t imply continuity of the function. – triple_sec May 27 '24 at 03:38
  • @triple_sec you're right, i at least organized the argument wrong. is there a function thats twice continuously partially differentiable on a neighborhood (as in schwarz), but not twice differentiable at the point (as in dieudonne)? that sounds strange and i'd like an example. – peter May 27 '24 at 07:10
  • @peter That’s a fair point. I don’t know whether such a counterexample exists, but I concede that even if one does exist, it is likely pathological and uninteresting. – triple_sec May 30 '24 at 22:24
  • @triple_sec i think i really only organized my argument wrong: not the existence of the second partials is what makes the first continuous. the continuity of the second partials makes the first partials differentiable and therefore continuous, which makes the original function differentiable. the second partials are the first partials of the derivative, making f twice differentiable. – peter May 31 '24 at 04:38