How to prove function $f(X) := \text{tr} \left( X^{-1} A \right)$ is convex?

Question

Let $S^{n}_{+}$ and $S^{n}_{++}$ denote the set of positive semidefinite and positive definite (symmetric) $n \times n$ matrices, respectively. Let function $f : S^{n}_{++} \to \mathbb R$ be defined by

$$f(X) := \text{tr} \left( X^{-1} A \right)$$

where $A \in S^{n}_{+}$. When $f$ is differentiable, given that

$$\nabla_X f = - X^{-1}AX^{-1}$$

can we show that $f$ is convex via the use of the equivalent proposition of convexity

$$\langle\nabla f(X)-\nabla f(Y),\ X-Y\rangle \ge 0$$

where $\langle \cdot, \cdot\rangle$ denotes the inner product of $S^{n}_{++}$? Or can it be proven that the function is convex in a simpler way?

In the case $n=1$ we deal with the function $f=f_a:(0,\infty)\to\Bbb R$ given by $f(x) = a/x$, and the second derivative has a sign depending on $a$, which seems to be arbitrary. (We get different shapes for instance for the particular values $-1,0,+1$ of $a$.) — dan_fulea, Sep 06 '19 at 00:27
Thanks a lot for your reminding! So I have modified the previous description as above. Look forward to your reply if it is convenient for you. — W.J, Sep 06 '19 at 05:33

score 1 · Answer 1 · answered Sep 07 '19 at 08:49

This is a formal proof, i am trying to give a proof following the lines suggested in the OP.

We may assume that $A$ is moreover positive definite, since any positive semidefinite $A$ can be approximated by a sequence of positive definite matrices. Now we can precompose with the linear, invertible map $g:Y\to A^{1/2}YA^{1/2}$ and show equivalently that the composition $$ \begin{aligned} Y\to g(Y) &\to f(g(Y)) \\ &= \operatorname{Trace}\Big(\ g(Y)^{-1}A\ \Big) \\ &= \operatorname{Trace}\Big(\ (A^{1/2}YA^{1/2})^{-1}A\ \Big) \\ &= \operatorname{Trace}\Big(\ A^{-1/2}Y^{-1}A^{-1/2}\; A\ \Big) \\ &= \operatorname{Trace}\Big(\ Y^{-1}\;A^{-1/2}AA^{-1/2}\ \Big) \\ &= \operatorname{Trace}\Big(\ Y^{-1}\ \Big) \end{aligned} $$ is a convex map. This is rather for making typing easy in the following. So we assume without loss of generality $A=I=1$, the unit matrix. Now we fix a point $X$ in the space of $n\times n$ positive definite matrices, and try to exhibit formally a Taylor expansion for $f:X\to \operatorname{Trace}\Big(\ X^{-1}\ \Big)$. For this, we consider a small deformation using a (symmetric) $H$-difference for / from $X$ and compute the expansion for $f(X+H)-f(X)$: $$ \begin{aligned} f(X+H)-f(X) &= \operatorname{Trace}\Big(\ (X+H)^{-1}\ \Big) - \operatorname{Trace}\Big(\ X^{-1}\ \Big) \\ &= \operatorname{Trace}\Big(\ (X+H)^{-1} - X^{-1}\ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2}(I+X^{-1/2}HX^{-1/2})^{-1}X^{-1/2} - X^{-1}\ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ (I+\underbrace{X^{-1/2}HX^{-1/2}}_{=:U\text{ notation}})^{-1} -I\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ (I-U+U^2-U^3+U^4-U^5+\dots) -I\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ -U+U^2+O(\|H\|^3)\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ X^{-1/2} \Big[\ -X^{-1/2}HX^{-1/2} +X^{-1/2}HX^{-1}HX^{-1/2} +O(\|H\|^3)\ \Big] X^{-1/2} \ \Big) \\ &= \operatorname{Trace}\Big(\ -X^{-1}HX^{-1} +X^{-1}HX^{-1}HX^{-1} +O(\|H\|^3) \ \Big) \\ &= \operatorname{Trace}\Big(\ -X^{-1}HX^{-1} \ \Big) + \operatorname{Trace}\Big(\ X^{-1}HX^{-1}HX^{-1} \ \Big) +O(\|H\|^3) \ . \end{aligned} $$ So the first derivative of $f$, computed in the point $X$, is the linear map $H\to f'(X)(H):=\operatorname{Trace}\Big(\ -X^{-1}HX^{-1} \ \Big)$. (Seen as a map from the tangent space in $X$ of the space $S_{++}^n$ with values in $\Bbb R$.)

The convexity addresses the next term in the expansion, necessarily $$ \operatorname{Trace}\Big(\ X^{-1}HX^{-1}HX^{-1} \ \Big) = \operatorname{Trace}\Big(\ X^{-1/2}\; \underbrace{(X^{-1/2}HX^{-1/2})^2}_{=U^2\in S_+^n}\; X^{-1/2} \ \Big) \ge 0 \ . $$ Strictly speaking we have not finished the proof, because "in some directions" $H$ may vanish and maybe the other terms in the Taylor expansion gain the upper hand. But the argument can be accurately hot ironed maybe simplest by considering the full expression $U^2-U^3+U^4-\dots=U(I+U)^{-1}U$ instead.

Firstly, thanks you very much! Your answer has given me a lot of new insights. Furthermore, I think we can show the conveixty of $f$ in this way that we consider the convexity of a new function $g(t)$ with respect to $t$, $$g(t):=f(Z+tV)$$ for any $Z, V\in S^{n}{++}$ and $t\in {t| Z+tV\in S^{n}{++}}$. — W.J, Sep 07 '19 at 11:24

score 0 · Accepted Answer · answered Sep 07 '19 at 14:46

Some thoughts:

Consider the case $A=I$. Then $\nabla f(X) = -X^{-2}$. Let $X, Y\in \mathbb S^n_{++}$ be arbitrary. By the spectral theorem they have eigen-decompositions $X=U^T\Lambda U$ and $Y=V^T\Sigma V$. First note that

$$f(X) = tr(X^{-1}) = tr( (U^T \Lambda U)^{-1} ) = tr(\Lambda^{-1}) = f(\Lambda)$$

then

$$\begin{aligned} \langle\nabla f(X)-\nabla f(Y),\ X-Y\rangle & = \langle -X^{-2} + Y^{-2}, X-Y\rangle \\ & = -tr(X^{-1}) + tr(X^{-2}Y) + tr(Y^{-2}X) - tr(Y^{-1}) \\ &= -tr(\Lambda^{-1}) + tr(\Lambda^{-1} \underbrace{ UYU^T}_{=:Y'}\Lambda^{-1}) + tr(\Sigma^{-1}\underbrace{V^TXV}_{=:X'}\Sigma^{-1}) -tr(\Sigma^{-1}) \\ &= \sum_i\Big[-\frac{1}{\lambda_i} + \frac{Y'_{ii}}{\lambda_i^2} +\frac{X'_{ii}}{\sigma_i^2} -\frac{1}{\sigma_i} \Big] \\ &= \sum_i\frac{1}{\lambda_i^2\sigma_i^2}\big(X'_{ii}\lambda_i^2 - \sigma_i\lambda_i^2-\sigma_i^2\lambda_i + Y'_{ii}\sigma_i^2 \big) \end{aligned}$$

In the case where $X=\Lambda$ and $Y=\Sigma$ are already diagonal themselves the latter simplifies to

$$\begin{aligned} \langle\nabla f(\Lambda)-\nabla f(\Sigma),\ \Lambda-\Sigma\rangle &= \sum_i\frac{1}{\lambda_i^2\sigma_i^2}\big(\lambda_i^3 - \sigma_i\lambda_i^2-\sigma_i^2\lambda_i + \sigma_i^3 \big) \\ &= \sum_i\frac{1}{\lambda_i^2\sigma_i^2}(\lambda_i - \sigma_i) ^2(\lambda_i+\sigma_i) \ge0\\ \end{aligned}$$

However the general case is more difficult. We know that $\sum_i X'_{ii} =tr(X')=tr(X)= \sum_i \lambda_i$ and $\lambda_{\min} \le X'_{ii}\le\lambda_{\max}$. However I am not seeing right now how to finish the proof.

What we see however is that it seems quite possible that some of the terms can become negative! (e.g. when $X'_{ii}<\sigma_i$ and $Y'_{ii}<\lambda_i$) So any solution attempt needs to take this into account by arguing that each term is "on average $\ge 0$"

How to prove function $f(X) := \text{tr} \left( X^{-1} A \right)$ is convex?

2 Answers2