Convexity of matrix exponential

Question

Consider the matrix-valued function $A: \mathbb{R}^n \rightarrow \mathbb{R}^{n \times n}$ defined as

$$ A( x ) := \left[ \begin{matrix} x_1 & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & x_2 & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n,1} & a_{n,2} & \cdots & x_n \end{matrix} \right] $$

where $a_{i,j} \geq 0$ for all $i,j$. Define the matrix-valued function $F: \mathbb{R}^{ n } \rightarrow \mathbb{R}^{ n \times n }$ as

$$ F( x ) := \text{exp}( A(x) )$$

where $\text{exp}(\cdot)$ denotes the matrix exponential, i.e., $$\exp(A) := \sum_{k=0}^{\infty} \frac{A^k}{k!} = I + A + \frac{1}{2!} A^2 + \frac{1}{3! }A^3 + \cdots$$

Notice that $F$ is element-wise nonnegative, because it is the exponential of a Metzler matrix.

Let $f_{i,j} : \mathbb{R}^{ n } \rightarrow \mathbb{R}$ be the $(i,j)$-component of $F$, $f_{ij}(x) = F(x)_{ij}$.

Prove that, for all $i,j$, the function $f_{i,j}$ is convex.

Comment: I am trying to show that the second derivative of $f_{i,j}$ is nonnegative. I am also trying to show that the second derivative of $\mathbb{R} \ni t \mapsto \exp( A( x + t y ) )$ is nonnegative for all $x,y \in \mathbb{R}^n$.

What motivates this question? Why would you think it true? Have you been able to make any progress yourself towards a solution? — Jonas Dahlbæk, Sep 26 '14 at 13:08
Doesn't convexity just follows from midpoint-convexity (that is trivial since we are considering the exponential of a linear map) and continuity? — Jack D'Aurizio, Sep 30 '14 at 10:09
Mumble, maybe we need to know something about the eigenvectors of $A$. If they are non-negative, convexity follows. Otherwise, it might be non-trivial, or even false. — Jack D'Aurizio, Sep 30 '14 at 10:13
Thanks a lot for your comments. Do you have a proof of mid-point convexity? — user693, Sep 30 '14 at 10:28
As for the eigenvectors, it is known that $A$ has an eigenvector in the non-negative orthant. Not sure that all of them are non-negative. — user693, Sep 30 '14 at 10:34
Do you know that the assertion of convexity is true? In other words, are you asking for a proof of a known property, or asking whether or not $f_{i,j}$ is convex? — Michael Grant, Oct 01 '14 at 13:45
I do know that it is true and I do have an indirect proof. Therefore I would like to get a direct proof by looking at the second derivative of the function. — user693, Oct 01 '14 at 14:03

Mizar · Answer 1 · 2014-12-20T20:22:28.903

It suffices to show the convexity in the domain $\Omega=\{x_i>0,\ i=1,\dots,n\}$. Once this is done, to check that the standard definition of convexity holds for a generic couple of points $x$, $y$ you can choose a large $M>0$ so that, calling $\vec{M}=(M,\dots,M)$, both $x'=x+\vec{M}\in\Omega$ and $y'=y+\vec{M}\in\Omega$;
then you have $F(z+\vec{M})=\exp\left(A(z+\vec{M})\right)=\exp(A(z)+MI)=e^M F(z)$ for any $z$ lying on the segment which joins $x$ and $y$ and what you want to check follows from the definition of convexity with $x',y'$.

So let's show that $f_{ij}$ is convex in $\Omega$: it suffices to show that if a matrix $A$ has nonnegative coefficients and $D$ is a diagonal matrix then $\exp(A+tD)+\exp(A-tD)-2\exp(A)=t^2P+O(t^3)$, where $P$ has nonnegative coefficients (because then we deduce $\nabla^2 f_{ij}(x)[h,h]=\lim_{t\to 0}\frac{f_{ij}(x+th)+f_{ij}(x-th)-2f_{ij}(x)}{t^2}\ge 0$ for any vector $h$, so $\nabla^2 f_{ij}$ is positive semidefinite and $f_{ij}$ is convex).
By the definition of the exponential, we can just show the stronger statement that for any $k\ge 0$ $(A+tD)^k+(A-tD)^k-2A^k=t^2P+O(t^3)$, where again $P$ is some matrix with nonnegative coefficients.

Now notice that $(A+tD)^k+(A-tD)^k-2A^k=2t^2\sum_{0\le r<s\le k-1} A^rDA^{s-r-1}DA^{k-s-1}+O(t^3)$ because the terms of first order cancel out, while the term of second order in the Taylor expansion of $(A\pm tD)^k$ is $t^2\sum_{0\le r<s\le k-1}\Pi_{rs}$, where $\Pi_{rs}$ is the product of $k$ factors which are all equal to $A$ except for the $r$-th and $s$-th ones which are equal to $D$ (by the $0$-th factor we mean the first one, and so on). So $P=2\sum_{0\le r<s\le k-1} A^rDA^{s-r-1}DA^{k-s-1}$.
Calling $i_0:=i$, $i_k:=j$, we then discover that the $(i,j)$-th component of $\sum_{0\le r<s\le k-2} A^rDA^{s-r-1}DA^{k-s-2}$ is $$\sum_{0\le r<s\le k}\sum_{i_1,\dots,i_{k-1}}a_{i_0i_1}\cdots a_{i_{r-1}i_r}d_{i_ri_{r+1}}a_{i_{r+1}i_{r+2}}\dots a_{i_{s-1}i_s}d_{i_si_{s+1}}a_{i_{s+1}i_{s+2}}\cdots a_{i_{k-1}i_k}$$ but $D$ is diagonal and has the effect of "freezing the indices" for one step, i.e. in the sum we can take $i_r=i_{r+1}$ and $i_s=i_{s+1}$. Thus, by shifting all the indices, the big sum becomes $$\sum_{0\le r\le s\le k-2}\sum_{j_1,\dots,j_{k-3}}a_{j_0j_1}\cdots a_{j_{r-1}j_r}d_{j_r}a_{j_rj_{r+1}}\dots a_{j_{s-1}j_s}d_{j_s}a_{j_sj_{s+1}}\cdots a_{i_{k-3}i_{k-2}}$$ where we have put $d_j:=d_{jj}$ and the extremal indices are $j_0:=i$, $j_{k-2}:=j$.
Now we exchange the two sums and we are left to show that $$\sum_{0\le r\le s\le k-2}d_{j_r}d_{j_s}a_{j_0j_1}\cdots a_{j_{r-1}j_r}a_{j_rj_{r+1}}\dots a_{j_{s-1}j_s}a_{j_sj_{s+1}}\cdots a_{i_{k-3}i_{k-2}}\ge 0$$ for any fixed choice of indices $j_0,\dots,j_{k-2}$. The product of the coefficients of $A$ is nonnegative and independent of $(r,s)$, so we just have to show that $$\sum_{0\le r\le s\le k-2}d_{j_r}d_{j_s}\ge 0$$ which is clear as $2\sum_{0\le r\le s\le k-2}d_{j_r}d_{j_s}=2\sum_{0\le r\le k-2}d_{j_r}^2+2\sum_{0\le r<s\le k-2}d_{j_r}d_{j_s}=\sum_{0\le r\le k-2}d_{j_r}^2+\left(\sum_{0\le r\le k-2}d_{j_r}\right)^2\ge 0.$

@ Mizar , your second part does not work. Indeed, the Hessian $P"(x)=[p_{i,j}]$ is a symmetric matrix s.t., for every $i,j$, $p_{i,j}\geq 0$. Yet, that does not imply that $P"(x)$ is positive definite or even non-negative. — , Dec 09 '14 at 00:13
Of course you are right. I try to see if this can be fixed, otherwise I will delete the answer. — Mizar, Dec 09 '14 at 07:35
Just replaced the bad argument with a more serious attempt. Let me know if it makes sense — Mizar, Dec 20 '14 at 17:06
@ Mizar , the middle of your post is unclear (for me). I agree with the first part: it suffices (cf. also my last remark) to prove that $P=\sum_{r+s+t\leq k-2}A^rDA^sDA^t$ is $\geq 0$. In the second part you take $s\leq k-2$ (?) and, in line -4, you say that the product does not depend on $(r,s)$ ; yet, I see in line -6, some indices of the $(a_{i,j})$ that are equal to $j_r$ or $j_s$. In fact I think that the difficulty is in the factors of your $d_{j_r}d_{j_s}$. A last remark: for a complete proof, you "must" prove that $P>0$. — , Dec 20 '14 at 19:58
Well, the meaning of $r$ and $s$ changes at some time. At the beginning $r$ and $s$ are the positions of the two factors $D$, so we always have $r<s$; after the shift of the indices, i.e. when we use the $j_m$'s, $s$ decreases by $1$ (so the new $s$ satisfies $s\le k-2$). If you look carefully, in line -6 I wrote the indices $j_r$ and $j_s$ just to be a little more clear about what happens after the shift. That product could be written without mentioning $r$ and $s$. — Mizar, Dec 20 '14 at 20:18
@ Mizar , I am not convinced and I explain why. (Correction): let $P=∑{r+s+t=k−2}A^rDA^sDA^t$; then we consider $p{i,q}=\sum_{r+s+t=k-2}\sum_{j,p}(A^r){i,j}d_j(A^s){j,p}d_p(A^t){p,q}=$ $\sum{j,p}d_jd_p\sum_{r+s+t=k-2}(A^r){i,j}(A^s){j,p}(A^t){p,q}=$ $\sum{j,p}d_jd_p\alpha_{j,p}$ where the $(\alpha_{j,p})$ are $\geq 0$. Yet the required conclusion $p_{i,q}\geq 0$ is not obvious. — , Dec 20 '14 at 20:45
Thank you for the reply. Expand completely $A^r$, $A^s$, $A^t$ and sum over $j,p$ (before exchanging the sums!) so that you get a huge sum over indices $m_1,\dots,m_r=j,m_{r+1},\dots,m_{r+s}=p,\dots,m_{k-3}$; so now only the first and last indices are fixed, namely $m_0=i$ and $m_{k-2}=q$. Now $j,p$ have disappeared (i.e. they are replaced by the indices $m_r$ and $m_{r+s}$). Then you exchange the sum over $r,s,t$ and the huge sum and, once a choice of the indices $m_1,\dots,m_{k-3}$ is made, the product of the coefficients of $A$ is independent of $r,s,t$. — Mizar, Dec 20 '14 at 21:04
I leave in the hands of another reader to decide on the matter... — , Dec 20 '14 at 21:38
@ Mizar , thanks but now I'd want to work about another subject. — , Dec 21 '14 at 10:30

score 2 · Answer 2 · answered Oct 21 '15 at 00:21

2

you will find the proof in

"Convexity of the cost functional in an optimal control problem for a class of positive switched systems" Patrizio Colaneri, Richard H. Middleton , Zhiyong Chen , Danilo Caporale , Franco Blanchini, Automatica 2014

answered Oct 21 '15 at 00:21

mcolombi

21

Please give us a synopsis of the proof. This is not a reference-request question. – Marconius Oct 21 '15 at 00:37
So the proof goes along the lines: By Trotter formula $$ f_{i,j}(x) = e_i^\top\lim_{k\to\infty}\left(e^{\frac{A}{k}}e^{\frac{\text{diag}(x)}{k}}\right)^k e_j $$ – mcolombi Oct 21 '15 at 13:59

Convexity of matrix exponential

2 Answers2

Linked