Gradient of $\operatorname{Tr}( \exp{(H+\log{X})})$ w.r.t $X$.

Question

When I learned about Lieb's inequality, I meet this problem. In $\operatorname{Tr}( \exp{(H+\log{X})})$, $X$ is a square matrix, and the simplest case can be diagonal. $H$ is a Hermitian matrix, but I think it has not any effect on the gradient.

I have tried the first kind of calculation

The general formula for the gradient of the trace of this function applied to a matrix argument $X$ is [cf. paragraph 2.5 of The Matrix Cookbook ]

$\frac{\partial \operatorname{Tr} (F(X))}{\partial X} =f(X^{T})$, where $f(\cdot)$ is the scalar derivative $F(\cdot)$.

I am not sure what "scalar derivative" means. I understand it as replacing the matrix argument $X$ with a scale $x$, that is, in my calculations,

$F(X)= [\exp{((\log{X}+H))}]$, thus, $f(X^{T})= [\exp{((\log{X^{T}}+H))}] * (X^{T})^{-1}$ .

I am very puzzled by this result. If $X$ is a diagonal matrix, $H$ is not a non-diagonal matrix, then the gradient w.r.t. $X$ derived from the above equation is a non-diagonal. But $\frac{\partial \operatorname{Tr} (F(X))}{\partial X} $ should be diagonal when $X$ is limited to a diagonal matrix.

Next, I have tried the second kind of calculations

$d \operatorname{Tr}( \exp{(\log{X}+H)})= \operatorname{Tr} [d ( \exp{(\log{X}+H)})]$

$= \operatorname{Tr}[\int_{0}^{1} \exp{(\alpha(\log{X}+H))} d(\log{X}+H) \exp{((1-\alpha)(\log{X}+H))} d \alpha] $

$= \operatorname{Tr}[\int_{0}^{1} \exp{(\alpha(\log{X}+H))} \exp{((1-\alpha)(\log{X}+H))} d \alpha d(\log{X}+H)] $

$= \operatorname{Tr} [\exp{((\log{X}+H))} d(\log{X}+H) ] $

$= \operatorname{Tr} [\exp{((\log{X}+H))} ]\operatorname{Tr} [d(\log{X}+H) ] $

$= \operatorname{Tr} [\exp{((\log{X}+H))} ] [d \operatorname{Tr}(\log{X}+H) ] $

$= \operatorname{Tr} [\exp{((\log{X}+H))} ] (X^{T})^{-1} dX$

Can someone help me with the right calculation?

Thanks, I almost looked through this website about matrix gradient, including what you mentioned, but it still did not solve this problem. In my first kind of calculation, the formula inside was used. — Curry, Oct 06 '21 at 10:26
From the other question I understand F to be a scalar-to-scalar function that is applied to all the coordinates of a matrix. If this is right, then the rule can be applied when H is a scalar. — Eman Yalpsid, Oct 06 '21 at 10:30
Your formula from the matrix cookbook is valid only when $F$ is a real-valued function on the real line and $F(X)$ is understood in the sense of functional calculus (which is not the same as application of $F$ to all entries, as @Andrew suggested). In particular, this formula is not true for expressions that involve any matrix $Y$ inside the trace that does not commute with $X$. — MaoWao, Oct 07 '21 at 12:28

greg · Accepted Answer · 2021-10-07T11:47:21.413

$ \def\o{{\tt1}}\def\p{\partial} \def\L{\left}\def\R{\right} \def\LR#1{\L(#1\R)} \def\BR#1{\Bigg(#1\Bigg)} \def\fracLR#1#2{\L(\frac{#1}{#2}\R)} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\c#1{\color{red}{#1}} $Defining the matrix variable and its differential $$\eqalign{ W &= \LR{X+I}^{-1}\LR{X-I} \\ dW &= \LR{X+I}^{-1}\,dX - \LR{X+I}^{-1}\,dX\,\LR{X+I}^{-1}\LR{X-I} \\ &= \LR{X+I}^{-1}\,dX - \LR{X+I}^{-1}\,dX\;W \\ &= \LR{X+I}^{-1}\,dX\LR{I-W} \\ \\ X &= \LR{I-W}^{-1}\LR{I+W} \\ I &= \LR{I-W}^{-1}\LR{I-W} \\ \LR{X+I} &= 2\LR{I-W}^{-1} \qiq \LR{X+I}^{-1} = \tfrac 12\LR{I-W} \\ }$$ then extending this post to a matrix argument yields formulas for the logarithm and its differential $$\eqalign{ \log(X) &= \sum_{k=0}^\infty \LR{\frac{2}{2k+1}}W^{2k+1} \\ d\log(X) &= \sum_{k=0}^\infty \LR{\frac{2}{2k+1}} \sum_{j=\o}^{2k+1} W^{j-\o}\,dW\;W^{2k+\o-j} \\ &= \sum_{k=0}^\infty \LR{\frac{2}{2k+1}} \sum_{j=\o}^{2k+1} W^{j-\o}\LR{X+I}^{-1}\,dX\LR{I-W}W^{2k+\o-j} \\ &= \sum_{k=0}^\infty \sum_{j=\o}^{2k+1} \fracLR{W^{j-\o}\LR{I-W}\,dX\LR{I-W}W^{2k+\o-j}}{2k+1} \\ }$$ Now define the matrix variable $$A=H+\log(X) \qiq dA = d\log(X)$$ and apply the formula from the Cookbook $$\eqalign{ \phi &= \trace{e^A} \\ d\phi &= \LR{e^A}^T:dA \\ &= \sum_{k=0}^\infty \sum_{j=\o}^{2k+1} \fracLR{ W^{j-\o}\LR{I-W}\,e^A\LR{I-W}W^{2k+\o-j} }{2k+1}^T\!:dX \\ \grad{\phi}{X} &= \sum_{k=0}^\infty \sum_{j=\o}^{2k+1} \fracLR{ W^{j-\o}\LR{I-W}\,e^A\LR{I-W}W^{2k+\o-j} }{2k+1}^T \\ \\ }$$

In the above derivation, a colon is used as a convenient product notation for the trace $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ A:A &= \big\|A\big\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a colon product to be rearranged in many different ways, e.g. $$\eqalign{ A:B &= B:A \\ A:B &= A^T:B^T \\ C:AB &= CB^T:A = A^TC:B \\ }$$

Thank you for your reply. I also have a little puzzled about why you define the W instead of using X directly. I guess you have thought about it. In that case, $\log{(X)}= \sum_{n=1}^{\infty} \frac{(-1)^{n+1}}{n}(X-I)^{n}$. — Curry, Oct 07 '21 at 11:31
Another problem is that I do not understand how the two equations are transformed after $d \phi $, Could you provide a detailed explanation. — Curry, Oct 07 '21 at 11:45
The radius and rate of convergence are superior with the $W$-expansion. Now that you have an template to work from, you can easily calculate a gradient expression for any series expansion that you like. I updated my post with details of rearranging the colon product. — greg, Oct 07 '21 at 11:45
I understand that these are the properties of the traces. But the result I got is $\sum_{k=0}^{\infty} \sum_{j=1}^{2 k+1}\left(\frac{(I-W) W^{2 k+1-j}e^{A}W^{j-1}(I-W)} {2 k+1} \right)^{T}: d X $, using $A^{T}:BCDEF=\operatorname{Tr}(ABCDEF)=\operatorname{Tr}(EFABCD)=(EFABC)^{T}:D$. What's wrong with me? — Curry, Oct 07 '21 at 12:29
$W^j$ and $(I-W)^k$ commute. For that matter, they also commute with $(I-X)^\ell$ since they are all, ultimately, functions of $X.\quad$ — greg, Oct 07 '21 at 12:47

Gradient of $\operatorname{Tr}( \exp{(H+\log{X})})$ w.r.t $X$.

1 Answers1

Linked