1

I am a little bit confused with the chain rule of matrix derivatives. For example, let

$$ f(X) := \operatorname{tr} \left(\log \left( W X W^\top + B \right) \right)^2 $$

where $\log(X)$ is the matrix logarithm of $m \times m$ symmetric positive definite (SPD) matrix $X$, $B$ is a $n \times n$ SPD matrix ($n>m$), and $W\in \mathbb{R}^{n\times m}$ is a rectangle matrix. If I use the chain rule, I should have

$$\frac{\partial f}{\partial X} = 2\log(W X W^\top + B) W^\top(W^\top X W + B)^{-1}W$$

However, the dimensions of $\log(W X W^\top + B)$ and $W^\top(W^\top X W + B)^{-1}W$ are $n \times n$ and $m \times m$ respectively. So there must be something wrong with my my derivations, but I don't know where is it. Any comments?


*** Addition ***

Let $Z=(W X W^\top + B)S$.

What if $f(X) = \text{tr}((\log(Z))^\top\log(Z))$, where $S$ is also SPD matrix. Do we have

$\frac{\partial f}{\partial X} = 2W^\top \log[(W X W^\top + B)S] (W^\top X W + B)^{-1}W$,

or

$\frac{\partial f}{\partial X} = 2W^\top(W^\top X W + B)^{-1}S^{-1} \log[(W X W^\top + B)S] SW$?

Using the notations in this post, my solution is:

Define $Z=(W X W^\top + B)S$, and $\phi=\text{tr}([\log(Z)]^2)$. Then we have

$d\phi = 2\log(Z)\cdot Z^{-\top}:dZ=2\log(Z)\cdot Z^{-\top}: WdXW^\top S = 2W^\top\log(Z)\cdot Z^{-\top}SW: dX$.

Therefore, we have

$\frac{\partial \phi}{\partial X} = 2W^\top\log(Z)\cdot Z^{-\top}SW=2W^\top \log[(W X W^\top + B)S] (W^\top X W + B)^{-1}W$.

Is it correct?

2 Answers2

2

$ \def\T{{\sf T}} \def\l{\lambda} \def\z{\zeta} \def\h{\odot} \def\bR#1{\Big[#1\Big]} \def\LR#1{\left(#1\right)} \def\CR#1{\left\lbrace #1 \right\rbrace} \def\op#1{\operatorname{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\q{\quad} \def\qq{\qquad} \def\qif{\q\iff\q} \def\qiq{\q\implies\q} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\rr#1{\color{red}{#1}} \def\bb#1{\color{blue}{#1}} \def\gg#1{\color{green}{#1}} \def\RLR#1{\rr{\LR{#1}}} \def\GLR#1{\gg{\LR{#1}}} $To answer your second question, consider this calculation $$\eqalign{ Z &= WXW^{\T}S+BS \qiq \gg{dZ=W\,dX\,W^{\T}S} \\ L &= \log(Z) \\ \phi &= \trace{L^2} \\ d\phi &= \LR{2LZ^{-1}}^{\T}:\:\gg{dZ} \\ &= 2\LR{L^{\T}Z^{-\T}}:\:\GLR{W\,dX\,W^{\T}S} \\ &= 2\LR{W^{\T}L^{\T}Z^{-\T}S^{\T}W}:dX \\ \grad{\phi}{X} &= 2W^{\T}L^{\T}Z^{-\T}S^{\T}W \\ &= 2\LR{S {Z^{-1}L} W}^{\T}W \\ }$$ Your first question is a special case of your second, in which $S$ equals the identity matrix. This choice of $S$ makes $(L,Z)$ symmetric, so you can omit the transposes on those terms.

Update

In the comments, there is a discussion about a related function $$ \psi = \trace{L^{\T}L} \;\equiv\; L:L $$ This is a seemingly trivial change, but calculating its gradient requires either an infinite series expansion or the $\sf Daleckii-Krein\;Theorem.\;$ Let's use the latter.

Start with the Eigenvalue Decomposition of the $Z$ matrix $$\eqalign{ \def\Q{Q^{-1}} Z &= QD\Q \qiq D=\Diag{\l_k}=\CR{{\sf eigenvalues}} \\ L &= \log(Z) \\ \rr{dL} &\rr{= Q\bR{R\h\LR{\Q\,dZ\:Q}}\Q} \\ \\ R_{jk} &= \begin{cases} {\large\frac{\log(\l_j)\,-\,\log(\l_k)}{\l_j\,-\,\l_k}} \qq{\rm if}\;\l_j\ne\l_k \\ \\ \qq\l_k^{-1}\q\qq\qq{\rm otherwise} \\ \end{cases} }$$ This can be substituted into the differential of the new function $$\eqalign{ d\psi &= 2L:\rr{dL} \\ &= 2L:\RLR{Q\bR{R\h\LR{\Q\,dZ\:Q}}\Q} \\ &= 2\LR{Q\bR{R\h\LR{\Q LQ}}\Q}^{\T}:\gg{dZ} \\ &= 2\LR{Q\bR{R\h\LR{\Q LQ}}\Q}^{\T}:\GLR{W\,dX\,W^{\T}S}\qq \\ &= 2\LR{W^{\T}SQ\bR{R\h\LR{\Q LQ}}\Q W}^{\T}:dX \\ \grad{\psi}{X} &= 2\LR{S {Q\bR{R\h\LR{\Q LQ}}\Q} W}^{\T}W \\ }$$ Eliminating $\CR{L,Z}$ in favor of $\CR{D,Q}$ yields $$\eqalign{ \grad{\phi}{X} &= 2\LR{SQ\;\bb{\bR{D^{-1}\log(D)}}\;\Q W}^{\T}W \\ \grad{\psi}{X} &= 2\LR{SQ\;\bb{\bR{R\h\log(D)}}\;\Q W}^{\T}W \\ }$$ However, $\log(D)$ is a diagonal matrix and the diagonal part of $R$ is equal to $D^{-1}$, therefore the gradients are in fact identical.

The above derivation assumes that $Z$ is diagonalizable. The $\sf DK\ Theorem$ can be extended to defective matrices, but the result is much more complicated and not usually worth the effort.

greg
  • 40,033
  • Thanks a lot for the comments! However, I don't quite understand how do you get $d\phi = (2LZ^{-1})^T:dZ $. Could you please show some detailed derivations? – user3138073 Oct 09 '18 at 23:23
  • @user3138073 Consider the scalar function $$f(\lambda)=(\log\lambda)^2$$ whose derivative is $$f'(\lambda)=\frac{2\log\lambda}{\lambda}$$ Apply this to the trace of the corresponding matrix function via the formula $$d,{\rm tr}f(X)=f'(Z^T):dZ$$ It's also worth pointing out that the matrices $(L,Z)$ commute, since $L$ is a function of $Z$. – greg Oct 09 '18 at 23:36
  • It appears that this answer was altered to match a slight change in the question. Specifically the function $,{\rm tr}(L^2) \implies {\rm tr}(L^TL).,$ Despite the seemingly minor nature of this change, there is no simple closed-form solution to the new question. In particular, the expressions for $d\phi$ given in my answer is no longer valid. – greg Nov 22 '18 at 16:44
  • Hi @greg, thanks a lot for the feedback! But it seems that your answer is based on $\phi = \text{tr} L^\top L$, not $\phi = \text{tr} L^2$. Is my understanding correct? – user3138073 Nov 26 '18 at 17:01
  • No. My original answer dealt with the function $\phi={\rm tr}(L^2)$. Five months later that single line was edited to read $\phi={\rm tr}(L^TL)$. But everything after that line is no longer true for the new function. – greg Nov 26 '18 at 19:42
  • Yes. The original question is $\text{tr}(L^2)$, and I changed it to $\text{tr}(L^\top L)$. In this case, there is no analytical solution? I thought your answer is applicable to the second case, because it was edited at Oct 10 at 20:56, after I modified my question. – user3138073 Nov 28 '18 at 00:23
  • Also, it seems that $d\text{tr}f(Z) = f'(Z^\top) : dZ$ is a very general rule, but I didn't find any related reference. To me, the definition of $f'(Z^\top)$ is also not clear, since in my case, $f(Z) = \log(Z)^\top \log(Z)$ is also a matrix, not a scalar. Could you please tell me where it is from? Thanks! – user3138073 Dec 03 '18 at 05:51
1

Let $Z=WXW^T+B$; it's a symmetric $>0$ matrix. Since $\log(Z)$ and $Z^{-1}$ commute, the derivative is

$Df_X:K\in M_{m,m}\rightarrow 2tr(\log(Z)Z^{-1}WKW^T)=2tr(W^T\log(Z)Z^{-1}WK)$.

Then the gradient is

$\nabla(f)(X)=2W^TZ^{-1}\log(Z)W\in M_{m,m}$.

When I am writing, I see that greg obtains the same result.

EDIT. Comment on the addition by @user3138073 . The answer is no but you will have trouble understanding why...

Assume that $U(t)$ is a function of $t\in \mathbb{R}$ and let $f(t)=tr((\log(U))^2)$; then $f'(t)=2tr((\log(U))'\log(U))=2tr(\log(U)U^{-1}U')$; indeed, behind, there is a series, and thanks to the trace, and because $\log(U)$ is a polynomial in $U$ (it's true when $U$ has no $<0$ eigenvalues), we can put $U'$ on the right side of the trace and obtain the series which gives $U^{-1}$ (it's absolutely not obvious!).

In a second time, you choose $g(t)=tr((\log(U)S)^2)$. Then $g'(t)=2tr((\log(U))'S\log(U)S)$. Unfortunately, $(\log(U))'=U^{-1}U'$ is absolutely false (it's much more complicated than that!). If you put $U'$ on the right side of the trace, then you break the series (cf. above) because $S,U$ don't commute.

In other words, $tr(U^2U'U^3\log(U))=tr(U^5U'\log(U))$ but $tr(U^2U'U^3S\log(U)S)\not= tr(U^5U'S\log(U)S)$.

  • Thanks a lot for the help! Could you tell me if there is any textbook/tutorial about all these matrix derivative things? – user3138073 May 24 '18 at 03:19
  • @user3138073 , you can read any book about differential calculus. greg does not use same notations as me. I prefer to consider the derivative as a linear function and to deduce the gradient, using the equality $Df_X(K)=<\nabla (f)(X),K>=tr((\nabla(f)(X))^TK)$. –  May 25 '18 at 13:37
  • Thanks for the feedback! I need to spend some time on the matrix calculus, since I am not quite familiar with the notation you used. Here you use the fact that $log(Z)$ and $Z^{-1}$ is commute. But what if $Z$ is not symmetric? I've update my questions. any comments? thanks! – user3138073 May 25 '18 at 21:05
  • Thanks for the comments! I made a mistake (now are corrected) -- it is not $(\log(U))^2$, but $(\log(U))^\top \log(U)$. I thought U was symmetric, so I ignore the transpose. – user3138073 Oct 10 '18 at 20:22