1

I am trying to differentiate a scalar expression $u$ with respect to a matrix $X$, which has applications to Material Strength models. I've gotten through most of the work here, but am stuck at the end. If you see anything wrong please let me know! This is related to question Derivative of matrix logarithm but involves some extra steps. We have the expression $$ u = tr(H^2) + tr(H)^2 $$ where $H=log(X^{-1})=-log(X)$. So then $$ \frac{\partial u}{\partial X} = \frac{\partial }{\partial X} tr(H^2) + \frac{\partial }{\partial X} tr(H)^2 $$

Outermost expressions

First term

Let's handle the first term. So, using the Chain Rule (sec 2.8.1) from The Matrix Cookbook I can let $U = H^2$, $g(U)=tr(U)$ and so $ \frac{\partial g(U)}{\partial U} = I$ and we must compute $$\frac{\partial U}{\partial X} = \frac{\partial H^2}{\partial X} = \frac{\partial H}{\partial X}H + H\frac{\partial H}{\partial X} $$

Using the chain rule formula, I have $$\frac{\partial }{\partial X_{ij}} tr(H^2) = tr \left( I^T \frac{\partial U}{\partial X_{ij}} \right ) = tr \left( \frac{\partial H}{\partial X_{ij}}H + H\frac{\partial H}{\partial X_{ij}} \right ) = 2tr \left( H\frac{\partial H}{\partial X_{ij}} \right ) $$

Second term term

For the second term, I assume I can write $$ \frac{\partial }{\partial X_{ij}} tr(H)^2 = 2 tr(H)\frac{\partial tr(H) }{\partial X_{ij}} = 2 tr(H)tr\left ( I^T \frac{\partial H}{\partial X_{ij}} \right) = 2 tr(H)tr\left (\frac{\partial H}{\partial X_{ij}} \right) $$

Putting this all together, we have $$ \frac{\partial u}{\partial X_{ij}} = 2tr \left( H\frac{\partial H}{\partial X_{ij}} \right ) + 2 tr(H)tr\left (\frac{\partial H}{\partial X_{ij}} \right) $$

Innermost expressions

Trace of Derivative of matrix logarithm

So then, what is $\frac{\partial H}{\partial X_{ij}} $ ? We can use the power series

$$ \frac{\partial H}{\partial X_{ij}} = -\frac{\partial }{\partial X_{ij}}\sum\limits_{k=1}^\infty (-1)^{k+1} k^{-1} (X-I)^{k} = \sum\limits_{k=1}^\infty \sum\limits_{\ell=1}^k (-1)^{k} k^{-1} (X-I)...\frac{\partial (X-I)}{\partial X_{ij}} ...(X-I) $$ where the last expression represents the derivative of the $\ell$th term. The trace of this equation simplifies nicely: $$ tr\left( \frac{\partial H}{\partial X_{ij}} \right) = \sum\limits_{k=1}^\infty \sum\limits_{\ell=1}^k (-1)^{k} k^{-1} tr\left( (X-I)^{k-1} \frac{\partial X}{\partial X_{ij}} \right ) = \sum\limits_{k=1}^\infty (-1)^{k} tr\left( (X-I)^{k-1} \frac{\partial X}{\partial X_{ij}} \right ) $$ We note that $\frac{\partial X}{\partial X_{ij}}$ is a single entry matrix, so then using the matrix Neumann series gives us $$ tr\left( \frac{\partial H}{\partial X_{ij}} \right) = \sum\limits_{k=1}^\infty (-1)^{k} \left( (X^T-I)^{k-1} \right ) _{ij} = - \sum\limits_{k=1}^\infty \left( (I-X^T)^{k-1} \right ) _{ij} = - \sum\limits_{m=0}^\infty \left( (I-X^T)^{m} \right ) _{ij} $$ This means we currently have $$ \frac{\partial u}{\partial X_{ij}} = 2tr \left( H\frac{\partial H}{\partial X_{ij}} \right ) - 2 tr(H)\left ( X^{-T} \right ) _{ij} $$

Trace of product of matrix and its derivative

This leaves us with the term $2tr \left( H\frac{\partial H}{\partial X_{ij}} \right )$. If we use previous work we see $$ H\frac{\partial H}{\partial X_{ij}} = \left ( -\sum\limits_{p=1}^\infty (-1)^{p+1} p^{-1} (X-I)^{p} \right) \left ( \sum\limits_{k=1}^\infty \sum\limits_{\ell=1}^k (-1)^{k} k^{-1} (X-I)...\frac{\partial (X-I)}{\partial X_{ij}} ...(X-I) \right ) $$

Expanding this, we get $$ H\frac{\partial H}{\partial X_{ij}} = - \sum\limits_{p=1}^\infty \sum\limits_{k=1}^\infty \sum\limits_{\ell=1}^k (-1)^{k+p+1} k^{-1}p^{-1} (X-I)^{p} (X-I)...\frac{\partial (X-I)}{\partial X_{ij}} ...(X-I) $$

Taking the trace $$ tr\left( H\frac{\partial H}{\partial X_{ij}} \right ) = -\sum\limits_{p=1}^\infty \sum\limits_{k=1}^\infty \sum\limits_{\ell=1}^k (-1)^{k+p+1} k^{-1}p^{-1} tr\left((X-I)^{p+k-1}\frac{\partial (X-I)}{\partial X_{ij}} \right ) $$

As before $$ tr\left( H\frac{\partial H}{\partial X_{ij}} \right ) = -\sum\limits_{p=1}^\infty \sum\limits_{k=1}^\infty (-1)^{k+p+1}p^{-1} \left( (X^T-I)^{p+k-1} \right ) _{ij} $$

However, I am not sure how to continue. Any ideas?

1 Answers1

0

Applying a function to a scalar argument $x$, yields a scalar value. Consider the following functions and their derivatives $$\eqalign{ h(x) &= -\log(x) \quad&\implies\quad h' &= -x^{-1} \\ g(x) &= h^2 \quad&\implies\quad g' &= 2hh' = -2hx^{-1} \\ }$$ Applying these functions to a matrix argument $X$, will yield the analogous matrix value $$\eqalign{ H &= h(X),\qquad H' = h'(X) = -X^{-1} \\ G &= g(X),\qquad G' = g'(X) = -2HX^{-1}\\ }$$ The differential of the trace of any analytic matrix function (like $G$) is also somewhere in the Matrix Cookbook, and is given by $$\eqalign{ d\,{\rm tr}(G) &= (G')^T:dX }$$ where the colon is a convenient product notation for the trace, i.e. $$\eqalign{ A:B = {\rm tr}(A^TB) = {\rm tr}(B^TA) = B:A \\\\ }$$ Now we are ready to tackle the function in question. $$\eqalign{ u &= {\rm tr}(G) + {\rm tr}(H)^2 \\ du &= d\,{\rm tr}(G) + 2\,{\rm tr}(H)\;d\,{\rm tr}(H) \\ &= \Big(G'+2\,{\rm tr}(H)\,H'\Big)^T:dX \\ &= -\Big(2HX^{-1}+2\,{\rm tr}(H)\,X^{-1}\Big)^T:dX \\ \frac{\partial u}{\partial X} &= -2\Big(HX^{-1}+{\rm tr}(H)\,X^{-1}\Big)^T \\ }$$ Depending on your preferred layout convention, you may want the transpose of this result.

greg
  • 40,033
  • Your statement "Applying these functions to a matrix argument , will yield the analogous matrix value" is only true if the matrix and its derivative commute. Is this problem negated by the fact that everything is enveloped by a trace? – InfiniteElementMethod Oct 29 '20 at 18:52