2

I don't see how to differentiate $ABA^T$ with respect to $A$ where $A$ and $B$ are $n\times n$ matrices. I know it's going to be a rank-4 tensor, but what exactly will it be?

The inspiration for this comes from having to find the derivative of the covariance matrix $\operatorname{Cov}(TX)$ with respect to $T$.

So I'll tell you all what I've done so far and maybe you can help.

I was working with the squared Bures distance $d_H^2(Cov(TX),\Sigma_v) = tr(Cov(TX) + \Sigma_v - 2(Cov(TX))^{1/2}\Sigma_v Cov(TX)^{1/2})^{1/2})$.

First I computed the derivative of $d_H^2(A,B)$ for positive matrices $A$ and $B$, which turned out to be $tr(I-A_{\#}B^{-1})$. Here we define $A_{\#}B=(AB^{-1})^{1/2}B.$

So now I was using the chain rule to compute the derivative of $d_H^2(Cov(TX),\Sigma_v)$. But in order to do that, I need to differentiate $Cov(TX)$ w.r.t. $T$. That's where I'm stuck.

=========

Ultimately, I'm looking to find the gradient with respect to $T$ of $$ \lambda \left\|TX-X\right\|^2 + \left\|T\right\|_{HS} + d_H^2(Cov(TX),\Sigma_v). $$ and calculate its roots.

Assuming I didn't make any mistakes, the derivatives of the first two terms are $2(TX-X)X^T$ and $T/\left\|T\right\|_{HS}$ respectively -- feel free to correct me if I'm wrong here. So the last term is what's causing problems for me when I differentiate.

Yannik
  • 1,565

3 Answers3

3

In Einstein notation,$$\begin{align}\frac{\partial(ABA^T)_{ij}}{\partial A_{kl}}&=\frac{\partial}{\partial A_{kl}}A_{im}B_{mn}A_{jn}\\&=\delta_{ik}\delta_{lm}B_{mn}A_{jn}+A_{im}B_{mn}\delta_{jk}\delta_{ln}\\&=\delta_{ik}(AB^T)_{jl}+\delta_{jk}(AB)_{il}.\end{align}$$

J.G.
  • 118,053
3

The derivative of the expression given in the title can be done using vectorizations:

We have :

\begin{equation} \begin{split} M & = XYZ \\ \implies \text{vec}(M) & = \text{vec}(XYZ) \\ & = (Z^TY^T \otimes I)\text{vec}(X) \\ & = (Z^T \otimes X)\text{vec}(Y) \\ & = (I \otimes XY)\text{vec}(Z) \end{split} \end{equation}

Then, for our expression we have:

\begin{equation} \begin{split} \text{Let} \quad C & = ABA^T \\ \implies \text{vec}(C) & = \text{vec}(ABA^T) \\ \implies d(\text{vec}(C)) & = ((A^T)^TB^T \otimes I)d(\text{vec}(A)) \\ \implies \frac{d(\text{vec}(C))}{d(\text{vec}(A))} & = (AB^T \otimes I) \end{split} \end{equation} Similarly, we can differentiate w.r.t $B$

$$ \frac{d(\text{vec}(C))}{d(\text{vec}(B))} = (A \otimes A)$$

3

Let $J$ be the all-ones matrix and $$\eqalign{ C &=(I-\tfrac 1nJ) = C^T \qquad\qquad\big({\rm Centering\,Matrix}\big) \\ B &= \Sigma_v \\ A &= {\rm Cov}(TX) \\ &= \left(\tfrac 1{n-1}\right)(TX)^TC\,(TX) \\ }$$ From this post, the Bures distance function and its differential can be simplified to $$\eqalign{ \beta(A,B) &= {\rm Tr}\Big(A+B - 2(BA)^{1/2} \Big) \\ d\beta &=\Big(I - (BA)^{-1/2}B\Big):dA \\ }$$ Now change the differentiation variable from $\;dA\to dT$. $$\eqalign{ d\beta &= \Big(I - (BA)^{-1/2}B\Big):\left(\tfrac 2{n-1}\right){\rm Sym}(X^TT^TC\,dT\,X) \\ &= \left(\tfrac 2{n-1}\right)\Big(I - (BA)^{-1/2}B\Big):(X^TT^TC\,dT\,X) \\ &= \left(\tfrac 2{n-1}\right)CTX\Big(I - (BA)^{-1/2}B\Big)X^T:dT \\ \frac{\partial\beta}{\partial T} &= \left(\tfrac 2{n-1}\right)CTX\Big(I - (BA)^{-1/2}B\Big)X^T \\ }$$


In the above derivation, the function $$\eqalign{ {\rm Sym}(M) = \tfrac 12(M+M^T) \\ }$$ was utilized, as well as the trace/Frobenius product $$\eqalign{ P:M = {\rm Tr}(P^TM) = {\rm Tr}(M^TP) = M:P \\ }$$ These have the following interaction $$\eqalign{ P:{\rm Sym}(M) = {\rm Sym}(P):M \\ }$$

greg
  • 40,033
  • How come you don't have $Sym$ in the second line when calculating $d\beta$? – Yannik Sep 18 '20 at 04:33
  • Because the matrix on the LHS is symmetric, i.e. ${\rm Sym}(M)=M$. – greg Sep 18 '20 at 05:49
  • Thank you. This helps a lot. Out of curiosity, how would I get this result using the answer from @MathLearner where he has $A\otimes A$? – Yannik Sep 21 '20 at 22:04