Calculating the gradient of Log-Euclidean distance between SPD matrices on Riemannian manifold

Question

In the paper Log-Euclidean metrics for fast and simple calculus on diffusion tensors, the geodesic distance between SPD matrices $A,B$ is defined as $$d(A,B)=||\log A- \log B||_F,$$ where $F$ is the Frobenius norm. I did not find more information on the other aspects of the Riemannian structure.

I am interested in the gradient (on Riemannian manifold) of the distance function $$ f(X) = d(X,A_0). $$ I am not familiar with geometry other than some basics, $\langle grad f,X\rangle = Xf $ for all smooth vector field $X$.

How can I derive $grad f$?

the link to the paper is broken.... – Binxu Wang 王彬旭 May 15 '22 at 06:03 — Binxu Wang 王彬旭, May 15 '22 at 06:03

greg · Answer 1 · 2023-04-26T08:57:25.680

$ \def\op#1{\operatorname{#1}} \def\bbR#1{{\mathbb R}^{#1}} \def\e{\varepsilon} \def\o{{\tt1}} \def\f{\frac{\tt1}{f}} \def\p{\partial} \def\A{{\cal A}}\def\L{{\cal L}} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\trace#1{\op{Tr}\LR{#1}} \def\Diag#1{\op{Diag}\LR{#1}} \def\diag#1{\op{diag}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\mb#1{\left[\begin{array}{c|c}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\fracLR#1#2{\LR{\frac{#1}{#2}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} \def\z#1{\op{\zeta}\!\LR{#1}} \def\smA{{\small A}} \def\smB{{\small B}} $Introduce the matrix variables $$\eqalign{ A &= X,\quad &L = \log(A), \quad &L_0 = \log(A_0) \\ }$$ Since $A$ is positive definite it can be diagonalized $$A=QBQ^T,\quad B=\Diag{b},\quad Q^TQ=I$$

First calculate the differential of the distance function $f$ $$\eqalign{ f^2 &= \|L-L_0\|^2_F \\ &= \LR{L-L_0}:\LR{L-L_0} \\ 2f\;df &= 2\LR{L-L_0}:dL \\ df &= \f\LR{L-L_0}:dL \\ df &= G:\c{dL} \qiq G \doteq \fracLR{L-L_0}{f} \\ }$$ Now invoke the Daleckii-Krein theorem $\:\big(\odot$ denotes the Hadamard product$\big)$ $$\eqalign{ \c{dL} &= Q\BR{R\odot\LR{Q^TdA\,Q}}Q^T \\ }$$ Substituting this into the previous differential leads to the desired gradient $$\eqalign{ df &= G:\c{Q\BR{R\odot\LR{Q^TdA\,Q}}Q^T} \\ &= Q\BR{R\odot\LR{Q^TG\,Q}}Q^T:dA \\ \grad{f}{A} &= Q\BR{R\odot\LR{Q^TG\,Q}}Q^T \\ }$$ The final task is to evaluate the $R$ matrix which lies at the heart of the theorem. This can be done using the log function and its derivative $\left\{\log(x),\frac{1}{x}\right\}$ evaluated at $B$, an all-ones matrix $J$, and $\z{X}$ which is an elementwise $\c{\sf zero\:indicator}$ function $$\eqalign{ \z{X}_{ij} &= \begin{cases} \o\qquad {\rm if}\;X_{ij}=0 \\ 0\qquad {\rm otherwise} \\ \end{cases} \\ Z &= \z{BJ-JB},\qquad L_\smB = \log(B),\qquad L_\smB' = B^{-1} \\ R &= {\frac{L_\smB J-JL_\smB+ZL_\smB'}{BJ-JB+Z}} \qquad \big({\rm Hadamard\;division}\big) \\\\ }$$ or in terms of the components of the $b$ vector $$\eqalign{ R_{jk} &= \begin{cases} {\Large\frac{\log(b_j)\,-\,\log(b_k)}{b_j\,-\,b_k}} \quad{\rm if}\; b_j\ne b_k \\ \\ \qquad\quad {\Large\frac{\o}{b_k}} \qquad\qquad {\rm otherwise} \\ \end{cases}\\ \\ }$$

Note that some of the steps above use the Frobenius product $(A:B)$, which is an extremely useful product notation for the trace function $$\eqalign{ A:B &= \trace{A^TB} \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \\ A:A &= \|A\|^2_F \qquad \big({\rm hence\;the\;name}\big) \\ }$$ Note that the Frobenius and Hadamard products commute $$\eqalign{ A:(B\odot C) \;=\; (A\odot B):C \;=\; \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij}C_{ij} \\ }$$ There are also easily derived rules for rearranging product terms, e.g. $$\eqalign{ A\odot B &= B\odot A \\ A:B &= B:A \\ (AC^T):B &= A:(BC) = (B^TA):C \\ }$$

Binxu Wang 王彬旭 · Answer 2 · 2022-05-15T06:26:11.713

This could be derived step by step, using chain rules

Frobenious norm. Let's deal with the squared norm first! taking sqrt is easy. $$ d(A,B)^2=\|A-B\|^2_F = tr((A-B)^T(A-B)) = \sum_{ij} (A_{ij}-B_{ij})^2 $$ So if one of the matrices $A$ is a function of a scalar $t$ $$ \frac{\partial d(A(t),B)^2}{\partial t} = 2\sum_{ij} (A_{ij}-B_{ij})\frac{\partial A_{ij}}{\partial t}\\ =2tr((A-B)^T\frac{\partial A}{\partial t}) $$ One specific case is $$ \frac{\partial d(A,B)^2}{\partial A_{ij}} = 2(A_{ij}-B_{ij})\\ $$ So we can write $$\frac{\partial d(A,B)^2}{\partial A}=2(A-B)$$

Getting rid of the square $$ \frac{\partial d(A(t),B)}{\partial t} = \frac{1}{2} d^{-1/2}\cdot 2tr((A-B)^T\frac{\partial A}{\partial t}) = d^{-1/2}tr((A-B)^T\frac{\partial A}{\partial t}) $$

matrix logrithm (See Derivative of matrix logarithm) $$ \frac{\partial \log(X(t))}{\partial t} = X(t)^{-1} \frac{\partial X(t)}{\partial t} $$ We can think of the entry $X_{ij}$ as a parameter of $X$, then we get $$\frac{\partial X(t)}{\partial X_{ij}}=E_{ij}$$ .$E_{ij}$ is a matrix which only $i,j$ element is 1, others are all $0$.
Using chain rule

$$ \frac{\partial d(X,A_0)^2}{\partial X_{ij}} =\partial_{X_{ij}} \|\log X -\log A_0\|^2_F \\ = 2tr\left((\log X -\log A_0)^T \frac{\partial \log X}{\partial X}\frac{\partial X} {\partial X_{ij}}\right)\\ =2 tr\left((\log X -\log A_0)^T (X^{-1}E_{ij})\right) $$

Note that the effect $E_{ij}$ in the trace operator is quite unique Since $$ tr(A^TB) = \sum{ij} A_{ij}B_{ij}\\ tr(M^TE_{kl}) = \sum{ij} M_{ij}E_{ij} = M_{kl} $$ $E_{ij}$ works as a delta function in getting the entry.

Thus taking the element wise derivative together and form a matrix $$ \left(\frac{\partial d(X,A_0)^2}{\partial X}\right)_{ij} = 2 tr((\log X -\log A_0)^T (X^{-1}E_{ij}))\\ \frac{\partial d(X,A_0)^2}{\partial X} = 2 X^{-1}(\log X -\log A_0)\\ \frac{\partial d(X,A_0)}{\partial X} = d^{-1/2} X^{-1}(\log X -\log A_0) $$

The first formula in Step #2 is not correct, unless X commutes with $\left(\frac{dX}{dt}\right);$ Have a look at the first part of this post for the derivative of the logarithm when the matrices don't commute. — greg, May 15 '22 at 19:56
In case you want to double-check your algebra, I got $$\eqalign{ f &= d(X,A_0) \ W &= (X+I)^{-1}(X-I) \ \frac{\partial f}{\partial X} &= \frac1f\Bigg[X^{-1}\log(X)

\sum_{k=0}^\infty\sum_{j=1}^{2k+1}

\left(\frac{W^{j-1}(I-W);\log(A_0);(I-W)W^{2k+1-j}}{2k+1}\right)\Bigg] \ }$$ — greg, May 16 '22 at 15:47

Calculating the gradient of Log-Euclidean distance between SPD matrices on Riemannian manifold

2 Answers2