2

How to calculate the gradient $\nabla_{\bf A} \left( {\bf x}^\top {\bf A}^{1/2} {\bf x} \right)$, where $\bf x$ is $N \times 1$ column vector and $\bf A$ is $N \times N$ symmetric positive matrix?

The difficulty is that there is ${\bf x}^\top {\bf A}^{1/2} {\bf x}$ rather than ${\bf x}^\top {\bf A} {\bf x}$.


Motivation

I want to calculate the gradient of the vector Gaussian distribution $\mathcal{N} \left( \mathbf{y} \mid \mathbf{A}^{\frac{1}{2}}\mathbf{x}, \mathbf{I} \right)$ w.r.t. $\mathbf{A}$, where $\mathbf{I}$ is an identity matrix, and

$$ \mathcal{N} \left( \mathbf{y} \mid \mathbf{A}^{\frac{1}{2}},\mathbf{I} \right) = (2\pi)^{-\frac{N}{2}} \exp \left( -\left\|\mathbf{y}-\mathbf{A}^{\frac{1}{2}}\mathbf{x} \right\|_2^2 \right) $$

where $\|\cdot\|_2$ denotes the $\ell_2$ norm. Its difficulty is to calculate the term

$$\nabla_{\mathbf{A}} \left( \mathbf{y}^T\mathbf{A}^{\frac{1}{2}}\mathbf{x} \right)$$

This term can be similarly solved by $\nabla_{\mathbf{A}}\left( \mathbf{x}^T\mathbf{A}^{\frac{1}{2}}\mathbf{x} \right)$. And I believe this term exists.

1 Answers1

5

$ \def\o{{\tt1}}\def\p{\partial} \def\LR#1{\left(#1\right)} \def\BR#1{\Big(#1\Big)} \def\bR#1{\big(#1\big)} \def\vc#1{\operatorname{vec}\LR{#1}} \def\dvc#1{\operatorname{unvec}\LR{#1}} \def\trace#1{\operatorname{Tr}\LR{#1}} \def\qiq{\quad\implies\quad} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\m#1{\left[\begin{array}{r}#1\end{array}\right]} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\gradLR#1#2{\LR{\grad{#1}{#2}}} $For typing convenience, define the variables $$\eqalign{ Y &= xx^T &\qiq y = \vc{Y} \,\doteq\, \LR{x\otimes I}x \\ B &= A^{1/2} &\qiq b = \vc{B} \\ }$$ and the Frobenius product, which is a concise notation for the trace $$\eqalign{ Y:Z &= \sum_{i=1}^n\sum_{j=1}^n Y_{ij}Z_{ij} \;=\; \trace{Y^TZ} \\ Y:Y &= \|Y\|^2_F \\ }$$ The properties of the underlying trace function allow the terms in a Frobenius product to be rearranged in many different ways, e.g. $$\eqalign{ X:Y &= Y:X \\ X:Y &= X^T:Y^T \\ Z:\LR{XY} &= \LR{ZY^T}:X \\&= \LR{X^TZ}:Y \\ }$$ Exploring the relationship between the symmetric matrices $A\,{\rm and}\,B$ $$\eqalign{ A &= B^2 \\ dA &= B\,dB + dB\,B \\ &= B\,dB\,I + I\,dB\,B \\ da &= \LR{I\otimes B+B\otimes I} db \\ &= \LR{B\oplus B} db \qquad \big\{{\rm Kronecker\:Sum}\big\} \\ db &= \LR{B\oplus B}^{-1}da \\ }$$ Use the above notation to write the objective function. Then calculate its differential and gradient. $$\eqalign{ \phi &= Y:B \\ d\phi &= Y:dB \\ &= y:db \\ &= y:\LR{B\oplus B}^{-1}da \\ &= \LR{B\oplus B}^{-1}y:da \\ \grad{\phi}{a} &= \LR{B\oplus B}^{-1}y \\ }$$ This is the vectorized form of the gradient, but there's a clever formula involving the identity matrix and Kronecker products which recovers the matrix form $$\eqalign{ \grad{\phi}{A} &= \LR{\vc{I}^T\otimes I}\LR{I\otimes\gradLR{\phi}{a}} \\ }$$ which can be written entirely in terms of $(I,A,x)$ as $$\eqalign{ \grad{\phi}{A} &= \LR{\vc{I}^T\otimes I}\LR{I\otimes\LR{\LR{I\otimes A^{1/2}+A^{1/2}\otimes I}^{-1}\LR{x\otimes I}x}} \\ }$$

greg
  • 40,033