Consider
$$\mbox{Tr} (X^TP^TPX)$$
where $X$ and $P$ are real matrices. What is the best way to approach the calculation of its derivative with respect to $P$?
Consider
$$\mbox{Tr} (X^TP^TPX)$$
where $X$ and $P$ are real matrices. What is the best way to approach the calculation of its derivative with respect to $P$?
Using Einstein's summation convention, we have, in component form
$$\mbox{Tr} (X^T P^T P X) = \mbox{Tr} ((PX)^T P X) = (PX)_{ij} (PX)_{ij} = P_{ik}X_{kj}P_{il}X_{lj}$$
Therefore, $$\frac{\partial}{\partial P_{ab}} \mbox{Tr} (X^T P^T P X) = \delta_{ai}\delta_{bk}X_{kj}P_{il}X_{lj} + P_{ik}X_{kj}\delta_{ai}\delta_{bl}X_{lj} = 2(PXX^T)_{ab}$$
Let $X \in \mathcal{M}_{n}(\mathbb{R})$ and the $\varphi \, : \, \mathcal{M}_{n}(\mathbb{R}) \, \longrightarrow \, \mathbb{R}$ such that :
$$ \forall P \in \mathcal{M}_{n}(\mathbb{R}), \, \varphi(P) = \mathrm{Tr}\big( X^{\top} P^{\top} P X \big).$$
The space $\mathcal{M}_{n}(\mathbb{R})$ can be equipped with the following inner product :
$$ \forall (A,B) \in \mathcal{M}_{n}(\mathbb{R}), \, \left\langle A,B \right\rangle = \mathrm{Tr}(A^{\top}B) $$
It follows that : $\forall P \in \mathcal{M}_{n}(\mathbb{R}), \, \varphi(P) = \left\langle PX,PX \right\rangle$. As a consequence, if $\mathrm{D}_{P}\varphi \cdot H$ denotes the differential of $\varphi$ at $P$ evaluated at the point $H$, we have :
$$ \mathrm{D}_{P}\varphi \cdot H = 2\left\langle HX,PX \right\rangle $$
Note that : $$ \begin{align*} 2\left\langle HX,PX \right\rangle &= {} 2\mathrm{Tr}\big( X^{\top}H^{\top}PX \big) \\[2mm] &= 2\mathrm{Tr}\big( PXX^{\top}H^{\top} \big) \\[2mm] &= 2\mathrm{Tr}\big( H^{\top} P X X^{\top} \big) \\ \end{align*} $$
Therefore,
$$ \nabla \varphi (P) = 2 P X X^{\top} $$
$$ \varphi(P+H) = \varphi(P) + \mathrm{D}{P}\varphi \cdot H + \mathop{o} \limits{\Vert H \Vert \to 0}\big( \Vert H \Vert \big) $$
– pitchounet Jul 30 '14 at 21:14Let $f : \mathbb R^{m \times n} \to \mathbb R$ be defined by
$$f (\mathrm X) = \mbox{tr} (\mathrm A^T \mathrm X^T \mathrm X \mathrm A)$$
The directional derivative of $f$ in the direction of $\mathrm V$ at $\mathrm X$ is
$$\begin{array}{rl} D_{\mathrm V} f (\mathrm X) &= \mbox{tr} (\mathrm A^T \mathrm V^T \mathrm X \mathrm A) + \mbox{tr} (\mathrm A^T \mathrm X^T \mathrm V \mathrm A)\\ &= \mbox{tr} (\mathrm V^T \mathrm X \mathrm A \mathrm A^T ) + \mbox{tr} (\mathrm A\mathrm A^T \mathrm X^T \mathrm V)\\ &= \mbox{tr} (\mathrm V^T \mathrm X \mathrm A \mathrm A^T ) + \mbox{tr} ((\mathrm X \mathrm A \mathrm A^T)^T \mathrm V)\\ &= \langle \mathrm V, \mathrm X \mathrm A \mathrm A^T \rangle + \langle \mathrm X \mathrm A \mathrm A^T, \mathrm V \rangle\\ &= \langle 2 \mathrm X \mathrm A \mathrm A^T, \mathrm V \rangle\end{array}$$
Hence,
$$\nabla f (\mathrm X) = 2 \mathrm X \mathrm A \mathrm A^T$$