3

Consider

$$\mbox{Tr} (X^TP^TPX)$$

where $X$ and $P$ are real matrices. What is the best way to approach the calculation of its derivative with respect to $P$?

trembik
  • 1,309

3 Answers3

5

Using Einstein's summation convention, we have, in component form

$$\mbox{Tr} (X^T P^T P X) = \mbox{Tr} ((PX)^T P X) = (PX)_{ij} (PX)_{ij} = P_{ik}X_{kj}P_{il}X_{lj}$$

Therefore, $$\frac{\partial}{\partial P_{ab}} \mbox{Tr} (X^T P^T P X) = \delta_{ai}\delta_{bk}X_{kj}P_{il}X_{lj} + P_{ik}X_{kj}\delta_{ai}\delta_{bl}X_{lj} = 2(PXX^T)_{ab}$$

  • At the very last step, you swapped indices for X to match the dimensions and hence got X^T. Is it rigorous to do it this way? Just interchange them? Or there is a more formal way to do this? – trembik Jul 30 '14 at 18:34
  • I swapped the indices on X so that I could interpret the sum as a matrix sum. It is indeed rigorous. This is the definition of matrix transpose. –  Jul 31 '14 at 09:19
  • Using above if I try to differentiate Tr(P^TX^TXP), I will get 2X^TXP - correct? – trembik Aug 01 '14 at 18:43
2

Let $X \in \mathcal{M}_{n}(\mathbb{R})$ and the $\varphi \, : \, \mathcal{M}_{n}(\mathbb{R}) \, \longrightarrow \, \mathbb{R}$ such that :

$$ \forall P \in \mathcal{M}_{n}(\mathbb{R}), \, \varphi(P) = \mathrm{Tr}\big( X^{\top} P^{\top} P X \big).$$

The space $\mathcal{M}_{n}(\mathbb{R})$ can be equipped with the following inner product :

$$ \forall (A,B) \in \mathcal{M}_{n}(\mathbb{R}), \, \left\langle A,B \right\rangle = \mathrm{Tr}(A^{\top}B) $$

It follows that : $\forall P \in \mathcal{M}_{n}(\mathbb{R}), \, \varphi(P) = \left\langle PX,PX \right\rangle$. As a consequence, if $\mathrm{D}_{P}\varphi \cdot H$ denotes the differential of $\varphi$ at $P$ evaluated at the point $H$, we have :

$$ \mathrm{D}_{P}\varphi \cdot H = 2\left\langle HX,PX \right\rangle $$

Note that : $$ \begin{align*} 2\left\langle HX,PX \right\rangle &= {} 2\mathrm{Tr}\big( X^{\top}H^{\top}PX \big) \\[2mm] &= 2\mathrm{Tr}\big( PXX^{\top}H^{\top} \big) \\[2mm] &= 2\mathrm{Tr}\big( H^{\top} P X X^{\top} \big) \\ \end{align*} $$

Therefore,

$$ \nabla \varphi (P) = 2 P X X^{\top} $$

pitchounet
  • 6,644
  • Could you, please, explain a little bit more the way you define differential of phi at P evaluated at H? – trembik Jul 30 '14 at 17:27
  • By definition, $\mathrm{D}{P}\varphi$ is the linear form on $\mathcal{M}{n}(\mathbb{R})$ such that :

    $$ \varphi(P+H) = \varphi(P) + \mathrm{D}{P}\varphi \cdot H + \mathop{o} \limits{\Vert H \Vert \to 0}\big( \Vert H \Vert \big) $$

    – pitchounet Jul 30 '14 at 21:14
0

Let $f : \mathbb R^{m \times n} \to \mathbb R$ be defined by

$$f (\mathrm X) = \mbox{tr} (\mathrm A^T \mathrm X^T \mathrm X \mathrm A)$$

The directional derivative of $f$ in the direction of $\mathrm V$ at $\mathrm X$ is

$$\begin{array}{rl} D_{\mathrm V} f (\mathrm X) &= \mbox{tr} (\mathrm A^T \mathrm V^T \mathrm X \mathrm A) + \mbox{tr} (\mathrm A^T \mathrm X^T \mathrm V \mathrm A)\\ &= \mbox{tr} (\mathrm V^T \mathrm X \mathrm A \mathrm A^T ) + \mbox{tr} (\mathrm A\mathrm A^T \mathrm X^T \mathrm V)\\ &= \mbox{tr} (\mathrm V^T \mathrm X \mathrm A \mathrm A^T ) + \mbox{tr} ((\mathrm X \mathrm A \mathrm A^T)^T \mathrm V)\\ &= \langle \mathrm V, \mathrm X \mathrm A \mathrm A^T \rangle + \langle \mathrm X \mathrm A \mathrm A^T, \mathrm V \rangle\\ &= \langle 2 \mathrm X \mathrm A \mathrm A^T, \mathrm V \rangle\end{array}$$

Hence,

$$\nabla f (\mathrm X) = 2 \mathrm X \mathrm A \mathrm A^T$$