4

I am trying to derive the gradient of the function $f(X) = AXZ + XZX^TXZ$ where $A,X,Z \in R^{n \times n}$ with respect to $X$ matrix. I read a post Matrix-by-matrix derivative formula about matrix derivate, but I am not able to follow it. In my case $\frac{\partial f(X)}{\partial X)}$ would be a tensor, but If I try to use the formula given in the post I would get a matrix. How should I process to get the partial derivate?

newbie
  • 81
  • Yes, that's correct that you would end up with 4th order Tensor. So, you can employ Kronecker product, and just vectorize accordingly. – user550103 Apr 02 '19 at 18:29
  • Just out of curiosity, where does this second term comes from ? And in what field does it makes sense ? – P. Quinton Apr 02 '19 at 18:30
  • @P.Quinton Actually the above function is the gradient of some other complex function – newbie Apr 02 '19 at 20:20

1 Answers1

6

Let \begin{align} F := f(X) = AXZ + XZX^TXZ \ . \end{align}

Now take the differential, then vectorize, and thereafter obtain the gradient. \begin{align} dF &= A \ dX \ Z + dX \ Z \ X^T \ X \ Z + X \ Z \ dX^T \ X \ Z + X \ Z \ X^T \ dX \ Z \\ \\ \Longleftrightarrow \ {\rm vec}\left( dF \right) &= {\rm vec}\left(A \ dX \ Z\right) + {\rm vec}\left(dX \ Z \ \underbrace{X^T \ X \ Z}_{ } \right) \\ &+ {\rm vec}\left(X \ Z \ dX^T \ X \ Z \right) + {\rm vec}\left(X \ Z \ X^T \ dX \ Z \right) \\ \\ \Longleftrightarrow \ {\rm vec}\left( dF \right) &= \left(Z^T \otimes A \right) {\rm vec}\left( dX \right) + \left(\left(X^T X Z\right)^T Z^T \otimes I\right) {\rm vec}\left( dX\right) \\ &+ \left(\left(X Z\right)^T \otimes \left( XZ\right)\right) \underbrace{{\rm vec} \left( dX^T\right)}_{= K {\rm vec}\left( dX\right)} + \left(Z^T \otimes \left( XZX^T\right)\right) {\rm vec}\left( dX\right) \\ \\ \Rightarrow \frac{\partial f(X)}{dX} = \frac{\partial {\rm vec}\left( F\right)}{{\rm vec}\left( dX\right)} &= \left(Z^T \otimes A \right) + \left(\left(X^T X Z\right)^T Z^T \otimes I\right) \\ &+ \left(\left(X Z\right)^T \otimes \left( XZ\right)\right) K + \left(Z^T \otimes \left( XZX^T\right)\right) \ , \end{align} where $K$ is the commutation matrix for the Kronecker products.

user550103
  • 2,773