1

Let $A \in \mathbb{R}^{n \times n}$ and $X, Y \in \mathbb{R}^{n \times r}$. Consider the function \begin{equation} H \left( X , Y \right) := \dfrac{1}{2} \left\lVert A - XY^{T} \right\rVert _{F}^{2} , \end{equation} where $\left\lVert \cdot \right\rVert _{F}$ denotes the Frobenius norm.

The first one seems to be easy. I used the chain rule to get \begin{equation} \nabla_{X} H \left( X , Y \right) = \left( A - XY^{T} \right) \nabla_{X} \left( - XY^{T} \right) = - \left( A - XY^{T} \right) Y^{T} . \end{equation}

For the second one, as $A - XY^{T} = A - \left( X^{T}Y \right) ^{T}$, we have \begin{align} \nabla_{Y} H \left( X , Y \right) = \left( \left( A - XY^{T} \right) \nabla_{Y} \left( - \left( X^{T}Y \right) ^{T} \right) \right) ^{T} & = \left( - \left( A - XY^{T} \right) X^{T} \right) ^{T} \\ & = - X \left( A - XY^{T} \right) ^{T} . \end{align}

Is my $\nabla_{Y} H \left( X , Y \right)$ formula correct?

And is there other approaches to compute the gradient. I guess we can compute $H \left( X + \delta X , Y \right)$ then deduce the gradient from the difference $H \left( X + \delta X , Y \right) - H \left( X , Y \right)$.

JKay
  • 647

1 Answers1

3

Let $$ B = (XY^T-A) \implies dB = (dX\,Y^T+X\,dY^T) $$ Write the function in terms of $B$, then find its differential and gradients. $$\eqalign{ H &= \tfrac{1}{2}B:B \cr dH &= B:dB \cr &= B:dX\,Y^T &+ \,\,B:X\,dY^T \cr &= B:dX\,Y^T &+ \,\,B^T:dY\,X^T \cr &= BY:dX &+ \,\,B^TX:dY \cr \frac{\partial H}{\partial X} &= BY, \quad \frac{\partial H}{\partial Y} &= B^TX \cr }$$ where the colon product is a convenient notation for the trace, i.e. $$\eqalign{ M:N &= {\rm Tr}(M^TN) }$$ Depending on your preferred layout convention, these gradients might need to be transposed.


Your first solution has a problem (in red) $$\eqalign{ \frac{\partial H}{\partial X} &= BY^T = X\color{red}{Y^TY^T} - AY^T \cr }$$ In matrix calculus, terms involving the transpose are invariably of the form $Y^TY$ or $YY^T$

Your second solution is $XB^T$ but it should be $B^TX$.

To get a better handle on these transpose issues, I recommend that you work through the problem in which all of the matrices are rectangular, i.e. $$ A \in {\mathbb R}^{m\times n}, \quad X \in {\mathbb R}^{m\times r}, \quad Y \in {\mathbb R}^{n\times r} $$ Then the solution to the current problem can be recovered by setting $m=n$.

greg
  • 40,033
  • thanks for your recommendation. now it makes more sense since, at some point when I consider $H(Z,Z)$ and try to compute its gradients, my solution cause some trouble but your solution gives $\left( B + B^{T} \right) X$ which I think should be correct now – JKay Apr 04 '19 at 08:28