3

Given the matrix $A \in {\Bbb R}^{n \times m}$, let the scalar field $f : {\Bbb R}^n \times {\Bbb R}^m \to {\Bbb R}_0^+$ be defined by

$$ f (u, v) : = \frac12 \left\| A - u v^T \right\|_{\text{F}}^2 $$

where $\| \cdot \|_{\text{F}}$ denotes the Frobenius norm. Find the gradients $\nabla_u f$ and $\nabla_v f$.


Solution:

Let $R=A-uv^T$. The gradients are $\nabla_u f = - R v$ and $\nabla_v f = - R^T u $.


I am struggling. I have read about the differentials method and I have tried to apply rules from The Matrix Cookbook, but I always have problems with transposed matrices in the result. Is there a systematic way to solve these problems without making use of summations?

If you can solve the derivative it would be great, but it would also be awesome if you just point me out to somewhere where this is explained properly

ek_q_t
  • 33

1 Answers1

3

I assume that $ \| { - } \| _ F $ is the Frobenius norm, so that $ \| M \| _ F ^ 2 = \operatorname { Tr } ( M ^ \top M ) $. Since the trace is linear in the components of a matrix, it's easy to take the differential of a trace: $ \mathrm d ( \operatorname { Tr } M ) = \operatorname { Tr } \mathrm d M $. Similarly, $ \mathrm d ( M ^ \top ) = \mathrm d M ^ \top $. And since we're working with matrices, we need to be careful with the order of the Product Rule: $ \mathrm d ( M N ) = \mathrm d M \, N + M \, \mathrm d N $. Now we have the needed ingredients to calculuate the differential of $ f $: $$ \mathrm d f = \mathrm d \Big ( \frac 1 2 \, \operatorname { Tr } ( R ^ \top R ) \Big ) = \frac 1 2 \operatorname { Tr } ( \mathrm d R ^ \top \, R + R ^ \top \, \mathrm d R ) \text . $$ I left this in terms of $ R $ since that's how your expected answer is written, and $ R $ clearly plays a prominent role; but we can calculate $ \mathrm d R $ too: $$ \mathrm d R = \mathrm d ( A - u v ^ \top ) = \mathrm d A - \mathrm d u \, v ^ \top - u \, \mathrm d v ^ \top \text , $$ whose transpose is $$ \mathrm d R ^ \top = \mathrm d A ^ \top - v \, \mathrm d u ^ \top - \mathrm d v \, u ^ \top \text . $$ Since you're only interested in partial derivatives with respect to $ u $ and $ v $, I'll treat $ A $ as constant and so drop $ \mathrm d A $ in what follows.

So the final expression for $ \mathrm d f $, keeping $ R $ only where it's not differentiated, seems to be $$ \mathrm d f = \frac 1 2 \operatorname { Tr } \big ( ( - v \, \mathrm d u ^ \top - \mathrm d v \, u ^ \top ) R + R ^ \top ( - \mathrm d u \, v ^ \top - u \, \mathrm d v ^ \top ) \big ) = - \frac 1 2 \operatorname { Tr } ( v \, \mathrm d u ^ \top \, R + \mathrm d v \, u ^ \top R + R ^ \top \, \mathrm d u \, v ^ \top + R ^ \top u \, \mathrm d v ^ \top ) \text . $$ This is about as far as you can go by applying rules for differentiation, but it's always worth considering whether you can simplify something involving traces and transposes. The important properties of the trace here are its linearity, that taking the transpose doesn't affect the trace, the cyclic property of the trace, and that the trace of a $ 1 $-by-$ 1 $ matrix is just its entry. So we can look at each term separately, take its transpose if necessary so that the differential factor has no transpose, cycle its factors so that the term is a $ 1 $-by-$ 1 $ matrix, and then get rid of the trace entirely. So we get $$ \mathrm d f = - \frac 1 2 \operatorname { Tr } ( R ^ \top \, \mathrm d u \, v ^ \top + \mathrm d v \, u ^ \top R + R ^ \top \, \mathrm d u \, v ^ \top + \mathrm d v \, u ^ \top R ) = - \frac 1 2 \operatorname { Tr } ( v ^ \top R ^ \top \, \mathrm d u + u ^ \top R \, \mathrm d v + v ^ \top R ^ \top \, \mathrm d u + u ^ \top R \, \mathrm d v ) = - \frac 1 2 ( v ^ \top R ^ \top \, \mathrm d u + u ^ \top R \, \mathrm d v + v ^ \top R ^ \top \, \mathrm d u + u ^ \top R \, \mathrm d v ) \text . $$ Finally, notice that the terms are repeated, so you can cancel the half and get the true final answer for the differential of $ f $: $$ \mathrm d f = - v ^ \top R ^ \top \, \mathrm d u - u ^ \top R \, \mathrm d v \text . $$ By fitting this to the pattern $ \mathrm d f = \nabla _ u f ^ \top \, \mathrm d u + \nabla _ v f ^ \top \, \mathrm d v $, you can read off your partial derivatives.

Specifically, $ \nabla _ u f = ( - v ^ \top R ^ \top ) ^ \top = - R v $, and $ \nabla _ v f = ( - u ^ \top R ) ^ \top = - R ^ \top u $. So the answers that you expected seem to be missing some minus signs. It's easy to check that the minus signs appear in the $ 1 $-by-$ 1 $ case where all of the matrix stuff is trivial, so they really should be there.

Toby Bartels
  • 4,947
  • 1
    Thank you so much for your answer! It is the most detailed explanation I could hope for! You are right, I missed the minus sign in the text of my question, I will edit it now! – ek_q_t Jul 02 '23 at 15:49
  • You're welcome! There are other ways to do it, but I love differentials, so since you said that you'd tried that way, I wanted to make sure that you had a solution that used them. – Toby Bartels Jul 03 '23 at 02:01