2

Does gradient descent converge to a minimum-norm solution in least-squares problems?

In this wonderful answer, the writer writes a proof that says to which value gradient descent will converge.

I'm trying to understand a simple detail.

It is implied that if $A = U\Sigma V^T$ and $y = V^Tx$, then $(I-A^TA)^kx = (I-\Sigma^T\Sigma)^ky$ and I am struggling more than I should be to understand why that is.

Shouldn't it be $ (I-V\Sigma^T\Sigma V^T)^kVy$?

Oria Gruber
  • 13,035

1 Answers1

2

What you are trying to show is false, but you can get something close then you get the desired matrix $(I-\Sigma^T\Sigma)^k$ by considering the iteration this definition is used to construct.

$$ \begin{aligned} (I-A^TA)^kx &= (I-V\Sigma^T\Sigma V^T)^kx \\ &= (VV^T-V\Sigma^T\Sigma V^T)^kx := (VDV^T)^kx, \end{aligned} $$ where $D = I-\Sigma^T\Sigma$. $$ (VDV^T)^kx = VD^kV^Tx = VD^ky. $$

I believe this is enough to proceed with the proof in the answer. Since you are forming an iteration between the $y$'s, you multiply both sides of the $x$ iteration by $V^T$, which cancels out the $V$, which yields the desired result.

whpowell96
  • 7,849