2

Considering ridge regression problem with given objective function as:

$$f (W) = \|XW - Y\|_F^2 + \lambda \|W\|_F^2$$

Having convex and twice differentiable function results into:

$$\nabla f (W) = 2 \lambda W + 2X^T(XW - Y)$$

And finding its roots. My question is: why is the gradient of $\|XW - Y\|_F^2$ equal to $2X^T(XW - Y)$?

Artem
  • 123

1 Answers1

1

Rewrite

$$||XW-Y||^2_2 = \left[XW-Y\right]^T\left[XW-Y\right]=\left[W^TX^T-Y^T\right]\left[XW-Y\right]=W^TX^TXW-Y^TXW-W^TX^TY+Y^TY$$

Hence, the total derivative on the right-hand side gives

$$dW^TX^TXW+W^TX^TXdW-2dW^TX^TY=2dW^T\left[X^TXW-X^TY \right]=dW^T\left[2X^T(XW-Y) \right].$$

The expression in the bracket is the gradient of the initial expression with respect to $W$. Note, that I used the fact that the transpose of a scalar is a scalar.