Considering ridge regression problem with given objective function as:
$$f (W) = \|XW - Y\|_F^2 + \lambda \|W\|_F^2$$
Having convex and twice differentiable function results into:
$$\nabla f (W) = 2 \lambda W + 2X^T(XW - Y)$$
And finding its roots. My question is: why is the gradient of $\|XW - Y\|_F^2$ equal to $2X^T(XW - Y)$?