0

$$\frac{d}{dw} [w^TX^TXw - 2w^TX^Ty+y^Ty] = 2(X^TXw-X^Ty)$$

I do not understand how the RHS was obtained -- are there certain matrix differentiation properties which can be used to show this? Why does differentiating w.r.t. $w$ get rid of the $w^T$ (and not $w$) from each of the terms?

  • 1
    Expand the matrix products and compute the partial derivatives $\frac{\partial}{\partial w_i}$ – Miguel Feb 23 '22 at 19:44

2 Answers2

3

Notice that $$w^TX^TXw−2w^TX^Ty+y^Ty=\langle Xw-y,Xw-y\rangle =\|Xw-y\|^2,$$ where $\langle \cdot,\cdot \rangle $ is the inner (or the dot) product of $\mathbb{R}^n$.

If you denote $$f(w)=\langle Xw-y,Xw-y\rangle,$$ then the inner product properties leads to $f(w+h)=f(w)+2\langle Xw-y,Xh\rangle+\langle Xh,Xh\rangle=f(w)+2\langle X^T(Xw-y),h\rangle+\langle Xh,Xh\rangle$.

If you use the definition and properties of directional derivative, you find that $$f(w+h)=f(w)+\langle \nabla f(w),h\rangle + \langle Xh,Xh\rangle,$$ and hence $$\frac{d f}{d w}=\nabla f(w)=2X^T(Xw−y).$$

You can find more comments on this thread, and in Matrix Calculus. Try also search for ''\|Xw-y\|^2 derivative'' on SearchOnMath.

1

Since$$ \frac{\partial}{\partial x}(x^TBx)=(B+B^T)x $$ The first term in your problem gives $$ w^T(X^TX+X^TX)=2X^TX w $$

The last term simplifies to $\boldsymbol{0}$.

By noting that $$ \frac{\partial x^T a}{\partial x}=a $$ We generalize this to the matrix $A$ instead of $a$ so the middle term gives: $$ \frac{\partial}{\partial w}{(-2w^T X^T y)}=-2X^Ty $$

wd violet
  • 1,360