2

$$f(W)=(Ax-b)^TW(Ax-b)=x^TA^TWAx-2b^TWAx+b^TWb$$

where $f(W)$ is a function of $W$, $A$ is a known matrix, $x$ and $b$ are vectors ($b$ is known). How to get $\frac{\partial f}{\partial W}$?

James LT
  • 338

2 Answers2

2

Define the vector $$y=Ax-b$$ and write the function in terms of this new variable and the double-dot (aka Frobenius) product.

In this form, the differential & gradient are easy to calculate $$\eqalign{ f &= yy^T:W \cr df &= yy^T:dW \cr \frac{\partial f}{\partial W} &= yy^T \cr }$$

hans
  • 1,804
  • Should that be $y^Ty$? – James LT Feb 12 '17 at 22:54
  • 2
    @James No, $y^Ty$ is a scalar, and the gradient wrt a matrix is a matrix. The same way that the gradient of a scalar function wrt to a vector argument, is a vector and not a scalar. – hans Feb 12 '17 at 23:22
1

$$f (\mathrm W) := (\mathrm A \mathrm x - \mathrm b)^{\top} \mathrm W (\mathrm A \mathrm x - \mathrm b) = \mbox{tr} \left( (\mathrm A \mathrm x - \mathrm b)(\mathrm A \mathrm x - \mathrm b)^{\top} \mathrm W \right) = \langle (\mathrm A \mathrm x - \mathrm b)(\mathrm A \mathrm x - \mathrm b)^{\top}, \mathrm W \rangle$$

Hence,

$$f ' (\mathrm W) = \color{blue}{(\mathrm A \mathrm x - \mathrm b)(\mathrm A \mathrm x - \mathrm b)^{\top}}$$