I am reading this paper: http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf
I have difficulties on the derivation of equation (6) on page 4. It is to take gradient of a quadratic form.
I searched around and found this: How to take the gradient of the quadratic form?
I can understand most of the answer in above link, but:
- Why the $y$ in the second part of chain rule needs to be transposed?
- In neither original paper or above Q/A it tells me how to take derivative of a vector valued function($R^n \rightarrow R^n$). I think that was used implicitly in the derivation of $\dfrac{\partial (x^TA^T)}{\partial x}$. And that may be not rigorous to apply $$\dfrac{\partial (b^Tx)}{\partial x} = \dfrac{\partial (x^Tb)}{\partial x} = b$$ directly on $\dfrac{\partial (x^TA^T)}{\partial x}$ to get $A^T$.