2

I am reading this paper: http://www.cs.cmu.edu/~quake-papers/painless-conjugate-gradient.pdf

I have difficulties on the derivation of equation (6) on page 4. It is to take gradient of a quadratic form.

I searched around and found this: How to take the gradient of the quadratic form?

I can understand most of the answer in above link, but:

  • Why the $y$ in the second part of chain rule needs to be transposed?
  • In neither original paper or above Q/A it tells me how to take derivative of a vector valued function($R^n \rightarrow R^n$). I think that was used implicitly in the derivation of $\dfrac{\partial (x^TA^T)}{\partial x}$. And that may be not rigorous to apply $$\dfrac{\partial (b^Tx)}{\partial x} = \dfrac{\partial (x^Tb)}{\partial x} = b$$ directly on $\dfrac{\partial (x^TA^T)}{\partial x}$ to get $A^T$.

1 Answers1

2

Well, never mind. Your matrix $A$ is not necessarily symmetric, which is annoying. But then you take $$ f(x) = \frac{1}{2} x^T A x - b^T x + c, $$ where $B$ is a constant column vector, and $x$ is the column vector with entries $x_j,$ the $n$ coordinate functions. For your purposes, perhaps the choices are $n=1,2,3.$

So $$ f(x) = \frac{1}{2} \left( \sum_{i=1}^n \sum_{j=1}^n a_{ij} x_i x_j \right) - \left( \sum_{k=1}^n b_k x_k \right) + c. $$

I want to fix an index $m$ and find $\frac{\partial f}{\partial x_m}.$ The terms in $f$ that involve $x_m$ are exactly $$ \frac{1}{2} \left( a_{mm} x_m^2 + \sum_{i \neq m}^n a_{im} x_i x_m + \sum_{j \neq m}^n a_{mj} x_m x_j \right) - b_m x_m . $$

As a result, $$ \frac{\partial f}{\partial x_m} = a_{mm} x_m + \frac{1}{2} \sum_{i \neq m}^n a_{im} x_i + \frac{1}{2} \sum_{j \neq m}^n a_{mj} x_j - b_m $$

Next, separate $$ a_{mm} x_m = \frac{1}{2} a_{mm} x_m + \frac{1}{2}a_{mm} x_m $$ and stick the two pieces back into the sums, arriving at $$ \frac{\partial f}{\partial x_m} = \frac{1}{2} \sum_{i =1}^n a_{im} x_i + \frac{1}{2} \sum_{j = 1}^n a_{mj} x_j - b_m $$ The traditional thing is to replace $i$ by $j$ in the first sum, written using the matrix $A^T;$ so $$ \frac{\partial f}{\partial x_m} = \frac{1}{2} \sum_{j =1}^n a_{mj}^T x_j + \frac{1}{2} \sum_{j = 1}^n a_{mj} x_j - b_m $$

So, we get the gradient of $f,$ written as a column vector, as $$ \nabla f = \frac{1}{2} \left( A^T + A \right) x - b. $$

That's about it. The new matrix $$ \frac{1}{2} \left( A^T + A \right) $$ is called the symmetric part of $A,$ and it is equal to $A$ itself if $A$ is already symmetric.

Will Jagy
  • 146,052
  • 1
    It's nice to avoid the use of components, as in Did's derivation here. – littleO Oct 27 '13 at 03:14
  • 2
    @littleO, I don't know the background of the person asking, or what sort of peculiar notation might be familiar. I will leave this here. If you teach a different way, go ahead and answer. – Will Jagy Oct 27 '13 at 03:18
  • I really like this flavor of derivation, it is more straightforward and avoids the problems I encountered. Please review the minor fix on one equation in your answer. Thank you. @littleO Thank you very much for other solutions. I am interested in your answer in that Q/A(which seems to solve my first sub-question). I leave a comment in your answer for some details. – craftsman.don Oct 27 '13 at 13:29