Hi I know how to derive the following result below,
\begin{eqnarray}
\frac{\partial}{\partial X} Tr (X^\top A X B)= A^\top X B^\top +AXB\\
\frac{\partial}{\partial X^\top} Tr (X^\top A X B)=\left( A^\top X B^\top +AXB\right)^\top
\end{eqnarray}
where $X^\top$ is the real transpose of $X$.
However I am trying to derive the following
\begin{eqnarray}
\frac{\partial}{\partial X^H} Tr ( X X^H X X ^H)=?
\end{eqnarray}
where $X^H$ is the complex (hermitian) transpose of the matrix $X$, and the real transpose is given by $X^\top$. How can we approach this? Thanks!
- 23,223
- 10,393
-
@A.MONNET No, the answers for the first two equations do not have a trace in them. They are correct as written. THanks – Jeff Faraci Dec 18 '16 at 20:58
-
Ok. Does $\frac{\partial}{\partial X^H} Tr(X)$ exist ? I was confused with another derivative in the first comment Sorry. – A. PI Dec 18 '16 at 21:17
-
@A.MONNET Yes, $\frac{\partial}{\partial X^H} Tr (X X^H)= X$ – Jeff Faraci Dec 19 '16 at 13:35
-
And $\frac{\partial}{\partial X^H} Tr(X)?$ – A. PI Dec 19 '16 at 13:48
2 Answers
Consider the function $$\eqalign{ f(Y) &= \operatorname{tr}(AYAY) = A^T:YAY \cr }$$where colon denotes the Frobenius (aka double-dot) product.
Let's find the differential and gradient of this function
$$\eqalign{
df &= A^T:(dY\,AY+YA\,dY) \cr
&= 2\,A^TY^TA^T:dY \cr
\cr
\frac{\partial f}{\partial Y} &= 2\,A^TY^TA^T \cr
\cr
}$$
To apply this to your question, set $\,Y\!=\!X^H\,\,$ and $\,\,A\!=\!X,\,\,$ yielding
$$\eqalign{
2\,X^TX^*X^T \cr
}$$
Update
(To address some of the questions in the comments)
Since $$\eqalign{ \operatorname{tr}(AYAY) &= \operatorname{tr}(YAYA) \cr }$$the differential and gradient wrt $A$ can be written down by interchanging $Y\leftrightarrow A$ in the previous result, i.e. $$\eqalign{ df &= 2\,Y^TA^TY^T:dA \cr \cr \frac{\partial f}{\partial A} &= 2\,Y^TA^TY^T \cr \cr }$$ If you consider the function $f(A,Y)$, then its full differential is $$\eqalign{ df &= 2\,Y^TA^TY^T:dA \,\,+\,\, 2\,A^TY^TA^T:dY \cr\cr }$$ Rearrangements of the Frobenius product follow from its equivalence to the trace $$A:BC=\operatorname{tr}(A^TBC)$$ and the properties of the trace wrt transposing and/or cyclically shuffling its arguments.
So, for example, the following are all equal $$\eqalign{ A:BC &= AC^T:B \cr &= B^TA:C \cr &= A^T:(BC)^T \cr }$$
- 40,033
-
-
Thanks a lot! I just have a few questions for you. I am just trying to understand, how did you go from : $$ \eqalign{ df &= A^T:(dY,AY+YA,dY) \cr &= 2,A^TY^TA^T:dY \cr \cr \cr } $$ When you took the total differential, it seems you didn't differentiate A. However shouldn't A really be differentiated in the total differential since $A=X$ and I am differentiating wrt $X$?
ALso, would this work to derive the result $$ \frac{\partial}{\partial X} Tr( X X^\top X X^\top) = 4 X X^\top X? $$ I do not see how it would since it seems to be off by a factor of 2.
– Jeff Faraci Dec 19 '16 at 13:46 -
Thanks for the update. This is very helpful. One last question, I am very unfamiliar with all of this matrix analysis. I see a lot of your answers are dedicated to this area. Do you have any good references for these kinds of calculations and such? This is all very new to me. For example, $$ \frac{\partial}{\partial X^H} Tr( X X^H (X X^H)^*)=? $$ is also something I want to compute, and I lack any intuition how to approach these. It's very similar to what I asked but still very unsure. Thanks. – Jeff Faraci Dec 19 '16 at 15:45
-
1For references, I'd recommend: Hjorungnes "Complex Valued Matrix Derivatives", Magnus and Neudecker "Matrix Differential Calculus", Petersen and Pedersen "Matrix Cookbook" – greg Dec 20 '16 at 00:49
-
1For your latest function $$\eqalign{ f &= XX^H:XX^H \cr\cr df &= 2,XX^H:X,dX^H \cr &= 2,X^TXX^H:dX^H \cr\cr \frac{\partial f}{\partial X^H} &= 2,X^TXX^H \cr \frac{\partial f}{\partial X} &= \Big(\frac{\partial f}{\partial X^H}\Big)^H\cr }$$ – greg Dec 20 '16 at 00:56
-
Thanks!! I got all those references now on PDF. I appreciate the help with the other derivative. Intuitively I know with functions that derivatives work like$$ \frac{\partial}{\partial x} f(y) =0 $$ (if $f(y)$ only depends on $y$ and not $x$ obviously). Is this true for matrix derivatives too? What I mean is, is $$ \frac{\partial}{\partial X^H} Tr[f( X, X^\top)] =0? $$ where $f(X,X^\top)$ is a function depending on $X,X^\top$. Thanks for the discussion and help, I appreciate it. – Jeff Faraci Dec 20 '16 at 15:38
-
1Yes, it works the same way for matrix derivatives. Generally, operations like $(X^T, \operatorname{vec}(X))$ simply rearrange the elements of $X$. But things like $(X^, X^H, \operatorname{vec}(X^))$ replaces the elements by their conjugates, in addition to rearranging them. It all boils down to how you handle ${\partial z^*}/{\partial z}$. Google the term "Wirtinger derivative". – greg Dec 21 '16 at 01:11
-
1
-
Hi Greg, can you solve this problem? It is a 500 point bounty. I would really like to award it to you, there is still not a complete solution to it. Just a partial solution. Thanks so much!http://math.stackexchange.com/questions/2089105/matrix-derivative-of-gradients/2092681?noredirect=1#comment4302844_2092681 – Jeff Faraci Jan 11 '17 at 16:10
-
1@Integrals Took a look at your new question, but I don't understand how you can take a partial derivative wrt X after taking the gradient wrt the coordinates. As a simple example $$\eqalign{X&=q_1q_2^2\cr G_1&=\nabla_1X=q_2^2\cr G_2&=\nabla_2X=2q_1q_2 \cr}$$ but now the $G_k$ gradients do not appear to be functions of $X$, so how would you go about finding $$\frac{\partial G_k}{\partial X}$$? – greg Jan 12 '17 at 05:10
-
@greg I misspoke about the derivative I needed, what I want is identical to this SIMPLER example (gradients are both wrt $i$ and $X$ is a scalar): $$ I=\int (\nabla_i X \cdot \nabla_i X) $$ $$ \delta I=\int \delta(\nabla_i X \cdot \nabla_i X)= \int (2 \nabla_i X \cdot \delta(\nabla_i X))=\int (2\nabla_i X\cdot \nabla_i (\delta X) )=-\int (2\nabla_i (\nabla_i X)(\delta X)) $$ In the question I post, I have $$ I=\int (\nabla_j X_{\alpha j} \cdot \nabla_k \bar{X}{\alpha k}) $$ and I need now: $$ \delta I=\int \delta ((\nabla_j X{\alpha j} \cdot \nabla_k \bar{X}_{\alpha k})) $$ – Jeff Faraci Jan 12 '17 at 11:16
-
Also, just in reply to your comment/simple example : if $$ G_2=2q_1 q_2 $$ this is actually a function of $X$, using your $X=q_1 q^2_2$, we can write this as $$ G_2=2X/q_2 $$ (Sorry if i'm wrong, just saying what I think dont take it too seriously) I really appreciate your help. Anyways, taking the differential of the integral I have above should be fine to do, I just get confused with the different indices on the gradients, and the matrix indices. I see you're really great with tensor analysis and I learn by going through all of your posts, let me know. Thanks again. – Jeff Faraci Jan 12 '17 at 11:26
-
The bounty is here (also A partial answer to the problem has been posted) http://math.stackexchange.com/questions/2089105/matrix-derivative-of-gradients
but it only does the 'simpler ' example (which is the only case I understand).
– Jeff Faraci Jan 12 '17 at 11:33 -
@Integrals But if you allow the mixing of $X$'s and $q$'s then I could write $$\eqalign{G_2&=\frac{2X^\beta}{q_2(q_1q_2^2)^{\beta-1}}\cr\frac{\partial G_2}{\partial X}&=\frac{2\beta}{q_2} }$$for arbitrary values of $\beta$ -- large or small, positive or negative. And that's the thing that has me stumped. How do I take the partial derivative (wrt $X$) of a gradient (wrt $q$)? – greg Jan 12 '17 at 18:52
-
@greg Inregards to this solution (this actual problem), you wrote$$ \eqalign{ df &= A^T:(dY,AY+YA,dY) \cr &= 2,A^TY^TA^T:dY \cr }$$ How did you go from the first to second line? I just cannot see it or get it. It seems you did more than 1 step than whats posted here. What I mean is, how did you actually get this? $$ df = A^\top : (dYAY+YAdY)=A^\top:(2YA dY)? $$ Thanks! – Jeff Faraci Jan 19 '17 at 20:17
-
1@Integrals Since $(A^T:dY,AY=A^TY^TA^T:dY)$ and $(A^T:YA,dY=A^TY^TA^T:dY)$ the two terms were re-arranged to the same form. I just combined them and put a "2" in front. – greg Jan 20 '17 at 02:31
-
Thanks. That actually cleared up a lot. I didn't realize you were acting AY on the right for one case, then acting YA on the left for the other case. Now it all makes sense. – Jeff Faraci Jan 21 '17 at 09:14
-
do you know if $\frac{\partial}{\partial X^H} |Tr(X X^\top)|^2=0?$ I posted it as a question I figure you would know. I thought it was zero at first however I thought more about it and when i write it out $|Tr(X X^\top)|^2=Tr(XX^\top)\overline{Tr(XX^\top)}$ does the complex conjugate of that trace affect the derivative now? I thought of just a simple complex number $z$, $|z|^2=z\overline{z}$. Thanks so much feel free to answer the question ill accept it. – Jeff Faraci Jan 31 '17 at 02:00
Write it down in terms of components, you get $$\frac{\partial}{\partial \bar{X}_{ij}} \mathop{\rm tr}(X X^H X X^H) =\frac{\partial}{\partial \bar{X}_{ij}} ( X_{kl} \bar{X}_{ml} X_{mn} \bar{X}_{kn}), $$ where for some components you need to compute $\frac{d\bar{z}}{dz}$ which doesn't exist, since the function $z\mapsto \bar{z}$ is not differentiable (said not holomorphic.)
- 681
-
Thanks. The derivative, $$\frac{\partial}{\partial X^H} Tr(X X^H) = X$$, it exists. How come this result does not exist? I do not see why the simpler one does but this doesn't. Since this can just be written as $$\frac{\partial}{\partial X^H} Tr ((X X^H)^2)$$... Thanks. Also note, $$ \frac{\partial}{\partial X} Tr (X X^\top X X^\top)= 4 X X^\top X $$ so I am still not convinced this result does not exist. Can you perhaps provide a complete proof? If so I will accept as an answer, otherwise this is not complete to accept as answer. Thanks! – Jeff Faraci Dec 19 '16 at 13:39
-
-
Note, $X$ is a complex square matrix and $X^H$ is the hermitian transpose of this matrix. Will this link help you? http://math.stackexchange.com/questions/258521/how-to-do-frac-partial-mathrmtrxxtxxt-partial-x?rq=1 – Jeff Faraci Dec 19 '16 at 13:50
-
It seems that we misunderstand the notation $\frac{\partial \phi(X)}{\partial X} ,$ where $\phi : \mathcal{M}_n(\mathbb{C}) \to \mathbb{C}.$ – A. PI Dec 19 '16 at 13:54
-
Well I do not understand that notation so I am not sure what you're talking about. I am trying to calculate exactly whats in that link above except the difference is the tranpose is now the complex transpose. Thanks – Jeff Faraci Dec 19 '16 at 13:55
-
That's why I have doubt. You might have look here on this definition https://en.wikipedia.org/wiki/Matrix_calculus#Derivatives_with_matrices in the paragraph (Derivatives with matrices). Further, I don't think that you refer by $\frac{\partial}{ \partial X}$ to the "Fréchet derivative," since in the first two cases in your original post, the derivatives were computed in the sense of the link above. – A. PI Dec 19 '16 at 14:06
-
You cannot even explain what you mean by this notation and you are looking for the answer ! this is realy surprising me ! (you can tell me) is not a mathematical proof ! Thank you again. Finally, you might keep the tag (complex analysis ). – A. PI Dec 19 '16 at 14:16
-
I answered regarding what I understood from your OP. BUT you don't understand YOUR question, so sure you won't understand the answer. – A. PI Dec 19 '16 at 14:20
-
Please see the above answer by Greg for a correct solution. Also, please lets avoid extended discussions here as this is policy on here. Thanks. – Jeff Faraci Dec 19 '16 at 15:42