4

Let $f: \mathbb{C}^{N\times M}\rightarrow \mathbb{R}$ and $g: \mathbb{R}^{N\times M}\rightarrow \mathbb{C}^{N \times M}, N\geq M $ and $F = f \circ g$. I am trying to compute the gradient of $F$ w.r.t. $\mathbf{X} \in \mathbb{R}^{N\times M}$, i.e., $\nabla_\mathbf{X} f(g(\mathbf{X}))$ but I am struggling with the chain rule because of the complex domain. What is the dimension of the final gradient matrix?

As an example, I have: $g(\mathbf{X})=e^{i\mathbf{X}}$ and $f(\mathbf{Y})=|| \mathbf{A}-\mathbf{YB}||_F^2$ ($\mathbf{A}$ and $\mathbf{B}$ complex as well).

Thank you in advance.

GPope
  • 65

1 Answers1

5

Let $E=\exp(iX)$ then your example concerns the function $$\eqalign{ \def\LR#1{\left(#1\right)} \def\c#1{\color{red}{#1}} \def\CLR#1{\c{\LR{#1}}} \def\op#1{\operatorname{#1}} \def\trace#1{\op{Tr}\LR{#1}} \phi(X) &= \|A-EB\|_F^2 \cr &= \LR{A-EB}^*:\CLR{A-EB} \cr &= M^*:\c{M} \cr }$$ where a colon denotes the trace/Frobenius product, i.e. $\,\,\,A:B=\trace{A^TB}$

Calculate the Wirtinger differential of this function $$\eqalign{ d\phi \;=\; M^*:dM + M:dM^* \;=\; 2\,{\mathcal Re}(M^*:dM)\cr }$$ Continuing $$\eqalign{ M^*:dM &= -M^*:dE\,B \cr &= \c{-M^*B^T}:d\exp(iX) \cr &= \c{C}:d\exp(iX) \cr &= C:d\LR{\sum_{k=0}^\infty q_kX^k} \\ &= C:\sum_{k=1}^\infty q_k\sum_{j=1}^kX^{j-1}\,dX\,X^{k-j} \cr &= \CLR{\sum_{k=1}^\infty q_k\sum_{j=1}^k\:X^{k-j}C^TX^{j-1}}^{\c T}:dX \cr &= \c{G}:dX \cr }$$ where, in addition to the Taylor series for the exponential $\LR{{\rm with\,\,} q_k=\frac{i^k}{k!}},\;$ I have introduced the matrices $(C,G)$ to hide some messy expressions.

Now we are in a position to write (recalling that $X$ is real) $$\eqalign{ d\phi &= (G+G^*):dX \cr \frac{\partial\phi}{\partial X} &= (G+G^*) \;=\; 2\,{\mathcal Re}(G) \cr }$$

Update

After writing the above, I noticed that your matrices are rectangular, which means you are applying the exponential function element-wise.

This makes the Taylor series unnecessary (and the result much simpler) because $$dE = iE\odot dX \qquad\quad $$

Picking up midway through the previous derivation, $$\eqalign{ M^*:dM &= C:dE \\ &= C:(iE\odot dX) \\&= \CLR{iE\odot C}:dX \\&= \c{H}:dX \\ \frac{\partial\phi}{\partial X} &= (H+H^*) \;=\; 2\,{\mathcal Re}(H) \\ }$$ where $\odot$ denotes the elementwise/Hadamard product.

greg
  • 40,033
  • Thanks for the answer Greg. My problem seemed to be with the use of the element-wise (Hadamard) product. Anyway I took a different path and also came to the same result, but with less elegant calculations to say the least. By the way could you point towards a direction (book) for complex valued differentials with product (trace or inner) formulation? Thanks again for the answer. – GPope Oct 22 '18 at 22:59
  • 2
    @G.Papas If you're using element-wise functions, then you must use an element-wise product to express the gradient. There's no way to avoid it. A book recommendation is Complex-Valued Matrix Derivatives by Are Hjorungnes; he's also authored some IEEE Journal articles with a similar name. – greg Oct 23 '18 at 15:26