2

I'm calculating the gradient of a loss function with respect to some complex matrices. The loss function is defined as $\mathcal{L}=||I(x)-\widetilde{I}(x)||_2^2$, where $I(x)$ is the output from a mathematical model, and $\widetilde{I}(x)$ is the observation data.

Specifically,

$I(x)=w_1|E_1(x)|^2 + w_2|E_2(x)|^2 + w_3[E_1^*(x)E_2(x)+E_1(x)E_2^*(x)]$,

where $^*$ denotes complex conjugate. $w_1$, $w_2$, and $w_3$ are scalar real value.

In addition,

$E_1(x)=\mathcal{F}[P_1(X)e^{i\phi(X)}]$ and $E_2(x)=\mathcal{F}[P_2(X)e^{i\phi(X)}]$,

where $\mathcal{F}$ denotes the Fourier transform.

In the above formulas, $I$ and $\phi$ are real matrices. $E$ and $P$ are complex matrices.

So, how can I calculate the gradient of the loss function $\mathcal{L}$ with respect to $P_1(X)$ and $P_2(X)$, respectively?

I derived the gradient as

$\frac{\partial\mathcal{L}}{\partial P_1(X)} = e^{-i\phi(X)}\mathcal{F}^{-1}\{2[2w_1E_1(x)+w_3E_2(x)][I(x)-\widetilde{I}(x)]\} + e^{i\phi(X)}\mathcal{F}\{2w_3E_2^*(x)[I(x)-\widetilde{I}(x)]\}$

Is it correct? Moreover, I'm curious whether I should include the gradient to the complex conjugate of $E_1^*(x)$, which corresponds to the second term in the above formula.

Xin Liu
  • 21

1 Answers1

2

Here's how you can calculate the Wirtinger derivative of the loss function, assuming that you are using a (Fast) Fourier Transform.

$ \def\s{\odot} \def\h{\odot} \def\VV#1{\Big\vert #1\Big\vert} \def\VV#1{\left\vert #1\right\vert} {\sf NB\!:}\ $I implemented your $\VV{X}^2$ operation as $$\eqalign{ &\VV{X}^2 = X^*\h X }$$ where $\h$ denotes the Hadamard product.


$ \def\L{{\cal L}} \def\o{{\tt1}} \def\bR#1{\Big(#1\Big)} \def\BR#1{\Big[#1\Big]} \def\KR#1{\left\{#1\right\}} \def\LR#1{\left(#1\right)} \def\op#1{\operatorname{#1}} \def\F{{\large\mathbb F}} \def\Fi{{\large\mathbb F}^{-1}} \def\trace#1{\op{Tr}\LR{#1}} \def\frob#1{\left\| #1 \right\|_F} \def\q{\quad} \def\qq{\qquad} \def\qif{\q\iff\q} \def\qiq{\q\implies\q} \def\p{\partial} \def\grad#1#2{\frac{\p #1}{\p #2}} \def\red#1{\color{red}{#1}} \def\blue#1{\color{blue}{#1}} \def\CLR#1{\red{\LR{#1}}} \def\fft#1{\op{\blue{FFT}}\KR{#1}} \def\ifft#1{\op{\blue{iFFT}}\LR{#1}} $First, define the Frobenius product (denoted by a colon) $$\eqalign{ A:B &= \sum_{i=1}^m\sum_{j=1}^n A_{ij}B_{ij} \;=\; \trace{A^TB} \\ B^*\!:B &= \frob{B}^2 \qquad \{ {\rm Frobenius\;norm} \}\\ }$$

The properties of the underlying trace function allow the terms in such products to be rearranged in many useful ways, e.g. $$\eqalign{ &A:B = B:A \;=\; B^T:A^T \\ &A:\LR{B\h C} = \LR{A\h B}:C \\ &A:\LR{XY} = \LR{AY^T}:X \;=\; \LR{X^TA}:Y \\ &A:\fft B = \fft A:B \\\\ }$$


Next, calculate the differential of the $E_2(x)$ function $$\eqalign{ \def\e{e^{i\phi}} M &= \e, \q E_2 = \fft{P_2M} \qiq dE_2 = \fft{dP_2\:M} \\\\ }$$


Then the differential of $I(x)$ with respect to $E_2$ $$\eqalign{ \def\A{{\large\alpha}} \def\B{{\large\beta}} \def\G{{\large\Omega}} I &= w_1E_1^*\s E_1 + w_2E_2^*\s\red{E_2} + w_3E_1^*\red{E_2} + w_3E_1 E_2^* \\ dI &= w_2E_2^*\s\red{dE_2} + w_3E_1^*\ \red{dE_2} \\ }$$


Then the differential of the loss function wrt $P_2$ $$\eqalign{ \def\I{\widetilde{I}} J &= (I-\I) \qiq dJ=dI \\ \L &= J:J \\ d\L &= 2J:\red{dJ} \\ &= 2J:\bR{ w_2E_2^*\s\red{dE_2} + w_3E_1^*\ \red{dE_2} } \\ &= 2\bR{J\h w_2E_2^* + w_3E_1^HJ}:\red{dE_2} \\ &= 2\bR{J\h w_2E_2^* + w_3E_1^HJ}:\fft{dP_2\:M} \\ &= 2\,\fft{J\h w_2E_2^* + w_3E_1^HJ}:\LR{dP_2\:M} \\ &= 2\,\fft{J\h w_2E_2^* + w_3E_1^HJ}\,M^T:dP_2 \\ }$$ where $\,E_1^H$ denotes the hermitian conjugate


Now the gradient (in the Wirtinger sense) can be identified as $$\eqalign{ \grad\L{P_2} &= 2\,\fft{(I-\I)\h w_2E_2^* + w_3E_1^H(I-\I)}\,\LR{\e}^T \\ }$$

The calculation for ${\grad\L{P_1}}$ is similar.

greg
  • 40,033