I'm trying to understand the following excerpt from a paper:
Subproblem 1: computing $S$. The $S$ estimation subproblem corresponds to minimizing $$ \sum_{p}(S_p - I_p)^2 + \beta((\partial_xS_p - h_p)^2 + (\partial_yS_p - v_p)^2) \tag {7}\label{7} $$ by omitting the terms not involving $S$ in Eq $(6)$. The function is quadratic and thus has a global minimum even by gradient descent. Alternatively, we diagonalize derivative operators after Fast Fourier Transform (FFT) for speedup. [Emphasis added] This yields the solution $$ S = \mathscr{F}^{-1} \left( \frac{\mathscr{F}(I) + \beta(\mathscr{F}(\partial_x)^*\mathscr{F}(h) + \beta(\mathscr{F}(\partial_y)^*\mathscr{F}(v))}{\mathscr{F}(1) + \beta(\mathscr{F}(\partial_x)^*\mathscr{F}(\partial_x) + \beta(\mathscr{F}(\partial_y)^*\mathscr{F}(\partial_y))} \right) \tag{8}\label{8} $$ where $\mathscr{F}$ is the FFT operator and $\mathscr{F}^*$ denotes the complex conjugate. $\mathscr{F}(1)$ is the Fourier transform of the delta function. The plus, multiplication, and division are all component-wise operators. Compared to minimizing Eq. $\eqref{7}$ directly in the image space, which involves very-large-matrix inversion, computation in the Fourier domain is much faster due to the simple component-wise division.
I don't understand how Eq.(8) was derived. How can FFT speed up gradient descent?
Cited by
"Image Smoothing via L0 Gradient Minimization" Li Xu, Cewu Lu, Yi Xu, Jiaya Jia ACM Transactions on Graphics, Vol. 30, No. 5 (SIGGRAPH Asia 2011), Dec 2011
pdf link in page 4