Is gradient descent nothing other than discretized gradient flow?

Question

I can find definitions of the gradient flow of a scalar field $f$ as $$\frac{d \xi}{dt} = - \nabla_\xi f$$ in here and here.

Gradient descent can be used to find a minimum in $f$ and can be written as $$\xi_{i+1} = \xi_{i} - \lambda \nabla_\xi f,$$ where $\lambda$ is a constant scalar.

As far as I understand, gradient descent is nothing other than discretized gradient flow. So I set $\lambda = 1$ to obtain $$\frac{d \xi}{dt} = \lim_{\Delta t \rightarrow 0} \frac{\xi_{i+1} - \xi_i}{\Delta t} = - \lim_{\Delta t \rightarrow 0} \frac{\nabla_\xi f}{\Delta t} = - \frac{d}{dt} \nabla_\xi f \stackrel{???}{=} - \nabla_\xi f = \frac{d \xi}{dt}.$$

This does not make much sense to me... Shouldn’t both $\frac{d \xi}{dt}$ give the same? What mistake am I making? Could anyone help me, please?

I'm a bit confused about the question -- do you want to show the equality $\stackrel{???}{=}$? — blue_egg, Mar 24 '21 at 17:00
@RodrigodeAzevedo Thank you for this excellent book recommendation! If someone looks for more info on gradient flow, I suggest having a look at appendix C.10 Riemannian Metrics and Gradient Flows, pp. 360 (or pp. 371 in the PDF), which provides a good introduction to gradient flow. — yavagi, Mar 25 '21 at 00:01

score 6 · Accepted Answer · answered Mar 24 '21 at 17:02

6

What you are missing is that $\lambda$ "is" the time scale $\Delta t$. So you should rather interpret the first formula as $$\xi_{t+\Delta t}:=\xi_t - (\Delta t)\nabla_{\xi_t}f,$$ and with this you get what you want.

answered Mar 24 '21 at 17:02

Daniel Robert-Nicoud

30,663

2

Thank you for your answer! We could even go so far as to state that $\lambda$ "is" not only $\Delta t$ but $\lambda' \Delta t$, where $\lambda' > 0$ governs the "speed of flow" of $\xi$ through the gradient field $\nabla_\xi f$. For $\Delta t → 0$ and $\lambda' = 1$, gradient descent and gradient flow converge. – yavagi Mar 24 '21 at 23:49

score 4 · Answer 2 · answered Mar 24 '21 at 17:01

I think the confusion might stem from the ambiguity in notation. The definition of gradient flow is $\frac{d \xi}{dt}(t_0) = - \nabla f(\xi(t_0))$, where the right-hand side is the gradient of $f$ evaluated at $\xi(t_0)$. (This is closer to the notation used in the second link.)

Further, the discrete step is $\xi_{i+1} = \xi_i - \lambda \nabla f(\xi_i)$. This is simply a rearrangement of $$-\nabla f(\xi(t_i)) = \frac{d\xi}{dt}(t_i) \approx \frac{\xi(t_i + \lambda) - \xi(t_i)}{\lambda},$$ where $t_i$ is such that $\xi_i = \xi(t_i)$. In particular, $\lambda$ is the parameter that should be driven to zero to see the asymptotic correspondence with gradient flow; you cannot just set it at $\lambda=1$.

Is gradient descent nothing other than discretized gradient flow?

2 Answers2