In the article Score-Based Generative Modeling through Stochastic Differential Equations (Song and al.), it's explained that we need to solve the reverse-time SDE to obtain samples from image distribution $p_{0}$:
$$ \text d \mathbf{x} = [\mathbf{f}(\mathbf{x},t) - g(t)^{2} \nabla_{\mathbf{x}}\log p_{t}(\mathbf{x})]dt + g(t)\text d \mathbf{\overline{w}} $$
Thus, we need to estimate the score $\nabla_{\mathbf{x}}\log p_{t}(\mathbf{x})$ to solve the previous equation and we then train a neural network to predict it (with score matching, slice score matching, etc).
However, in practice, I've seen many codes (event recent ones) training their neural networks to predict the noise $\varepsilon$ knowing $\mathbf{x}_{t}$ and $t$ (like in DDPM).
So I'm trying to understand the connexion between the noise $\varepsilon$ and the score $\nabla_{\mathbf{x}}\log p_{t}(\mathbf{x})$.
I know that for Gaussian transition kernels the training objective is the same (up to a coefficient, see the first article), but it seem very restrictive as the Gaussian transition kernel assumption is only valid for affine drift coefficient $\mathbf{f}(\mathbf{x},t)$.