2

What is the true equation for the reparameterization trick for gaussian distribution ? I see many forms

$z = \mu + \sigma \times \epsilon$ http://kvfrans.com/variational-autoencoders-explained/

$z = \mu + \sqrt{ \exp(\sigma) } \times \epsilon$ https://jmetzen.github.io/2015-11-27/vae.html

$z = \mu + \exp(\frac{\sigma}{2}) \times \epsilon $ https://github.com/fchollet/keras/blob/master/examples/variational_autoencoder.py

How do I derive the one true equation ? Or can I do it anyway I want as long as I add mu and a function of sigma ?

Kong
  • 932

2 Answers2

7

There are a few different quantities to distinguish here:

  • $\sigma,$ the standard deviation
  • $\sigma^2,$ the variance
  • $\log(\sigma^2)$

These can all be easily converted to one another - they are just different ways of measuring the same thing. In VAE, the encoder will be trained to generate one of these parameters, and you need to interpret the output accordingly. If the encoder was trained to produce $\sigma,$ you should (presumably) use its output as $\sigma.$ If the encoder was trained to produce $\log(\sigma^2),$ you should use its output as $\log(\sigma^2);$ apply the transformation $x\mapsto \exp(x/2)$ to get $\sigma.$

To generate a normally distributed sample $X$ mean mean $\mu$ and standard deviation $\sigma$ given a standard normal $Z\sim N(0,1),$ use the expression

$$X=\mu+\sigma Z.$$

Dap
  • 25,701
  • 20
  • 52
  • The interested reader in the reason why it works can look at my answer there: https://math.stackexchange.com/a/3470378/497474 – Marine Galantin Mar 12 '24 at 19:22
  • Also note that $\log(x^2) = 2 \log(x)$ and this is the reason why it is fine to output from the model $\log(\sigma)$ and the square is ignored in the OP question. – Marine Galantin Mar 12 '24 at 19:24
2

Assume we have a normal distribution $q$ that is parameterized by $\theta$, specifically $q_{\theta}(x) = N(\theta,1)$. We want to solve the below problem $$ \text{min}_{\theta} \quad E_q[x^2] $$ We want to understand how the reparameterization trick helps in calculating the gradient of this objective $E_q[x^2]$.

One way to calculate $\nabla_{\theta} E_q[x^2]$ is as follows $$ \nabla_{\theta} E_q[x^2] = \nabla_{\theta} \int q_{\theta}(x) x^2 dx = \int x^2 \nabla_{\theta} q_{\theta}(x) \frac{q_{\theta}(x)}{q_{\theta}(x)} dx = \int q_{\theta}(x) \nabla_{\theta} \log q_{\theta}(x) x^2 dx = E_q[x^2 \nabla_{\theta} \log q_{\theta}(x)] $$

For our example where $q_{\theta}(x) = N(\theta,1)$, this method gives $$ \nabla_{\theta} E_q[x^2] = E_q[x^2 (x-\theta)] $$

Reparameterization trick is a way to rewrite the expectation so that the distribution with respect to which we take the expectation is independent of parameter $\theta$. To achieve this, we need to make the stochastic element in $q$ independent of $\theta$. Hence, we write $x$ as $$ x = \theta + \epsilon, \quad \epsilon \sim N(0,1) $$

Then, we can write $$ E_q[x^2] = E_p[(\theta+\epsilon)^2] $$ where $p$ is the distribution of $\epsilon$, i.e., $N(0,1)$. Now we can write the derivative of $E_q[x^2]$ as follows $$ \nabla_{\theta} E_q[x^2] = \nabla_{\theta} E_p[(\theta+\epsilon)^2] = E_p[2(\theta+\epsilon)] $$