Generating multivariate normal samples - why Cholesky?

Question

Hello everyone and happy new year! May all your hopes and aspirations come true and the forces of evil be confused and disoriented on the way to your house.

With that out of the way...

I am trying to write a computer code that gets a vector $\mu \in R^n $ and matrix $\Sigma \in \mathbb R^{n \times n}$ and generates random samples from the multivariate normal distribution with mean $\mu$ and covariance $\Sigma$.

The problem: I am only allowed to use the program to sample from the single variable normal distribution with mean $0$ and variance $1$: $N(0, 1)$.

The proposed solution: Define a vector of zeros (initially) $v \in \mathbb R^n$, now for all $i$ from $1$ to $n$, draw from a single variable normal dist: $v_i \overset{}{\sim} N(0, 1)$.

Now do a Cholesky decomposition on $\Sigma$: $\Sigma = LL^T$.

Now finally the random vector we want that is distributed from the multivariate gaussian is $Lv + \mu$.

My question is why? I don't understand the intuition, if it was a single dimensional distribution $N(\mu, \sigma^2)$ then I understand why $\sigma ^2 v + \mu$ is a good idea, so why cholesky? Wouldn't we want $\Sigma v + \mu$?

There is a flaw in your understanding: In the single-dimensional case our random sample is $\sigma v+\mu$, not $\sigma^2v + \mu$. In the multivariate case $\Sigma$ plays the role of $\sigma^2$. — , Jan 01 '17 at 07:40
Even so, why would the "root" of the covariance be cholesky? I can see why that seems similar but I think it demands an explanation. What if there is another matrix $A$ that is not $L$ such that $AA^T = \Sigma$? Why wouldn't that be a good fit rather than $L$? — Oria Gruber, Jan 01 '17 at 07:58

score 13 · Accepted Answer · answered Jan 01 '17 at 11:09

13

After the comment of Rahul you understood that in any parametrization $x=Av+μ$ you will need that $$ Σ=\Bbb E(x-μ)(x-μ)^T=A·\Bbb E(vv^T)·A^T=AA^T. $$ There are infinitely many possibilities to chose $A$, with any orthogonal matrix $Q$ also $\tilde A=AQ$ satisfies that condition.

One could even chose the square root of $Σ$ (which exists and is unique among the s.p.d. matrices).

The advantage of using the Cholesky factorization is that you have a cheap and easy algorithm to compute it.

answered Jan 01 '17 at 11:09

Lutz Lehmann

131,652

The answer below says If all the variables in the multivariate gaussian were independent, we would have faced no issue but to use the formula $X_i =\sigma_i \nu+\mu_i $. So if there are correlated, doesnt it mean they are not independent? – GENIVI-LEARNER Feb 06 '20 at 11:12
@GENIVI-LEARNER : Yes. Independent variables have zero correlation, resp. a diagonal correlation matrix (the converse is not always true). – Lutz Lehmann Feb 06 '20 at 11:16
Makes sense, so the bold statement in my comment as per the answer below is wrong, right? it should be If all the variables in the multivariate gaussian were NOT independent... – GENIVI-LEARNER Feb 06 '20 at 11:27
if possible i would like to draw you attention to this for your contribution. – GENIVI-LEARNER Feb 06 '20 at 11:33
1

No, the original is correct. You just need to take a different random number $ν_i$ for each, $X_i=σ_iν_i+μ_i$. – Lutz Lehmann Feb 06 '20 at 11:34

score 5 · Answer 2 · answered Sep 26 '22 at 09:48

I know I am a bit late to the party, but maybe this still helps somebody. I thought of this question in the following way:

From the transformation properties of the Gaussian distribution it is known that if $X \sim \mathcal N (\tau, \Lambda)$, with $\tau\in \mathbb R^n$ and $\Lambda\in \mathbb R^{n\times n}$ then the affine transformation $Y=BX + b$ is distributed as $Y\sim \mathcal N (B\tau+ b, B\Lambda B^T)$.

Of course $B$ and $b$ must be of suitable dimensions. Now sample $X$ from a standard $n$-dimensional normal distribution to get $X\sim \mathcal{N} (0,\boldsymbol 1)$, where $\boldsymbol 1$ is the $n\times n$ identity matrix, then $Y = BX+b$ is distributed as $Y\sim \mathcal(b, BB^T)$, which is just plugging in $0$ and $\boldsymbol 1$ for $\tau$ and $\Lambda$ in the relaltion above.

If you now want $Y$ to be distributed as $Y\sim \mathcal N(\mu, \Sigma)$, then you need to choose some transformation matrix $A$, such that $AA^T = \Sigma$. A good choice for that is the Cholesky decomposition because it is explicitly structured to fulfill this condition and there are standard functions in lots of libraries which compute it for you.

Thanks for this simple explanation. It was what I needed. – rocksNwaves Oct 26 '23 at 21:52 — rocksNwaves, Oct 26 '23 at 21:52

score 2 · Answer 3 · answered Oct 10 '18 at 17:11

If all the variables in the multivariate gaussian were independent, we would have faced no issue but to use the formula $X_i =\sigma_i \nu+\mu_i $. Since they are correlated, we have (for example, bivariate case), $X_1 = \sigma_1\nu_1+\mu_1$ and $X_2 = \sigma_2[\rho\nu_1+\sqrt{1-\rho_{12}^2}\nu_2]+\mu_2$ and can be extended further to N. Note: $$\sum = \begin{bmatrix}\sigma_1^2 &\rho_{12} \sigma_1\sigma_2 &\rho_{13} \sigma_2\sigma_3 & \dots \\\\\rho_{12} \sigma_1\sigma_2 &\sigma_2^2 &\rho_{23} \sigma_2\sigma_3 & \dots\end{bmatrix}$$ By decomposing through Cholesky $\sum=LL^T$, we can get our $X = L\nu+\mu$ without manual calculations which are otherwise quite tedious for higher order.

Generating multivariate normal samples - why Cholesky?

3 Answers3