0

According to this paper on factor analysis, a $p$-dimensional random vector $\textbf{x}$ can be modeled using a $k$-dimensional vector of factors $\textbf{z}$ where $k \ll p$ using this generative model:

$$ \textbf{x} = \Lambda \textbf{z} + \textbf{u} $$

Where $\Lambda$ is a matrix and $\textbf{z} \sim \mathcal{N}(0, I_k)$ and $\textbf{u} \sim \mathcal{N}(0, \Psi)$. The authors then claim the following:

According to this model, $\textbf{x}$ is therefore distributed with zero mean and covariance $\Lambda \Lambda^{\top} + \Psi$.

My question is: what is the relationship between the above equation between random variables and then the various probability distributions, $p(\textbf{x})$, $p(\textbf{x} \mid \textbf{z})$, and $p(\textbf{x}, \textbf{z})$.

For example, how would I compute $p(\textbf{x})$? This is my attempt, but it seems like an abuse of notation:

$$ \begin{align} \textbf{x} &= \Lambda \textbf{z} + \textbf{u} \\ &= \Lambda \mathcal{N}(\textbf{0}, I_k) + \mathcal{N}(\textbf{0}, \Psi) \\ &= \mathcal{N}(\textbf{0}, \Lambda \Lambda^{\top} + \Psi) \end{align} $$

Does that work? Can I just replace random variables in that equation with distributions and compute?

Also, I've seen other resources (for example) claim that the factor analysis model is:

$$ p(\textbf{x} \mid \textbf{z}, \theta) = \mathcal{N}(\textbf{x} \mid \Lambda \textbf{z} + \textbf{u}, \Psi) $$

But I don't know how to go from the equation of random variables to this conditional density. Why is one presentation an equation of random variables and another a conditional density? How do I move between these two formulations?

jds
  • 2,326

1 Answers1

1

It's not entirely correct to replace random variables with their distributions. It's usually incorrect notation to write $\mathbf{z} = \mathcal{N}\left(0, I_{k}\right)$, however the notation $p\left(\mathbf{z}\right) = \mathcal{N}_{\mathbf{z}}\left(0, I_{k}\right)$ is sometimes accepted (I use the subscript $\mathbf{z}$ in $\mathcal{N}_{\mathbf{z}}$ to express that it is a normal density in $\mathbf{z}$.

If $\mathbf{z}\sim\mathcal{N}_{\mathbf{z}}\left(0, I_{k}\right)$, then by the properties of affine transformations of normal random vectors, $\Lambda\mathbf{z}\sim\mathcal{N}_{\mathbf{z}}\left(0, \Lambda I_{k}\Lambda^{\top}\right) = \mathcal{N}_{\mathbf{z}}\left(0, \Lambda\Lambda^{\top}\right)$. Then using the property of sums of normal random vectors, $\Lambda\mathbf{z} + \mathbf{u} \sim \mathcal{N}\left(0, \Lambda\Lambda^{\top} + \Psi\right)$, assuming independence of $\mathbf{z}$ and $\mathbf{u}$. Hence $$p\left(\mathbf{x}\right) = \mathcal{N}_{\mathbf{x}}\left(0, \Lambda\Lambda^{\top} + \Psi\right)$$ The conditional distribution $p\left(\mathbf{x}\middle|\mathbf{z}\right)$ is given by $$p\left(\mathbf{x}\middle|\mathbf{z}\right) = \mathcal{N}_{\mathbf{x}}\left(\Lambda\mathbf{z}, \Psi\right)$$ since $\mathbb{E}\left[\mathbf{x}\middle|\mathbf{z}\right] = \mathbb{E}\left[\Lambda\mathbf{z} + \mathbf{u}\middle|\mathbf{z}\right] = \Lambda\mathbf{z} + \mathbb{E}\left[\mathbf{u}\middle|\mathbf{z}\right] = \Lambda\mathbf{z} + \mathbb{E}\left[\mathbf{u}\right] = \Lambda\mathbf{z}$ and similarly $\operatorname{Cov}\left(\mathbf{x}\middle|\mathbf{z}\right) = \operatorname{Cov}\left(\mathbf{u}\right) = \Psi$.

The joint distribution $p\left(\mathbf{x}, \mathbf{z}\right)$ can be obtained via the chain rule of probability: $$p\left(\mathbf{x}, \mathbf{z}\right) = p\left(\mathbf{x}\middle|\mathbf{z}\right)p\left(\mathbf{z}\right) = \mathcal{N}_{\mathbf{x}}\left(\Lambda\mathbf{z}, \Psi\right)\mathcal{N}_{\mathbf{z}}\left(0, I_{k}\right)$$

rzch
  • 346