3

I want to understand how to derive the update formula for Gibbs sampling for Hidden Markov Model, for example, in here:

$$p(z_t | \mathbf{x}, \mathbf{z}_{\setminus t}, \boldsymbol{\alpha}, > \boldsymbol\beta) \propto \dfrac{C_{x_t, z_t}^{-t} + \beta}{\sum_x > C_{x, z_t}^{-t} + W\beta} \dfrac{C_{z_t, z_{t-1}}^{-t} + > \alpha}{\sum_z C_{z, z_{t-1}} + K\alpha} \dfrac{C_{z_{t+1}, z_t}^{-t} > + \alpha + \delta(z_{t-1} = z_t = z_{t+1})}{\sum_{z}C_{z, z_t}^{-t} + K\alpha + \delta(z_{t-1} = z_t)},$$

  • $z_t \in \{1, \cdots, K\}$ is a hidden state,
  • $x_t \in \{1, > \cdots, W\}$ is an observed variable,
  • $p(x_t | z_t, \Phi) = p(x_t | \phi_{z_t}) = \phi_{x_t, z_t}$ - emission probability;
  • $p(z_t | z_{t-1}, \Xi) = p(z_t | \xi_{z_{t-1}}) = \xi_{z_t, z_{t-1}}$ transition probability;
  • $\phi_z \sim Dir(\phi_z; \boldsymbol\beta)$ - the prior distribution for the emission probability is Dirichlet with the symmetric hyper-parameter $\boldsymbol\beta$;
  • $\xi_z \sim Dir(\xi_z; \boldsymbol\alpha)$ - the prior distribution for the transition probability is Dirichlet with the symmetric hyper-parameter $\boldsymbol\alpha$;
  • $C_{x, z}^{-t}$ - the number of times the observed variable $x$ associated to the state $z$, excluding the pair at the time moment $t$;
  • $C_{z', z}^{-t}$ - the number of times the state $z$ was followed by the state $z$ excluding the state at the time moment $t$

The question is how to derive the second and third term from the formula?

I think I can derive it from this relation: $$\int p(z_{t+1} | z_t, \Xi) p(z_t | z_{t-1}, \Xi) p(\Xi | \mathbf{z}_{\setminus t}) \mathrm{d}\Xi$$ (similar to the derivation of the first term if considering different cases when $z_t = z_{t-1} = z_{t+1}, z_t \neq z_{t-1}$ and so on)

This derivation is possible if $p(\Xi | \mathbf{z}_{\setminus t})$ is Dirichlet, which is not obvious for me, as we have time dependency in the sequence of $z$, and we can't just ignore $z_t$ as $z_{t+1}$ depend on it. Is it still true that the posterior of $\Xi$ is Dirichlet even though we throw $z_t$ away?

And how can we arrive at $\int p(z_{t+1} | z_t, \Xi) p(z_t | z_{t-1}, \Xi) p(\Xi | \mathbf{z}_{\setminus t}) \mathrm{d}\Xi$?

Or is my train of thought wrong and there is another way to get the formula?

Any idea and suggestions will be very much appreciated.


Context: The first term from the formula is easy to get: $$p(z_t | \mathbf{x}, \mathbf{z}_{\setminus t}, \boldsymbol\alpha, \boldsymbol\beta) \propto p(x_t | z_t, \mathbf{z}_{\setminus t}, \mathbf{x}_{\setminus t}, \boldsymbol\alpha, \boldsymbol\beta) p(z_t | \mathbf{z}_{\setminus t}, \mathbf{x}_{\setminus t}, \boldsymbol\alpha, \boldsymbol\beta)$$

The first term from here: $$p(x_t | z_t, \mathbf{z}_{\setminus t}, \mathbf{x}_{\setminus t}, \boldsymbol\alpha, \boldsymbol\beta) = \int p(x_t | \Phi, z_t) p(\Phi | \mathbf{z}_{\setminus t}, \mathbf{x}_{\setminus t}, \boldsymbol\beta) \mathrm{d}\Phi$$

As the Dirichlet distribution is conjugate for the multinomial one, the second term under the integral is the Dirichlet distribution and the integral is the expected value for $\phi_{x_t, z_t}$ which is equal to the first term of the formula.

Chill2Macht
  • 22,055
  • 10
  • 67
  • 178
ihoho
  • 31

0 Answers0