3

I understand that a Discrete Time Markov Chain (https://en.wikipedia.org/wiki/Markov_chain) is closely related to a Multinomial Distribution (https://en.wikipedia.org/wiki/Multinomial_distribution).

As I understand:

  • A Multinomial Probability Distribution is a Discrete Probability Distribution Function which is characterized by a "k" number of possible events and a "k" number of probabilities corresponding to each of these possible events. As an example, the probability corresponding to the outcome of a dice roll can be characterized as a Multinomial Probability Distribution Function.

On the other hand:

  • A Discrete Time Markov Chain has a set of discrete states (e.g. "k" number of states) - e.g. State "S1", State "S2", State "S3"
  • A Discrete Time Markov Chain has a corresponding set of probabilities which describe the probabilities of transitioning to one of these "k" states conditional on the the Markov Chain being in any other of these "k" states - these are referred to as "Transition Probabilities"

Suppose we consider a Discrete Time Markov Chain with k=3 states - this means that there will be 3 x 3 = 9 transition probabilities : p11, p12, p13, p21, p22, p23, p31, p32, p33. Together, these 9 transition probabilities can be placed into a "Transition Matrix" (3x3) . As such, (p11, p12, p13) are in the first row of this transition matrix, (p21, p22, p23) are in the second row of this transition matrix, and (p31, p32, p33) are in the third row of this transition matrix.

  • Now, consider just one row of this transition matrix. These three probabilities describe the "fate" of the Markov Chain provided they are currently situated in State "S1".
  • As such, at this point we can consider the 3 transition probabilities as parameters of a Multinomial Probability Distribution.

Now, this leads me to my question.

  • The Wikipedia page on the Multinomial Distribution shows the properties of a Random Variable "X" following a Multinomial Distribution - for example, if a Random Variable "X" follows a Multinomial Distribution, we can find out the Mean and the Variance of "X"

However, given some data (e.g. the Markov Chain was observed over 10 days to be in S1, S1, S2, S1, S2, S3, S1, S1, S1, S2) it is not immediately clear as to how we can estimate the transition probabilities and the variance of these transition probabilities. In other words, how can I estimate the mean and variance for the parameters of a Multinomial Distribution - NOT just the mean and variance for a Random Variable that follows a Multinomial Distribution?

After doing some reading online, I came across the following link (e.g. https://www.stat.cmu.edu/~cshalizi/462/lectures/06/markov-mle.pdf, Maximum Likelihood Estimator of parameters of multinomial distribution) which shows how Maximum Likelihood Estimation can be used to derive formula that can estimate the transition probabilities (i.e. parameters of a Multinomial Distribution) of a Markov Chain. Essentially, this is done by differentiating the log-likelihood function of the Multinomial Distribution with respect to each one of the parameters (i.e. dependent on the number of states), setting them to 0, and then solving the resulting system of equations. Doing this will provide the ("obvious") estimates for these transition probabilities, e.g. P11 = (Number of Times Markov Chain was in State 1 and Transitioned to State 1) / (Number of Times Markov Chain was in State 1 and Transitioned to State 1 + Number of Times Markov Chain was in State 1 and Transitioned to State 2 + Number of Times Markov Chain was in State 1 and Transitioned to State 3)

But I am still confused as to how the variance estimates for the transition probabilities can be calculated. Currently, several ideas come to mind:

  • We can use "bootstrap sampling" (https://en.wikipedia.org/wiki/Bootstrapping_(statistics)) to randomly sample consecutive sequences of historical states (so that the chronological order of the state sequences are not interrupted) and calculate the transition probabilities based on this random sample. If we repeat this random sampling many times, we will now get a range of transition probabilities (e.g. p11_1st_sample, p11_2nd_sample...p11_nth_sample ...... p31_1st_sample...p33_nth_sample). For each transition probability, if we rank these random samples from smallest to largest and then take the 5th and 95th percentile, we construct a "crude" range of possible values that each transition probability could assume and their relative deviance from the mean value.

  • The second way could be through the "Delta Method" (https://en.wikipedia.org/wiki/Delta_method), in which a formulation for the variance is first constructed, a 2nd Order Taylor Expansion is then approximated - and finally we evaluate the variance of this 2nd Order Taylor Expansion using the Laws of Expectation. But this looks very complicated and I don't know how to do this?

  • I remember hearing there is an inverse relationship between the Fisher Information of a Probability Distribution and the Variance for the Parameters Belonging to this Probability Distribution - supposedly doing some algebraic manipulations might lead to a formula for the variance of these parameters ... but I am not sure about this, and this seems is even more complicated!

  • Finally, some classic methods involving Method of Moments and Var(x) = E(x^2) - (E(x))^2 might also be possible?

I am interested in seeing a theoretical formulation for the variance of Transition Probabilities in a Discrete Time Markov Chain (or equivalently, the parameters of a Multinomial Distribution). I have spent a lot of time searching for a reference that clearly explains this - but I have not been able to find something.

Can someone please help me in deriving these variance estimates alongside with providing an explanation as to how you approached and solved this problem?

Thanks!

Note: Throughout this whole question, I have assumed that the mean estimate for the parameters of a Multinomial Distribution are equal to the mean estimate for the Transition Probabilities of a Discrete Time Markov Chain ... and that the variance estimates for the parameters of a Multinomial Distribution are equal to the variance estimates for the Transitional Probabilities of a Discrete Time Markov Chain. In general, is this even true?

RobPratt
  • 50,938
stats_noob
  • 4,107

1 Answers1

1

Let's establish some notation here first.

Let $1, \dotsc, k$ be the states of the Markov chain and $\{p_{ij}\}_{i,j=1}^k$ the corresponding transition probabilities.

Now, suppose that we run the Markov chain for $n$ steps, and let $N_i$ be the number of times that the chain was in state $i$ and $X_{ij}$ the number of times that the chain transitioned from state $i$ to state $j$.

The MLE that you propose amounts to $$ \hat{p}_{ij} = \frac{X_{ij}}{N_i}. $$

There's a problem with this though - namely, $\Pr(N_i = 0) > 0$, so we need to decide to to in the case that $N_i = 0$. One easy solution to this is to just set $\hat{p}_{ij}$ to $0$ in this case, in which case we can write the estimator as $$ \hat{p}_{ij} = \frac{X_{ij}}{N_i} \mathbf{1}_{N_i > 0}. $$ Now, as long as $N_i > 0$, we have that $X_{ij}|N_i \sim \mathrm{Binomial}(N_i, p_{ij})$, which allows us to find the conditional mean and variance of $\hat{p}_{ij}$, namely $$ \mathbf{E}(\hat{p}_{ij} | N_i) = p_{ij} \mathbf{1}_{N_i > 0} \quad \mathrm{Var}(\hat{p}_{ij} | N_i) = \frac{\mathbf{1}_{N_i > 0}}{N_i}p_{ij}(1 - p_{ij}). $$ Using the formula $\mathrm{Var}(X) = \mathrm{Var}(\mathbf{E}(X|Y)) + \mathbf{E} \mathrm{Var}(X|Y)$ for computing variance from conditional variance, we have that \begin{align*} \mathrm{Var}(\hat{p}_{ij}) &= \mathrm{Var}(p_{ij} \mathbf{1}_{N_i > 0}) + \mathbf{E} \Bigl(\frac{\mathbf{1}_{N_i > 0}}{N_i}p_{ij}(1 - p_{ij})\Bigr) \\ &= p_{ij}^2 q_i(1 - q_i) + p_{ij}(1 - p_{ij}) \mathbf{E}\Bigl(\frac{\mathbf{1}_{N_i > 0}}{N_i}\Bigr), \end{align*} where $q_i = \Pr(N_i = 0)$.

Now, you can separately estimate $q_i$, $p_{ij}$ and $\mathbf{E}(\mathbf{1}_{N_i > 0}/N_i)$ and plug them into the above. $p_{ij}$ is readily estimated by $\hat{p}_{ij}$. $q_i$ and $\mathbf{E}(\mathbf{1}_{N_i > 0}/N_i)$ are harder to estimate, but one way to do this is to notice that for large $n$, $N_i$ has an approximate Poisson distribution, and you can estimate the Poisson parameter by counting the steps between successive visits to state $i$.

  • @ Damian Pavlyshyn: Thank you for your answer! I will spend some time reading it and trying to understand it. I will probably come back to you with some questions! :) – stats_noob Dec 14 '22 at 05:50
  • I identified several possible methods which might be applicable for finding out a closed form solution for the transition probabilities - do you think that any of these methods might be applicable? – stats_noob Dec 14 '22 at 05:51