7

Given some finite state space $\Omega\equiv\{\omega_1,\ldots,\omega_n\}$ be given, and let any Markov chain $\{X_t\}$ with $n\times n$ transition matrix $A$ on $\Omega$ be given. I would like to know if there are standard results out there I can cite that give the following:

  1. There always exists some (not necessarily unique) stationary distribution $\pi$ such that $A\pi = \pi$.
  2. Let any initial distribution $\pi^0\equiv (\pi^0_1,\ldots,\pi^0_n)$ (where $\pi^0_i$ is the initial probability of being in state $\omega_i$) be given, and define $\pi^{t}=\pi^0 A^t$. The "time average limiting distribution" $x$ (is there a more standard term for this out there?) exists: $$x_i \equiv \lim_{t\to\infty}\frac{1}{t} \sum_{s=0}^{t-1} \pi^t_i.$$
  3. The time average limiting distribution $x$ given above is stationary: $x = xA$.

I know this is all implied by the Ergodic Theorem for irreducible Markov chains, but I would like to see that it is also true for all (finite) Markov chains. I don't need convergence to a unique stationary distribution, which is the concern of most of the stuff out there.

fltfan
  • 91
  • 6

1 Answers1

4

All of your statements are true. For the first, you can use a purely linear algebraic fact, that a stochastic matrix always has an eigenvalue of 1 with a left eigenvector whose entries are nonnegative.

For the second, let's say the chain has $m$ communicating classes $C_j$. If the initial condition is entirely in a single communicating class $C_j$ (i.e. $\sum_{i \in C_j} \pi^0_i = 1$), then the time average of the distributions converge to the unique stationary distribution of the Markov chain restricted to $C_j$. This follows from the ergodic theorem. So if you find the time average limit of each subchain, then you can manage the whole chain by writing

$$\mathbb{P} \left ( X_k = i \right ) = \sum_{j=1}^m \mathbb{P} \left ( \left. X_k = i \right | X_0 \in C_{j} \right ) \mathbb{P} \left ( X_0 \in C_{j} \right ) \\ = \mathbb{P} \left ( \left. X_k = i \right | X_0 \in C_{j(i)} \right ) \mathbb{P} \left ( X_0 \in C_{j(i)} \right )$$

where $C_{j(i)}$ is the communicating class of state $i$. Then the second term is constant and the time average of the first term is convergent, so the time average of the whole thing is also convergent.

For the third, you can use linearity:

$$\left ( \frac{\sum_{k=1}^t \pi^k}{t} \right ) A = \frac{\sum_{k=1}^t \pi^k A}{t} = \frac{\left ( \sum_{k=1}^t \pi^k \right ) + \pi^{t+1} - \pi^1}{t}.$$

Taking $t \to \infty$ and exploiting continuity of $A$, we get

$$\left ( \lim_{t \to \infty} \frac{\sum_{k=1}^t \pi^k}{t} \right ) A = \lim_{t \to \infty} \frac{\sum_{k=1}^t \pi^k}{t}.$$

Ian
  • 104,572
  • Re your last sentence, I am not sure that Cesaro means are relevant (they would be if $\pi^t$ was convergent but this is not always so). – Did Aug 15 '14 at 15:55
  • Ah, right. That mistake actually changes problems earlier in the discussion as well. Thank you. – Ian Aug 15 '14 at 15:56
  • Sorry but I must also disagree with the new last sentence since the OP is interested in the time average of the distributions $\pi^t$, not in the distributions $\pi^t$ themselves. – Did Aug 15 '14 at 16:10
  • 1
    @Did The third statement in the OP is that the time average limit is stationary. A deterministic oscillator between two states has a time average limit, namely a uniform distribution on the two states, but it has no stationary distribution. Is that what you are disagreeing with? (Please excuse my many edits). – Ian Aug 15 '14 at 16:13
  • Yes the uniform distribution is stationary for the deterministic oscillator between two states. – Did Aug 15 '14 at 16:16
  • I found this article (via this question), so I think that suffices to show that there does exist some (not necessarily unique!) stationary distribution for any finite Markov chain, unless I'm missing something? – fltfan Aug 15 '14 at 16:18
  • @Did ...Perhaps I shouldn't play on math.SE before I've had my coffee. You are right of course. Is the time average limit necessarily stationary then? How would you show it? – Ian Aug 15 '14 at 16:18
  • 2
    Let $t\to\infty$ in the identity $$A\left(\frac1t\sum_{n=0}^{t-1}\pi^n\right)=\frac1t\sum_{n=0}^{t-1}\pi^{n}+\frac{\pi^{t}-\pi^0}t.$$ – Did Aug 15 '14 at 16:22
  • Thanks for the answer, but in "part 2" does it technically account for transient communicating classes? Specifically, the equation seems too low because it only accounts for the part of the initial distribution in the recurrent class $j(i)$, i.e. $\mathbb{P}(X_0 \in C_{j(i)})$. – fltfan Sep 08 '14 at 07:45
  • If you have a "finitely" transient communicating class (that is, a $C \subset \Omega$ such that for any $X_0$ there exists $m$ such that $\mathbb{P}(X_k \in C) = 0$ for $k \geq m$), then you can fix this problem by moving the distribution forward in time up until the transient communicating class has died off and then performing the same argument. If it is merely asymptotically transient (that is, $\lim_{n \to \infty} (P^n)_{ii} = 0$), then I don't think there is any problem here. – Ian Sep 08 '14 at 13:03
  • I definitely agree for the finitely transient case. If they are asymptotically transient, I'm wondering if you have to add a term for $F(\tilde i,i)\mathbb{P}(X_0= \tilde{i})$ where $\tilde{i}$ is a transient state and $F(\tilde i,i')$ is the probability of "first passage" from $\tilde i$ to $i$, i.e. the probability that starting in state $\tilde i$ you ever reach $i$. – fltfan Sep 08 '14 at 20:44