Proof that steady state is not affected by initial distribution in Markov chain.

Question

I was following a proof provided in Gilbert Strang's book "Introduction to Linear Algebra". And I am confused by one step of the proof.

Suppose we have a $n$ by $n$ stochastic matrix $A$, where all elements are not negative and sum of every element in each row is $1$.

There exists a proof that there is only one biggest eigenvalue of $A$ equal to $1$ and other eigenvalues are less then $1$. I found it there: Proof that the largest eigenvalue of a stochastic matrix is 1

Now I want to get a proof that Markov chain has a steady state that is not affected by initial probability distribution:

$$ u_0 = \left( \begin{smallmatrix} u_1 \\ u_2 \\ \ldots \\ u_n \end{smallmatrix} \right) $$

Let's apply diagonalization to our matrix $A$:

$$ A = S \Lambda S^{-1} $$

Where $S$ consists of eigenvectors placed there as columns, $\Lambda$ is a diagonal matrix with corresponding eigenvalues.

Suppose we want to get a presentation of our initial distribution as a linear combination of eigenvectors of $A$ like this:

$$ u_0 = c_1 x_1 + c_2x_2 + \ldots + c_nx_n $$

In a matrix form:

$$u_o = SC$$

We can get $C$ from:

$$C = S^{-1}u_0$$

So, when we apply A to $u_0$ multiple times ($k$ times):

$$ u_k = A \ldots Au_o = S \Lambda S^{-1} \ldots S \Lambda S^{-1} u_o = S \Lambda S^{-1} \ldots S \Lambda S^{-1} SC = S \Lambda^{k} C$$

So, literally we get:

$$ u_k = c_1(\lambda_1)^kx_1 + c_2(\lambda_2)^kx_2 + \ldots + c_n(\lambda_n)^kx_n $$

But then author writes this in the book:

$$ u_k = x_1 + c_2(\lambda_2)^kx_2 + \ldots + c_n(\lambda_n)^kx_n $$

I understand that the author omits $\lambda_1$ because it is equal to 1. Why does the author omit $c_1$?

EDIT: I found out that $c_1$ is equal to 1. But I don't know why. This is why the author omits it.

Later in his proof the author shows that:

$$ \lim_{k\rightarrow \infty } u_k = x_1 $$

So, the author concludes that steady state is equal to the eigenvector with corresponding eigenvalue of 1.

.. to make sure the initial distribution contains some share of the first eigenvector? — daw, May 31 '14 at 15:11
@daw, Thank you for your response.) No, the author does it because c_1 is equal to 1. And the question is why c_1 is equal to 1. — warmspringwinds, May 31 '14 at 15:25

zhengchl · Answer 1 · 2022-10-23T06:58:52.040

There is something wrong in your question, $c_1=1$ is a special case for the example matrix $\begin{bmatrix}.80&.05\\.20&.95\end{bmatrix}$ in Strang's book. But assume that $\lambda_1=1$, the corresponding $c_i$ must be a CONSTANT for any initial state $u_0$, like 1 of $\begin{bmatrix}.80&.05\\.20&.95\end{bmatrix}$.

Proof

The answer of 'user940' is almost right, but he made a little mistake in the last step, maybe misled by the question.

The right last step is $$ \begin{aligned} 1^T u_0 &= 1^T c_1 x_1 + 1^T c_2 x_2 + \dots + 1^T c_n x_n \\&=1^Tc_1x_1 \quad (\text{ for } 1^T x_j=0 \text{ if } j\neq1) \end{aligned} $$ so, $$ \begin{aligned} c_1&=\frac{1^Tu_0}{1^Tx_1} \\ &=\frac{1}{1^Tx_1} \quad (\text{ for } u_0 \text{ is a probability distribution }) \end{aligned} $$

So, for any initial state $u_0$, the steady state is $$ \lim_{x \to \infty}{u_k}= c_1 x_1 = \frac{x_1}{1^T x_1} $$

One word more, in the example matrix $\begin{bmatrix}.80&.05\\.20&.95\end{bmatrix}$ in Strang's book. $x_1=\begin{bmatrix} .2 \\ .8 \end{bmatrix}$, so $c_1=1$

score 0 · Answer 2 · answered Jun 01 '14 at 20:36

This proof seems a little confusing to me...first of all, consider the stochastic matrix \begin{equation} \begin{bmatrix} 0&1 \\ 1&0 \end{bmatrix},\end{equation} which is diagonalizable, but which has no steady state.

So I think it must be specified that $A$ is regular, ie some power of $A$ has only positive entries.

So now assuming this is true, with regard to your question, if we have the coordinate vector of $u_0$ with respect to the standard basis as $\left( \begin{smallmatrix} u_1 \\ u_2 \\ \ldots \\ u_n \end{smallmatrix} \right)$, and since we know $S$ is the change of basis matrix taking vectors from the standard basis to the basis of eigenvectors, we must have \begin{equation} S\left( \begin{smallmatrix} u_1 \\ u_2 \\ \ldots \\ u_n \end{smallmatrix} \right)=\sum_{i=1}^n u_ix_i,\end{equation} so that in fact we must have $u_i=c_i$, and there is definitely no reason why we must have $u_1=1$.

A much better approach (imho) is to use the property that for a regular stochastic matrix $A$: $\lim_{m \rightarrow \infty}A^m=L$, is a matrix with every column equal to $x_1$, the eigenvector associated with eigenvalue $1$. The existence of $L$ follows in part from the Perron-Frobenius theorem, or equivalently, the link you have posted and some other results - I am not going to prove that here - so if we can accept that $L$ indeed exists, then we need two things - one: \begin{equation} AL=A\lim_{m \rightarrow \infty}A^m=\lim_{m \rightarrow \infty}AA^m=\lim_{m \rightarrow \infty}A^{m+1}=L,\end{equation} and secondly since $AL=L$ every column of $L$ is an eigenvector of $A$ with associated with eigenvalue $1$.

So having established the above, using $x_1$ as in your notation, we have \begin{equation}(\lim_{m \rightarrow \infty}A^m)u_0=Lu_0=u_1x_1+u_2x_2+\cdots+u_nx_1=(u_1+u_2+\cdots+u_n)x_1=x_1,\end{equation} (since $u_0$ is a probability vector its entries sum to 1).

score 0 · Answer 3 · answered Jun 01 '14 at 21:20

Start with the equation $Ax_j=\lambda_j x_j$ and left multiply by the row vector of all ones. This gives $1^T A x_j = 1^T \lambda_j x_j$, or $1^T x_j =\lambda_j 1^T x_j$. For $j\geq 2$, we have $\lambda_j\neq 1$ and deduce that $1^T x_j=0$.

Now left multiply the equation $u_0 = c_1 x_1 + c_2x_2 + \cdots + c_nx_n$ by $1^T$ to deduce that $c_1=1$. At least this is true if we choose the eigenvector $x_1$ so that $1^T x_1=1$.

Proof that steady state is not affected by initial distribution in Markov chain.

3 Answers3