4

Setting: Consider a continuous-time Markov chain on a finite state space $\mathcal Z$ with irreducible (constant) generator matrix $L\in\mathbb R^{|\mathcal Z| \times |\mathcal Z|}$. Let $\rho_t\in\mathcal P(\mathbb R^{|\mathcal Z|})$ be the probability law of $X_t$ (here subscript denotes dependence on time and $\mathcal P(\cdot)$ is the space of probability measures).

This time-$t$ distribution evolves according to \begin{equation} \partial_t\rho_t = L^T \rho_t, \end{equation}
with some initial distribution. Since $L$ is irreducible there exists a stationary measure $\mu\in \mathcal P(\mathbb R^{|\mathcal Z|})$, i.e. $L^T\mu = 0$.

Question: Are there some results in this setting which state that the time-$t$ distribution $\rho_t$ converges exponentially fast to $\mu$, i.e. \begin{equation} \|\rho_t - \mu\|_{TV} \leq Ce^{-D t}, \end{equation} for some positive constants $C,D$. Here $\|\cdot\|_{TV}$ is the total-variation norm.

Remarks: When I search online, typical results require that the Markov chain is reversible and then they provide estimates on convergence, however I do not want to make any such assumption. Also, there is often discussion about spectral gap, but I don't yet understand what I need to assume for $L$ so as to get a spectral gap.

UPS
  • 579

1 Answers1

2

The solution $P(t)=P(0)e^{Lt}$ is valid for finite state space (see for example, here). As explained for example here, $e^{Lt}$ is stochastic for each $t$. If $L$ is irreducible, then $e^{L}$ is primitive (actually positive, but I'm blanking out how to show this; in any case, the discrete-time Markov chain corresponding to $e^L$ is irreducible, and can not have any periodicity other than 1, so is aperiodic, hence primitive, which is enough). Then by Perron-Frobenius there is a spectral gap - i.e. the maximal eigenvalue is 1, corresponding generalized eigenspace is simple and contains unique eigenvector $\mu$ with positive entries that sum to 1, and all other eigenvalues have norm strictly $<1$. Since there are finitely many of them they all have norm at most some fixed $\lambda<1$. Then it follows that $L$ has the same generalized eigenspaces, one of eigenvalue $0$ (spanned by $\mu$) and all others with real part of eigenvalues $<\ln \lambda<0$. The $e^{tL}$ again has same generalized eigenspaces, one of eigenvalue $1$ spanned by $\mu$ and all other with eigenvalues of norm $<\lambda^t$. Writing a vector $\rho$ with entries summing to 1 using this generalized eigenspace decomposition we have

$$\rho=\mu+\sum c_i \rho_i$$

and then

$$e^{Lt}\rho=\mu+\sum c_i e^{Lt}\rho_i$$

But on each generalized eigenspace $e^{Lt}$ is the product of the scalar $\lambda_i^t$ and a matrix polynomial in $t$ (as in here), so each entry of $e^{Lt}\rho_i$ is of the form $\lambda_i^t P(t)$ for some fixed polynomial $P$, and in particular they are all bounded by $C\lambda^t$ for some $C$ (since $(\lambda_i/\lambda)^t<|C/P(t)|$). Since this is true for all pieces $\rho_i$, we get that each entry of $|e^{Lt}\rho-\mu|$ is bounded by $C\lambda^t$. Taking $D=-\ln \lambda$ this bound becomes $Ce^{-Dt}$ (with $C$ but not $D$ dependent on $\rho$).

In this finite-dimensional setting, it follows from this that $|e^{Lt}\rho-\mu|_{TV}<\hat{C}e^{-Dt}$ for some $\hat{C}$.


Why is the coefficient of $\mu$ equal to $1$?

Lemma: If $x$ is left generalized eigenvector of $A$ with generalized eigenvalue $a$ and $y$ right eigenvector of $A$ with eigenvalue $b$ and $a\neq b$ then $x \cdot y=0$.

Proof: $0=y^T \vec{0}= y^T(A-aId)^nx= (b-a)^n(y^Tx)$, so $y^Tx=0$.

Now because all other left generalized eigenspaces of $P(t)$ are orthogonal to the right eigenvector $\mathbb{1}=(1,...,1)$, so the sum of entries of $\rho$, aka $\rho\cdot\mathbb{1}$, is the coefficient of the $\mu$:

$$1=\rho\cdot\mathbb{1}=(c\mu +\sum c_i \rho_i)\cdot \mathbb{1}=c (\mu\cdot \mathbb{1})+0=c$$

Max
  • 14,503
  • Thank you so much. This looks very nice, although I have a few questions. (1) Here you have written $\rho(t)=e^{Lt}\rho(0)$, but shouldn't this be $\rho(t)=e^{Lt}\rho(0)$. If so, doesn't the $L^T$ instead of $L$ pose problems? (2) At the beginning of the proof you switch from continuous-time to discrete time by working with $e^{L}$. Is that always allowed? (3) Maybe a naive question, but doesn't eigenspace decomposition require additional properties such as diagonalizability? Also why is the coefficient for $\mu$ exactly 1. – UPS Sep 15 '21 at 09:11
  • I'm a bit busy today, so only quick answers, may be rewised later: 1) I think you wrote the same thing twice, but I think I know what you meant to write. I'm 90% sure this is simply convention differences; some people write transition matrices so that $P_{ij}(t)=P(X(t)=x_i | X(0)=x_j)$ and some it the transposed way $e^{A^T}=(e^A)^T$ so it should not matter much, but I may have been inconsistent. 2) It is allowed by the fact that $P(t)=e^{Lt}P(0)$ (page 158 in the reference). In fancy language, in finite dimensions any linear flow has a time-1 map. – Max Sep 15 '21 at 15:52
  • That's why I talk about generalized eigenspaces (Jordan canonical form) - technically speaking over complex numbers.
  • – Max Sep 15 '21 at 15:59
  • I added an explanation for the coefficient.
  • – Max Sep 15 '21 at 16:19
  • Regarding (1), I am still having issues. If the transition matrix, defined the traditional way i.e. $P_{ij}(t)=P(X(t)=j|X(0)=i)$ then the corresponding forward Kolmogorov equation is $dP/dt = L P$ with the explicit solution $P(t)=e^{Lt}$, as you pointed out. However, the distribution evolves according to $\rho(t)=P^T(t)\rho(0)$, which is seen by just applying the law of total expectation. Therefore, using the explicit solution it follows that $\rho(t)=e^{L^T t}\rho(0)$. Isn't this an issue, since $e^{Lt}$ is a stochastic matrix but $e^{L^Tt}$ is not? – UPS Sep 16 '21 at 09:13
  • Regarding (2), thanks for pointing out this way of looking at things. However, I still don't see why the discrete-time Markov chain with transition probability $e^{L}$ cannot have periodicity other than 1. If I take a three-state markov chain, where we have only three possible transition $x_1 \rightarrow x_2, \ x_2\rightarrow x_3, \ x_3 \rightarrow x_1$ (i.e. a chain where the system just moves in a loop), isn't the period of each vertex 3? – UPS Sep 16 '21 at 09:16
  • Informally, this is because it has to be "infinitely divisible", so can not move in a loop. For example, the permutation (2 3 1) does not have a stochastic cubic root. Formally, see Lemma 8 in https://arxiv.org/pdf/1001.1693.pdf (no root of unity as an eigenvalue, no non-trivial periodicity). – Max Sep 20 '21 at 16:47
  • We are planning to include parts of your argument in an article we are writing, and we would like to acknowledge you for it. At the moment we have added a citation to this post using the cite function. Would you like to be acknowledged differently? For instance this post (https://academia.stackexchange.com/questions/107963/how-to-acknowledge-a-mathoverflow-user) suggests that we acknowledge you with your real name if possible. Would you prefer that? I am sorry, I am not sure of the protocol in these situations since its a first for me. – UPS Jan 19 '22 at 10:25