Markov Chain Transition Matrix eigenvalues - Textbook Exercise

Question

I am trying to solve the following problem taken the book Markov Chains and Mixing Times 2nd edition (Exercise 12.2):

Let $P$ be irreducible transition matrix, and suppose that A is a matrix with $0 \le A(i,j) \le P(i,j)$ and $A \ne P$. Show that any eigenvalue $\lambda$ of $A$ satisfies $|\lambda| < 1$.

Now I know that $P$'s biggest eigenvalue is 1. I have tried several approaches but I am not sure how to use irreducibility in this context. Maybe if the chain has some period it is possible to look on a chain with transition matrix raised to the power of the period to get aperiodic chain?
Help will be appreciated.

Link to the book: https://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf

This follows from the fact that the dominant eigen value of $P$ is $1$ and the fact that $A$ has to have its eigenvalues smaller than $P$. — Salcio, Mar 27 '24 at 17:20
I showing that $\sigma(A) \leq \sigma(P)$ is easy but showing a strict inequality requires some more work since the elementwise inequality is not strict. You need the irreducibility assumption to strengthen the result to a strict inequality. For example, $P = I$, $A = I - \epsilon e_1e_1^T$ shows why irreducibility is needed. — whpowell96, Mar 27 '24 at 17:36
@user8675309 Well that's an interesting suggestion will look into it later. Regarding the prior knowledge - all chapters of the book up until 12 (including), intermediate linear algebra knowledge and I am also familiar with Perron-Frobenius theorem (I looked it up on Wikipedia). Generally the book didn't mention it so I don't think the solution they are aiming for relies on it. — codeplay, Mar 28 '24 at 13:29
@user8675309 I wrote my solution to the problem, think this works. Mind relating and posting what you had in mind? — codeplay, Mar 29 '24 at 13:48
sure. I think proof #1 will be the most familiar as it uses a maximum principle type argument. By the way, I think a more common way of dealing with periodicity is to look at a lazy chain $\frac{1}{2}\big(I+A\big)\leq \frac{1}{2}\big(I+P\big)$ but I prefer the above average. — user8675309, Mar 29 '24 at 16:44
More generally, if $0\le A\le B$ entrywise, $A\ne B$ and $B$ is irreducible, then $\rho(A)<\rho(B)$. See https://math.stackexchange.com/a/924257 for a proof. — user1551, Nov 18 '24 at 22:18

user8675309 · Accepted Answer · 2024-11-18T18:21:13.887

use the average instead
$S_A =\frac{1}{n}\sum_{k=0}^{n-1} A^k \leq \frac{1}{n}\sum_{k=0}^{n-1} P^k = S_P$
where the inequality is strict in at least one component. Note that $S_P\mathbf 1 = \mathbf 1$ and $S_A$ has a (Perron) eigenvalue $\geq 1$ iff $A$ does. Since $P$ is $n\times n$ and irreducible, $S_P$ is a positive matrix.

proof 0 (w/ analysis)
Construct the monotone sequence of matrices $S_k := \frac{1}{k}\cdot S_p + \frac{k-1}{k}\cdot S_A$ for $k \in \mathbb N$
$\implies S_p = S_1 \geq S_2 \geq S_3\geq \dots$
where the inequality holds point-wise and is strict in at least one component for each $k$ and all $S_k$ are positive matrices. Perron Theory then tells us that the Perron roots form a strictly monotone decreasing sequence
$1=\lambda_1^{(S_1)} \gt \lambda_1^{(S_2)}\gt \lambda_1^{(S_3)} \gt \dots$
$\implies 1 \gt \lambda_1^{(S_A)}$
e.g. by continuity of coefficients of the characteristic polynomial and Hurwitz from complex analysis. Alternatively there is a real analysis argument for taking the limit of the sequence in Meyer's "Continuity of the Perron Root" that gives the result as well (ref http://carlmeyer.com/pdfFiles/ContinuityOfThePerronRoot.pdf or http://arxiv.org/abs/1407.7564 )

proof 1 (no analysis)
$A$ decomposes into $r$ irreducibles $\implies S_A$ decomposes into a collection of $r$ irreducibles (i.e. it is permutation similar to a block lower triangular matrix).

Each irreducible of $S_A$ is either nilpotent [hence eigenvalues $=0$ class] or a positive matrix [each element in the irreducible of $A$ communicates with all others]. In the latter case, we are dealing with positive matrices whose row sums are $\leq 1$ and this is strict in at least one row [i.e. if $r\geq 2$ the irreducible is at most $n-1\times n-1$ and its row sum is bounded above by the sum across rows the same submatrix of $S_P$ which is $\lt 1$ since $S_P$ is a positive stochastic matrix; if $r=1$ then $S_A \leq S_P$ where the inequality is strict in at least one component]. Each irreducible [submatrix] $M$ of $S_A$ has a single Perron root $\lambda$ and $M \mathbf x = \lambda \mathbf x$ for some positive vector $\mathbf x$ but each element of $M\mathbf x$ is a 'complete' (sub)convex combination (i.e. all weights positive summing to at most $1$) of elements of $\mathbf x$ hence $\lambda \gt 1$ is impossible and if $\lambda =1$ then $M\mathbf x = \mathbf x \implies \mathbf x \propto \mathbf 1$ since $\mathbf x$ cannot have a maximum [i.e. if $\max_i x_i \gt \min_k x_k \gt 0$ then that max cannot be written as a complete (sub)convex combination of elements in $\mathbf x$]. But since at least one row of $M$ sums to $\lt 1$ then we know $M\mathbf 1 \neq \mathbf 1\implies \lambda \lt 1$. This is a maximum principle type finish; another way to finish would be to stick $M$ onto the leading principle submatrix of a matrix of zeros that has one extra column and row -- then turn that matrix into an absorbing state chain by incrementing the zeros in the relevant column so it is a stochastic matrix with a single absorbing state and a transient chain given by $M$.

proof 2 (no analysis and no consideration of reducibility of $A$)
First symmetrize the Perron vector for $S_P$ using diagonal matrices, i.e. $S_P^T$ has a (unique up to re-scaling) Perron vector $\mathbf w$ and define $\mathbf v_1:= \mathbf w^\frac{1}{2}\cdot \frac{1}{\big \Vert \mathbf w^\frac{1}{2}\big \Vert_2}$
(where the square root is understood to be taken component-wise)

$\Gamma:= \text{diag}\big(\mathbf v_1\big)$
$C_A:= \Gamma S_A \Gamma^{-1}$
$C_P:= \Gamma S_p \Gamma^{-1}$
checking that $C_p\mathbf v_1 = \Gamma S_p\mathbf 1 = \Gamma\mathbf 1=\mathbf v_1$ and $\mathbf v_1^T C_p =\mathbf w^TS_p\Gamma^{-1} = \mathbf w^T\Gamma^{-1}=\mathbf v_1^T$

left (and right) multiplication by a positive diagonal matrix preserves point-wise inequalities between non-negative matrices $\implies C_A \leq C_P$
with the inequality strict in at least one component, and $C_P$ is a positive matrix with a single dominant singular value $\sigma_1 = 1=\lambda_1$ the Perron root, with Perron vector $\mathbf v_1$ --to check: apply Perron theory to $\big(C_P^T C_P\big)\mathbf v_1=C_P^T\mathbf v_1=\mathbf v_1$. Finally

$\lambda_{\max \text{modulus}}\big(C_A\big)^2 \leq \Big \Vert C_A\Big \Vert_2^2 = \max_{\Vert\mathbf x\Vert_2=1} \mathbf x^TC_A^TC_A\mathbf x\leq \mathbf x^TC_P^TC_P\mathbf x\leq \Big \Vert C_P\Big \Vert_2^2 =1 $
observing that by Perron Frobenius Theory the maximal eigenvalue for $\big(C_A^TC_A\big)$ is real non-negative and has a real non-negative associated eigenvector $\mathbf x$ and selecting that maximizing $\mathbf x$ yields $\max_{\Vert\mathbf x\Vert_2=1} \mathbf x^TC_A^TC_A\mathbf x\leq \mathbf x^TC_P^TC_P\mathbf x$ , but $\mathbf x^TC_P^TC_P\mathbf x\leq \Big \Vert C_P\Big \Vert_2^2$ being met with equality implies $\mathbf x=\mathbf v_1$, a (particular) positive vector but that implies the prior inequality is strict, i.e. it would imply $(\lambda_{\max \text{modulus}})^2 \leq \max_{\Vert\mathbf x\Vert_2=1} \mathbf x^TC_A^TC_A\mathbf x=\mathbf v_1^TC_A^TC_A\mathbf v_1\lt \mathbf v_1^TC_P^TC_P\mathbf v_1 =1$. Thus the spectral radius of $S_A$ is $\lt 1$.

note that proof 2 works directly with lazy chains as well, i.e. $S_A:=\frac{1}{2}\big(I+A\big)$ and $S_P:=\frac{1}{2}\big(I+P\big)$. Then $S_P^T S_P$ is irreducible as it is a simple average of non-negative matrices which includes irreducible $P$ hence the sum is a chain that is irreducible. This means $C_P$ is irreducible as well with a single dominant singular value $\sigma_1 = 1=\lambda_1$ the Perron root. — user8675309, Jul 01 '24 at 16:09

codeplay · Answer 2 · 2024-03-29T17:46:46.193

I think I figured the solution:
Assume $P(i,k) > A(i,k)$ for some $i,k \in \mathcal{X}$ where $\mathcal{X}$ is the states space so $P$ and $A$ are of size $|\mathcal{X}| \times |\mathcal{X}|$. Also assume by contradiction $Av = \lambda v$ for some $|\lambda| \ge 1$.
Then we have:
$\begin{align*}\\ |Av(i)| = |\sum_{j \in \mathcal{X}} A(i,j)v(j)| \le \sum_{j \in \mathcal{X}} A(i,j)|v(j)| \le \sum_{j \in \mathcal{X}} A(i,j)\max\limits_{l \in \mathcal{X}} |v(l)|\\ < \max\limits_{l \in \mathcal{X}} |v(l)| \end{align*}$
In addition:

$|Av(i)| = |\lambda| |v(i)|$

Hence: $|v(i)| < \max\limits_{l \in \mathcal{X}} |v(l)|$
Meaning not all the vector elements are the same (in absolute value). Now assume P is aperiodic (else use @user8675309 proposition) then there exist time t such that all elements in $P^t$ are strictly positive and the original element - wise inequality still holds.
Denote $A' = A^t$, $P' = P^t$ and let $s \in \mathcal{X}$

$\begin{align*}\\ |A'v(s)| = |\sum_{j \in \mathcal{X}} A'(s,j)v(j)| \le \sum_{j \in \mathcal{X}} A'(s,j)|v(j)| \le \sum_{j \in \mathcal{X}} P'(s,j) |v(j)| < \max\limits_{l \in \mathcal{X}} |v(l)| \end{align*}$

The last inequality is due to the facts that $P^t$'s elements are strictly positive and as we shown earlier there is an element which is smaller in absolute value than the max. The previous is true for all states and in particular for state $s'$ achieving the max in absolute value:
$|A'v(s')| = |\lambda ^t| |v(s')| < |v(s')|$

The previous is obviously a contradiction and we conclude our proof.

it may be a notation issue that I'm not understanding, but isn't $ \max\limits_{l \in \mathcal{X}} |v(l)|$ a fixed scalar? If so, then shouldn't $ \sum_{j \in \mathcal{X}} P'(s,j)\max\limits_{l \in \mathcal{X}} |v(l)| = \max\limits_{l \in \mathcal{X}} |v(l)|\cdot \sum_{j \in \mathcal{X}} P'(s,j) = \max\limits_{l \in \mathcal{X}} |v(l)|$? since $P$ is (row) stochastic? — user8675309, Mar 29 '24 at 16:47
You are absolutely right I got confused mid - writing. So instead of the max value there, it's supposed to be summation over v's elements in absolute values. — codeplay, Mar 29 '24 at 17:43

Markov Chain Transition Matrix eigenvalues - Textbook Exercise

2 Answers2

Linked