If $B = x(xI-A)^{-1}$ for a generator matrix $A$, then $B-B^2$ has positive diagonal elements

Question

Let $A$ be the generator matrix of a continuous-time Markov chain. This means that $A$ has positive off-diagonal elements $A_{ij} > 0$, $i \ne j$, and row sums $\sum_j A_{ij}$ equal to $0$. For example, $A$ could be $$ A = \left( \begin{matrix} -7 & 4 & 3 \\ 1 & -2 & 1 \\ 3 & 5 & -8 \end{matrix} \right). $$ I am interested in proving the following claim about the matrix $B = x(x I - A)^{-1}$ for some $x > 0$.

Claim. For the matrix $B = x(x I - A)^{-1}$, it holds that the diagonal elements of $B - B^2$ are non-negative.

Using numerical simulations I have convinced myself that this claim is likely true; however, I have not been able to make much progress toward proving it.

It is straightforward to show that the matrix $B$ is stochastic. However, the claim above is not true for all stochastic matrices $B$; there is something special about stochastic matrices of this particular form.

Any ideas?

I suggest you just write $B = ( I - A)^{-1}$ as the $x$ doesn't contribute anything. I.e. you are asking about a claim for $B = x(x I - A)^{-1}$ to hold for arbitrary $x\gt 0$, for abittrary generator matrix $A$ which is equivalent to asking about $B = ( I - x^{-1}A)^{-1}$ but this is equivalent to asking about $B = ( I - A)^{-1}$ since rescaling $A$ by a positive number does not change the rows summing to 0 or the off diagonals of $A$ being $\gt 0$ [i.e. re-scaling by $x^{-1}$ maps from one arbitrary generator to another]. — user8675309, Mar 02 '24 at 02:18

score 3 · Accepted Answer · edited Mar 02 '24 at 17:38

Let me continue from @user1551's idea and show that indeed $\mathbf{B} - \mathbf{B}^2$ has non-negative diagonals.

Step 1. We recap @user1551's reduction.

Let $\mathbf{A}$ be a transition-rate matrix as in OP, and let $\mathbf{B} = ( \mathbf{I} - \mathbf{A})^{-1}$. (Here, we are assuming $x=1$ without losing the generality.) Choose $d > 0$ sufficiently large so that

$$ \mathbf{P} = \mathbf{I} + \frac{1}{d} \mathbf{A} $$

is a stochastic matrix. Solving this for $\mathbf{A}$ gives $\mathbf{A} = d(\mathbf{P} - \mathbf{I})$, hence

\begin{align*} \mathbf{B} - \mathbf{B}^2 &= -\mathbf{A}(\mathbf{I} - \mathbf{A})^{-2} \\ &= \frac{d}{(d+1)^2} (\mathbf{I} - \mathbf{P})(\mathbf{I} - c \mathbf{P})^{-2} \end{align*}

where $c = \frac{d}{d+1} \in (0, 1)$. In light of this, it suffices to prove:

Claim. Let $\mathbf{P}$ be a stochastic matrix. Then for any $c \in (0, 1)$, the diagonal entries of $$ (\mathbf{I} - \mathbf{P})(\mathbf{I} - c\mathbf{P})^{-2} $$ are non-negative.

Step 2. Let $X = (X_n)_{n\geq 0}$ denote a Markov chain with the transition matrix $\mathbf{P}$. Fix a state $j$, and define $(\tau_k)_{k\geq 0}$ as the sequence of return times to state $j$. More precisely,

$$ \tau_0 = 0 \qquad \text{and} \qquad \tau_{k+1} = \inf\{ n > \tau_k : X_n = j \}. $$

If $\mathbb{P}_j$ denotes the law of $X$ started at $j$, then

\begin{align*} \mathbf{e}_j^{\top} (\mathbf{I} - c \mathbf{P})^{-1} \mathbf{e}_j &= \sum_{n=0}^{\infty} c^n (\mathbf{e}_j^{\top} \mathbf{P}^n \mathbf{e}_j) = \sum_{n=0}^{\infty} c^n \mathbb{P}_j(X_n = j) \\ &= \sum_{n=0}^{\infty} \sum_{k=0}^{\infty} c^n \mathbb{P}_j(\tau_k = n) = \sum_{k=0}^{\infty} \mathbb{E}_j[c^{\tau_k}] = \sum_{k=0}^{\infty} \mathbb{E}_j[c^{\tau_1}]^k \\ &= \frac{1}{1 - \mathbb{E}_j[c^{\tau_1}]}. \end{align*}

This computation is related to the original problem as follows:

\begin{align*} \frac{\mathrm{d}}{\mathrm{d}c} \frac{1 - c}{1 - \mathbb{E}_j[c^{\tau_1}]} &= \frac{\mathrm{d}}{\mathrm{d}c} \mathbf{e}_j^{\top} (1 - c)(\mathbf{I} - c \mathbf{P})^{-1} \mathbf{e}_j \\ &= - \mathbf{e}_j^{\top} (\mathbf{I} - \mathbf{P})(\mathbf{I} - c\mathbf{P})^{-2} \mathbf{e}_j. \end{align*}

Step 3. The above relation shows that the claim is equivalent to showing:

Claim 2. The function $$ f(c) = \frac{1 - c}{1 - \mathbb{E}_j[c^{\tau_1}]} $$ is non-increasing for $c \in (0, 1)$.

This is the same as showing $1/f(1-x)$ is non-increasing for $x \in (0, 1)$. However,

\begin{align*} \frac{1}{f(1-x)} &= \frac{1 - \mathbb{E}_j[(1-x)^{\tau_1}]}{x} = \int_{0}^{1} \mathbb{E}_j[ \tau_1 (1 - xt)^{\tau_1 - 1} ] \, \mathrm{d}t \end{align*}

It is clear that $x \mapsto \tau_1 (1 - xt)^{\tau_1 - 1}$ is non-increasing for each $t \in [0, 1]$, hence the claim is proved. $\square$

user1551 · Answer 2 · 2024-03-02T05:57:17.517

Not an answer, but some observations. Let $d$ be the maximum diagonal entry of $-\frac{1}{x}A$. Then $S=I+\frac{1}{xd}A$ is a stochastic matrix whose diagonal elements are less than $1$. Conversely, given any $d>0$ and any stochastic matrix $S$ whose diagonal elements are less than $1$, $A=-xd(I-S)$ is a generator of a CTMC. Therefore, while the stochastic matrix $B$ in the OP takes a very special form and the statement that $B-B^2$ has a nonnegative diagonal is not true for a general stochastic matrix $B$, if we express $B$ in terms of $S$, we may reformulate the statement in question as one about an almost general stochastic matrix $S$.

More specifically, let $c=\frac{d}{d+1}$. Then $0<c<1$ and \begin{align*} B-B^2 &=(B^{-1}-I)B^2\\ &=-\frac{A}{x}\left(I-\frac{A}{x}\right)^{-2}\\ &=d(I-S)\big(I+d(I-S)\big)^{-2}\\ &=d(d+1)^{-2}(I-S)(I-cS)^{-2}.\\ \end{align*} Hence the OP is essentially asking whether $(I-S)(I-cS)^{-2}$ has a nonnegative diagonal for any $c\in(0,1)$ and any stochastic matrix $S$ whose diagonal elements are less than $1$.

When $c$ is small, the answer is clearly affirmative, because $(I-cS)^{-2}$ is close to $I$ and $e_j^T(I-S)(I-cS)^{-2}e_j\approx e_j^T(I-S)e_j=(1-s_{jj})\ge0$. In fact, for any $i\ne j$, we have \begin{align*} \left((I-cS)^{-2}\right)_{jj} &=\left(I+2cS+3c^2S^2+4c^3S^3+\cdots\right)_{jj}\ge1,\\ \left((I-cS)^{-2}\right)_{ij} &=\left(I+2cS+3c^2S^2+4c^3S^3+\cdots\right)_{ij}\\ &\le2c+3c^2+4c^3+\cdots\\ &=(1-c)^{-2}-1. \end{align*} Therefore, when $(1-c)^{-2}-1\le1$, i.e., when $c\le1-\frac{1}{\sqrt{2}}\approx0.2929$ or $d\le\sqrt{2}-1$, the largest element in the nonnegative vector $v=(v_1,v_2,\ldots,v_n)^T=(I-cS)^{-2}e_j$ will be $v_j$. Consequently, \begin{align*} e_j^T(I-S)(I-cS)^{-2}e_j &=e_j^T(I-S)v\\ &=(1-s_{jj})v_j-\sum_{k\ne j}s_{jk}v_k\\ &\ge(1-s_{jj})v_j-\sum_{k\ne j}s_{jk}v_j\\ &=\left(1-\sum_{k=1}^ns_{jk}\right)v_j\\ &=0. \end{align*} This settles the question for small $c$. However, numerical simulation suggests that the answer is affirmative for actually any $c\in(0,1)$.

leslie townes · Answer 3 · 2024-03-08T17:22:54.707

Although a solution has been posted using analytical means, I note that it is possible to see the nonnegativity of the diagonal elements of $B - B^2$ purely algebraically, via combinatorial interpretation of the entries of the matrix $B$.

Fix $n$ and regard $A$ as a function of its off-diagonal entries $a_{ij}$, $i \neq j$, with its diagonal entries determined hy the requirement that the row sums of $A$ are all zero. A monomial in the variables $a_{ij}$, $i \neq j$, can be thought of as a directed graph on the vertices $\{1,2,\dots,n\}$ without "self-loops": the power of $a_{ij}$ in the monomial records how many directed edges go from $i$ to $j$ in the graph (the distinctness of the indices $i$ and $j$ in the variables $a_{ij}$ means that no vertex is ever connected to itself by a single edge). When $G$ is such a directed graph I will write $a_G = \prod_{(i,j) \in G} a_{ij}$ for the corresponding monomial (this product being interpreted "with multiplicity" whenever $G$ has more than one edge from $i$ to $j$ - although such $G$ will not arise in the cases of interest below). A forest on the vertices $\{1,2,\dots,n\}$ is an undirected graph on these vertices whose connected components contain no loops (i.e., are trees), and a rooted forest is a forest whose connected components each have a distinguished vertex. A rooted forest can be regarded as a directed graph by orienting each edge toward the root of its component. I will write $\mathcal{F}_k$ for the set of all rooted forests on $\{1,\dots,n\}$ having exactly $k$ components (equivalently, having exactly $n-k$ edges), and for $1 \leq j \leq n$ I will write $\mathcal{F}_k(j)$ for the set of rooted forests on $\{1,\dots,n\}$ in which $j$ is a root. Note that there is exactly one element $F$ of $\mathcal{F}_n$ (the graph having vertices $\{1,2,\dots,n\}$ and no edges), and in this case the corresponding monomial $a_F$ is $1$ and the sets $\mathcal{F}_n(j)$ are all equal to $\mathcal{F}_n$.

Letting $B := x(xI - A)^{-1}$ and denoting the row $i$, column $j$ entry of $B$ by $B_{ij}$, and letting $d(xI - A) := \det(xI - A)/x$ (note that it is clear a priori that $x$ divides $\det(xI - A)$), the combinatorial interpretations of the entries of $B$ promised up above are that $$ d(xI-A) = \sum_{k=1}^n \left(\sum_{F \in \mathcal{F}_k} a_F\right) x^{k-1}, $$ and that $$ d(xI-A) B_{ij} = \sum_{k=1}^n \left(\sum_{F \in \mathcal{F}_k(j), i \sim_F j} a_F\right) x^{k-1}, \qquad 1 \leq i, j \leq n, $$ where $i \sim_F j$ means that $i$ and $j$ belong to the same connected component of $F$ (a condition that is vacuous when $i=j$). In this last formula note that when $i \neq j$ the inside sum corresponding to the index value $k=n$ is empty (because the one element of $\mathcal{F}_n$ has no edges) and does not contribute to the sum.

As the OP observed, it is straightforward to show that $B$ is (right) stochastic, i.e., that it has all its row sums equal to $1$. The formulas above are a refinement of that observation. That the rows of $B$ sum to $1$, or equivalently that the rows of $d(xI-A) B$ sum to $d(xI - A)$, reflects the fact that for any row index $i$, the rooted forests on the vertices $\{1,\dots,n\}$ can be partitioned according to the root of the component that contains $i$, with $d(xI - A) B_{ij}$ accounting for those forests in which $i$ lies in a component in which $j$ is a root, and with $d(xI-A)$ accounting for all of these forests.

To see how these formulas imply the nonnegativity of the diagonal of $B - B^2$, note that for any stochastic matrix $B$ (i.e. not just $B = x(xI - A)^{-1}$ with $A$ as above) we have $$ B_{ii} = \left(\sum_{j=1}^n B_{ij}\right) B_{ii} = \sum_{j=1}^n B_{ij} (B_{ji} + B_{ii} - B_{ji}) = [B^2]_{ii} + \sum_{j \neq i} B_{ij} (B_{ii} - B_{ji}). $$ For $B$ of the above specific type (but not for all stochastic matrices), our formulas will show that $B_{ii} - B_{ji} \geq 0$ for all $j \neq i$, which then (by the formula displayed immediately above) implies $B_{ii} \geq [B^2]_{ii}$ for all $i$.

Indeed, while $d(xI - A) B_{ii}$ is a sum over all rooted forests in which $i$ is a root, for $i \neq j$ the term $d(xI-A) B_{ji}$ accounts for only those forests in which $i$ is a root and $j$ is in the same component as $i$. For $i \neq j$ the difference $p_{ji} := d(xI - A) B_{ii} - d(xI - A) B_{ji}$ is thus the sum $$ p_{ji} = \sum_{k=1}^n \left(\sum_{F \in \mathcal{F}_k(i), j \not \sim_F i} a_F\right) x^{k-1}; $$ which is plainly nonnegative whenever $x$ and the $a_{ij}$ are because it is a sum of monomials in these variables with nonnegative integer coefficients (indeed, any positive coefficient is $1$). Since $d(xI - A)$ is also such a sum (and is positive for positive $x$ and positive values of the $a_{ij}$), the nonnegativity of $B_{ii} - B_{ij}$ for $i \neq j$, and hence the nonnegativity of $B_{ii} - [B^2]_{ii}$ for all $1 \leq i \leq n$ follows.

The formula for $d(xI-A) = \det(xI-A)/x$ and the formulas for the matrix entries $B_{ij}$ (which are themselves determinant formulas by e.g. Cramer's rule) are slightly nontrivial to verify. Note, for example, that while it is straightforward to see that $d(xI - A)$ is a sum of monomials in $x$ and $a_{ij}$ of degree $n-1$, it is perhaps less clear which monomials of this form occur in the expansion and which do not. When $n=4$, for example, the formula immediately shows that the monomial $a_{12} a_{24} a_{41}$ does not appear with nonzero coefficient in $\det(xI - A)/x$ (the corresponding directed graph has a loop), although the monomial $a_{12} a_{24} a_{41} x$ does indeed appear with nonzero coefficient in the product of the diagonal entries of $xI - A$ (as do all degree $4$ products of $x$ with three "$a$" variables taken from distinct rows). So the formula above for $d(xI-A)$ implies that a substantial amount of cancellation would take place were one to simply evaluate the determinant via Laplace expansion (for example).

Some general remarks on the formula for $d(xI - A)$:

It is a polynomial of degree $n-1$ in $x$ with leading term $x^{n-1}$.
The coefficient of $x^{n-2}$ in $d(xI-A)$ is the sum of all variables $a_{ij}$, $i \neq j$.
The coefficient of $x^{n-3}$ in $d(xI-A)$ is the sum of all products of pairs of variables $a_{ij} a_{kl}$, $i \neq j$, $k \neq l$, where we additionally require that $a_{ij}$ and $a_{kl}$ not occupy positions that are reflections of one another across the diagonal, or in the same row - or more suggestively for the general case, where the directed edges from $i$ to $j$ and from $k$ to $l$ do not together form a loop (preventing the monomial from representing part of a tree) or point in two "directions" from the same point (preventing the monomial from representing part of a rooted tree).
In general, the coefficient of $x^{n-(d+1)}$ in $d(xI-A)$ is a sum of degree $d$ monomials $a_{i_1 j_1} a_{i_2 j_2} \cdots a_{i_d j_d}$ in the variables $a_{ij}$, where a given monomial appears if and only if the directed graph with vertices $\{1,\dots,n\}$ and edges $\{(i_k, j_k): 1 \leq k \leq d\}$ is a rooted forest. When $d = n-1$ this is the requirement that $\{(i_k, j_k): 1 \leq k \leq d\}$ define a rooted tree. There are famously $n^{n-1}$ such trees (and hence such monomials) by Cayley's formula (which tells us there are $n^{n-2}$ labeled trees on $\{1,2,\dots,n\}$, each of which can be rooted in $n$ different ways).
The coefficient of $x^{k-1}$ in $d(xI - A)$ is in general a sum of $|\mathcal{F}_k|$ monomials in the variables $a_{ij}$ (each monomial having degree $n-k$). As just observed, it is well known that $|\mathcal{F}_1| = n^{n-1}$; I do not know of a simple citation for where the numbers $|\mathcal{F}_k|$ (equivalently, the number of rooted forests on $\{1,2,\dots,n\}$ involving $n-k$ edges) have been computed in print, but this MO post provides express (if somewhat ugly) formulas for the number of labeled forests with $k$ components on $\{1,\dots,n\}$ (that is, counting the objects in $\mathcal{F}_k$ up to differences in choices of roots).
One can use computer software to expressly compute and check all of the formulas above, although they involve large numbers of terms even when $n$ is as small as $3$ or $4$, and the large amount of symmetry in the problem makes it difficult (at least for me) to visually process, let alone independently evaluate, the resulting output.

Like a commenter mentioned, the $x$ can be dropped. Does this make your solution easier? — Benjamin Wang, Mar 08 '24 at 17:35
@BenjaminWang I suppose it depends on what one means by "easier"; $x$ does not appear in a particularly complicated way in these formulas, and depending on how you look at it, is arguably helpful in keeping track of where monomials of different degree in the $a_{ij}$ appear in the relevant determinants. The main formulas do simplify (there is certainly no need to parameterize sums over rooted forests by the number "$k$" of components), but this simplification is primarily visual. For purposes of computing anything nontrivial, IMVHO it is not any easier to take $x=1$ — leslie townes, Mar 08 '24 at 17:45
The above also shows directly e.g. that if $x$ and the $a_{ij}$ lie in some subset of your field (or commutative ring) that is closed under $+$ and $$, then so do the ingredients in the relevant formulas (of course division is necessary to get the matrix entries themselves). The "rescale by $x^{-1}$" reduction makes one think for a second about determinants and how this plays out in a field of quotients before seeing this, while the above has this on the surface. (For purposes of e.g. integer computation on computers, it may also be helpful not* to "just" rescale by $x^{-1}$ at the outset) — leslie townes, Mar 08 '24 at 18:14

user8675309 · Answer 4 · 2024-03-11T18:58:00.603

The missing part of this post is a proof that $B$ is a stochastic matrix. Such a proof can quickly lead to a proof that OP's claim is true. I suggest using Inverse of strictly diagonally dominant matrix for reference.

Mimicking the link, we have $A' := I-A$ is a strictly diagonally dominant matrix with positive diagonal and negative off diagonal and $\mathbf 1 = A'\mathbf 1$. Now $\delta A'$ has all diagonal entries $\lt \frac{1}{n}$ for some $\delta \gt 0$ small enough. Write $Q = I- \delta A'$ which is a positive matrix and has Perron vector $\mathbf 1$ that satisfies $Q\mathbf 1 = (1-\delta) \mathbf 1$.

$\delta^{-1}\big(I-A\big)^{-1}=\big(\delta A'\big)^{-1}= \big(I-Q\big)^{-1}=I + Q + Q^2 + Q^3+\dots $
where the spectral radius of $Q$ is $1-\delta$ hence the series converges

(i.) This proves $\big(I-A\big)^{-1}$ is a positive matrix and $\big(I-A\big)^{-1}\mathbf 1 = \mathbf 1$ hence it is a stochastic matrix.
(ii.) $Q$, being strictly substochastic in each row, is a transition matrix for a transient chain hence it also proves OP's claim. i.e. $N=\big(I-Q\big)^{-1}= I+QN$ is the fundamental matrix for some absorbing state markov chain and $n_{k,j}$ is the expected number of times the process is in transient state $k$ given a start in transient state $j$.

For $j\neq k$, we may look at $N = I + QN$ to get $n_{k,j} = \sum_{r=1}^n q_{k,r}n_{r,j}$ i.e. it is a sub-convex combination of elements in $\mathbf n_j$ . Compare against $n_{k,k} = 1+\sum_{r=1}^n q_{k,r}n_{r,k}$ which is not a sub-convex combination. [This is equivalent to "first step analysis" for expected visits, though the probability interpretation is not needed.] There are finitely many values in each column of $N$, all of which are positive, hence a column max exists but said maximum cannot be a sub-convex combination of other values in that column $\implies n_{k,k}$ is maximal for each column.

$\implies \delta \cdot N =\big(I-A\big)^{-1} = B$ has diagonal elements strictly larger than other elements in their respective column $\implies B-B^2$ has a positive diagonal since the $k$th diagonal of $B^2=BB$ is a non-trivial average of elements in $\mathbf b_k$ hence is strictly less than the column max, $b_{k,k}$.

remark
If $A$ were symmetric, then we can get to the result via spectral theory since $\big(I-A\big)$ has all eigenvalues in $\mathbb R_{\geq 1}$. So all eigenvalues of $\big(I-A\big)^{-1}$ are $\in (0,1]$ and $\lt 1$ except for the Perron root which implies all eigenvalues of $B-B^2$ are $\lambda - \lambda^2 \geq 0$ with equality only for the Perron Root. Since the diagonal of a real symmetric matrix is in the convex hull of the eigenvalues of said matrix [majorization], the result follows. [With minor modification this gives the result for normal $A$ as well.] Corollary: If the chain is reversible then $B$ is diagonally similar to a symmetric matrix and this does not change eigenvalues or the diagonals of $B-B^2$ so the same spectral argument gives the result.

If $B = x(xI-A)^{-1}$ for a generator matrix $A$, then $B-B^2$ has positive diagonal elements

4 Answers4