3

The Problem. If $A$ is a complex $n \times n$ matrix and $b$ is a complex column vector and if $\text{rank }\begin{bmatrix} A - \lambda I & b \end{bmatrix} = n$ for every eigenvalue $\lambda$ of $A$, then $b$ is cyclic.

My Attempt. I know that $\text{rank}\begin{bmatrix} A - \lambda I \end{bmatrix} < n$ because $\text{nullity}(A - \lambda I)$ is nontrivial since $\lambda$ is an eigenvalue. I know that $\text{rank}\begin{bmatrix} A - \lambda I & b \end{bmatrix} = n$ implies that

  1. $b$ is not in $\text{range}(A - \lambda I)$,
  2. $\text{rank}\begin{bmatrix} A - \lambda I \end{bmatrix} = n - 1$

I know that point 2 implies that the geometric multiplicity of each eigenvalue is 1, so the minimal polynomial of $A$ equals the characteristic polynomial. I know that equality of these two polynomials occurs if and only if $A$ has a cyclic vector. What I don't know is: Why is $b$ specifically a cyclic vector?

My Research. I have posted about this already here where I have posted the problem from which this statement comes and where a kind person has provided a solution to the problem. But I still don't understand. I have reread chapters 6 and 7 of Hoffman and Kunze, which deal with eigendecomposition (chapter 6) and cyclic decomposition (chapter 7) and I have consulted the internet. So, please, can you provide some intuition about all of this? Specifically, how do I know that a cyclic vector not only exists, but that $b$ is that cyclic vector?

3 Answers3

3

As you observed, ${\rm rank}(A-\lambda I)=n-1\Rightarrow {\rm ker}(A-\lambda I)=1,$ which implies that there is only one Jordan block for each eigenvalue and the characteristic polynomial agrees with the minimal polynomial.

(0) Basic setup: Let the characteristic polynomial be $$f(x)=\prod_{i=1}^r(x-\lambda_i)^{d_i},\sum d_i=n.$$ Let $W_i={\rm ker}(A-\lambda_i I)^{d_i}.$ Then $$V={\mathbb C}^n=\bigoplus_{i=1}^r W_i,\dim W_i=d_i.$$ Note that each $W_i$ is invariant under $A$.

(1) A cyclic vector exists: For each $i$, choose $b_i\in {\ker}(A-\lambda_i I)^{d_i}\setminus {\rm ker}(A-\lambda_i)^{d_i-1}$ (such $b_i$ exists, otherwise the minimal polynomial will be a proper factor of $f(x)$). Necessarily $b_i\notin {\rm Im}(A-\lambda_i I).$ Using definition, it is easy to show that $$b_i,(A-\lambda_i)b_i,\cdots,(A-\lambda_i I)^{d_i-1}$$ form a basis for $W_i,$ hence $W_i$ is spanned by $b_i,Ab_i,\cdots,A^{d_i-1}b_i.$ Now let $b=b_1+\cdots+b_r.$ Then one needs to show that $$<b,Ab,\cdots,A^{n-1}b>_{\mathbb C}=V.\qquad (1)$$ To see this, it suffices to show that $W_i\subset V$ for each $i$. Without loss of generality, one proves the case $i=1$. Note that $$\prod_{i\geq 2}(A-\lambda_i I)^{d_i}b=\prod_{i\geq 2}(A-\lambda_i I)^{d_i}b_1=[q(A)(A-\lambda_1 I)+\beta I]b_1,$$ where $q$ is some polynomial and $\beta=\prod_{i\geq 2}(\lambda_1-\lambda_i)^{d_i}\neq 0.$ Now it’s easy to check that $$b_1’:=[q(A)(A-\lambda_1 I)+\beta I]b_1\in {\rm ker}(A-\lambda_1 I)^{d_1}\setminus {\rm ker}(A-\lambda_1I)^{d_1-1}.$$ By exactly the same argument above for $b_i$, one sees that $$W_1=<b_1’,Ab_1’,\cdots,A^{d_1-1}b_1’>_{\mathbb C}\subset V.$$ Similarly, $W_i\subset V,i=2,\cdots,r,$ hence $(1)$ is proven and thus $b$ is a cyclic vector.

(2) The given $b$ in the question is a cyclic vector: To see this, write $b=b_1+\cdots+b_r$ in the decomposition $W_1\oplus\cdots\oplus W_r.$ One can show that each $$b_i\in {\rm ker}(A-\lambda_i I)^{d_i}\setminus{\rm ker}(A-\lambda_i I)^{d_i-1},$$ hence by (1), $b$ is a cyclic vector. Without loss of generality, one proves the case $i=1$, the other cases being similar. By the given assumptions $$B:=[A-\lambda_1I\qquad b_1+\cdots+b_r]$$ has rank $n$. Clearly $${\rm Im}(A-\lambda_1 I)\subset {\rm ker}(A-\lambda_1 I)^{d_1-1}\oplus W_2\oplus+\cdots\oplus W_r,\qquad (2)$$ since $$(A-\lambda_1I)^{d_1-1}((A-\lambda_1I)(v_1+\cdots+v_r))=0+\sum_{i\geq 2}(A-\lambda_1I)^{d_1}v_i$$ for any $v=v_1+\cdots+v_r$ in the decomposition $W_1\oplus\cdots\oplus W_r$ and $W_i$’s are invariant spaces. Now aiming for contradiction, assume that $b_1\in {\rm ker}(A-\lambda_1I)^{d_1-1}.$ Then $$b_1+\cdots+b_r\in {\rm ker}(A-\lambda_1I)^{d_1-1}\oplus W_2\oplus\cdots\oplus W_r.\qquad (3)$$ Combining $(2)$ and $(3)$, by dimension count, the rank of $B$ would be at most $n-1$. This is a contradiction. QED

Acknowledgement: I would like to thank Chris Sanders for pointing out problem in my earlier post which naively assumed that $A$ is diagonalizable.

Pythagoras
  • 7,149
  • Thank you! The point about $b_i \in \text{ker}(A - \lambda_i I)^{d_i}$ and $b_i \notin \text{ker}(A - \lambda_i I)^{d_i - 1}$ is a very useful way to think about this. I appreciate the time and effort that went into your response. – 1Teaches2Learn Mar 29 '22 at 18:03
1

Main prerequisites: Jordan normal form, basic properties of the ring of complex polynomials

The key fact is that in the Jordan normal form of $A$,

if there is more than one block, then these blocks must have distinct eigenvalues.

Two blocks $B,C$ cannot have the same eigenvalue $\lambda$, or else $\text{Ker}(A-\lambda I)$ would have at least two linearly independent vectors, one coming from $B$ and one coming from $C$.

By the way, any block that is $n$-by-$n$ for $n>1$ must have $1$'s above the diagonal. In other words, you can't have $\begin{bmatrix}\lambda &0\\0&\lambda\end{bmatrix}$ as one of the blocks, because this really just is two blocks each of size $1$-by-$1$ with the same eigenvalue.

$\\$

So let's say we've written out the Jordan normal form according to a suitable basis of the vector space $V$. We have a block $C_i$, of size $c_i$ and of eigenvalue $\mu_i$, and let the subspace corresponding to $C_i$ be $S_i$. Now, $C_i$ is a linear transformation on $S_i$. Also, the vector space is a direct sum of $S_i$, so according to that direct sum let $b_i$ be the $S_i$-component of $b$.

$\\$

First case: $c_i>1$. In other words, $C_i$ is not a $1$-by-$1$ block.

When you write out $b$ as a linear combination of the basis vectors of $V$, the coefficient attached to the bottom vector for $S_i$ (which is one of the basis vectors) cannot be $0$. Otherwise, if that coefficient were $0$, then when you write out $[A-\lambda I\;|\;b]$, the row that has that vector in $S_i$ would have all its entries as $0$, and the rank of $[A-\lambda I\;|\;b]$ would be at most $n-1$.

If you're confused about what the bottom vector means, write out an example of a Jordan block with $1$'s above the diagonal.

Now we talk about polynomials. For a complex polynomial $p$, if $p(A)b=0$, then $p(C_i)b_i=0$.

Now, what polynomials $p(z)\in\mathbb{C}[z]$ satisfy $p(C_i)b_i=0$? Call the set of such polynomials $P_i$. Now, $P_i$ is an ideal in $\mathbb{C}[z]$ and therefore a principal ideal. There is a unique $t_i(z)\in\mathbb{C}[z]$ such that $t(z)$ has leading coefficient $1$ and $P_i$ is precisely the set of multiples of $t_i(z)$.

By the definition of a Jordan block, the polynomial $(z-\mu_i)^{c_i}$ is in $P_i\subset \mathbb{C}[z]$. Therefore, $t_i(z)$ is a factor of $(z-\mu_i)^{c_i}$. The only possibilities consist of $t_i(z)=(z-\mu_i)^{d}$, where $d\leq c_i$.

However, for $d$ less than $c_i$, because of the non-zero bottom-vector coefficient I mentioned earlier, when you calculate $(C-\mu_i)^{d}b_i$ you will still be left with a non-zero coefficient for one of the basis vectors.

Therefore, $t_i(z)=(z-\mu_i)^{c_i}$.

$\\$

Second case: $C_i$ is a $1$-by-$1$ block. In other words, $c_i=1$.

Again, for similar reasons, when you write out $b$ as a linear combination of the basis vectors of $V$, the coefficient corresponding to the vector for $S_i$ cannot be $0$. Furthermore, $t_i(z)=z-\mu_i$.

$\\$

Now, we go back to the point that for a complex polynomial $p$, if $p(A)b=0$, then $p(C_i)b_i=0$.

We can now deduce that if $p(A)b=0$, then $p(z)$ is a multiple of $(z-\mu_i)^{c_i}$ for each $i$. Remember that these $\mu_i$ are all distinct.

So $\displaystyle\prod_i (z-\mu_i)^{c_i}$ is the characteristic polynomial of $A$, which we denote as $\text{char}_A(z)$.

So if $p(A)b=0$, then $\text{char}_A(z)$ divides $p(z)$.

But equally, if $\text{char}_A(z)$ divides $p(z)$, then we know (by the basic properties of the characteristic polynomial) that $p(A)b=0$.

So $p(A)b=0$ if and only if $\text{char}_A(z)$ divides $p(z)$.

Therefore, there is no non-zero polynomial $r(z)$ of degree less than the dimension of the vector space $V$ such that $r(A)b=0$. Let's call the dimension of the vector space $n>1$, because $n=1$ would be a trivial case.

This implies that $b,Ab,\ldots,A^{n-1}b$ are linearly independent.

  • Ah! So in short, the generalized eigenspaces are each cyclic. The given $b$ must contain a nonzero component of each generator of each generalized eigenspace. So the only polynomial that can annihilate $b$ is the full characteristic polynomial. I see now. Thank you so much! – 1Teaches2Learn Mar 29 '22 at 18:00
1

This is basically an essential result in control theory about the controllability of a linear system.

The following statements are equivalent:

  1. The linear system $\dot{x}=Ax+bu$ is controllable.
  2. $\mathrm{rank}\begin{bmatrix}b & Ab & \ldots& A^{n-1}b \end{bmatrix}=n$.
  3. $\mathrm{rank}\begin{bmatrix}A-\lambda I & b\end{bmatrix}=n$ for all $\lambda\in\mathbb{C}$.
  4. For all left-eigenvectors $v^*\in\mathbb{C}^n$ associated with the eigenvalue $\lambda^*$ (i.e. $v^*A=\lambda^*v^*$), we have that $v^*b\ne 0$.
  5. The matrix $W$, called the controllability Gramian, defined as $$W:=\int_0^\infty \exp(As)bb^*\exp(A^*s)ds$$ is full-rank.

For more details, the Wikipedia page can be checked https://en.wikipedia.org/wiki/Controllability_Gramian or the book by E. Sontag, "Mathematical Control Theory".

Generalizations of those results exist for time-varying linear systems and for nonlinear systems.

KBS
  • 7,903