3

Let $n \in \mathbb{N}$. For $f \in K[x]$ be monic with degree $n$, let $C_f$ denote the companion matrix of $f$. In this paper, the author says that it is easily seen that every matrix $A$ is similar to a matrix of the form $g(C_f)$ for some $f$ if $K$ is infinite, where $g$ can be any polynomial in $K[x]$. How is this proved?

The only observation I have made is that the characteristic polynomial of $g(C_f)$ is $\prod_i (t - g(\alpha_i))$, where the $(\alpha_i)$ are the roots of $f$, so the eigenvalues of $A$ must be $(g(\alpha_i))$. I don't think this helps very much.

J. S.
  • 641

2 Answers2

2

$\def\ed{\stackrel{\text{def}}{=}}$ As Ben Grossman surmises, the ideas in his proof can be used to obtain more general results. Here's a generalisation to the case when the characteristic polynomial of $\ A\ $ merely splits in $\ K\ ,$ and isn't restricted to being just an integral power of the indeterminate, as it is in his proof. It certainly doesn't make the result "easily seen", and I haven't been able to extend the proof to the case where not all of the roots of $\ A\text{'s}\ $ characteristic polynomial lie in $\ K\ .$ I'm guessing the ability to see the result "easily" might depend on some special knowledge of the properties of Frobenius normal forms and companion matrices which I don't possess.

My proof is based on the following observations:

  • Matrices with the same Jordan normal form are similar.
  • If $\ f(x)=\prod_\limits{i=1}^r\big(x-\rho_i\big)^{m_i}\ ,$ with $\ \rho_i\ne \rho_j\ $ for $\ i\ne j\ ,$ then a Jordan normal form for $\ C_f\ $ is $$ \mathfrak{J}_{C_f}\ed\pmatrix{J_{m_1}\big(\rho_1\big)&0_{m_1\times m_1}&\dots&0_{m_1\times m_1}\\ 0_{m_2\times m_2}&J_{m_2}\big(\rho_2\big)&\dots&0_{m_2\times m_2}\\ \vdots&&\ddots&\vdots\\ 0_{m_r\times m_r}&0_{m_r\times m_r}&\dots&J_{m_r}\big(\rho_r\big)}\tag{1}\label{e1} $$ where $$ J_k(\lambda)\ed\pmatrix{\lambda&1&0&&\dots&0&0\\ 0&\lambda&1&&\dots&0&0\\ 0&0&\lambda&&\dots&0&0\\ \vdots&\vdots&&\ddots&\ddots&\vdots&\vdots\\ 0&0&0&\dots&\dots&\lambda&1\\ 0&0&0&\dots&\dots&0&\lambda} $$ is a $\ k\times k\ $ Jordan block with main diagonal entries $\ \lambda\ .$
  • If $\ p\ $ is a polynomial, then\begin{align}p\big(J_k(\lambda)\big)&=\pmatrix{p(\lambda)&p'(\lambda)&p''(\lambda)&\dots&p^{(k-2)}(\lambda)&p^{(k-1)}(\lambda)\\ 0&p(\lambda)&p'(\lambda)&\dots&p^{(k-3)}(\lambda)&p^{(k-2)}(\lambda)\\ 0&0&p(\lambda)&\dots&p^{(k-4)}(\lambda)&p^{(k-3)}(\lambda)\\ \vdots&\vdots&&\ddots&&\vdots\\ 0&0&0&\dots&p(\lambda)&p'(\lambda)\\ 0&0&0&\dots&0&p(\lambda)}\\ &=p(\lambda)I_{k\times k}+\ \sum_{i=1}^k\sum_{j=i+1}^{k-1}p^{(j-i)}(\lambda)e_ie_j^T ,\tag{2}\label{e2}\end{align}where $\ e_i\ $ is the $\ i^\text{th}\ $ column of the $\ k\times k\ $ identity matrix $\ I_{k\times k}\ .$That is, $\ p\big(J_k(\lambda)\big)\ $ is a $\ k\times k\ $ upper triangular matrix with entry $\ p^{(j-i)}(\lambda)\ $ in row $\ i\ $ and column $\ j\ $ for $\ 1\le i\le j\le k\ .$
  • If \begin{align} B&=\pmatrix{b_0&b_1&b_2&\dots&b_{k-2}&b_{k-1}\\ 0&b_0&b_1&\dots&b_{k-3}&b_{k-2}\\ 0&0&b_0&\dots&b_{k-4}&b_{k-3}\\ \vdots&\vdots&&\ddots&&\vdots\\ 0&0&0&\dots&b_0&b_1\\ 0&0&0&\dots&0&b_0}\\ &=b_0I_{k\times k}+\ \sum_{i=1}^k\sum_{j=i+1}^{k-1}b_{j-i}e_ie_j^T\tag{3}\label{e3}\end{align}—that is, $\ B\ $ is a $\ k\times k\ $ upper triangular matrix with entry $\ b_{j-i}\ $ in row $\ i\ $ and column $\ j\ $ for $\ 1\le i\le j\le k\ $—and $\ b_1\ne0\ ,$ then $\ B\ $ has Jordan normal form $\ J_k\big(b_0\big)\ .$

The first two of the above observations are well known. For all I know, the other two might also qualify for that description, but if I've ever come across them before I'd long forgotten that encounter, and I had to (re?)discover and prove them for myself.

The idea of the overall proof is to take $\ f=\prod_\limits{i=1}^r\big(x-\rho_i\big)^{m_i}\ ,$ where the Jordan normal form of $\ A\ $ has $\ r\ $ blocks, $\ J_{m_1}(\lambda_1),$$\,J_{m_2}(\lambda_2),$$\,\dots,$$\,J_{m_r}(\lambda_r)\ ,$ and $\ \rho_1,\rho_2,\dots,\rho_r\ $ are any $\ r\ $ distinct elements of $\ K\ .$ This is the only part of the proof where the infinite cardinality of $\ K\ $ is invoked, and it would obviously be sufficient for $\ K\ $ to have cardinality at least $\ r\ $. Next, take $\ g\ $ to be a polynomial which satisfies the conditions \begin{align} g\big(\rho_i\big)&=\lambda_i\ \ \text{ and }\\ g'(\rho_i\big)&=1\tag{4}\label{e4} \end{align} for $\ 1\le i\le r\ .$ I give an explicit construction of such a polynomial below. The Jordan normal form of $\ C_f\ $ is given by the expression \eqref{e1}. Therefore, there exist an invertible matrix $\ W\ $ such that $$ W^{-1}C_fW=\mathfrak{J}_{C_f}\ , $$ and then \begin{align} W^{-1}g\big(C_f)W&=g\big(W^{-1}C_fW\big)\\ &=g\big(\mathfrak{J}_{C_f}\big)\\ &=\pmatrix{g\big(J_{m_1}\big(\rho_1\big)\big)&0_{m_1\times m_1}&\dots&0_{m_1\times m_1}\\ 0_{m_2\times m_2}&g\big(J_{m_2}\big(\rho_2\big)\big)&\dots&0_{m_2\times m_2}\\ \vdots&&\ddots&\vdots\\ 0_{m_r\times m_r}&0_{m_r\times m_r}&\dots&g\big(J_{m_r}\big(\rho_r\big)\big)}\\ &=\pmatrix{G_1&0_{m_1\times m_1}&\dots&0_{m_1\times m_1}\\ 0_{m_2\times m_2}&G_2&\dots&0_{m_2\times m_2}\\ \vdots&&\ddots&\vdots\\ 0_{m_r\times m_r}&0_{m_r\times m_r}&\dots&G_r}\ , \end{align} where \begin{align} G_i&\ed\pmatrix{g\big(\rho_i\big)&g'\big(\rho_i\big)&g''\big(\rho_i\big)&\dots&g^{(m_i-2)}\big(\rho_i\big)&g^{(m_i-1)}\big(\rho_i\big)\\ 0&g\big(\rho_i\big)&g'\big(\rho_i\big)&\dots&g^{(m_i-3)}\big(\rho_i\big)&g^{(m_i-2)}\big(\rho_i\big)\\ 0&0&g\big(\rho_i\big)&\dots&g^{(m_i-4)}\big(\rho_i\big)&g^{(m_i-3)}\big(\rho_i\big)\\ \vdots&\vdots&&\ddots&&\vdots\\ 0&0&0&\dots&g\big(\rho_i\big)&g'\big(\rho_i\big)\\ 0&0&0&\dots&0&g\big(\rho_i\big)}\\ &=\pmatrix{\lambda_i&1&g''\big(\rho_i\big)&\dots&g^{(m_i-2)}\big(\rho_i\big)&g^{(m_i-1)}\big(\rho_i\big)\\ 0&\lambda_i&1&\dots&g^{(m_i-3)}\big(\rho_i\big)&g^{(m_i-2)}\big(\rho_i\big)\\ 0&0&\lambda_i&\dots&g^{(m_i-4)}\big(\rho_i\big)&g^{(m_i-3)}\big(\rho_i\big)\\ \vdots&\vdots&&\ddots&&\vdots\\ 0&0&0&\dots&\lambda_i&1\\ 0&0&0&\dots&0&\lambda_i}\ , \end{align} by virtue of observation \eqref{e2} and the properties \eqref{e4} of $\ g\ .$ Now by observation \eqref{e3}, $\ G_i\ $ has Jordan normal form $\ J_{m_i}\big(\lambda_i\big)\ ,$ so there exists an invertible $\ m_i\times m_i\ $ matrix $\ U_i\ $ such that $\ U_i^{-1}G_iU_i=J_{m_i}\big(\lambda_i\big)\ .$ If we now define $$ U\ed\pmatrix{U_1&0_{m_1\times m_1}&\dots&0_{m_1\times m_1}\\ 0_{m_2\times m_2}&U_2&\dots&0_{m_2\times m_2}\\ \vdots&&\ddots&\vdots\\ 0_{m_r\times m_r}&0_{m_r\times m_r}&\dots&U_r} $$ then we have \begin{align} U^{-1}W^{-1}g\big(C_f)WU&=\pmatrix{U_1^{-1}G_1U_1&0_{m_1\times m_1}&\dots&0_{m_1\times m_1}\\ 0_{m_2\times m_2}& U_2^{-1}G_2U_2&\dots&0_{m_2\times m_2}\\ \vdots&&\ddots&\vdots\\ 0_{m_r\times m_r}&0_{m_r\times m_r}&\dots& U_r^{-1}G_rU_r}\\ &=\pmatrix{J_{m_1}\big(\lambda_1\big)&0_{m_1\times m_1}&\dots&0_{m_1\times m_1}\\ 0_{m_2\times m_2}&J_{m_2}\big(\lambda_2\big)&\dots&0_{m_2\times m_2}\\ \vdots&&\ddots&\vdots\\ 0_{m_r\times m_r}&0_{m_r\times m_r}&\dots&J_{m_r}\big(\lambda_r\big)} \end{align} which is the Jordan normal form of $\ A\ .$ Thus, it is also a Jordan normal form for $\ g\big(C_f\big)\ $ which must therefore be similar to $\ A\ .$

Here's the construction of $\ g\ $ promised above. Define \begin{align} h_1(x)&\ed\sum_{i=1}^r\frac{\lambda_i\prod_\limits{j=1,j\ne i}^r\big(x-\rho_j\big)}{\prod_\limits{j=1,j\ne i}^r\big(\rho_i-\rho_j\big)}\\ h_2(x)&\ed\sum_{i=1}^r\frac{\big(h_1'\big(\rho_i\big)-1\big)\prod_\limits{j=1,j\ne i}^r\big(x-\rho_j\big)}{\left(\prod_\limits{j=1,j\ne i}^r\big(\rho_i-\rho_j\big)\right)^2}\\ h_3(x)&\ed\prod_{j=1}^r\big(x-\rho_j)\ \ \text{ and}\\ g(x)&\ed h_1(x)-h_2(x)h_3(x)\ . \end{align} The first two definitions use the Lagrange interpolation formula to obtain polynomials satisfying \begin{align} h_1\big(\rho_i\big)&=\lambda_i\ \ \ \text{ and}\\ h_2\big(\rho_i\big)&=\frac{h_1'\big(\rho_i\big)-1}{\prod_\limits{j=1,j\ne i}^r\big(\rho_i-\rho_j\big)} \end{align} for $\ 1\le i\le r\ ,$ and $\ h_3\ $ has the properties \begin{align} h_3\big(\rho_i\big)&=0\ \ \ \text{ and}\\ h_3'\big(\rho_i\big)&=\prod_\limits{j=1,j\ne i}^r\big(\rho_i-\rho_j\big) \end{align} for $\ 1\le i\le r\ .$ Therefore \begin{align} g\big(\rho_i\big)&=h_1\big(\rho_i\big)-h_2\big(\rho_i\big)h_3\big(\rho_i\big)\\ &=h_1\big(\rho_i\big)\\ &=\lambda_i\ \ \ \text{ and}\\ g'\big(\rho_i\big)&=h_1'\big(\rho_i\big)-h_2'\big(\rho_i\big)h_3\big(\rho_i\big)-h_2\big(\rho_i\big)h_3'\big(\rho_i\big)\\ &=h_1'\big(\rho_i\big)-h_2\big(\rho_i\big)h_3'\big(\rho_i\big)\\ &=h_1'\big(\rho_i\big)-\left(\frac{h_1'\big(\rho_i\big)-1}{\prod_\limits{j=1,j\ne i}^r\big(\rho_i-\rho_j\big)}\right)\prod_\limits{j=1,j\ne i}^r\big(\rho_i-\rho_j\big)\\ &=1 \end{align} for $\ 1\le i\le r\ ,$ as required.

Here's a proof of observation \eqref{e2}. First, note that if $\ \pi_r(x)=x^r\ $ then \begin{align} \pi_r\big(J_k(\lambda)\big)&=\left(\lambda I_{k\times k}+\sum_{i=1}^{k-1}e_ie_{i+1}^T\right)^r\\ &=\sum_{\ell=0}^r {r\choose\ell}\left(\sum_{i=1}^{k-1}e_ie_{i+1}^T\right)^\ell\big(\lambda I_{k\times k}\big)^{r-\ell}\\ &=\lambda^rI_{k\times k}+\sum_{\ell=1}^{\min(r,k-1)} {r\choose\ell}\lambda^{r-\ell}\sum_{i=1}^{k-\ell}e_ie_{i+\ell}^T\,^\color{red}{\dagger}\\ &=\lambda^rI_{k\times k}+\sum_{i=1}^{k-1} \sum_{\ell=1}^{\min(r,k-i)}{r\choose\ell}\lambda^{r-\ell}e_ie_{i+\ell}^T\\ &=\lambda^rI_{k\times k}+\sum_{i=1}^{k-1} \sum_{\ell=1}^{k-i}\pi_r^{(\ell)}(\lambda)e_ie_{i+\ell}^T\\ &=\pi_r(\lambda) I_{k\times k}+\sum_{i=1}^{k-1} \sum_{j=i+1}^k\pi_r^{(j-i)}(\lambda)e_ie_j^T\ , \end{align} which establishes the observation for $\ p=\pi_r\ .$ For an arbitrary polynomial, $\ p(\lambda)=\sum_\limits{r=0}^dp_r\lambda^r=\sum_\limits{r=0}^dp_r\pi_r(\lambda)\ ,$ we therefore have \begin{align} p(J_k(\lambda)\big)&=\sum_{r=0}^dp_rJ_k(\lambda)^r\\ &=\sum_{r=0}^dp_r\pi_r\big(J_k(\lambda)\big)\\ &=\sum_{r=0}^dp_r\left(\pi_r(\lambda) I_{k\times k}+\sum_{i=1}^{k-1} \sum_{j=i+1}^k\pi_r^{(j-i)}(\lambda)e_ie_j^T\right)\\ &=\left(\sum_{r=0}^dp_r\pi_r(\lambda)\right) I_{k\times k}+\sum_{i=1}^{k-1} \sum_{j=i+1}^k\left(\sum_{r=0}^dp_r\pi_r^{(j-i)}(\lambda)\right)e_ie_j^T\\ &=p(\lambda) I_{k\times k}+\sum_{i=1}^{k-1} \sum_{j=i+1}^kp^{(j-i)}(\lambda)e_ie_j^T\ , \end{align} which establishes the observation for $\ p\ .$

Observation \eqref{e3} follows from the fact that both the characteristic and minimal polynomials of $\ B\ $ are $\ \big(x-b_0\big)^k\ .$ Obviously, $\ \det\big(xI_{k\times k}-B\big)=\big(x-b_0\big)^k\ ,$ and since the minimal polynomial must be a factor of this, it must have the form $\ \big(x-b_0\big)^j\ $ for some $\ j\le k\ .$ But since $\ \big(B-b_0I_{k\times k}\big)^{k-1}=b_1^{k-1}e_1e_k^T\ne0\ ,$ it follows that the minimal polynomial of $\ B\ $ must also be $\ \big(x-b_0\big)^k\ .$

If the characteristic polynomial $\ c_A\ $ of $\ A\ $ doesn't split in $\ A\ ,$ the proof will still obviously work with $\ K\ $ replaced by the splitting field, $\ K_{c_A}\ ,$ of $\ c_A\ ,$ but without some extra work, you'll only get similarity over $\ K_{c_A}\ ,$ not necessarily over $\ K\ ,$ and it's not clear that you can choose both polynomials $\ f\ $ and $\ g\ $ to lie in $\ K[x]\ $ rather than $\ K_{c_A}[x]\ .$ It's not hard to choose $\ f\ $ to be in $\ K[x]\ ,$ but I haven't been able to show that you can do the same with $\ g\ .$ Doing that would be sufficient to prove the general case, because if $\ f,g\in K[x]\ ,$ then the entries in the matrices $\ C_f,g\big(C_f\big)\ $ and the Frobenius normal form of $\ g\big(C_f\big)\ $ would all lie in $\ K\ .$ Since $\ A\ $ and $\ g\big(C_f\big)\ $ (hypothetically) have the same Jordan normal form, then they must also have the same Frobenius normal form, and since they're similar over $\ K\ $ to their Frobenius normal forms, they would also have to be similar to each other over $\ K\ .$

$\,^\color{red}{\dagger}$ Using the identity $$ \left(\sum_{i=1}^{k-1}e_ie_{i+1}^T\right)^\ell=\cases{\sum_\limits{i=1}^{k- \ell}e_ie_{i+\ell}&if $\ \ell\le k-1$\\ 0_{k\times k}&if $\ \ell\ge k$} $$ which is easily established by induction.

  • Thank you for the answer. It looks quite intimidating at first with all of the matrices displayed in full, but you make the derivations look effortless, especially that one for $g$! – J. S. Sep 22 '24 at 15:17
1

The following is incomplete but potentially helpful. I suspect that there is some nice way to generalize this to arbitrary matrices $A$, but the below addresses only the case where $A$ is nilpotent.


Here's an argument that works over $\Bbb C$; I suspect it could be generalized to work over other infinite fields.

Let $A \sim B$ mean "$A$ is similar to $B$". First of all, note that $g(PCP^{-1}) = Pg(C)P^{-1}$, so the following statements are equivalent:

  • Every $A \sim g(C_f)$ for some $f$
  • Every $A = g(M_f)$, where $M_f\sim C_f$ for some $f$
  • Every $A \sim g(M_f)$, where $M_f \sim C_f$ for some $f$

As I note here, a matrix $M$ is similar to a companion matrix if and only if the Jordan form of $M$ has one Jordan block associated with each eigenvalue. I will call matrices that are similar to a companion matrix "non-derogatory".

Now, we begin with the case where $A$ has zero as its only eigenvalue, and assume without loss of generality that $A$ is in Jordan form. Let $J_k(\lambda)$ denote the size-$k$ Jordan block associated with eigenvalue $\lambda$. I will make use of the following facts.

Claim: If $g'(\lambda) \neq 0$, then $g(J_k(\lambda))$ is similar to $J_k(g(\lambda))$.

Corollary: If $A$ is a matrix with zero as its only eigenvalue (i.e. $A$ is nilpotent) and $g'(0) \neq 0$, then $g(A) \sim A + g(0)I$.

We proceed inductively: if $A$ consists of only one Jordan block, then it is similar to a companion matrix, and we are done. Otherwise, it is of the form $$ A = \pmatrix{J_k(0) & 0\\0 & A_0}, $$ where $A_0$ has $0$ as its only eigenvalue. Let $M_0, g_0$ be such that $M_0$ is non-derogatory and $A_0 \sim g_0(M_0)$.

Let $M_1$ be the matrix $$ M_1 = \pmatrix{J_k(\lambda) & 0\\0 & M_0}, $$ where $\lambda$ is non-zero, $g_0(\lambda) \neq 0$, and $g_0'(\lambda) \neq 0$. We have $$ g_0(M_1) = \pmatrix{g_0(J_k(\lambda)) & 0\\0 & g_0(M_0)}, $$ where we note that $g_0(J_k(\lambda)) \sim J_k(g_0(\lambda))$ and $g_0(M_0) \sim A_0$. If $n$ denotes the size of $A_0$, We have $$ (g_0(M_1))^n = \pmatrix{[g_0(J_k(\lambda))]^n & 0\\0 & 0}. $$ Note that $[g_0(J_k(\lambda))]^n \sim J_k([g_0(\lambda)]^n)$, and $[g_0(\lambda)]^n \neq 0$. Now, construct a polynomial $h$ such that $h(0) = 0$, $h([g_0(\lambda)]^n) = 0$, and $h'([g_0(\lambda)]^n) \neq 0$. We have $$ h((g_0(M_1))^n) = \pmatrix{h([g_0(J_k(\lambda))]^n) & 0\\0 & 0}, $$ where we note that $h([g_0(J_k(\lambda))]^n) \sim J_k(h([g_0(\lambda)]^n) = J_k(0)$.

On the other hand, if we denote $p(x) = (x - g_0(x))^k - (-g_0(\lambda))^k$, then we have $$ p(g_0(M_1)) = \pmatrix{- (-g_0(\lambda))^k I & 0\\0 & p(g_0(M))}, $$ where we note (using the Corollary) that $p(g_0(M)) \sim A_0$.

Now, construct a polynomial $q$ such that $q(-(-g_0(\lambda))^k) = q(0) = 0$, but $q'(0) \neq 0$. Conclude that $$ q(p(g_0(M_1))) = \pmatrix{0 & 0\\0 & q(p(g_0(M_0)))}, $$ where we note that $q(p(g_0(M_0))) \sim p(g_0(M_0)) \sim A_0$.

Finally, denote $$ g_1(x) = h((g_0(x))^n) + q(p(g_0(x))). $$ We have $$ g_1(x) = \pmatrix{h([g_0(J_k(\lambda))]^n) & 0\\ 0 & q(p(g_0(M_0)))} \sim \pmatrix{J_k(0) & 0\\0 & A_0} = A. $$

Ben Grossmann
  • 234,171
  • 12
  • 184
  • 355
  • Thank you for the answer! I'll try to read this as soon as possible. – J. S. Aug 23 '24 at 01:19
  • Hi Ben, apologies for the late reply, I ended up on hiatus from mathematics. I noted two minor misprints -- $p(g_0(M_0)) \sim A$ instead of $p(g_0(M)) \sim A$ halfway through and $g_1(M_0)$ instead of $g_1(x)$ on LHS of the last line. I followed everything except the computation of $p(g_0(J_k(\lambda)))$, but perhaps I need a few more minutes. – J. S. Sep 22 '24 at 14:50
  • I think I see: the constant term [coeff. of $I$] in $(g_0(J_k(\lambda)) - g_0(g_0(J_k(\lambda)))$ cancels out. Then, every term in the polynomial expansion has a factor of the $k$-nilpotent matrix $J_k(\lambda)$. Thanks again! – J. S. Sep 22 '24 at 14:57