Is this 1-line proof of Cayley–Hamilton incomplete?

Question

In the comments of Martin Brandenburg's answer to this old MO question Victor Protsak offers the following "1-line proof" of the Cayley–Hamilton theorem. Here $p_A(\lambda)$ is the characteristic polynomial.

Let $X = A - \lambda I_n$, then $p_A(\lambda) I_n = (\det X) I_n = X \operatorname{adj}(X)$ in the $n \times n$ matrix polynomials in $\lambda$, now specialize $\lambda \to A$, get $p_A(A) = 0$.

I think this proof is not quite complete as written and requires at least one more line. The "specialize $\lambda \to A$" step, as written, looks a lot like the standard "tempting but incorrect" proof of Cayley–Hamilton. The issue is that in this calculation we are working with, equivalently, either matrices with polynomial entries $M_n(K[\lambda])$, or polynomials with matrix coefficients $M_n(K)[\lambda]$. Naïvely, "specialize $\lambda \to A$" means applying some kind of evaluation homomorphism $M_n(K)[\lambda] \to M_n(K)$ sending $\lambda$ to $A$. But this is not a homomorphism in general, and in particular is not multiplicative, due to lack of commutativity. So, very explicitly, if $f(\lambda) = \sum F_i \lambda^i \in M_n(K)[\lambda]$ and $g(\lambda) = \sum G_i \lambda^i \in M_n(K)[\lambda]$ are two matrix polynomials, and we interpret "specialize $\lambda \to A$" to mean $f(A) = \sum F_i A^i \in M_n(K)$ and $g(A) = \sum G_i A^i \in M_n(K)$, then $f(A) g(A) \neq fg(A)$ in general, where $fg$ refers to the product of matrix polynomials (which involves treating $\lambda$ as central).

A different way to interpret this specialization is to instead consider the (commutative) subalgebra $K[A] \subset M_n(K)$ generated by $A$, and interpret the specialization as applying the evaluation homomorphism $K[\lambda] \to K[A]$ to a matrix with polynomial coefficients to get a matrix in $M_n(K[A])$. This specialization is a homomorphism, but it doesn't send $X$ to $0$! This is clarified if we write $M_n(K[\lambda]) \cong M_n(K)[\lambda]$ explicitly as a tensor product $M_n(K) \otimes K[\lambda]$, in which case

$$X(\lambda) = A \otimes 1 - I_n \otimes \lambda \in M_n(K) \otimes K[\lambda]$$

is getting specialized to

$$X(A) = A \otimes 1 - I_n \otimes A \in M_n(K) \otimes K[A].$$

Q1: Am I correct that this proof is incomplete or at least ambiguous as written?

Wikipedia appears to explain a way to complete this proof, which I'd describe as follows:

The point is that we actually do have an evaluation homomorphism for the matrices which appear in this argument. Because $X = A - \lambda I_n$ commutes with its adjugate $\operatorname{adj}(X) = \text{adj}(A - \lambda I_n)$, $A$ commutes with all the coefficients of the matrix polynomial $\operatorname{adj}(X)$ when expanded out in powers of $\lambda$. That means this computation isn't happening in the full $M_n$ but in the smaller centralizer $Z_{M_n}(A) \subset M_n(K)$. So, we can interpret the identity $p_A(\lambda) I_n = X \operatorname{adj}(X)$ as an identity in $Z_{M_n}(A)[\lambda]$, and now we really do have an evaluation homomorphism

$$Z_{M_n}(A)[\lambda] \ni f(\lambda) \mapsto f(A) \in Z_{M_n}(A)$$

because $A$ commutes with all the coefficients of the matrix polynomials involved. Applying this evaluation homomorphism gives us an identity

$$p_A(A) = (A - A) \operatorname{adj}(X(A)) = 0 \in Z_{M_n}(A)$$

as desired (the notation $\operatorname{adj}(X(A))$ is a little unfortunate but I couldn't think of anything better; this means taking the matrix polynomial $\operatorname{adj}(X) \in Z_{M_n}(K)[\lambda]$, then evaluating it at $A$). Note that the identity matrix on the LHS has disappeared; we evaluated the matrix polynomial $p_A(\lambda) I_n \in Z_{M_n}(A)[\lambda]$ at $\lambda = A$ and we get the ordinary product $p_A(A) I_n = p_A(A) \in Z_{M_n}(A)$, rather than the tensor product above. Similarly this is why the identity matrix in $A - \lambda I_n$ has disappeared.

Q2: Is this a correct completion of Victor Protsak's argument or have I misunderstood something? Have I overcomplicated the situation or is it really necessary to say all this?

To be clear, I think this completed proof is quite a nice proof of Cayley–Hamilton, probably my new favorite. It also seems to me unusually confusing and fraught with both notational and conceptual issues (I gather that I'm not alone in this based on the comments in that MO discussion) so I want to be sure I've understood what's going on carefully, and in particular I want to be clear on exactly where each of the expressions in the proof live.

Why is $\operatorname{adj}(X(A))$ more natural than $\operatorname{adj}(X)(A)$, if you're specialising $\operatorname{adj}(X) \in Z_{M_n}(K)[\lambda]$ at $\lambda = A$? — LSpice, Jul 28 '24 at 00:05
@LSpice: I guess that also works, but as written it could also look like a product rather than an evaluation. Perhaps we need to stop writing so many operations as concatenation... — Qiaochu Yuan, Jul 28 '24 at 00:07
That is pretty unambiguous but I don't like how it looks in this context! We are proving a very nice identity here. — Qiaochu Yuan, Jul 28 '24 at 00:47
I don't know if this answers your question, but I recently wrote down the details of a proof of Cayley-Hamilton that seems similar to the one you talk about here. — Joe, Aug 01 '24 at 12:44

user1551 · Accepted Answer · 2024-07-28T20:21:27.573

This one-line proof was well-known to authors of algebra or linear algebra texts around the mid-1950s. See Modern Higher Algebra by Abraham Adrian Albert (1937), Vectors and Matrices by Cyrus Colton MacDuffee (1943) or The Theory of Matrices by Felix Gantmacher (1959) for instances. Its subtleties, however, are sometimes not well explained. MacDuffee’s book, in my opinion, offers the clearest explanation that is accessible to beginners.

Let $M$ be a (possibly non-commutative) ring with unity. Given $h(x)=\sum_{k=0}^n c_kx^k\in M[x]$ and $\alpha\in M$, the values of $\sum_{k=0}^n c_k\alpha^k$ and $\sum_{k=0}^n \alpha^kc_k$ are in general different. Therefore, care must be taken when we speak of “evaluation of $h$ at $\alpha$”. Let us write \begin{align*} h(\alpha \text{ from the left})&=\sum_{k=0}^n \alpha^kc_k,\\ h(\alpha \text{ from the right})&=\sum_{k=0}^n c_k\alpha^k.\\ \end{align*}

Now suppose $f,g,h\in M[x]$ are such that $fg=h$. When $M$ is not commutative, given an arbitrary element $\alpha\in M$, we in general do not have $f(\alpha)g(\alpha)=h(\alpha)$. That is, polynomial factorisation does not survive evaluation. However, we do have the following one-sided version of the factor theorem: \begin{cases} f(\alpha \text{ from the left})=0 \implies h(\alpha \text{ from the left})=0;\\ g(\alpha \text{ from the right})=0 \implies h(\alpha \text{ from the right})=0.\\ \end{cases} The one-line proof of Cayley-Hamilton is just a direct consequence of this.

More specifically, we begin with $$ \operatorname{adj}(xI-A)(xI-A)=\det(xI-A)I,\tag{1} $$ where $A$ is an $n\times n$ matrix over a commutative ring $R$. Here $\operatorname{adj}(xI-A),\,xI-A$ and $\det(xI-A)I$ are members of $M_n(R[x])$, i.e., they are matrices with polynomial coefficients. Now, every matrix with polynomial coefficients can be identified with a polynomial with matrix coefficients — i.e., $M_n(R[x])$ can be identified with $M_n(R)[x]$ — in an obvious way. If we denote the $(i,j)$-th entry of $\operatorname{adj}(xI-A)$ by $\sum_{k=0}^n f_k^{(ij)}x^k$, then $\operatorname{adj}(xI-A)$ can be identified with the polynomial $f(x)=\sum_{k=0}^n F_kx^k$ with matrix coefficients $F_0,F_1,\ldots,F_n$ where $F_k=\big(f_k^{(ij)}\big)\in M_n(R)$. Likewise, if we write $\det(xI-A)=\sum_{k=0}^nc_kx^k$, then $\det(xI-A)I$ can be identified with the polynomial $h(x)= \sum_{k=0}^n(c_k\color{red}{I})x^k$ (with matrix coefficients $c_0I,\,c_1I,\,\ldots,c_nI$). The identity $(1)$ now becomes $f(x)g(x)=h(x)$, where $g(x)=Ix-A$ .

Clearly, we have $g(A \text{ from the right})=IA-A=0$. Therefore, by the factor theorem (with $M=M_n(R)$), we also have $h(A \text{ from the right})=0$, and this is Cayley-Hamilton theorerm. (Since the coefficients of $h$ are scalar multiples of the identity matrix, actually $h(A \text{ from the right})=h(A \text{ from the left})$ and we may simply write $h(A)=0$.)

Note that in MacDuffee’s rendering of the proof, the specialisation of $x$ to $A$ in $f,g$ or $h$ is completely unambiguous. We always means substitution from the right (although this convention is unimportant for $h$). It does not depend on the existence of any evaluation homomorphism! The evaluation map is plain and simple ‘substitution’ in the naive sense. His argument also does not depend on the commutativity of $A$ with $\operatorname{adj}(xI-A)$. In contrast, this commutativity is used in Linear Algebra by Ichiro Satake (1975) (see q4343699 for a copy of his proof) to implicitly confine the factorisation of $h$ over a subring that centralises $A$, so that the factorisation survives evaluation at $A$ and the usual form of the factor theorem can be applied. Whether MacDuffee’s approach is better than Satake’s is a matter of taste. I like MacDuffee’s approach more because I think it has almost no room for confusion.

At any rate, the merit of this proof is obvious. It offers a very simple reason why the theorem holds — it is all because $Ix-A$ is a factor (from the left or the right) of $h$.

Thanks, this is helpful! Those one-sided factor theorems are very mysterious to me although I agree that given them the proof of CH is totally clear; are they obvious? Should I just multiply everything out? — Qiaochu Yuan, Jul 28 '24 at 19:30
@QiaochuYuan Yes. Just multiply everything out. For substitution from the right, when $f(x)=ax^j$ and $g(x)=bx^k$, we have $h(x)=f(x)g(x)=abx^{j+k}$ and hence $h(\alpha)=ab\alpha^{j+k}=ag(\alpha)\alpha^j$. More generally, if $f(x)=\sum_ja_jx^j$ and $g(x)=\sum_kb_kx^k$, then $h(\alpha)=\sum_ja_jg(\alpha)\alpha^j$. Therefore $g(\alpha)=0$ implies $h(\alpha)=0$. — user1551, Jul 28 '24 at 19:45
Whoops. Well, that's what I get for refusing to actually do the multiplication. Thanks for clarifying, this is great stuff. — Qiaochu Yuan, Jul 28 '24 at 20:03

score 13 · Answer 2 · edited Jul 28 '24 at 00:00

I see that I just missed a comment in the MO discussion where Victor clarifies:

Let $S$ be the commutant of $A$ in $M_n$, then $S[\lambda] \subset M_n[\lambda]$ contains $X = A - \lambda I_n$ and $\operatorname{adj}(X)$ and spec'n is a unique ring hom $\phi : S[\lambda] \to M_n$ that is identity on $S$ and $\phi(\lambda) = A$.

Sorry about that, Victor! So it looks like my understanding of the situation is accurate, and one has to note the commutativity of $A$ with $\operatorname{adj}(X)$ to finish.

I'd like to leave this question up as a searchable reference regarding this issue; hope that's fine. I'm also interested in this other claim Victor makes regarding Zariski density but I think that should be a separate question (edit: I've asked about this here).

Other points of view and further clarifications are also still welcome!

Since you mentioned you like this type of proof, you might also take a look at the latter part of my video here which uses similar ideas. The book by McDonald listed in the references is also good. — blargoner, Jul 27 '24 at 23:11

score 3 · Answer 3 · answered Jul 28 '24 at 01:46

Let me try this:

If $X{\rm adj} X=(\det X)1\!\!1$ is valid for each square matrix then for $X=t1\!\!1-A$ we have $$(t 1\!\!1-A){\rm adj}(t1\!\!1-A)=\det (t1\!\!1-A)1\!\!1.$$ Being ${\rm adj}(t1\!\!1-A)$ a degree matrix $\le n-1$ we can set $${\rm adj}(t1\!\!1-A)=B_0+B_1t+B_2t^2+...+B_{n-1}t^{n-1}$$ for some square matrices $B_i$. Then we get \begin{eqnarray*} (t1\!\!1-A)(B_0+B_1t+B_2t^2+...+B_{n-1}t^{n-1}) &=&(a_0+a_1t+...+a_nt^n)1\!\!1,\\ \\ \mbox{or equivalently}&&\\ \\ -AB_0+ %(B_0-AB_1)t+ ...+(B_{n-2}-AB_{n-1})t^{n-1}+B_{n-1}t^n&=&a_01\!\!1+...+a_nt^n1\!\!1 \end{eqnarray*} That implies \begin{eqnarray*} a_01\!\!1&=&-AB_0,\\ a_11\!\!1&=&B_0-AB_1,\\ a_21\!\!1&=&B_1-AB_2,\\ &\vdots& \\ a_{n-1}1\!\!1&=&B_{n-2}-AB_{n-1},\\ a_n1\!\!1 &=&B_{n-1}. \end{eqnarray*} We are going to multiply by $1\!\!1$ the 1st, by $A$ the 2nd, by $A^2$ the 3rd and so on to get \begin{eqnarray*} a_01\!\!1&=&-AB_0,\\ a_1A&=&AB_0-A^2B_1,\\ a_1A^2&=&A^2B_1-A^3B_2,\\ &\vdots& \\ a_{n-1}A^{n-1}&=&A^{n-1}B_{n-2}-A^nB_{n-1},\\ a_nA^n&=&A^nB_{n-1}. \end{eqnarray*} If we sum both columns we see: $$a_01\!\!1+a_1A+a_2A^2+...+a_nA^n={\bf 0}$$ $\Box$

I guess this works but it strikes me as a little inelegant. The result comes out of a cancellation that one just has to verify. In Victor's argument something very intuitive is happening but the question is how to justify it. — Qiaochu Yuan, Jul 28 '24 at 03:30
It is a nice exercise to see how this proof corresponds to @user1551's, with the proof of the factor theorem (or rather its special case with $\deg h = 1$) inlined and all abstract algebra removed. — darij grinberg, Jul 28 '24 at 14:10

krm2233 · Answer 4 · 2024-08-11T01:18:05.123

The fact that, for a commutative ring $R$, the matrix ring $\mathrm{Mat}_n(R)$ is noncommutative seems to me to be a red herring here (at least as far as it relates to the question of why the Cayley-Hamilton theorem is useful). If $\mathsf k$ is a field, let $R = \mathsf k[t]$ be the ring of polynomials in 1-variable with coefficients in $\mathsf k$. If $V$ is an $n$-dimensional $\mathsf k$-vector space and $\alpha \colon V \to V$ is a linear map, then as usual, we can give $(V,\alpha)$ the structure of a $R$-module via $f(t)(v) = f(\alpha)(v)$ for $f \in \mathsf k[t]$, $v \in V$.

The characteristic polynomial $\chi_{\alpha}(t) = \det(t1_V-\alpha)$ for $\alpha$ arises naturally from considering the question of finding eigenvalues of $\alpha$, but the map "$t-\alpha$", properly understood, is itself completely natural also:

Let $F=R\otimes_\mathsf k V$ and let $b\colon F \to V$ be the map given by $f(t)\otimes v \mapsto f(\alpha)(v)$. Now the action of $\alpha$ on $V$ clearly extends to give an $R$-module homomorphism $\alpha_R:=1_R\otimes \alpha$, it is not hard to show that $\ker(b)= \text{im}(a)$ where $a = t\otimes 1_V - \alpha_R$ so that one obtains a short exact sequence: $$ 0 \to F {\to^a}\hspace{2mm} F \to^b V \to 0 $$ Equivalently, $a=t-\alpha_R\colon F \to F$ is a free resolution of $V$. Hilbert's syzygy theorem of course tells you that such a resolution exists, but it is still seems useful to see that in the 1-dimensional case it be made explicit, and that it gives some sort of meaning to the object $t-\alpha$.

Now as $\mathrm{adj}(a).a = \det(a).1_F$, hence we must have $\det(a)=0$ on $F/a(F)\cong V$, giving the Cayley-Hamilton theorem. (Thus one does not need the fact that $\mathrm{adj}(a)$ commutes with $a$.)

Alternatively, since $R$ is a PID, the map $a$ has a Smith normal form, that is, there are bases $\{v_1,\ldots,v_n\}$ and $\{w_1,\ldots,w_n\}$ of $F$ such that $a(v_i) = \alpha_i w_i$ where $\alpha_1\mid \alpha_2\mid\ldots\mid \alpha_n$. In particular, $\wedge^n a = \det(a) = \prod_{i=1}^n \alpha_i$ and hence clearly $\det(a)=0$ on $F/a(F)\cong V $, again proving Cayley-Hamilton.

The argument trivially extends to show that if $S$ is a ring and $R=S[a]$ then if there exists a faithful $R$-module $M$ which is finitely-generated as an $S$-module, then $a$ is integral over $S$, i.e. there is a monic polynomial $m(t) \in S[t]$ such that $m(x)M=0$.

I like the fact that the presentation really just comes from the fact that $R\otimes_R V=V$ is obtained from $R\otimes_{\mathsf k}V$ by quotienting by $t\otimes 1 - 1\otimes \alpha$, i.e. the action of $t$ on the $R$-module $V$. I think a lecturer I had once made an off-hand comment about Cayley-Hamilton being the simplest example of a syzygy. I had no idea what he meant at the time, but this is the best I can think of. — krm2233, Aug 07 '24 at 22:53

score 0 · Answer 5 · answered Mar 15 '25 at 18:43

The currently accepted answer is almost perfect except it is not written coordinate-freely. I will do that now.

Let $R$ be a commutative unital ring, $M\cong R^n,\phi \in \DeclareMathOperator{End}{\mathrm{End}}\End_RM$. We want to show $\chi_\phi(\phi)\equiv0$.

Note that $(\End_RM)[x]=(\End_RM) \otimes_\mathbb{Z}\mathbb{Z}[x]$ is a polynomial ring over a (maybe non-commutative) ring, and we have a ring isomorphism $(\End_RM)[x]\cong \End_{R[x]}(M\otimes_R R[x]),\sum_i\phi_ix^i \mapsto \sum_i\phi_i\otimes x^i $ (it holds more generally when $M$ is finitely generated), both of them contain $R[x]$ as a subring.

We have $\det:\End_{R[x]}(M\otimes_R R[x])\to R[x], x-\phi =\mathrm{id}\otimes x-\phi\otimes \mathrm{id}\mapsto \det(x-\phi)=:\chi_\phi(x)$.

Viewing $\chi_\phi(x)\in \End_{R[x]}(M\otimes_R R[x])$, we have $\chi_\phi(x)=\mathrm{adj}(x-\phi)\cdot (x-\phi )$. Now view this factorization in $(\End_RM)[x]$ and substitute $\phi$ (evaluated from right), we see $\chi_\phi(\phi)\equiv 0$ by the right factor theorem.

Is this 1-line proof of Cayley–Hamilton incomplete?

5 Answers5

Linked