17

I see many proofs for the Cayley-Hamilton Theorem in textbooks and net, so I want to know how many proofs are there for this important and applicable theorem?

  • 7
    Avoid demanding that answers have a certain form. If you don't allow for references or links, you'll miss out on other proofs. Also, there is no reason for demanding that an answer contain only one proof. What's the goal of this? – Pedro Apr 23 '16 at 14:57
  • 7
    That seems like an arbitrary thing to ask for. – Pedro Apr 23 '16 at 15:05
  • 4
    At any rate, do not expect people to comply with this demand. – Pedro Apr 23 '16 at 15:08
  • 3
    @PedroTamaroff you can delete your comments. –  May 04 '16 at 22:16
  • 3
    Also, I would avoid asking moderators do delete their comments :p – fonini May 04 '16 at 23:10
  • @NNN They are useful for future reference. – Pedro May 05 '16 at 00:42
  • 1
    In case someone wants more proofs, the linked wikipedia page itself has a number of proofs. e.g. the shortest one there is simply by checking the result for any matrix in Jordan normal form. – Calvin Khor Nov 23 '18 at 12:13
  • https://arxiv.org/pdf/2105.09285.pdf – Alexey Mar 26 '23 at 22:08

10 Answers10

24

My favorite : let $k$ be your ground field, and let $A = k[X_{ij}]_{1\leqslant i,j\leqslant n}$ be the ring of polynomials in $n^2$ indeterminates over $k$, and $K = Frac(A)$.

Then put $M = (X_{ij})_{ij}\in M_n(A)$ the "generic matrix".

For any $N=(a_{ij})_{ij}\in M_n(k)$, there is a unique $k$-algebra morphism $\varphi_N:A\to k$ defined by $\varphi(X_{ij}) = a_{ij}$ that satisfies $\varphi(M)=N$.

Then the characteristic polynomial of $M$ is separable (ie $M$ has $n$ distinct eingenvalues in an algebraic closure $\widetilde{K}$ of $K$). Indeed, otherwise its resultant $Res(\chi_M)$ is zero, so for any $N\in M_n(k)$, $Res(\chi_N) = Res(\chi_{\varphi_N(M)})= \varphi_N(Res(\chi_M)) = 0$, so no matrix $N\in M_n(k)$ would have distinct eigenvalues (but obviously some do, just take a diagonal matrix).

It's easy to show that matrices with separable characteristic polynomial satisfy Cayley-Hamilton (because they are diagonalizable in an algebraic closure), so $M$ satisfies Cayley-Hamilton.

Now for any $N\in M_n(k)$, $\chi_N(N) = \varphi_N(\chi_M(M)) = \varphi_N(0) = 0$.

Captain Lama
  • 27,658
  • What's a $k$-algebra morphism? Can you elaborate on that step? – littleO Apr 23 '16 at 23:28
  • 1
    @littleO A morphism that preserves operations : it is linear (where $A$ and $k$ are considered as vector spaces on $k$), and preserves multiplication and unit (in other words, that is a unital ring homomorphism between $A$ and $k$ seen as rings) – yago Aug 05 '16 at 13:42
  • for future visitors: in case anyone doesn't know about the resultant, this post might be useful/of interest: https://math.stackexchange.com/questions/3597480/understanding-resultant. – D.R. Mar 16 '22 at 23:22
  • on the other hand, is there a proof of this without referring to the resultant? – D.R. Mar 16 '22 at 23:23
  • 2
    Resultant is the wrong word. It should be the discriminant, although this is indeed the resultant of the polynomial with its derivative. – Abdelmalek Abdesselam Mar 08 '23 at 17:43
23

Here is a neat proof from Qiaochu Yuan's answer to this question:

If $L$ is diagonalizable with eigenvalues $\lambda_1, \dots \lambda_n$, then it's clear that $(L - \lambda_1) \dots (L - \lambda_n) = 0$, which is the Cayley-Hamilton theorem for $L$. But the Cayley-Hamilton theorem is a "continuous" fact: for an $n \times n$ matrix it asserts that $n^2$ polynomial functions of the $n^2$ entries of $L$ vanish. And the diagonalizable matrices are dense (over $\mathbb{C}$). Hence we get Cayley-Hamilton in general.

  • 10
    This proof also works over any integral domain by going up to the algebraic closure of the field of fractions and using that the diagonalizable matrices are dense in the Zariski topology (which is regular which is sufficient to emulate the Hausdorffness to make morphisms uniquely determined by their value on a dense subset) – Tobias Kildetoft Apr 23 '16 at 18:54
  • 3
    @Tobias Kildetoft In fact the argument works over any ring, as one can reduce to proving this in the universal case (as an identity in the coordinate ring of M_n over Spec(ZZ), an algebra which embeds into C). – Sempliner Feb 11 '21 at 03:15
  • Some discussion of this proof here: https://mathoverflow.net/questions/475892/is-the-zariski-density-proof-of-cayley-hamilton-circular – Qiaochu Yuan Jul 28 '24 at 20:15
15

One can prove this theorem by use of the fact that the matrix representation of all linear map on a complex vector space, is Triangularisable with respect to a basis $\{v_1,...,v_n\}$.

So if $T$ be a linear map there are $\{\lambda_1,...,\lambda_n\}$ s.t
$$T(v_1)=\lambda_1 v_1 $$ $$T(v_2)=a_{11} v_1+\lambda_2 v_2 $$ $$.$$ $$.$$ $$.$$ $$T(v_n)=a_{n1}v_1+a_{n2}v_2+...+\lambda_n v_n $$

And by computation we can find that the matrix $S=(T-\lambda_1)(T-\lambda_2)...(T-\lambda_n)$ vanishes all $v_i$, and so $S\equiv 0$.
For more details you can see Herstein's Topics in Algebra.

Sh.R
  • 181
14

Here is a proof which doesn't assume that our field is algebraically closed (or involve passing to a field extension). For any square matrix $M$, let $c_M(t)$ be the characteristic polynomial of $M$. What we want to show is that $c_M(M)=0$.

Lemma 1: Let $A$ be an $n \times n$ matrix and suppose that there is a vector $v$ such that $v$, $Av$, $A^2 v$, ..., $A^{n-1} v$ is a basis of $k^n$. Then $c_A(A)=0$.

Proof: Since $v$, $Av$, $A^2 v$, ..., $A^{n-1} v$ is a basis, we have some linear relationship $A^n v + \sum_{j=0}^{n-1} f_j A^j v=0$. We set $f_n=1$ so we can write $$\sum_{j=0}^n f_j A^j v = 0 \quad (\ast).$$ Then, in the basis $v$, $Av$, $A^2 v$, ..., $A^{n-1} v$, the linear operator $A$ has matrix $$ \begin{bmatrix} 0&0&0&\cdots&0&-f_n \\ 1&0&0&\cdots&0&-f_{n-1} \\ 0&1&0&\cdots&0&-f_{n-2} \\ 0&0&1&\cdots&0&-f_{n-3} \\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots \\ 0&0&0&\cdots&1&-f_1 \\ \end{bmatrix}. $$ Using this basis, we compute that $c_A(t) = \sum f_j t^j$. Now, multipltying $(\ast)$ by $A^k$ on the left, we deduce that $$A^k \left( \sum f_j A^j \right) v = \left( \sum f_j A^{k+j} \right) v = \left( \sum f_j A^j \right) A^k v = c_A(A) A^k v = 0.$$ Thus, $c(A)$ kills each of the basis elements $v$, $Av$, $A^2 v$, ..., $A^{n-1} v$, and we deduce that $c_A(A)=0$. $\square$

Lemma 2: Suppose that $A$ is a matrix with block form $\left[ \begin{smallmatrix} X&Y \\ 0&Z \end{smallmatrix} \right]$ and that $c_X(X)=0$ and $c_Z(Z)=0$. Then $c_A(A)=0$.

Proof: The determinant of a block matrix is the product of the determinants of the diagonal blocks, so $c_A(t) = c_X(t) c_Z(t)$, so we have $$c_A(A) = c_X(A) c_Z(A) = \begin{bmatrix} c_X(X) & \ast \\ 0 & c_X(Z) \end{bmatrix} \begin{bmatrix} c_Z(X) & \ast \\ 0 & c_Z(Z) \end{bmatrix}$$ $$=\begin{bmatrix} 0 & \ast \\ 0 & c_X(Z) \end{bmatrix} \begin{bmatrix} c_Z(X) & \ast \\ 0 & 0 \end{bmatrix} = \begin{bmatrix} 0&0 \\ 0&0 \\ \end{bmatrix}. \qquad \square$$

Proof of the Cayley-Hamilton theorem: We induct on $\dim V$; if $\dim V=0$, the result is vacuously true.

Now, suppose $\dim V=n>0$ and choose a nonzero $v \in V$. Find the minimal $r$ such that there is a linear relation between $v$, $Av$, $A^2 v$, ..., $A^{r-1} v$, $A^r v$. Since $v \neq 0$, we have $r \geq 1$. If $r=n$, we are done by Lemma 1.

If not, complete $v$, $Av$, $A^2 v$, ..., $A^{r-1} v$ to a basis $v$, $Av$, $A^2 v$, ..., $A^{r-1} v$, $w_{r+1}$, $w_{r+2}$, $\dots$, $w_n$ of $k^n$. In this basis, $A$ has block form $\left[ \begin{smallmatrix} X&Y \\ 0&Z \end{smallmatrix} \right]$, with the $X$-block being $r \times r$ and the $Z$-block being $(n-r) \times (n-r)$. By induction, we have $c_X(X) = 0$ and $c_Z(Z)=0$, and we conclude by Lemma 2. $\square$

  • I am currently researching proofs of CH in preparation for a course next term. I'll add ones I like to this question, if they aren't already here. – David E Speyer Dec 13 '21 at 17:13
  • Here is a really slick complex analysis approach: https://math.stackexchange.com/questions/20677/cauchys-integral-formula-for-cayley-hamilton-theorem – David E Speyer Dec 13 '21 at 17:14
  • While this proof is accessible to beginners, I like its generic version more, as that version makes evident the point that the reason why Cayley-Hamilton theorem holds is because the characteristic polynomial of a generic matrix is the minimal polynomial. – user1551 Dec 13 '21 at 20:21
  • @DavidESpeyer Shouldn't the A matrix you wrote w.r.t those bases be a square matrix? I might seem stupid but I think you made the matrix n×(n+1). – Asmit Karmakar Mar 20 '25 at 03:19
  • 1
    @AsmitKarmakar You're correct. See if I have fixed it now. – David E Speyer Mar 20 '25 at 03:56
6

A few years ago I gave an "ugly" and long proof (more than 4 pages), which is purely computational.

Basically, I took an $n\times n$ matrix $A=(a_{i,j})_{i,j}$, where $a_{i,j}$ are variables. Then I wrote the characteristic polynomial as $P_A(X)=c_nX^n+\cdots +c_0$, where each $c_k$ is an explicit polynomial in the variables $a_{i,j}$. Then I wrote explicitly the $n^2$ entries of each $A^k$ with $0\leq k\leq n$ as polynomials in the variables $a_{i,j}$. Finally, I proved that each of the $n^2$ entries of $P_A(A)=c_nA^n+\cdots +c_0I_n$ is $0$. It's just playing with sums of monomials and proving that all of them reduce in the end.

You can find the proof in the 3-4/2014 issue of Gazeta Matematica, Seria A, pages 32-36. Available online at this address: https://ssmr.ro/gazeta/gma/2014/gma3-4-2014-continut.pdf

5

A more general version of the Cayley-Hamilton Theorem is given in Eisenbud's Commutative Algebra with a View Toward Algebraic Geometry (1st Edition, Theorem 4.3, p.120). I give a slightly expanded version of the proof here.

Theorem. Suppose $A$ is a commutative ring and $M$ is a finitely generated $A$-module. If $\varphi:M\to M$ is an endomorphism, then there is a monic polynomial $p\in A[t]$ such that $p(\varphi)$ is the zero endomorphism of $M$.

Proof. We first endow $M$ with an $A[t]$-module structure by extending the scalar multiplication $A\times M\to M$ to a map $A[t]\times M\to M$. We let $t$ "act as $\varphi$", so that $$ (a_0+a_1t+\dots +a_nt^n)m=a_0m+a_1\varphi(m)+a_2\varphi^2(m)+\dots+a_n\varphi^n(m) \, , $$ where $\varphi^i$ denotes the $i$-fold composition of $\varphi$. (We can think of the scalar multiplication $A[t]\times M\to M$ as being induced by the ring homomorphism $A[t]\to\operatorname{End}_{\mathsf{Ab}}(M), p\mapsto p(\varphi)$.)

Now suppose $x_1,\dots,x_n$ generate $M$ as an $A$-module. The action of $A[t]$ on $M$ gives rise to an action of $\mathcal M_n(A[t])$ on $M^n$; explicitly, we define $$ \begin{pmatrix} p_{11} & \dots & p_{1n} \\ \vdots & & \vdots \\ p_{1n} & \dots & p_{nn} \end{pmatrix} \begin{pmatrix} m_1 \\ \vdots \\ m_n \end{pmatrix} = \begin{pmatrix} p_{11}m_1+\dots+p_{1n}m_n \\ \vdots \\ p_{n1}m_1+\dots+p_{nn}m_n \end{pmatrix} \, . $$ We can write $\varphi(x_i)$ as a linear combination of the $x_j$, say $\varphi(x_i)=\sum_jb_{ij}x_j$ with $b_{ij}\in A$. In terms of the action of $\mathcal M_n(A[t])$ on $M^n$, these equations tell us that $$ (tI-B)x=0 \, , $$ where $B=(b_{ij})$ and $x=(x_1,\dots,x_n)$. We claim that $p=\det(tI-B)\in A[t]$ is the desired polynomial. By inspecting the Leibniz formula for the determinant, we see that $p$ is monic; it remains to be shown that $p(\varphi)$ is the zero endomorphism.

Recall that if $C$ is a matrix with entries in a commutative ring, then the adjugate $\DeclareMathOperator{\adj}{adj}\adj(C)$ satisfies $\adj(C)C=\det(C)I$. If we multiply the equation $(tI-B)x=0$ on the left by the adjugate of $tI-B$, then the equation becomes $[\det(tI-B)I]x=0$, hence $\det(tI-B)x_i=0$ for each $i$. It follows that $p(\varphi)$ is identically zero, completing the proof.

Remark. In the case where $M=A^n$ for some $n\in\mathbb N$, and the $x_i$ are the canonical basis vectors $e_i$, the matrix $B$ constructed in the above proof is simply the matrix representing $\varphi$ with respect to the canonical basis of $A^n$, and $p$ is the characteristic polynomial of $B$. Since $p(B)$ represents $p(\varphi)$, and $p(\varphi)$ is the zero operator, it follows that $p(B)$ is the zero matrix, from which we recover the "classical" Cayley-Hamilton Theorem.

Joe
  • 22,603
4

The Cayley-Hamilton Theorem says enter image description here where the dark box represents an antisymmetrizer of size $n+1$.

The proof of the theorem is a trivial application of the pigeonhole principle. What takes a bit more time and effort is to see that this is indeed the familiar Cayley-Hamilton Theorem. For more details, see Section 6.5 of the book by P. Cvitanović, Group theory. Birdtracks, Lie's, and exceptional groups. Princeton University Press, Princeton, NJ, 2008. In Lectures 4 and 5 of my course https://mabdesselam.github.io/MATH8450Spring2023.html you can see the $n=2$ case worked out.

This point of view was used by V. Lafforgue in his recent work https://arxiv.org/abs/1209.5352

  • I took a look at Cvitanovic's book and wow, this proof is wild, it's totally unfamiliar to me. It gives the characteristic polynomial not in terms of the determinant but in terms of traces of the exterior powers. I'd love to understand what's going on here... – Qiaochu Yuan Jul 28 '24 at 19:53
  • For those of us who do not have access to the book, it would be nice if you would write a — shall we say — more detailed and enlightening answer. – Ted Shifrin Jul 28 '24 at 20:15
  • 3
    @Ted: Cvitanović's book is freely available online here: https://birdtracks.eu/ The proof of CH is in 6.5. It depends strongly on computations using his "birdtrack" notation, I have no idea how one would render it conventionally. – Qiaochu Yuan Jul 28 '24 at 20:18
  • 2
    To at least indicate what is being computed: let $V$ be an $n$-dimensional vector space and $M : V \to V$ a linear map. Let $A_k : V^{\otimes k} \to V^{\otimes k}$ be the antisymmetrization, normalized so that $A_k^2 = A_k$. Then the coefficients of the characteristic polynomial are given by the traces of $A_k M^{\otimes k}$. This is so far standard. Cvitanović claims that if you compute $A_{n+1} M^{\otimes (n+1)}$, then on the one hand this is clearly zero (because the $(n+1)$-th exterior power is zero), and on the other hand if you compute the partial trace of this operator over... – Qiaochu Yuan Jul 28 '24 at 20:22
  • 2
    ...every copy of $V$ except one (that's what the diagram is describing), you get an identity in $\text{End}(V)$ which exactly reproduces CH. This partial trace is what I have no idea how to compute conventionally; in Cvitanović's birdtrack notation it's done by using various lemmas to do the calculation inductively on the number of factors being antisymmetrized over. It's quite nice! – Qiaochu Yuan Jul 28 '24 at 20:24
  • @Qiaochu Thanks for the additional details. I guess there is magical representation theory behind the birdtracks. – Ted Shifrin Jul 28 '24 at 21:01
  • 1
    @QiaochuYuan: by the way this "exercise" is taken from a course on QFT I taught https://mabdesselam.github.io/MATH8450Spring2023.html see in particular Lectures 4 and 5. – Abdelmalek Abdesselam Jul 28 '24 at 21:04
3

For the sake of completeness the $1$-line proof should be mentioned here. The proof uses the adjugate identity

$$\det(tI - A) = (tI - A) \text{adj}(tI - A)$$

and then "just plugs in $t = A$"! See the link for a few different ways to justify this; one has to say some things about polynomials with matrix coefficients.

I really think this proof deserves to be better known; it is both more elementary than other proofs (you don't have to know what an eigenvalue or eigenvector is, you don't have to pass to an algebraic extension of the ground field, you don't have to use a density argument to reduce to the diagonalizable case) and proves a stronger result (this works over any commutative ring). It really clarifies that Cayley-Hamilton is "pure algebra" in some sense.

Qiaochu Yuan
  • 468,795
  • This is the proof I was shown first year in my undergraduate studies many moons ago. – Ted Shifrin Jul 28 '24 at 20:16
  • @Ted: wow, really? I don't remember if I saw a proof of CH as an undergraduate at all. The first proof I felt like I understood was the Zariski density argument. – Qiaochu Yuan Jul 28 '24 at 20:34
  • Yeah, I saw that in a linear algebra and differential equations course I took first year at MIT from a sublime applied mathematician, Louis Howard. I saw the continuity proof in Mike Artin's algebra course the next year. – Ted Shifrin Jul 28 '24 at 21:00
  • @Ted: as it happens, I also took algebra from Artin! But I skipped the first semester and went straight into the second semester so I missed some things. – Qiaochu Yuan Jul 28 '24 at 21:18
3

I think, I can add a peculiar proof for complex valued matrices based on purely analytic considerations.

Let $A \in \mathbb C^{n\times n}$ and let $R(z) = (zI - A)^{-1}$ for $z \in \mathbb C$ such that $z$ is not an eigenvalue of $A$. Clearly, for large $z$ we have an expansion $R(z) = \sum_{n = 1}^\infty z^{-n} A^{n-1}$. Thus, for a contour $\gamma$ that encloses all eigenvalues of $A$ we obtain $\int_\gamma z^kR(z)dz = 2\pi iA^k$ for $k = 0,1,\dots$ As a consequence we get $\int_\gamma p(z)R(z)dz = 2\pi i p(A)$ for all polynomials $p \in \mathbb C[x]$.

Now the proof of Cayley-Hamilton theorem finishes as follows. Let $f(z) = \det(zI - A)$ be a characteristic polynomial of $A$ and observe that $R(z)f(z)$ is an entire function (moreover, this matrix function is polynomial in $z$ and $A$). Thus, $2\pi i f(A) = \int_\gamma f(z)R(z)dz = 0$ by Cauchy theorem.

Matsmir
  • 4,159
1

[This is just a version of Sh.R’s answer above]

Consider ${ A \in \mathbb{C} ^{n \times n } }.$ By upper triangularization, there exists a basis ${ [P _1, \ldots, P _n] }$ such that $${ A [P _1, \ldots, P _n] = [P _1, \ldots, P _n] U }$$ and $${ U = \begin{pmatrix} \lambda _1 &* &* &\ldots &* \\ &\lambda _2 &* &\ldots &* \\ & &\ddots & &\vdots \\ & & &\lambda _{n-1} &* \\ & & & & \lambda _n \end{pmatrix} }.$$

Writing ${ P = [P _1, \ldots, P _n ] },$ since $${ \det(tI - A) = \det(P ^{-1} (tI - A) P) = \det(tI - P ^{-1} A P) }$$ we have polynomial $${ f(t) = \det(tI - A) = \det(tI - U) = (t - \lambda _1) \ldots (t - \lambda _n) }.$$

We are interested in the matrix $${ f(A) = P f(P ^{-1} A P) P ^{-1} = P f(U) P ^{-1} . }$$

We can show matrix ${ f(U) = 0 }$ (and hence ${ f(A) = 0 }$).

Thm: Let ${ U }$ be an upper triangular matrix $${ U = \begin{pmatrix} \lambda _1 &* &* &\ldots &* \\ &\lambda _2 &* &\ldots &* \\ & &\ddots & &\vdots \\ & & &\lambda _{n-1} &* \\ & & & & \lambda _n \end{pmatrix} }.$$ Then ${ f(U) = (U - \lambda _1 I) \ldots (U - \lambda _n I) = 0 }.$

Pf: We have ${ U = \begin{pmatrix} \lambda _1 &* &\ldots &* \\ 0& &\tilde{U} \end{pmatrix} }$ with submatrix ${ \tilde{U} = \begin{pmatrix} \lambda _2 &* &* &\ldots &* \\ &\ddots & & &\vdots \\ & & &\lambda _{n-1} &* \\ & & & & \lambda _n \end{pmatrix} }.$
By induction hypothesis, $${ (\tilde{U} - \lambda _2 I) \ldots (\tilde{U} - \lambda _n I) = 0 }.$$ Hence $${ (U - \lambda _2 I) \ldots (U - \lambda _n I) }$$ $${ \begin{align*} &= \begin{pmatrix} * &* &\ldots &* \\ 0& &\tilde{U} - \lambda _2 I \end{pmatrix} \ldots \begin{pmatrix} * &* &\ldots &* \\ 0& &\tilde{U} - \lambda _n I \end{pmatrix} \\ &= \begin{pmatrix} * &* &\ldots &* \\ 0 & &(\tilde{U} - \lambda _2 I) \ldots (\tilde{U} - \lambda _n I) \end{pmatrix} = \begin{pmatrix} * &* &\ldots &* \\ 0 & &0 & \end{pmatrix}. \end{align*}}$$ Hence $${ (U - \lambda _1 I) (U - \lambda _2 I) \ldots (U - \lambda _n I) }$$ $${ \begin{align*} &= \begin{pmatrix} 0 &* &\ldots &* \\ 0 & &\tilde{U}-\lambda _1 I & \end{pmatrix} \begin{pmatrix} * &* &\ldots &* \\ 0 & &0 & \end{pmatrix} = 0 , \end{align*} }$$ as needed.