lecture notes on the Cayley-Hamilton theorem

1 Overview

You should all know about the characteristic polynomial of a squarematrix, $A$ . To calculate the characteristic polynomial of $A$ , onesubtracts a variable, say $t$ , from the diagonal entries of $A$ , andthen takes the determinant of the result. In other words, letting $p_{A}(t)$ denote the characteristic polynomial of $A$ , one has

p_{A}(t)=\\det(A-tI),

where $I$ as usual denotes the identity matrix.For example, set

A=\\left[\\begin{array}[]{rrr}1&2&3\\\\0&-2&1\\\\1&1&0\\end{array}\\right].

Evaluating the determinant of $A-tI$ one gets

p_{A}(t)=-t^{3}-t^{2}+6t+7.

Now the interesting thing about square matrices is that one can doalgebra with them. So if $A$ is a $3\\times 3$ matrix then $A^{2}$ , $A^{3}$ , indeed every power of $A$ will also be a $3\\times 3$ matrix.Indeed, one can take any polynomial $p(t)$ , and happily plug $A$ intoit. The result will be some other $3\\times 3$ matrix. The obviousquestion now is: what will happen when one plugs a square matrix $A$ into its own characteristic polynomial? Let’s see what happens forthe sample $3\\times 3$ matrix above. Straightforward calculations showthat

A^{2}=\\left[\\begin{array}[]{rrr}4&1&5\\\\1&5&-2\\\\1&0&4\\end{array}\\right],\\qquad A^{3}=\\left[\\begin{array}[]{rrr}9&11&13\\\\-1&-10&8\\\\5&6&3\\end{array}\\right],

Next adding the various powers of $A$ with the coefficients ofcharacteristic polynomial (note that one uses the identity matrix inplace of the constants) one gets

\\begin{array}[]{cccc}-A^{3}&-A^{2}&6A&7I\\\\\\\\-\\left[\\begin{array}[]{rrr}9&11&13\\\\-1&-10&8\\\\5&6&3\\end{array}\\right]&-\\left[\\begin{array}[]{rrr}4&1&5\\\\1&5&-2\\\\1&0&4\\end{array}\\right]&+6\\left[\\begin{array}[]{rrr}1&2&3\\\\0&-2&1\\\\1&1&0\\end{array}\\right]&+7\\left[\\begin{array}[]{rrr}1&0&0\\\\0&1&0\\\\0&0&1\\end{array}\\right]\\end{array}

\\qquad\\qquad=\\left[\\begin{array}[]{rrr}0&0&0\\\\0&0&0\\\\0&0&0\\end{array}\\right]

Zero! One gets zero. This seemingly miraculous answer is not acoincidence. Indeed one gets zero regardless of what matrix onestarts with. I encourage you to try this with a few examples of your own.

Theorem 1 (Cayley-Hamilton).

Let $A$ be an $n\\times n$ matrix, and let $p_{A}(t)=\\det(A-tI)$ be thecorresponding characteristic polynomial. Then, $p_{A}(A)=0$ .

The goal of these notes will be to explain and prove the abovetheorem. There are various hidden reasons that make theCayley-Hamilton theorem work. It is the purpose of these notes tobring these reasons into the open.

2 The Gist of the Matter.

Indeed there are two factors that make the Cayley-Hamilton theoremsuch a striking and interesting result. Recall that if $U$ is an $n$ -dimensional vector space, and $\\mathbf{u}_{1},\\mathbf{u}_{2},\\mathbf{u}_{3},\\ldots,\\mathbf{u}_{n+1}$ , are any $n+1$ vectors in $U$ , then there will some kind ofa linear relation between the $\\mathbf{u}_{i}$ ’s, i.e. for some choice ofscalars $a_{1},a_{2},\\ldots,a_{n+1}$ one will have

a_{1}\\mathbf{u}_{1}+a_{2}\\mathbf{u}_{2}+a_{3}\\mathbf{u}_{3}+\\ldots a_{n+1}%\\mathbf{u}_{n+1}=0.

Now the space of $3\\times 3$ matrices is $9$ -dimensional. Thereforefor every $3\\times 3$ matrix $A$ there must be a linear relationshipbetween the $10$ different matrix powers

A^{9},A^{8},A^{7},A^{6},A^{5},A^{4},A^{3},A^{2},A^{1}=A,A^{0}=I.

The “miracle” of the Cayley-Hamilton theorem is twofold. First, alinear relation arises already for the powers $A^{3},A^{2},A^{1},A^{0}$ .Second, the coefficients for this linear relation are precisely thecoefficients of the characteristic polynomial of $A$ .

Let’s put it another way. Look at the first column vectors of thematrices $A^{3},A^{2},A^{1},A^{0}$ , i.e. the vectors

\\left[\\begin{array}[]{r}9\\\\-1\\\\5\\end{array}\\right],\\quad\\left[\\begin{array}[]{r}4\\\\1\\\\1\\end{array}\\right],\\quad\\left[\\begin{array}[]{r}1\\\\0\\\\1\\end{array}\\right],\\quad\\left[\\begin{array}[]{r}1\\\\0\\\\0\\end{array}\\right].

Now ${\\mathbb{R}}^{3}$ is a $3$ -dimensional vector space, and so thereshould be a linear relation between the above $4$ vectors. Indeedthere is: the coefficients of the linear relation are $-1,-1,6,7$ ( i.e. $-1$ times the first vector, plus $-1$ times the second, plus $6$ times the third, plus $7$ times the fourth is equal to zero —try it yourself!). What about the second column vectors of $A^{3},A^{2},A^{1},A^{0}$ ? Now the vectors in question are

\\left[\\begin{array}[]{r}11\\\\-10\\\\6\\end{array}\\right],\\quad\\left[\\begin{array}[]{r}1\\\\5\\\\0\\end{array}\\right],\\quad\\left[\\begin{array}[]{r}2\\\\-2\\\\1\\end{array}\\right],\\quad\\left[\\begin{array}[]{r}0\\\\1\\\\0\\end{array}\\right].

Again, we have here $4$ vectors from a $3$ -dimensional vectorsspace, and therefore there should be a linear relation between thevectors. However by some miracle the coefficients of the linearrelation for the second column vectors are the same as thecoefficients of the linear relation between the first column vectors,namely $-1,-1,6,7$ . Furthermore, these coefficients are preciselythe coefficients of the characteristic polynomial: $-t^{3}-t^{2}+6t+7$ .Needless to say the third column vectors are joined in a linearrelation with the same coefficients: $-1,-1,6,7$ . Why is thishappening?

3 The Cyclic Basis

Let’s look again at the first column vectors of the matrices $A^{0},A^{1},A^{2}$ (recall that $A^{0}$ is just the identity matrix):

\\mathbf{u}_{0}=\\ \\left[\\begin{array}[]{r}1\\\\0\\\\0\\end{array}\\right],\\quad\\mathbf{u}_{1}=\\left[\\begin{array}[]{r}1\\\\0\\\\1\\end{array}\\right],\\quad\\mathbf{u}_{2}=\\left[\\begin{array}[]{r}4\\\\1\\\\1\\end{array}\\right],

and let’s take these $3$ vectors as a new basis, ${\\mathbf{B}}$ . A basisobtained in this fashion, i.e. by starting with a vector andsuccessively applying a matrix to it, is called a cyclic basis. Whatwill be the representation of the matrix $A$ relative to this cyclicbasis? Now $\\mathbf{u}_{0}$ is just the first elementary vector, $\\mathbf{e}_{1}$ .Furthermore, note that $\\mathbf{u}_{1}$ is nothing but $A\\mathbf{u}_{0}$ , and that $\\mathbf{u}_{2}=A\\mathbf{u}_{1}$ . Now $A\\mathbf{u}_{2}$ is the first column vector of $A^{3}$ andwe already determined the linear relation between the first columnvectors of $A^{3},\\ldots A^{0}$ . The bottom line is that

A\\mathbf{u}_{2}=-(\\mathbf{u}_{2}-6\\mathbf{u}_{1}-7\\mathbf{u}_{0}),

and consequently $A$ will havethe following appearance relative to the basis ${\\mathbf{B}}$ :

[A]_{{\\mathbf{B}}}=\\left[\\begin{array}[]{rrr}0&0&7\\\\1&0&6\\\\0&1&-1\\end{array}\\right]

The transition matrix $P$ from ${\\mathbf{B}}$ to the standard basis $\\mathbf{e}_{1},\\mathbf{e}_{2},\\mathbf{e}_{3}$ is given by

P=\\left[\\begin{array}[]{rrr}1&1&4\\\\0&0&1\\\\0&1&1\\end{array}\\right].

Of course $P$ is relevant to our discussion precisely because

[A]_{\\mathbf{B}}=P^{-1}AP.

Proposition 1.

Let $A$ be an $n\\times n$ matrix, $P$ a non-singular $n\\times n$ matrix, and set $B=P^{-1}AP$ . The matrices $A$ and $B$ havethe same characteristic polynomials.

Proof.

The characteristic polynomial of $B$ is given as

\\det(B-tI)=\\det\\left(P^{-1}AP-tI\\right)=\\det\\left(P^{-1}(A-tI)P\\right).

Recall that the determinant of a product is the product of thedeterminants, and that the determinant of an inverse is the inverse ofthe determinant. Therefore

\\det(B-tI)=\\det(P^{-1})\\det(A-tI)\\det(P)=\\det(A-tI).

∎

In other words, according to the above theorem we should expect thecharacteristic polynomial of $[A]_{\\mathbf{B}}$ to be equal to thecharacteristic polynomial of $A$ . Let’s check this using a co-factorexpansion.

\\left|\\begin{array}[]{ccc}-t&0&7\\\\1&-t&6\\\\0&1&-1-t\\end{array}\\right|=-t(t^{2}+t-6)+7\\cdot 1=-t^{3}-t^{2}+6t+7

Also note that the last column of $[A]_{\\mathbf{B}}$ contains all but one ofthe coefficients of the characteristic polynomial. This too is not acoincidence.

Proposition 2.

Consider an $n\\times n$ matrix $B$ such that the $j^{\\mbox{\\rm th}}$ column vector for $j=1,2,\\ldots n-1$ is the basic vector $\\mathbf{e}_{j+1}$ , while the last column of $B$ is the vector $[-b_{0},-b_{1},\\ldots,-b_{n-1}]^{T}$ . In other words $B$ has the followingform:

B=\\left[\\begin{array}[]{cccccc}0&0&0&\\ldots&0&-b_{0}\\\\1&0&0&\\ldots&0&-b_{1}\\\\0&1&0&\\ldots&0&-b_{2}\\\\0&0&1&\\ldots&0&-b_{3}\\\\\\vdots&\\vdots&\\vdots&\\ddots&\\vdots&\\vdots\\\\0&0&0&\\ldots&1&-b_{n-1}\\end{array}\\right]

Then, the characteristic polynomial of $B$ is given by

(-1)^{n}p_{B}(t)=t^{n}+b_{n-1}t^{n-1}+\\ldots+b_{2}t^{2}+b_{1}t+b_{0}.

Proof.

We will calculate the determinant of $B-tI$ by doing a co-factorexpansion along the first row. Let $B_{1}$ be the matrix obtained bydeleting the first row and the first column from $B-tI$ , and let $D$ be the matrix obtained by deleting the first row and the last columnfrom $B-tI$ . Doing a co-factor expansion along the top row, it iseasy to see that

\\det(B-tI)=-t\\det(B_{1})+(-1)^{n}b_{0}\\det(D).

Now $D$ is anupper triangular matrix with ones on the diagonal, and therefore $\\det(D)=1$ . The matrix $B_{1}$ , on the other hand has the samestructure as $B-tI$ , only it’s 1 size smaller. To that end let $B_{2}$ be the matrix obtained by deleting the first two rows andcolumns from $B-tI$ . By the same reasoning as above it’s easy tosee that

\\det(B_{1})=-t\\det(B_{2})+(-1)^{n-1}b_{1},

and therefore

\\det(B-tI)=(-1)^{n}b_{0}-t\\Big{(}(-1)^{n-1}b_{1}-t\\det(B_{2})\\Big{)}.

Continuing inductively we see that for even $n$ ,the determinant of $B-tI$ will have the form:

b_{0}-t\\Bigg{(}-b_{1}-t\\Big{(}b_{2}-t\\big{(}-b_{3}-t(\\ldots)\\big{)}\\Big{)}%\\Bigg{)}=b_{0}+b_{1}t+b_{2}t^{2}+b_{3}t^{3}+\\ldots+b_{n-1}t^{n-1}+t^{n}

For odd $n$ , $\\det(B-tI)$ will be just like theformula above, but multiplied through by a negative sign.∎

4 Putting it all together

Thanks to Propositions 1 and 2 we arenow in a position to understand and to prove theCayley-Hamilton Theorem. Let $A$ be an $n\\times n$ matrix. Start bysetting $\\mathbf{u}_{0}=\\mathbf{e}_{1}$ , and then create a sequence of vectors bysuccessively applying $A$ , i.e. $\\mathbf{u}_{1}=A\\mathbf{u}_{0}$ , $\\mathbf{u}_{2}=A\\mathbf{u}_{1}$ , etc.Notice that $\\mathbf{u}_{k}=A^{k}\\mathbf{u}_{0}$ ; in other words, $\\mathbf{u}_{k}$ is the firstcolumn of the matrix $A^{k}$ .

Next, suppose that the $n$ vectors $\\mathbf{u}_{0},\\mathbf{u}_{1},\\mathbf{u}_{2},\\ldots,\\mathbf{u}_{n-1}$ form a basis, ${\\mathbf{B}}$ , of ${\\mathbb{R}}^{n}$ (There are matrices $A$ for which this doesn’t happen, but we’ll consider this possibilitylater.) There will therefore exist scalars $b_{0},b_{1},\\ldots b_{n-1}$ such that

\\mathbf{u}_{n}+b_{n-1}\\mathbf{u}_{n-1}+\\ldots+b_{1}\\mathbf{u}_{1}+b_{0}\\mathbf%{u}_{0}=0.

Now the representation of $A$ relative to the cyclic basis ${\\mathbf{B}}$ willhave the form

[A]_{\\mathbf{B}}=\\left[\\begin{array}[]{cccccc}0&0&0&\\ldots&0&-b_{0}\\\\1&0&0&\\ldots&0&-b_{1}\\\\0&1&0&\\ldots&0&-b_{2}\\\\0&0&1&\\ldots&0&-b_{3}\\\\\\vdots&\\vdots&\\vdots&\\ddots&\\vdots&\\vdots\\\\0&0&0&\\ldots&1&-b_{n-1}\\end{array}\\right]

By Proposition 1 the characteristic polynomial of $[A]_{\\mathbf{B}}$ is equal to the characteristic polynomial of $A$ .Furthermore, by Proposition 2 the characteristicpolynomial of $[A]_{\\mathbf{B}}$ is equal to

\\pm(t^{n}+b_{n-1}t^{n-1}+\\ldots b_{1}t+b_{0}).

Only one conclusionis possible: $b_{0},b_{1},\\ldots,b_{n-1}$ must be precisely thecoefficients of the characteristic polynomial of $A$ . Let ussummarize these findings.

Proposition 3.

Let $A$ be an $n\\times n$ matrix, with characteristic polynomial

p_{A}(t)=\\pm\\left(t^{n}+b_{n-1}t^{n-1}+\\ldots+b_{1}t+b_{0}\\right).

Fix anumber $k$ between $1$ and $n$ , and let $\\mathbf{u}_{j}$ be the $k^{\\mbox{\\rm th}}$ column of the matrix $A^{j}$ . If the vectors $\\mathbf{u}_{0},\\mathbf{u}_{1},\\ldots,\\mathbf{u}_{n-1}$ form a basis of ${\\mathbb{R}}^{n}$ , thenthe vectors $\\mathbf{u}_{0},\\mathbf{u}_{1},\\ldots,\\mathbf{u}_{n-1},\\mathbf{u}_{n}$ satisfy thelinear relation:

\\mathbf{u}_{n}+b_{n-1}\\mathbf{u}_{n-1}+\\ldots+b_{1}\\mathbf{u}_{1}+b_{0}\\mathbf%{u}_{0}=0.

5 A Complication

We are almost done with the proof of the Cayley-HamiltonTheorem. First, however, we must deal with the possibility that thesquare matrix $A$ is such that the column vectors of $A^{0},A^{1},\\ldots,A^{n-1}$ do not form a basis. Consider, for example

A=\\left[\\begin{array}[]{rrr}1&-1&2\\\\1&4&-4\\\\1&2&-2\\end{array}\\right]

An easy calculation shows that the characteristic polynomial is givenby

p_{A}(t)=t^{3}-3t^{2}+t+2.

Writing down the sequence of powers of $A$ :

\\begin{array}[]{cccc}A^{3}&A^{2}&A^{1}&A^{0}\\\\\\\\\\left[\\begin{array}[]{rrr}3&-2&4\\\\2&15&-14\\\\2&7&-6\\end{array}\\right]&\\quad\\left[\\begin{array}[]{rrr}2&-1&2\\\\1&7&-6\\\\1&3&-2\\end{array}\\right]&\\quad\\left[\\begin{array}[]{rrr}1&-1&2\\\\1&4&-4\\\\1&2&-2\\end{array}\\right]&\\quad\\left[\\begin{array}[]{rrr}1&0&0\\\\0&1&0\\\\0&0&1\\end{array}\\right]\\end{array}

we notice that the first columns do, in fact, obey a linear relationwith the coefficients of the characteristic polynomial:

\\left[\\begin{array}[]{r}3\\\\2\\\\2\\end{array}\\right]-3\\left[\\begin{array}[]{r}2\\\\1\\\\1\\end{array}\\right]+\\left[\\begin{array}[]{r}1\\\\1\\\\1\\end{array}\\right]+2\\left[\\begin{array}[]{r}1\\\\0\\\\0\\end{array}\\right]=\\left[\\begin{array}[]{r}0\\\\0\\\\0\\end{array}\\right].

(1)

However these first column vectors do not form a basis of ${\\mathbb{R}}^{3}$ , and therefore Proposition 3 is not enoughto explain why these vectors obey the above linear relation.

In order to find an explanation, let us proceed as follows. Just asbefore, start by setting $\\mathbf{u}_{0}=\\mathbf{e}_{1}$ , and $\\mathbf{u}_{1}=A\\mathbf{u}_{0}$ . If wetake $\\mathbf{u}_{2}=A\\mathbf{u}_{1}$ , then ${\\mathbf{B}}=(\\mathbf{u}_{0},\\mathbf{u}_{1},\\mathbf{u}_{2})$ will not form abasis, so instead, let us choose $\\mathbf{u}_{2}$ that is linearly independentfrom $\\mathbf{u}_{0}$ and $\\mathbf{u}_{1}$ , thereby ensuring that ${\\mathbf{B}}$ is a basis.There are many, many possible such choices for $\\mathbf{u}_{2}$ . To keep thediscussion concrete, let us take $\\mathbf{u}_{2}=\\mathbf{e}_{3}=[0,0,1]^{T}$ . Note that

A\\mathbf{u}_{0}=\\mathbf{u}_{1}

A\\mathbf{u}_{1}=[2,1,1]^{T}=\\mathbf{u}_{0}+\\mathbf{u}_{1}.

A\\mathbf{u}_{2}=[2,-4,2]^{T}=6\\mathbf{u}_{0}-4\\mathbf{u}_{1}+2\\mathbf{u}_{2}.

Therefore, representing $A$ relative to the basis ${\\mathbf{B}}$ we obtain

[A]_{\\mathbf{B}}=\\left[\\begin{array}[]{rrr}0&1&6\\\\1&1&-4\\\\0&0&2\\end{array}\\right]

By Proposition 1, we know that the characteristicpolynomial of $[A]_{\\mathbf{B}}$ is equal to the characteristic polynomial of $A$ . However, we know much more.

Proposition 4.

Let $B$ be an $n\\times n$ matrix of the form

B=\\left[\\begin{array}[]{cc}B_{1}&B_{2}\\\\\\mathbf{0}&B_{3}\\end{array}\\right]

where $B_{1}$ is a $k\\times k$ matrix, $B_{2}$ is a $k\\times(n-k)$ matrix, and $B_{3}$ is a $(n-k)\\times(n-k)$ matrix. Then, the characteristicpolynomial of $B$ is the product of the characteristic polynomials of $B_{1}$ and $B_{3}$ , i.e. $p_{B}(t)=p_{B_{1}}(t)\\times p_{B_{3}}(t)$ .

Proof.

Note that

B-tI=\\left[\\begin{array}[]{cc}B_{1}-tI_{1}&B_{2}\\\\\\mathbf{0}&B_{3}-tI_{3}\\end{array}\\right]

where $I_{1}$ is the $k\\times k$ identity matrix, and $I_{3}$ is the $(n-k)\\times(n-k)$ identity matrix. The Proposition now follows fromthe fact that the determinant of a matrix whose shape is like $B$ isthe determinant of the upper-left block times the determinant of thelower-right block.∎

Thanks to Proposition 4 we know that thecharacteristic polynomial of $[A]_{\\mathbf{B}}$ is a product of thecharacteristic polynomial of the $2\\times 2$ matrix

\\left[\\begin{array}[]{rr}0&1\\\\1&1\\end{array}\\right]

and the characteristic polynomial of the $1\\times 1$ matrix $[2]$ . Inother words,

p_{[A]_{\\mathbf{B}}}(t)=(t^{2}-t-1)(t-2).

Furthermore by Proposition2 we know that $A\\mathbf{u}_{1}$ , $\\mathbf{u}_{1}$ , $\\mathbf{u}_{0}$ , i.e. thefirst column vectors of $A^{2},A^{1},A^{0}$ , obey a linear relation withthe coefficients of the polynomial $t^{2}-t-1$ :

A\\mathbf{u}_{1}-\\mathbf{u}_{1}-\\mathbf{u}_{0}=\\mathbf{0},

\\left[\\begin{array}[]{r}2\\\\1\\\\1\\end{array}\\right]-\\left[\\begin{array}[]{r}1\\\\1\\\\1\\end{array}\\right]-\\left[\\begin{array}[]{r}1\\\\0\\\\0\\end{array}\\right]=\\left[\\begin{array}[]{r}0\\\\0\\\\0\\end{array}\\right].

(2)

Multiplying this relation through by $A$ wed deduce that the firstcolumn vectors of $A^{3},A^{2},A^{1}$ obey the same linear relation:

\\left[\\begin{array}[]{r}3\\\\2\\\\2\\end{array}\\right]-\\left[\\begin{array}[]{r}2\\\\1\\\\1\\end{array}\\right]-\\left[\\begin{array}[]{r}1\\\\1\\\\1\\end{array}\\right]=\\left[\\begin{array}[]{r}0\\\\0\\\\0\\end{array}\\right].

(3)

Next think about what it means to multiply a polynomial such as $t^{2}-t-1$ by another polynomial such as $t-2$ . Indeed, one canstructure the multiplication by multiplying the first polynomialthrough by $t$ , then multiplying it through by $-2$ , and then addingthe two terms:

\\begin{array}[]{cccc}t^{3}&-t^{2}&-t\\\\&-2t^{2}&2t&2\\\\\\hline t^{3}&-3t^{2}&+t&+2\\end{array}

The bottom line is, of course, just the characteristic polynomialof $A$ , and the whole idea behind the above calculation is that $p_{A}(t)$ can be “formed out of” the polynomial $t^{2}-t-1$ . Thisshows that we can combine relations (2) and(3) and produce in the end the desired relation(1). All we have to do is take relation (3),and add to it $-2$ times the relation (2). This explainswhy the first column vectors of $A^{3},A^{2},A^{1},A^{0}$ obey a linearrelation whose coefficients come from the characteristic polynomial of $A$ .

Proposition 5.

Let $A$ be an $n\\times n$ matrix, with characteristic polynomial

p_{A}(t)=\\pm\\left(t^{n}+b_{n-1}t^{n-1}+\\ldots+b_{1}t+b_{0}\\right).

Fix anumber $k$ between $1$ and $n$ , and let $\\mathbf{u}_{j}$ be the $k^{\\mbox{\\rm th}}$ column of the matrix $A^{j}$ .The vectors $\\mathbf{u}_{0},\\mathbf{u}_{1},\\ldots,\\mathbf{u}_{n-1},\\mathbf{u}_{n}$ satisfy thelinear relation:

\\mathbf{u}_{n}+b_{n-1}\\mathbf{u}_{n-1}+\\ldots+b_{1}\\mathbf{u}_{1}+b_{0}\\mathbf%{u}_{0}=0,

even if the vectors $\\mathbf{u}_{0},\\mathbf{u}_{1},\\ldots,\\mathbf{u}_{n-1}$ do not form abasis of ${\\mathbb{R}}^{n}$ .

Proof.

Suppose that there is a number $m<n$ such that $\\mathbf{u}_{m}$ can be givenas a linear combination of $\\mathbf{u}_{0},\\mathbf{u}_{1},\\ldots,\\mathbf{u}_{m-1}$ ; let’ssay

\\mathbf{u}_{m}+c_{m-1}\\mathbf{u}_{m-1}+\\ldots+c_{1}\\mathbf{u}_{1}+c_{0}\\mathbf%{u}_{0}=0.

Choose vectors $\\mathbf{v}_{m},\\mathbf{v}_{m+1},\\ldots,\\mathbf{v}_{n-1}$ so that thelist

{\\mathbf{B}}=(\\mathbf{u}_{0},\\mathbf{u}_{1},\\ldots,\\mathbf{u}_{m-1},\\mathbf{v}%_{m},\\mathbf{v}_{m+1},\\ldots,\\mathbf{v}_{n-1})

forms a basis of ${\\mathbb{R}}^{n}$ . Relative to this basis, $A$ will havethe form

A=\\left[\\begin{array}[]{rr}A_{1}&A_{2}\\\\0&A_{3}\\end{array}\\right],

where the upper-left block has the form

A_{1}=\\left[\\begin{array}[]{ccccc}0&0&\\ldots&0&-c_{0}\\\\1&0&\\ldots&0&-c_{1}\\\\0&1&\\ldots&0&-c_{2}\\\\\\vdots&\\vdots&\\ddots&\\vdots&\\vdots\\\\0&0&\\ldots&1&-c_{m-1}\\end{array}\\right]

By Proposition 2,

p_{A_{1}}(t)=\\pm(t^{m}+c_{m-1}t^{m-1}+\\ldots+c_{1}t+c_{0})

.Let $\\pm(t^{n-m}+d_{n-m-1}t^{m-1}+\\ldots d_{1}t+d_{0})$ denote the characteristic polynomial of $A_{3}$ .By Proposition 4, the characteristic polynomial of $[A]_{\\mathbf{B}}$ (which is equal to the characteristic polynomial of $A$ ) isthe product $p_{A_{1}}(t)\\times p_{A_{3}}(t)$ , and therefore $p_{A}(t)$ canbe obtained by taking linear combinations of $p_{A_{1}}(t)$ timesvarious powers of $t$ :

\\begin{array}[]{rrrrrrrrrr}d_{0}\\times&&&t^{m}\\;+&c_{m-1}t^{m-1}\\;+&\\ldots\\;+&%c_{2}t^{2}\\;+&c_{1}t\\;+&c_{0}\\\\d_{1}\\times&&t^{m\\;+1}\\;+&c_{m-1}t^{m}\\;+&c_{m-2}t^{m-1}\\;+&\\ldots\\;+&c_{1}t^{%2}\\;+&c_{0}t\\\\&&\\ldots&\\ldots&\\ldots\\\\1\\times&t^{n}\\;+&c_{m-1}t^{n-1}\\;+&\\ldots\\;+&c_{1}t^{n-m\\;+1}\\;+&c_{0}t^{n-m}%\\\\\\hline&t^{n}\\;+&b_{n-1}t^{n-1}\\;+&b_{n-2}t^{n-2}\\;+&\\ldots&+&b_{2}t^{2}\\;+&b_{%1}t\\;+&b_{0}\\end{array}

Corresponding to the above polynomials are the relations

\\begin{array}[]{rrrrrrrrrr}&&\\mathbf{u}_{m}\\;+&c_{m-1}\\mathbf{u}_{m-1}\\;+&%\\ldots\\;+&c_{2}\\mathbf{u}_{2}\\;+&c_{1}\\mathbf{u}_{1}\\;+&c_{0}\\mathbf{u}_{0}&=0%\\\\&\\mathbf{u}_{m+1}\\;+&c_{m-1}\\mathbf{u}_{m}\\;+&c_{m-2}\\mathbf{u}_{m-1}\\;+&%\\ldots\\;+&c_{1}\\mathbf{u}_{2}\\;+&c_{0}\\mathbf{u}_{1}&&=0\\\\&\\ldots&\\ldots&\\ldots\\\\\\mathbf{u}_{n}\\;+&c_{m-1}\\mathbf{u}_{n-1}\\;+&\\ldots\\;+&c_{1}\\mathbf{u}_{n-m+1}%\\;+&c_{0}\\mathbf{u}_{n-m}&&&&=0\\\\\\end{array}

Adding these relations in the same way as the polynomials yieldsthe desired relation:

\\mathbf{u}_{n}+b_{n-1}\\mathbf{u}_{n-1}+\\ldots+b_{1}\\mathbf{u}_{1}+b_{0}\\mathbf%{u}_{0}=0,

∎

Now we really are finished. Thanks to Propositions 3and 5 we know that for every $k=1,\\ldots,n$ , the $k^{\\rm th}$ column vectors of the matrices

A^{n},A^{n-1},\\ldots,A^{1},A^{0}

obey a linear relation with the coefficients of the characteristicpolynomial of $A$ . Since this is true for every column of the abovematrices, it is certainly true for the full matrix — and that is theprecisely the conclusion of the Cayley-Hamilton theorem.