lecture notes on the Cayley-Hamilton theorem
1 Overview
You should all know about the characteristic polynomial of a squarematrix
, . To calculate the characteristic polynomial of , onesubtracts a variable, say , from the diagonal entries of , andthen takes the determinant
of the result. In other words, letting denote the characteristic polynomial of , one has
where as usual denotes the identity matrix.For example, set
Evaluating the determinant of one gets
Now the interesting thing about square matrices is that one can doalgebra with them. So if is a matrix then ,, indeed every power of will also be a matrix.Indeed, one can take any polynomial , and happily plug intoit. The result will be some other matrix. The obviousquestion now is: what will happen when one plugs a square matrix into its own characteristic polynomial? Let’s see what happens forthe sample matrix above. Straightforward calculations showthat
Next adding the various powers of with the coefficients ofcharacteristic polynomial (note that one uses the identity matrix inplace of the constants) one gets
Zero! One gets zero. This seemingly miraculous answer is not acoincidence. Indeed one gets zero regardless of what matrix onestarts with. I encourage you to try this with a few examples of your own.
Theorem 1 (Cayley-Hamilton).
Let be an matrix, and let be thecorresponding characteristic polynomial. Then, .
The goal of these notes will be to explain and prove the abovetheorem. There are various hidden reasons that make theCayley-Hamilton theorem work. It is the purpose of these notes tobring these reasons into the open.
2 The Gist of the Matter.
Indeed there are two factors that make the Cayley-Hamilton theoremsuch a striking and interesting result. Recall that if is an-dimensional vector space, and , are any vectors in , then there will some kind ofa linear relation
between the ’s, i.e. for some choice ofscalars one will have
Now the space of matrices is -dimensional. Thereforefor every matrix there must be a linear relationshipbetween the different matrix powers
The “miracle” of the Cayley-Hamilton theorem is twofold. First, alinear relation arises already for the powers .Second, the coefficients for this linear relation are precisely thecoefficients of the characteristic polynomial of .
Let’s put it another way. Look at the first column vectors of thematrices , i.e. the vectors
Now is a -dimensional vector space, and so thereshould be a linear relation between the above vectors. Indeedthere is: the coefficients of the linear relation are ( i.e. times the first vector, plus times the second, plus times the third, plus times the fourth is equal to zero —try it yourself!). What about the second column vectors of ? Now the vectors in question are
Again, we have here vectors from a -dimensional vectorsspace, and therefore there should be a linear relation between thevectors. However by some miracle the coefficients of the linearrelation for the second column vectors are the same as thecoefficients of the linear relation between the first column vectors,namely . Furthermore, these coefficients are preciselythe coefficients of the characteristic polynomial: .Needless to say the third column vectors are joined in a linearrelation with the same coefficients: . Why is thishappening?
3 The Cyclic Basis
Let’s look again at the first column vectors of the matrices (recall that is just the identity matrix):
and let’s take these vectors as a new basis, . A basisobtained in this fashion, i.e. by starting with a vector andsuccessively applying a matrix to it, is called a cyclic basis. Whatwill be the representation of the matrix relative to this cyclicbasis? Now is just the first elementary vector, .Furthermore, note that is nothing but , and that. Now is the first column vector of andwe already determined the linear relation between the first columnvectors of . The bottom line is that
and consequently will havethe following appearance relative to the basis :
The transition matrix from to the standard basis is given by
Of course is relevant to our discussion precisely because
Proposition 1.
Let be an matrix, a non-singular matrix, and set . The matrices and havethe same characteristic polynomials.
Proof.
The characteristic polynomial of is given as
Recall that the determinant of a product is the product of thedeterminants, and that the determinant of an inverse
is the inverse ofthe determinant. Therefore
∎
In other words, according to the above theorem we should expect thecharacteristic polynomial of to be equal to thecharacteristic polynomial of . Let’s check this using a co-factorexpansion.
Also note that the last column of contains all but one ofthe coefficients of the characteristic polynomial. This too is not acoincidence.
Proposition 2.
Consider an matrix such that the column vector for is the basic vector, while the last column of is the vector . In other words has the followingform:
Then, the characteristic polynomial of is given by
Proof.
We will calculate the determinant of by doing a co-factorexpansion along the first row. Let be the matrix obtained bydeleting the first row and the first column from , and let be the matrix obtained by deleting the first row and the last columnfrom . Doing a co-factor expansion along the top row, it iseasy to see that
Now is anupper triangular matrix with ones on the diagonal, and therefore. The matrix , on the other hand has the samestructure
as , only it’s 1 size smaller. To that end let be the matrix obtained by deleting the first two rows andcolumns from . By the same reasoning as above it’s easy tosee that
and therefore
Continuing inductively we see that for even ,the determinant of will have the form:
For odd , will be just like theformula above, but multiplied through by a negative sign.∎
4 Putting it all together
Thanks to Propositions 1 and 2 we arenow in a position to understand and to prove theCayley-Hamilton Theorem. Let be an matrix. Start bysetting , and then create a sequence
of vectors bysuccessively applying , i.e. , , etc.Notice that ; in other words, is the firstcolumn of the matrix .
Next, suppose that the vectors form a basis, , of (There are matrices for which this doesn’t happen, but we’ll consider this possibilitylater.) There will therefore exist scalars such that
Now the representation of relative to the cyclic basis willhave the form
By Proposition 1 the characteristic polynomial of is equal to the characteristic polynomial of .Furthermore, by Proposition 2 the characteristicpolynomial of is equal to
Only one conclusionis possible: must be precisely thecoefficients of the characteristic polynomial of . Let ussummarize these findings.
Proposition 3.
Let be an matrix, with characteristic polynomial
Fix anumber between and , and let be the column of the matrix . If the vectors form a basis of , thenthe vectors satisfy thelinear relation:
5 A Complication
We are almost done with the proof of the Cayley-HamiltonTheorem. First, however, we must deal with the possibility that thesquare matrix is such that the column vectors of do not form a basis. Consider, for example
An easy calculation shows that the characteristic polynomial is givenby
Writing down the sequence of powers of :
we notice that the first columns do, in fact, obey a linear relationwith the coefficients of the characteristic polynomial:
(1) |
However these first column vectors do not form a basis of, and therefore Proposition 3 is not enoughto explain why these vectors obey the above linear relation.
In order to find an explanation, let us proceed as follows. Just asbefore, start by setting , and . If wetake , then will not form abasis, so instead, let us choose that is linearly independentfrom and , thereby ensuring that is a basis.There are many, many possible such choices for . To keep thediscussion concrete, let us take . Note that
Therefore, representing relative to the basis we obtain
By Proposition 1, we know that the characteristicpolynomial of is equal to the characteristic polynomial of. However, we know much more.
Proposition 4.
Let be an matrix of the form
where is a matrix, is a matrix, and is a matrix. Then, the characteristicpolynomial of is the product of the characteristic polynomials of and , i.e. .
Proof.
Note that
where is the identity matrix, and is the identity matrix. The Proposition now follows fromthe fact that the determinant of a matrix whose shape is like isthe determinant of the upper-left block times the determinant of thelower-right block.∎
Thanks to Proposition 4 we know that thecharacteristic polynomial of is a product of thecharacteristic polynomial of the matrix
and the characteristic polynomial of the matrix . Inother words,
Furthermore by Proposition2 we know that , , , i.e. thefirst column vectors of , obey a linear relation withthe coefficients of the polynomial :
(2) |
Multiplying this relation through by wed deduce that the firstcolumn vectors of obey the same linear relation:
(3) |
Next think about what it means to multiply a polynomial such as by another polynomial such as . Indeed, one canstructure the multiplication by multiplying the first polynomialthrough by , then multiplying it through by , and then addingthe two terms:
The bottom line is, of course, just the characteristic polynomialof , and the whole idea behind the above calculation is that can be “formed out of” the polynomial . Thisshows that we can combine relations (2) and(3) and produce in the end the desired relation(1). All we have to do is take relation (3),and add to it times the relation (2). This explainswhy the first column vectors of obey a linearrelation whose coefficients come from the characteristic polynomial of.
Proposition 5.
Let be an matrix, with characteristic polynomial
Fix anumber between and , and let be the column of the matrix .The vectors satisfy thelinear relation:
even if the vectors do not form abasis of .
Proof.
Suppose that there is a number such that can be givenas a linear combination of ; let’ssay
Choose vectors so that thelist
forms a basis of . Relative to this basis, will havethe form
where the upper-left block has the form
By Proposition 2,
.Let denote the characteristic polynomial of .By Proposition 4, the characteristic polynomial of (which is equal to the characteristic polynomial of ) isthe product , and therefore canbe obtained by taking linear combinations of timesvarious powers of :
Corresponding to the above polynomials are the relations
Adding these relations in the same way as the polynomials yieldsthe desired relation:
∎
Now we really are finished. Thanks to Propositions 3and 5 we know that for every , the column vectors of the matrices
obey a linear relation with the coefficients of the characteristicpolynomial of . Since this is true for every column of the abovematrices, it is certainly true for the full matrix — and that is theprecisely the conclusion of the Cayley-Hamilton theorem.