simultaneous triangularisation of commuting matrices over any field

Let $\\mathbf{e}_{i}$ denote the (column) vector whose $i$ th position is $1$ and where all other positions are $0$ . Denote by $[n]$ the set $\\{1,\\ldots,n\\}$ . Denote by $\\mathrm{M}_{n}(\\mathcal{K})$ the set of all $n\\times n$ matrices over $\\mathcal{K}$ , and by $\\mathrm{GL}_{n}(\\mathcal{K})$ the set of allinvertible elements of $\\mathrm{M}_{n}(\\mathcal{K})$ . Let $d_{i}$ be the functionwhich extracts the $i$ th diagonal element of a matrix, i.e., $d_{i}(A)=\\mathbf{e}_{i}^{\\mathrm{T}}\\!A\\mathbf{e}_{i}$ .

Theorem.

Let $\\mathcal{K}$ be a field, let $A_{1},\\ldots,A_{r}\\in\\mathrm{M}_{n}(\\mathcal{K})$ be pairwise commuting matrices, and let $\\mathcal{L}$ be a field extensionof $\\mathcal{K}$ in which the characteristic polynomials of all $A_{k}$ split (http://planetmath.org/SplittingField). Then there exists some $P\\in\\mathrm{GL}_{n}(\\mathcal{L})$ such that

1.
$P^{-1}A_{k}P$ is upper triangular for all $k=1,\\ldots,r$ ,and
2.
if $i,j,l\\in[n]$ are such that $i\\leqslant l\\leqslant j$ and $d_{i}(P^{-1}A_{k}P)=d_{j}(P^{-1}A_{k}P)$ for all $k=1,\\ldots,r$ , then $d_{l}(P^{-1}A_{k}P)=d_{j}(P^{-1}A_{k}P)$ for all $k=1,\\ldots,r$ as well.

The proof relies on two lemmas.

Lemma 1.

Let $\\mathcal{K}$ be a field, let $A_{1},\\ldots,A_{r}\\in\\mathrm{M}_{n}(\\mathcal{K})$ be pairwise commuting matrices, and let $\\mathcal{L}$ be a field extensionof $\\mathcal{K}$ in which the characteristic polynomials of all $A_{k}$ split. Then there exists some nonzero $\\mathbf{u}\\in\\mathcal{L}^{n}$ whichis an eigenvector of $A_{k}$ for all $k=1,\\ldots,r$ .

Lemma 2.

For any sequence $R_{1},\\ldots,R_{r}\\in\\mathrm{M}_{n}(\\mathcal{L})$ of uppertriangular pairwise commuting matrices and every row index $i\\in[n]$ , there exists $\\mathbf{v}\\in\\mathcal{L}^{n}\\setminus\\{0\\}$ such that

R_{k}\\mathbf{v}=d_{i}(R_{k})\\mathbf{v}\\quad\\text{for all \\(k\\in[r]\\).}

Proof.

This is by induction on $n$ . The induction hypothesis is that givenpairwise commuting matrices $A_{1},\\ldots,A_{r}\\in\\mathrm{M}_{n}(\\mathcal{L})$ ,whose characteristic polynomials all split in $\\mathcal{L}$ , and asequence of arbitrary scalars $\\mu_{1},\\ldots,\\mu_{r}\\in\\mathcal{L}$ ,there exists some $P\\in\\mathrm{GL}_{n}(\\mathcal{L})$ such that:

1.
$P^{-1}A_{k}P$ is upper triangular for all $k=1,\\ldots,r$ .
2.
If some $i,j\\in[n]$ are such that $i<j$ and $d_{j}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ for all $k\\in[r]$ ,then $d_{i+1}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ .
3.
If some $j\\in[n]$ is such that $d_{j}(P^{-1}A_{k}P)=\\mu_{k}$ for all $k\\in[r]$ , then $d_{1}(P^{-1}A_{k}P)=\\mu_{k}$ for all $k\\in[r]$ .

For $n=1$ this hypothesis is trivially fulfilled (all $1\\times 1$ matrices are upper triangular). Assume that it holds for $n=m$ andconsider the case $n=m+1$ .

It is easy to see that condition 1 impliesthat $P\\mathbf{e}_{1}$ must be an eigenvector that is common to all thematrices. If there exists a nonzero vector $\\mathbf{u}_{1}\\in\\mathcal{L}^{n}$ such that $A_{k}\\mathbf{u}_{1}=\\mu_{k}\\mathbf{u}_{1}$ for all $k=1,\\ldots,r$ then this is such a common eigenvector, and in that case let $\\lambda_{k}=\\mu_{k}$ for all $k=1,\\ldots,r$ . Otherwise there byLemma 1 exists a vector $\\mathbf{u}_{1}\\in\\mathcal{L}^{n}\\setminus\\{\\mathbf{0}\\}$ such that $A_{k}\\mathbf{u}_{1}=\\lambda_{k}\\mathbf{u}_{1}$ for some $\\{\\lambda_{k}\\}_{k=1}^{r}\\subseteq\\mathcal{L}$ . Either way, one gets asuitable candidate $\\mathbf{u}_{1}$ for $P\\mathbf{e}_{1}$ and eigenvalues $\\lambda_{1},\\ldots,\\lambda_{r}$ that incidentally will satisfy $d_{1}(P^{-1}A_{k}P)=\\lambda_{k}$ for all $k\\in[r]$ .

Let $\\mathbf{u}_{2},\\ldots,\\mathbf{u}_{n}\\in\\mathcal{L}^{n}$ be arbitraryvectors such that $\\{\\mathbf{u}_{i}\\}_{i=1}^{n}$ is a basis of $\\mathcal{L}^{n}$ . Let $U$ be the $n\\times n$ matrix whose $i$ th columnis $\\mathbf{u}_{i}$ for $1\\leqslant i\\leqslant n$ .¹¹Byimposing extra conditions on the choice of the basis $\\{\\mathbf{u}_{i}\\}_{i=1}^{n}$ (such as for example requesting thatit is orthonormal) at this point, one can often prove a strongerclaim where the choice of $P$ is restricted to some smallergroup of matrices (for example the group of orthogonalmatrices), but this requires assuming additional things aboutthe fields $\\mathcal{K}$ and $\\mathcal{L}$ .Then $U$ is invertible and for each $k$ the first column of $B_{k}=U^{-1}A_{k}U$ is

U^{-1}A_{k}U\\mathbf{e}_{1}=U^{-1}A_{k}\\mathbf{u}_{1}=\\lambda_{k}U^{-1}\\mathbf{%u}_{1}=\\lambda_{k}\\mathbf{e}_{1}\\text{.}

Furthermore

\\displaystyle B_{j}B_{k}=U^{-1}A_{j}UU^{-1}A_{k}U=U^{-1}A_{j}A_{k}U=\\\\\\displaystyle=U^{-1}A_{k}A_{j}U=U^{-1}A_{k}UU^{-1}A_{j}U=B_{k}B_{j}

for all $j$ and $k$ .

Now let $A_{k}^{\\prime}$ be the matrix formed from rows and columns $2$ though $n$ of $B_{k}$ . Since $\\det(A_{k}-\obreak xI)=\\det(B_{k}-\obreak xI)=(\\lambda_{k}-\obreak x)\\det(A%_{k}^{\\prime}-\obreak xI)$ byexpansion (http://planetmath.org/LaplaceExpansion) along the first column,it follows that the characteristic polynomial of $A_{k}^{\\prime}$ splits in $\\mathcal{L}$ . Furthermore all the $A_{k}^{\\prime}$ have side $m=n-1$ andcommute pairwise with each other, whence by the induction hypothesisthere exists some $P^{\\prime}\\in\\mathrm{GL}_{n-1}(\\mathcal{L})$ such that every $P^{\\prime-1}A_{k}^{\\prime}P^{\\prime}$ is upper triangular. Let $P=U\\left(\\begin{smallmatrix}1&0\\\\0&P^{\\prime}\\end{smallmatrix}\\right)$ . Then the submatrix consisting of rows and columns $2$ through $n$ of $P^{-1}A_{k}P$ is equal to $P^{\\prime-1}A_{k}^{\\prime}P^{\\prime}$ and hencecontains no nonzero subdiagonal elements. Furthermore the firstcolumn of $P^{-1}A_{k}P$ is equal to the first column of $B_{k}$ andthus the $P^{-1}A_{k}P$ are all upper triangular, as claimed.

It also follows from the induction hypothesis that $P$ can be chosensuch that $d_{2}(P^{-1}A_{k}P)=d_{1}(P^{\\prime-1}A_{k}^{\\prime}P^{\\prime})=\\lambda_{k}=d_%{1}(P^{-1}A_{k}P)$ for all $k\\in[r]$ if there is any $j\\geqslant 2$ for which $d_{j}(P^{-1}A_{k}P)=d_{j-1}(P^{\\prime-1}A_{k}^{\\prime}P^{\\prime})=\\lambda_{k}=%d_{1}(P^{-1}A_{k}P)$ for all $k\\in[r]$ andmore generally if $2\\leqslant i<j$ are such that $d_{j}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ for all $k\\in[r]$ then similarly $d_{i+1}(P^{-1}A_{k}P)=d_{i}(P^{-1}A_{k}P)$ for all $k\\in[r]$ . This has verifiedcondition 2 of the induction hypothesis.For the remaining condition 3, one may first observethat if there is some $i\\in[n]$ such that $d_{i}(P^{-1}A_{k}P)=\\mu_{k}$ for all $k\\in[r]$ then by Lemma 2there exists a nonzero $\\mathbf{v}\\in\\mathcal{L}^{n}$ such that $P^{-1}A_{k}P\\mathbf{v}=\\mu_{k}\\mathbf{v}$ for all $k\\in[r]$ . Thismeans $P\\mathbf{v}$ will fulfill the condition for choice of $\\mathbf{u}_{1}$ , and hence $d_{1}(P^{-1}A_{k}P)=\\lambda_{k}=\\mu_{k}$ asclaimed.

The theorem now follows from the principle of induction.∎