Section4.3Diagonalization, similarity, and powers of a matrix

The first example we considered in this chapter was the matrix \(A=\left[\begin{array}{rr}
1 \amp 2 \\
2 \amp 1 \\
\end{array}\right]
\text{,}\) which has eigenvectors \(\vvec_1=\twovec{1}{1}\) and \(\vvec_2 = \twovec{-1}{1}\) and associated eigenvalues \(\lambda_1=3\) and \(\lambda_2=-1\text{.}\) In Subsection 4.1.2, we described how \(A\) is, in some sense, equivalent to the diagonal matrix \(D = \left[\begin{array}{rr}
3 \amp 0 \\
0 \amp -1\\
\end{array}\right]
\text{.}\)

This equivalence is summarized by Figure 4.3.1. The diagonal matrix \(D\) has the geometric effect of stretching vectors horizontally by a factor of \(3\) and flipping vectors vertically. The matrix \(A\) has the geometric effect of stretching vectors by a factor of \(3\) in the \(\vvec_1\) direction and flipping them in the \(\vvec_2\) direction. That is, the geometric effect of \(A\) is the same as that of \(D\) when viewed in a basis of eigenvectors of \(A\text{.}\)

Our goal in this section is to express this geometric observation in algebraic terms. In doing so, we will make precise the sense in which \(A\) and \(D\) are equivalent.

Preview Activity4.3.1.

In this preview activity, we will review some familiar properties about matrix multiplication that appear in this section.

Remember that matrix-vector multiplication constructs linear combinations of the columns of the matrix. For instance, if \(A = \begin{bmatrix}
\avec_1 \amp \avec_2 \end{bmatrix}\text{,}\) express the product \(A\twovec2{-3}\) in terms of \(\avec_1\) and \(\avec_2\text{.}\)

What is the product \(A\twovec40\) in terms of \(\avec_1\) and \(\avec_2\text{?}\)

Next, remember how matrix-matrix multiplication is defined. Suppose that we have matrices \(A\) and \(B\) and that \(B = \begin{bmatrix}
\bvec_1 \amp \bvec_2 \end{bmatrix}\text{.}\) How can we express the matrix product \(AB\) in terms of the columns of \(B\text{?}\)

Suppose that \(A\) is a matrix having eigenvectors \(\vvec_1\) and \(\vvec_2\) with associated eigenvalues \(\lambda_1 = 4\) and \(\lambda_2 =
-1\text{.}\) Express the product \(A(2\vvec_1+3\vvec_2)\) in terms of \(\vvec_1\) and \(\vvec_2\text{.}\)

Suppose that \(A\) is the matrix from the previous part and that \(P=\begin{bmatrix} \vvec_1 \amp \vvec_2
\end{bmatrix}\text{.}\) What is the matrix product

\begin{equation*}
AP = A\begin{bmatrix}
\vvec_1 \amp \vvec_2
\end{bmatrix}?
\end{equation*}

Subsection4.3.1Diagonalization of matrices

When working with an \(n\times n\) matrix \(A\text{,}\)Subsection 4.1.2 demonstrated the value of having a basis of \(\real^n\) consisting of eigenvectors of \(A\text{.}\) In fact, Proposition 4.2.9 tells us that if the eigenvalues of \(A\) are real and distinct, then there is a such a basis. As we’ll see later, there are other conditions on \(A\) that guarantee a basis of eigenvectors. For now, suffice it to say that we can find a basis of eigenvectors for many matrices. With this assumption, we will see how the matrix \(A\) is equivalent to a diagonal matrix \(D\text{.}\)

Activity4.3.2.

Suppose that \(A\) is a \(2\times2\) matrix having eigenvectors \(\vvec_1\) and \(\vvec_2\) with associated eigenvalues \(\lambda_1=3\) and \(\lambda_2 = -6\text{.}\) Because the eigenvalues are real and distinct, we know by Proposition 4.2.9 that these eigenvectors form a basis of \(\real^2\text{.}\)

What are the products \(A\vvec_1\) and \(A\vvec_2\) in terms of \(\vvec_1\) and \(\vvec_2\text{?}\)

If we form the matrix \(P = \begin{bmatrix}
\vvec_1 \amp \vvec_2
\end{bmatrix}
\text{,}\) what is the product \(AP\) in terms of \(\vvec_1\) and \(\vvec_2\text{?}\)

Use the eigenvalues to form the diagonal matrix \(D
= \begin{bmatrix}
3 \amp 0 \\
0 \amp -6
\end{bmatrix}\) and determine the product \(PD\) in terms of \(\vvec_1\) and \(\vvec_2\text{.}\)

The results from the previous two parts of this activity demonstrate that \(AP=PD\text{.}\) Using the fact that the eigenvectors \(\vvec_1\) and \(\vvec_2\) form a basis of \(\real^2\text{,}\) explain why \(P\) is invertible and that we must have \(A=PDP^{-1}\text{.}\)

Suppose that \(A=\begin{bmatrix}
-3 \amp 6 \\
3 \amp 0 \\
\end{bmatrix}\text{.}\) Verify that \(\vvec_1=\twovec11\) and \(\vvec_2=\twovec2{-1}\) are eigenvectors of \(A\) with eigenvalues \(\lambda_1 = 3\) and \(\lambda_2=-6\text{.}\)

Use the Sage cell below to define the matrices \(P\) and \(D\) and then verify that \(A=PDP^{-1}\text{.}\)

More generally, suppose that we have an \(n\times n\) matrix \(A\) and that there is a basis of \(\real^n\) consisting of eigenvectors \(\vvec_1,\vvec_2,\ldots,\vvec_n\) of \(A\) with associated eigenvalues \(\lambda_1,\lambda_2,\ldots,\lambda_n\text{.}\) If we use the eigenvectors to form the matrix

and apply the same reasoning demonstrated in the activity, we find that \(AP = PD\) and hence

\begin{equation*}
A=PDP^{-1}.
\end{equation*}

We have now seen the following proposition.

Proposition4.3.2.

If \(A\) is an \(n\times n\) matrix and there is a basis \(\{\vvec_1,\vvec_2,\ldots,\vvec_n\}\) of \(\real^n\) consisting of eigenvectors of \(A\) having associated eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\text{,}\) then we can write \(A=PDP^{-1}\) where \(D\) is the diagonal matrix whose diagonal entries are the eigenvalues of \(A\)

and the matrix \(P = \left[\begin{array}{cccc}
\vvec_1 \amp \vvec_2 \amp \ldots \amp \vvec_n
\end{array}\right]
\text{.}\)

Example4.3.3.

We have seen that \(A = \begin{bmatrix}
1 \amp 2 \\
2 \amp 1 \\
\end{bmatrix}\) has eigenvectors \(\vvec_1 = \twovec11\) and \(\vvec_2=\twovec{-1}1\) with associated eigenvalues \(\lambda_1 = 3\) and \(\lambda_2 = -1\text{.}\) Forming the matrices

This is the sense in which we mean that \(A\) is equivalent to a diagonal matrix \(D\text{.}\) The expression \(A=PDP^{-1}\) says that \(A\text{,}\) expressed in the basis defined by the columns of \(P\text{,}\) has the same geometric effect as \(D\text{,}\) expressed in the standard basis \(\evec_1, \evec_2,\ldots,\evec_n\text{.}\)

Definition4.3.4.

We say that the matrix \(A\) is diagonalizable if there is a diagonal matrix \(D\) and invertible matrix \(P\) such that

\begin{equation*}
A = PDP^{-1}.
\end{equation*}

Example4.3.5.

We will try to find a diagonalization of \(A =
\left[\begin{array}{rr}
-5 \amp 6 \\
-3 \amp 4 \\
\end{array}\right]\) whose characteristic equation is

This shows that the eigenvalues of \(A\) are \(\lambda_1 =
-2\) and \(\lambda_2 = 1\text{.}\)

By constructing \(\nul(A-(-2)I)\text{,}\) we find a basis for \(E_{-2}\) consisting of the vector \(\vvec_1 =
\twovec{2}{1}\text{.}\) Similarly, a basis for \(E_1\) consists of the vector \(\vvec_2 = \twovec{1}{1}\text{.}\) This shows that we can construct a basis \(\{\vvec_1,\vvec_2\}\) of \(\real^2\) consisting of eigenvectors of \(A\text{.}\)

If we choose a different basis for the eigenspaces, we will also find a different matrix \(P\) that diagonalizes \(A\text{.}\) The point is that there are many ways in which \(A\) can be written in the form \(A=PDP^{-1}\text{.}\)

Example4.3.6.

We will try to find a diagonalization of \(A =
\left[\begin{array}{rr}
0 \amp 4 \\
-1 \amp 4 \\
\end{array}\right]
\text{.}\)

Once again, we find the eigenvalues by solving the characteristic equation:

This shows that the eigenspace \(E_2\) is one-dimensional with \(\vvec_1=\twovec{2}{1}\) forming a basis.

In this case, there is not a basis of \(\real^2\) consisting of eigenvectors of \(A\text{,}\) which tells us that \(A\) is not diagonalizable.

In fact, if we only know that \(A = PDP^{-1}\text{,}\) we can say that the columns of \(P\) are eigenvectors of \(A\) and that the diagonal entries of \(D\) are the associated eigenvalues.

Proposition4.3.7.

An \(n\times n\) matrix \(A\) is diagonalizable if and only if there is a basis of \(\real^n\) consisting of eigenvectors of \(A\text{.}\)

The columns of \(P\) form eigenvectors of \(A\) so that \(\vvec_1 =
\twovec{1}{1}\) is an eigenvector of \(A\) with eigenvalue \(\lambda_1 = 2\) and \(\vvec_2 =
\twovec{1}{2}\) is an eigenvector with eigenvalue \(\lambda_2=-2\text{.}\)

In several earlier examples, we have been interested in computing powers of a given matrix. For instance, in Activity 4.1.3, we had the matrix \(A = \left[\begin{array}{rr}
0.8 \amp 0.6 \\
0.2 \amp 0.4 \\
\end{array}\right]\) and an initial vector \(\xvec_0=\ctwovec{1000}{0}\text{,}\) and we wanted to compute

In particular, we wanted to find \(\xvec_k=A^k\xvec_0\) and determine what happens as \(k\) becomes very large. If a matrix \(A\) is diagonalizable, writing \(A=PDP^{-1}\) can help us understand powers of \(A\) more easily.

Find the powers \(D^2\text{,}\)\(D^3\text{,}\) and \(D^4\text{.}\) What is \(D^k\) for a general value of \(k\text{?}\)

Suppose that \(A\) is a matrix with eigenvector \(\vvec\) and associated eigenvalue \(\lambda\text{;}\) that is, \(A\vvec = \lambda\vvec\text{.}\) By considering \(A^2\vvec\text{,}\) explain why \(\vvec\) is also an eigenvector of \(A\) with eigenvalue \(\lambda^2\text{.}\)

Remembering that the columns of \(P\) are eigenvectors of \(A\text{,}\) explain why \(A^2\) is diagonalizable and find a diagonalization in terms of \(P\) and \(D\text{.}\)

Give another explanation of the diagonalizability of \(A^2\) by writing

In the same way, find a diagonalization of \(A^3\text{,}\)\(A^4\text{,}\) and \(A^k\text{.}\)

Suppose that \(A\) is a diagonalizable \(2\times2\) matrix with eigenvalues \(\lambda_1 =
0.5\) and \(\lambda_2=0.1\text{.}\) What happens to \(A^k\) as \(k\) becomes very large?

If \(A\) is diagonalizable, the activity demonstrates that any power of \(A\) is as well.

Proposition4.3.9.

If \(A=PDP^{-1}\text{,}\) then \(A^k = PD^kP^{-1}\text{.}\) When \(A\) is invertible, we also have \(A^{-1} =
PD^{-1}P^{-1}\text{.}\)

Example4.3.10.

Let’s revisit Activity 4.1.3 where we had the matrix \(A = \begin{bmatrix}
0.8 \amp 0.6 \\
0.2 \amp 0.4 \\
\end{bmatrix}\) and the initial vector \(\xvec_0 = \ctwovec{1000}0\text{.}\) We were interested in understanding the sequence of vectors \(\xvec_{k+1} = A\xvec_k\text{,}\) which means that \(\xvec_k =
A^k\xvec_0\text{.}\)

We can verify that \(\vvec_1 = \twovec31\) and \(\vvec_2 =
\twovec{-1}1\) are eigenvectors of \(A\) having associated eigenvalues \(\lambda_1=1\) and \(\lambda_2 =
0.2\text{.}\) This means that \(A = PDP^{-1}\) where

Therefore, the powers of \(A\) have the form \(A^k =
PD^kP^{-1}\text{.}\)

Notice that \(D^k = \begin{bmatrix}
1^k \amp 0 \\
0 \amp 0.2^k \\
\end{bmatrix}
= \begin{bmatrix}
1 \amp 0 \\
0 \amp 0.2^k
\end{bmatrix}
\text{.}\) As \(k\) increases, \(0.2^k\) becomes closer and closer to zero. This means that for very large powers \(k\text{,}\) we have

Beginning with the vector \(\xvec_0 = \ctwovec{1000}{0}\text{,}\) we find that \(\xvec_k = A^k\xvec_0\approx
\twovec{750}{250}\) when \(k\) is very large.

Subsection4.3.3Similarity and complex eigenvalues

We have been interested in diagonalizing a matrix \(A\) because doing so relates a matrix \(A\) to a simpler diagonal matrix \(D\text{.}\) In particular, the effect of multiplying a vector by \(A=PDP^{-1}\text{,}\) viewed in the basis defined by the columns of \(P\text{,}\) is the same as the effect of multiplying by \(D\) in the standard basis.

While many matrices are diagonalizable, there are some that are not. For example, if a matrix has complex eigenvalues, it is not possible to find a basis of \(\real^n\) consisting of eigenvectors, which means that the matrix is not diagonalizable. In this case, however, we can still relate the matrix to a simpler form that explains the geometric effect this matrix has on vectors.

Definition4.3.11.

We say that \(A\) is similar to \(B\) if there is an invertible matrix \(P\) such that \(A = PBP^{-1}\text{.}\)

Notice that a matrix is diagonalizable if and only if it is similar to a diagonal matrix. In case a matrix \(A\) has complex eigenvalues, we will find a simpler matrix \(C\) that is similar to \(A\) and note that \(A=PCP^{-1}\) has the same effect, when viewed in the basis defined by the columns of \(P\text{,}\) as \(C\text{,}\) when viewed in the standard basis.

To begin, suppose that \(A\) is a \(2\times2\) matrix having a complex eigenvalue \(\lambda = a+bi\text{.}\) It turns out that \(A\) is similar to \(C=\begin{bmatrix}
a \amp -b \\
b \amp a \\
\end{bmatrix}
\text{.}\)

The next activity shows that \(C\) has a simple geometric effect on \(\real^2\text{.}\) First, however, we will use polar coordinates to rewrite \(C\text{.}\) As shown in the figure, the point \((a,b)\) defines \(r\text{,}\) the distance from the origin, and \(\theta\text{,}\) the angle formed with the positive horizontal axis. We then have

\begin{equation*}
\begin{aligned}
a \amp {}={} r\cos\theta \\
b \amp {}={} r\sin\theta\text{.} \\
\end{aligned}
\end{equation*}

Notice that the Pythagorean theorem says that \(r=\sqrt{a^2+b^2}\text{.}\)

Activity4.3.5.

We begin by rewriting \(C\) in terms of \(r\) and \(\theta\) and noting that

\begin{equation*}
C =
\left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right]
=
\left[\begin{array}{rr}
r\cos\theta \amp -r\sin\theta \\
r\sin\theta \amp r\cos\theta \\
\end{array}\right]
=
\left[\begin{array}{rr}
r \amp 0 \\
0 \amp r \\
\end{array}\right]
\left[\begin{array}{rr}
\cos\theta \amp -\sin\theta \\
\sin\theta \amp \cos\theta \\
\end{array}\right].
\end{equation*}

Explain why \(C\) has the geometric effect of rotating vectors by \(\theta\) and scaling them by a factor of \(r\text{.}\)

whose eigenvalues are \(\lambda_1 = 1+i\) and \(\lambda_2 =
1-i\text{.}\) We will choose to focus on one of the eigenvalues \(\lambda_1 = a+bi= 1+i. \)

Form the matrix \(C\) using these values of \(a\) and \(b\text{.}\) Then rewrite the point \((a,b)\) in polar coordinates by identifying the values of \(r\) and \(\theta\text{.}\) Explain the geometric effect of multiplying vectors by \(C\text{.}\)

Suppose that \(P=\left[\begin{array}{rr}
1 \amp 1 \\
2 \amp 1 \\
\end{array}\right]
\text{.}\) Verify that \(A = PCP^{-1}\text{.}\)

Explain why \(A^k = PC^kP^{-1}\text{.}\)

We formed the matrix \(C\) by choosing the eigenvalue \(\lambda_1=1+i\text{.}\) Suppose we had instead chosen \(\lambda_2 = 1-i\text{.}\) Form the matrix \(C'\) and use polar coordinates to describe the geometric effect of \(C\text{.}\)

Using the matrix \(P' = \left[\begin{array}{rr}
1 \amp -1 \\
2 \amp -1 \\
\end{array}\right]
\text{,}\) show that \(A = P'C'P'^{-1}\text{.}\)

If the \(2\times2\) matrix \(A\) has a complex eigenvalue \(\lambda = a + bi\text{,}\) it turns out that \(A\) is always similar to the matrix \(C = \left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right],\) whose geometric effect on vectors can be described in terms of a rotation and a scaling. There is, in fact, a method for finding the matrix \(P\) so that \(A=PCP^{-1}\) that we’ll see in Exercise 4.3.5.8. For now, we note that \(A\) has the same geometric effect as \(C\text{,}\) when viewed in the basis provided by the columns of \(P\text{.}\) We will put this fact to use in the next section to understand certain dynamical systems.

Proposition4.3.12.

If \(A\) is a \(2\times2\) matrix with a complex eigenvalue \(\lambda = a + bi\text{,}\) then \(A\) is similar to \(C = \begin{bmatrix}
a \amp -b \\
b \amp a \\
\end{bmatrix}\text{;}\) that is, there is a matrix \(P\) such that \(A= PCP^{-1}\text{.}\)

Subsection4.3.4Summary

Our goal in this section has been to use the eigenvalues and eigenvectors of a matrix \(A\) to relate \(A\) to a simpler matrix.

We said that \(A\) is diagonalizable if we can write \(A = PDP^{-1}\) where \(D\) is a diagonal matrix. The columns of \(P\) consist of eigenvectors of \(A\) and the diagonal entries of \(D\) are the associated eigenvalues.

An \(n\times n\) matrix \(A\) is diagonalizable if and only if there is a basis of \(\real^n\) consisting of eigenvectors of \(A\text{.}\)

We said that \(A\) and \(B\) are similar if there is an invertible matrix \(P\) such that \(A=PBP^{-1}\text{.}\) In this case, \(A^k = PB^kP^{-1}\text{.}\)

If \(A\) is a \(2\times2\) matrix with complex eigenvalue \(\lambda = a+bi\text{,}\) then \(A\) is similar to \(C = \left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array} \right]
\text{.}\) Writing the point \((a,b)\) in polar coordinates \(r\) and \(\theta\text{,}\) we see that \(C\) rotates vectors through an angle \(\theta\) and scales them by a factor of \(r=\sqrt{a^2+b^2}\text{.}\)

Exercises4.3.5Exercises

1.

Determine whether the following matrices are diagonalizable. If so, find matrices \(D\) and \(P\) such that \(A=PDP^{-1}\text{.}\)

We say that \(A\) is similar to \(B\) if there is a matrix \(P\) such that \(A = PBP^{-1}\text{.}\)

If \(A\) is similar to \(B\text{,}\) explain why \(B\) is similar to \(A\text{.}\)

If \(A\) is similar to \(B\) and \(B\) is similar to \(C\text{,}\) explain why \(A\) is similar to \(C\text{.}\)

If \(A\) is similar to \(B\) and \(B\) is diagonalizable, explain why \(A\) is diagonalizable.

If \(A\) and \(B\) are similar, explain why \(A\) and \(B\) have the same characteristic polynomial; that is, explain why \(\det(A-\lambda I) =
\det(B-\lambda I)\text{.}\)

If \(A\) and \(B\) are similar, explain why \(A\) and \(B\) have the same eigenvalues.

Explain the geometric effect that \(D\) has on vectors in \(\real^2\text{.}\)

Explain the geometric effect that \(A\) has on vectors in \(\real^2\text{.}\)

What can you say about \(A^2\) and other powers of \(A\text{?}\)

Is \(A\) invertible?

8.

When \(A\) is a \(2\times2\) matrix with a complex eigenvalue \(\lambda = a+bi\text{,}\) we have said that there is a matrix \(P\) such that \(A=PCP^{-1}\) where \(C=\left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right]
\text{.}\) In this exercise, we will learn how to find the matrix \(P\text{.}\) As an example, we will consider the matrix \(A = \left[\begin{array}{rr}
2 \amp 2 \\
-1 \amp 4 \\
\end{array}\right]
\text{.}\)

Show that the eigenvalues of \(A\) are complex.

Choose one of the complex eigenvalues \(\lambda=a+bi\) and construct the usual matrix \(C\text{.}\)

Using the same eigenvalue, we will find an eigenvector \(\vvec\) where the entries of \(\vvec\) are complex numbers. As always, we will describe \(\nul(A-\lambda I)\) by constructing the matrix \(A-\lambda I\) and finding its reduced row echelon form. In doing so, we will necessarily need to use complex arithmetic.

We have now found a complex eigenvector \(\vvec\text{.}\) Write \(\vvec = \vvec_1 - i \vvec_2\) to identify vectors \(\vvec_1\) and \(\vvec_2\) having real entries.

Construct the matrix \(P = \left[\begin{array}{rr}
\vvec_1 \amp \vvec_2
\end{array}\right]\) and verify that \(A=PCP^{-1}\text{.}\)

9.

For each of the following matrices, sketch the vector \(\xvec = \twovec{1}{0}\) and powers \(A^k\xvec\) for \(k=1,2,3,4\text{.}\)

Consider a matrix of the form \(C=\left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right]\) with \(r=\sqrt{a^2+b^2}\text{.}\) What happens when \(k\) becomes very large when

\(r \lt 1\text{.}\)

\(r = 1\text{.}\)

\(r \gt 1\text{.}\)

10.

For each of the following matrices and vectors, sketch the vector \(\xvec\) along with \(A^k\xvec\) for \(k=1,2,3,4\text{.}\)

Find the eigenvalues and eigenvectors of \(A\) to create your sketch.

If \(A\) is a \(2\times2\) matrix with eigenvalues \(\lambda_1=0.7\) and \(\lambda_2=0.5\) and \(\xvec\) is any vector, what happens to \(A^k\xvec\) when \(k\) becomes very large?