Section4.3Diagonalization, similarity, and powers of a matrix
The first example we considered in this chapter was the matrix \(A=\left[\begin{array}{rr}
1 \amp 2 \\
2 \amp 1 \\
\end{array}\right]
\text{,}\) which has eigenvectors \(\vvec_1=\twovec{1}{1}\) and \(\vvec_2 = \twovec{-1}{1}\) and associated eigenvalues \(\lambda_1=3\) and \(\lambda_2=-1\text{.}\) In SubsectionΒ 4.1.2, we described how \(A\) is, in some sense, equivalent to the diagonal matrix \(D = \left[\begin{array}{rr}
3 \amp 0 \\
0 \amp -1\\
\end{array}\right]
\text{.}\)
This equivalence is summarized by FigureΒ 4.3.1. The diagonal matrix \(D\) has the geometric effect of stretching vectors horizontally by a factor of \(3\) and flipping vectors vertically. The matrix \(A\) has the geometric effect of stretching vectors by a factor of \(3\) in the \(\vvec_1\) direction and flipping them in the \(\vvec_2\) direction. That is, the geometric effect of \(A\) is the same as that of \(D\) when viewed in a basis of eigenvectors of \(A\text{.}\)
Our goal in this section is to express this geometric observation in algebraic terms. In doing so, we will make precise the sense in which \(A\) and \(D\) are equivalent.
Remember that matrix-vector multiplication constructs linear combinations of the columns of the matrix. For instance, if \(A = \begin{bmatrix}
\avec_1 \amp \avec_2 \end{bmatrix}\text{,}\) express the product \(A\twovec2{-3}\) in terms of \(\avec_1\) and \(\avec_2\text{.}\)
Next, remember how matrix-matrix multiplication is defined. Suppose that we have matrices \(A\) and \(B\) and that \(B = \begin{bmatrix}
\bvec_1 \amp \bvec_2 \end{bmatrix}\text{.}\) How can we express the matrix product \(AB\) in terms of the columns of \(B\text{?}\)
Suppose that \(A\) is a matrix having eigenvectors \(\vvec_1\) and \(\vvec_2\) with associated eigenvalues \(\lambda_1 = 4\) and \(\lambda_2 =
-1\text{.}\) Express the product \(A(2\vvec_1+3\vvec_2)\) in terms of \(\vvec_1\) and \(\vvec_2\text{.}\)
Suppose that \(A\) is the matrix from the previous part and that \(P=\begin{bmatrix} \vvec_1 \amp \vvec_2
\end{bmatrix}\text{.}\) What is the matrix product
\begin{equation*}
AP = A\begin{bmatrix}
\vvec_1 \amp \vvec_2
\end{bmatrix}?
\end{equation*}
When working with an \(n\times n\) matrix \(A\text{,}\)SubsectionΒ 4.1.2 demonstrated the value of having a basis of \(\real^n\) consisting of eigenvectors of \(A\text{.}\) In fact, PropositionΒ 4.2.9 tells us that if the eigenvalues of \(A\) are real and distinct, then there is a such a basis. As weβll see later, there are other conditions on \(A\) that guarantee a basis of eigenvectors. For now, suffice it to say that we can find a basis of eigenvectors for many matrices. With this assumption, we will see how the matrix \(A\) is equivalent to a diagonal matrix \(D\text{.}\)
Suppose that \(A\) is a \(2\times2\) matrix having eigenvectors \(\vvec_1\) and \(\vvec_2\) with associated eigenvalues \(\lambda_1=3\) and \(\lambda_2 = -6\text{.}\) Because the eigenvalues are real and distinct, we know by PropositionΒ 4.2.9 that these eigenvectors form a basis of \(\real^2\text{.}\)
What are the products \(A\vvec_1\) and \(A\vvec_2\) in terms of \(\vvec_1\) and \(\vvec_2\text{?}\)
If we form the matrix \(P = \begin{bmatrix}
\vvec_1 \amp \vvec_2
\end{bmatrix}
\text{,}\) what is the product \(AP\) in terms of \(\vvec_1\) and \(\vvec_2\text{?}\)
Use the eigenvalues to form the diagonal matrix \(D
= \begin{bmatrix}
3 \amp 0 \\
0 \amp -6
\end{bmatrix}\) and determine the product \(PD\) in terms of \(\vvec_1\) and \(\vvec_2\text{.}\)
The results from the previous two parts of this activity demonstrate that \(AP=PD\text{.}\) Using the fact that the eigenvectors \(\vvec_1\) and \(\vvec_2\) form a basis of \(\real^2\text{,}\) explain why \(P\) is invertible and that we must have \(A=PDP^{-1}\text{.}\)
Suppose that \(A=\begin{bmatrix}
-3 \amp 6 \\
3 \amp 0 \\
\end{bmatrix}\text{.}\) Verify that \(\vvec_1=\twovec11\) and \(\vvec_2=\twovec2{-1}\) are eigenvectors of \(A\) with eigenvalues \(\lambda_1 = 3\) and \(\lambda_2=-6\text{.}\)
More generally, suppose that we have an \(n\times n\) matrix \(A\) and that there is a basis of \(\real^n\) consisting of eigenvectors \(\vvec_1,\vvec_2,\ldots,\vvec_n\) of \(A\) with associated eigenvalues \(\lambda_1,\lambda_2,\ldots,\lambda_n\text{.}\) If we use the eigenvectors to form the matrix
If \(A\) is an \(n\times n\) matrix and there is a basis \(\{\vvec_1,\vvec_2,\ldots,\vvec_n\}\) of \(\real^n\) consisting of eigenvectors of \(A\) having associated eigenvalues \(\lambda_1, \lambda_2, \ldots, \lambda_n\text{,}\) then we can write \(A=PDP^{-1}\) where \(D\) is the diagonal matrix whose diagonal entries are the eigenvalues of \(A\)
We have seen that \(A = \begin{bmatrix}
1 \amp 2 \\
2 \amp 1 \\
\end{bmatrix}\) has eigenvectors \(\vvec_1 = \twovec11\) and \(\vvec_2=\twovec{-1}1\) with associated eigenvalues \(\lambda_1 = 3\) and \(\lambda_2 = -1\text{.}\) Forming the matrices
This is the sense in which we mean that \(A\) is equivalent to a diagonal matrix \(D\text{.}\) The expression \(A=PDP^{-1}\) says that \(A\text{,}\) expressed in the basis defined by the columns of \(P\text{,}\) has the same geometric effect as \(D\text{,}\) expressed in the standard basis \(\evec_1, \evec_2,\ldots,\evec_n\text{.}\)
By constructing \(\nul(A-(-2)I)\text{,}\) we find a basis for \(E_{-2}\) consisting of the vector \(\vvec_1 =
\twovec{2}{1}\text{.}\) Similarly, a basis for \(E_1\) consists of the vector \(\vvec_2 = \twovec{1}{1}\text{.}\) This shows that we can construct a basis \(\{\vvec_1,\vvec_2\}\) of \(\real^2\) consisting of eigenvectors of \(A\text{.}\)
If we choose a different basis for the eigenspaces, we will also find a different matrix \(P\) that diagonalizes \(A\text{.}\) The point is that there are many ways in which \(A\) can be written in the form \(A=PDP^{-1}\text{.}\)
In fact, if we only know that \(A = PDP^{-1}\text{,}\) we can say that the columns of \(P\) are eigenvectors of \(A\) and that the diagonal entries of \(D\) are the associated eigenvalues.
The columns of \(P\) form eigenvectors of \(A\) so that \(\vvec_1 =
\twovec{1}{1}\) is an eigenvector of \(A\) with eigenvalue \(\lambda_1 = 2\) and \(\vvec_2 =
\twovec{1}{2}\) is an eigenvector with eigenvalue \(\lambda_2=-2\text{.}\)
In several earlier examples, we have been interested in computing powers of a given matrix. For instance, in ActivityΒ 4.1.3, we had the matrix \(A = \left[\begin{array}{rr}
0.8 \amp 0.6 \\
0.2 \amp 0.4 \\
\end{array}\right]\) and an initial vector \(\xvec_0=\ctwovec{1000}{0}\text{,}\) and we wanted to compute
In particular, we wanted to find \(\xvec_k=A^k\xvec_0\) and determine what happens as \(k\) becomes very large. If a matrix \(A\) is diagonalizable, writing \(A=PDP^{-1}\) can help us understand powers of \(A\) more easily.
Suppose that \(A\) is a matrix with eigenvector \(\vvec\) and associated eigenvalue \(\lambda\text{;}\) that is, \(A\vvec = \lambda\vvec\text{.}\) By considering \(A^2\vvec\text{,}\) explain why \(\vvec\) is also an eigenvector of \(A\) with eigenvalue \(\lambda^2\text{.}\)
Remembering that the columns of \(P\) are eigenvectors of \(A\text{,}\) explain why \(A^2\) is diagonalizable and find a diagonalization in terms of \(P\) and \(D\text{.}\)
Suppose that \(A\) is a diagonalizable \(2\times2\) matrix with eigenvalues \(\lambda_1 =
0.5\) and \(\lambda_2=0.1\text{.}\) What happens to \(A^k\) as \(k\) becomes very large?
Letβs revisit ActivityΒ 4.1.3 where we had the matrix \(A = \begin{bmatrix}
0.8 \amp 0.6 \\
0.2 \amp 0.4 \\
\end{bmatrix}\) and the initial vector \(\xvec_0 = \ctwovec{1000}0\text{.}\) We were interested in understanding the sequence of vectors \(\xvec_{k+1} = A\xvec_k\text{,}\) which means that \(\xvec_k =
A^k\xvec_0\text{.}\)
We can verify that \(\vvec_1 = \twovec31\) and \(\vvec_2 =
\twovec{-1}1\) are eigenvectors of \(A\) having associated eigenvalues \(\lambda_1=1\) and \(\lambda_2 =
0.2\text{.}\) This means that \(A = PDP^{-1}\) where
Notice that \(D^k = \begin{bmatrix}
1^k \amp 0 \\
0 \amp 0.2^k \\
\end{bmatrix}
= \begin{bmatrix}
1 \amp 0 \\
0 \amp 0.2^k
\end{bmatrix}
\text{.}\) As \(k\) increases, \(0.2^k\) becomes closer and closer to zero. This means that for very large powers \(k\text{,}\) we have
Beginning with the vector \(\xvec_0 = \ctwovec{1000}{0}\text{,}\) we find that \(\xvec_k = A^k\xvec_0\approx
\twovec{750}{250}\) when \(k\) is very large.
We have been interested in diagonalizing a matrix \(A\) because doing so relates a matrix \(A\) to a simpler diagonal matrix \(D\text{.}\) In particular, the effect of multiplying a vector by \(A=PDP^{-1}\text{,}\) viewed in the basis defined by the columns of \(P\text{,}\) is the same as the effect of multiplying by \(D\) in the standard basis.
While many matrices are diagonalizable, there are some that are not. For example, if a matrix has complex eigenvalues, it is not possible to find a basis of \(\real^n\) consisting of eigenvectors, which means that the matrix is not diagonalizable. In this case, however, we can still relate the matrix to a simpler form that explains the geometric effect this matrix has on vectors.
Notice that a matrix is diagonalizable if and only if it is similar to a diagonal matrix. In case a matrix \(A\) has complex eigenvalues, we will find a simpler matrix \(C\) that is similar to \(A\) and note that \(A=PCP^{-1}\) has the same effect, when viewed in the basis defined by the columns of \(P\text{,}\) as \(C\text{,}\) when viewed in the standard basis.
To begin, suppose that \(A\) is a \(2\times2\) matrix having a complex eigenvalue \(\lambda = a+bi\text{.}\) It turns out that \(A\) is similar to \(C=\begin{bmatrix}
a \amp -b \\
b \amp a \\
\end{bmatrix}
\text{.}\)
The next activity shows that \(C\) has a simple geometric effect on \(\real^2\text{.}\) First, however, we will use polar coordinates to rewrite \(C\text{.}\) As shown in the figure, the point \((a,b)\) defines \(r\text{,}\) the distance from the origin, and \(\theta\text{,}\) the angle formed with the positive horizontal axis. We then have
\begin{equation*}
\begin{aligned}
a \amp {}={} r\cos\theta \\
b \amp {}={} r\sin\theta\text{.} \\
\end{aligned}
\end{equation*}
Notice that the Pythagorean theorem says that \(r=\sqrt{a^2+b^2}\text{.}\)
whose eigenvalues are \(\lambda_1 = 1+i\) and \(\lambda_2 =
1-i\text{.}\) We will choose to focus on one of the eigenvalues \(\lambda_1 = a+bi= 1+i. \)
Form the matrix \(C\) using these values of \(a\) and \(b\text{.}\) Then rewrite the point \((a,b)\) in polar coordinates by identifying the values of \(r\) and \(\theta\text{.}\) Explain the geometric effect of multiplying vectors by \(C\text{.}\)
We formed the matrix \(C\) by choosing the eigenvalue \(\lambda_1=1+i\text{.}\) Suppose we had instead chosen \(\lambda_2 = 1-i\text{.}\) Form the matrix \(C'\) and use polar coordinates to describe the geometric effect of \(C\text{.}\)
If the \(2\times2\) matrix \(A\) has a complex eigenvalue \(\lambda = a + bi\text{,}\) it turns out that \(A\) is always similar to the matrix \(C = \left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right],\) whose geometric effect on vectors can be described in terms of a rotation and a scaling. There is, in fact, a method for finding the matrix \(P\) so that \(A=PCP^{-1}\) that weβll see in ExerciseΒ 4.3.5.8. For now, we note that \(A\) has the same geometric effect as \(C\text{,}\) when viewed in the basis provided by the columns of \(P\text{.}\) We will put this fact to use in the next section to understand certain dynamical systems.
If \(A\) is a \(2\times2\) matrix with a complex eigenvalue \(\lambda = a + bi\text{,}\) then \(A\) is similar to \(C = \begin{bmatrix}
a \amp -b \\
b \amp a \\
\end{bmatrix}\text{;}\) that is, there is a matrix \(P\) such that \(A= PCP^{-1}\text{.}\)
Our goal in this section has been to use the eigenvalues and eigenvectors of a matrix \(A\) to relate \(A\) to a simpler matrix.
We said that \(A\) is diagonalizable if we can write \(A = PDP^{-1}\) where \(D\) is a diagonal matrix. The columns of \(P\) consist of eigenvectors of \(A\) and the diagonal entries of \(D\) are the associated eigenvalues.
We said that \(A\) and \(B\) are similar if there is an invertible matrix \(P\) such that \(A=PBP^{-1}\text{.}\) In this case, \(A^k = PB^kP^{-1}\text{.}\)
If \(A\) is a \(2\times2\) matrix with complex eigenvalue \(\lambda = a+bi\text{,}\) then \(A\) is similar to \(C = \left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array} \right]
\text{.}\) Writing the point \((a,b)\) in polar coordinates \(r\) and \(\theta\text{,}\) we see that \(C\) rotates vectors through an angle \(\theta\) and scales them by a factor of \(r=\sqrt{a^2+b^2}\text{.}\)
If \(A\) and \(B\) are similar, explain why \(A\) and \(B\) have the same characteristic polynomial; that is, explain why \(\det(A-\lambda I) =
\det(B-\lambda I)\text{.}\)
When \(A\) is a \(2\times2\) matrix with a complex eigenvalue \(\lambda = a+bi\text{,}\) we have said that there is a matrix \(P\) such that \(A=PCP^{-1}\) where \(C=\left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right]
\text{.}\) In this exercise, we will learn how to find the matrix \(P\text{.}\) As an example, we will consider the matrix \(A = \left[\begin{array}{rr}
2 \amp 2 \\
-1 \amp 4 \\
\end{array}\right]
\text{.}\)
Using the same eigenvalue, we will find an eigenvector \(\vvec\) where the entries of \(\vvec\) are complex numbers. As always, we will describe \(\nul(A-\lambda I)\) by constructing the matrix \(A-\lambda I\) and finding its reduced row echelon form. In doing so, we will necessarily need to use complex arithmetic.
We have now found a complex eigenvector \(\vvec\text{.}\) Write \(\vvec = \vvec_1 - i \vvec_2\) to identify vectors \(\vvec_1\) and \(\vvec_2\) having real entries.
Consider a matrix of the form \(C=\left[\begin{array}{rr}
a \amp -b \\
b \amp a \\
\end{array}\right]\) with \(r=\sqrt{a^2+b^2}\text{.}\) What happens when \(k\) becomes very large when
If \(A\) is a \(2\times2\) matrix with eigenvalues \(\lambda_1=0.7\) and \(\lambda_2=0.5\) and \(\xvec\) is any vector, what happens to \(A^k\xvec\) when \(k\) becomes very large?