In this section, we will revisit the theory of eigenvalues and eigenvectors for the special class of matrices that are symmetric, meaning that the matrix equals its transpose. This understanding of symmetric matrices will enable us to form singular value decompositions later in the chapter. We’ll also begin studying variance in this section as it provides an important context that motivates some of our later work.
To begin, remember that if is a square matrix, we say that is an eigenvector of with associated eigenvalue if . In other words, for these special vectors, the operation of matrix multiplication simplifies to scalar multiplication.
The preview activity asks us to compare the matrix transformations defined by two matrices, a diagonal matrix and a matrix whose eigenvectors are given to us. The transformation defined by stretches horizontally by a factor of 3 and reflects in the horizontal axis, as shown in Figure 7.1.2
By contrast, the transformation defined by stretches the plane by a factor of 3 in the direction of and reflects in the line defined by , as seen in Figure 7.1.3.
In this way, we see that the matrix transformations defined by these two matrices are equivalent after a rotation. This notion of equivalence is what we called similarity in Section 4.3. There we considered a square matrix that provided enough eigenvectors to form a basis of . For example, suppose we can construct a basis for using eigenvectors having associated eigenvalues . Forming the matrices,
Notice that the matrix has eigenvectors and that not only form a basis for but, in fact, form an orthogonal basis for . Given the prominent role played by orthogonal bases in the last chapter, we would like to understand what conditions on a matrix enable us to form an orthogonal basis of eigenvectors.
Remember that the Sage command A.right_eigenmatrix() attempts to find a basis for consisting of eigenvectors of . In particular, the assignment D, P = A.right_eigenmatrix() provides a diagonal matrix constructed from the eigenvalues of with the columns of containing the associated eigenvectors.
For each of the following matrices, determine whether there is a basis for consisting of eigenvectors of that matrix. When there is such a basis, form the matrices and and verify that the matrix equals .
.
.
.
.
For which of these examples is it possible to form an orthogonal basis for consisting of eigenvectors?
For any such matrix , find an orthonormal basis of eigenvectors and explain why where is an orthogonal matrix.
The examples in this activity illustrate a range of possibilities. First, a matrix may have complex eigenvalues, in which case it will not be diagonalizable. Second, even if all the eigenvalues are real, there may not be a basis of eigenvalues if the dimension of one of the eigenspaces is less than the algebraic multiplicity of the associated eigenvalue.
We are interested in matrices for which there is an orthogonal basis of eigenvectors. When this happens, we can create an orthonormal basis of eigenvectors by scaling each eigenvector in the basis so that its length is 1. Putting these orthonormal vectors into a matrix produces an orthogonal matrix, which means that . We then have
If there is an orthonormal basis of consisting of eigenvectors of the matrix , we say that is orthogonally diagonalizable. In particular, we can write where is an orthogonal matrix.
Consider the matrix , which has eigenvectors , with associated eigenvalue , and , with associated eigenvalue . Notice that and are orthogonal so we can form an orthonormal basis of eigenvectors:
If , then there is an orthogonal basis of eigenvectors and with eigenvalues and . Using these eigenvectors, we form the orthogonal matrix consisting of eigenvectors and the diagonal matrix , where
Notice that the matrix transformation represented by is a rotation while that represented by is a rotation. Therefore, if we multiply a vector by , we can decompose the multiplication as
That is, we first rotate by , then apply the diagonal matrix , which stretches and reflects, and finally rotate by . We may visualize this factorization as in Figure 7.1.8.
Figure7.1.8.The transformation defined by can be interpreted as a sequence of geometric transformations: rotates by , stretches and reflects, and rotates by .
In fact, a similar picture holds any time the matrix is orthogonally diagonalizable.
We have seen that a matrix that is orthogonally diagonalizable must be symmetric. In fact, it turns out that any symmetric matrix is orthogonally diagonalizable. We record this fact in the next theorem.
Each of the following matrices is symmetric so the Spectral Theorem tells us that each is orthogonally diagonalizable. The point of this activity is to find an orthogonal diagonalization for each matrix.
To begin, find a basis for each eigenspace. Use this basis to find an orthogonal basis for each eigenspace and put these bases together to find an orthogonal basis for consisting of eigenvectors. Use this basis to write an orthogonal diagonalization of the matrix.
We won’t justify the first two facts here since that would take us rather far afield. However, it will be helpful to explain the third fact. To begin, notice the following:
Many of the ideas we’ll encounter in this chapter, such as orthogonal diagonalizations, can be applied to the study of data. In fact, it can be useful to understand these applications because they provide an important context in which mathematical ideas have a more concrete meaning and their motivation appears more clearly. For that reason, we will now introduce the statistical concept of variance as a way to gain insight into the significance of orthogonal diagonalizations.
Find the centroid, or mean, . Then plot the data points and their centroid in Figure 7.1.12.
Figure7.1.12.Plot the data points and their centroid here.
Notice that the centroid lies in the center of the data so the spread of the data will be measured by how far away the points are from the centroid. To simplify our calculations, find the demeaned data points
Now that the data has been demeaned, we will define the total variance as the average of the squares of the distances from the origin; that is, the total variance is
Find the total variance for our set of three points.
Now plot the projections of the demeaned data onto the and axes using Figure 7.1.14 and find the variances and of the projected points.
Figure7.1.14.Plot the projections of the demeaned data onto the and axes.
Which of the variances, and , is larger and how does the plot of the projected points explain your response?
What do you notice about the relationship between ,, and ? How does the Pythagorean theorem explain this relationship?
Plot the projections of the demeaned data points onto the lines defined by vectors and using Figure 7.1.15 and find the variances and of these projected points.
Figure7.1.15.Plot the projections of the deameaned data onto the lines defined by and .
What is the relationship between the total variance and and ? How does the Pythagorean theorem explain your response?
Notice that variance enjoys an additivity property. Consider, for instance, the situation where our data points are two-dimensional and suppose that the demeaned points are . We have
More generally, suppose that we have an orthonormal basis and . If we project the demeaned points onto the line defined by , we obtain the points so that
since . When we average over all the data points, we find that the total variance is the sum of the variances in the and directions. This leads to the following proposition, in which this observation is expressed more generally.
The next activity demonstrates a more efficient way to find the variance in a particular direction and connects our discussion of variance with symmetric matrices.
In general, the matrix is called the covariance matrix of the dataset, and it is useful because the variance , as we have just seen. Find the matrix for our dataset with three points.
xxxxxxxxxx
1
Messages
Use the covariance matrix to find the variance when .
Use the covariance matrix to find the variance when . Since and are orthogonal, verify that the sum of and gives the total variance.
Explain why the covariance matrix is a symmetric matrix.
which tells us that is symmetric. In particular, we know that it is orthogonally diagonalizable, an observation that will play an important role in the future.
If is the covariance matrix associated to a demeaned dataset and is a unit vector, then the variance of the demeaned points projected onto the line defined by is
Our goal in the future will be to find directions where the variance is as large as possible and directions where it is as small as possible. The next activity demonstrates why this is useful.
Evaluating the following Sage cell loads a dataset consisting of 100 demeaned data points and provides a plot of them. It also provides the demeaned data matrix .
This activity illustrates how variance can identify a line along which the data are concentrated. When the data primarily lie along a line defined by a vector , then the variance in that direction will be large while the variance in an orthogonal direction will be small.
In the next section, we will use an orthogonal diagonalization of the covariance matrix to find the directions having the greatest and smallest variances. In this way, we will be able to determine when data are concentrated along a line or subspace.
A matrix is orthogonally diagonalizable if there is an orthonormal basis of eigenvectors. In particular, we can write , where is a diagonal matrix of eigenvalues and is an orthogonal matrix of eigenvectors.
The Spectral Theorem tells us that a matrix is orthogonally diagonalizable if and only if it is symmetric; that is, .
The variance of a dataset can be computed using the covariance matrix , where is the matrix of demeaned data points. In particular, the variance of the demeaned data points projected onto the line defined by the unit vector is .
Variance is additive so that if is a subspace with orthonormal basis , then
For each of the following matrices, find the eigenvalues and a basis for each eigenspace. Determine whether the matrix is diagonalizable and, if so, find a diagonalization. Determine whether the matrix is orthogonally diagonalizable and, if so, find an orthogonal diagonalization.