In this section, we will revisit the theory of eigenvalues and eigenvectors for the special class of matrices that are symmetric, meaning that the matrix equals its transpose. This understanding of symmetric matrices will enable us to form singular value decompositions later in the chapter. Weβll also begin studying variance in this section as it provides an important context that motivates some of our later work.
To begin, remember that if \(A\) is a square matrix, we say that \(\vvec\) is an eigenvector of \(A\) with associated eigenvalue \(\lambda\) if \(A\vvec=\lambda\vvec\text{.}\) In other words, for these special vectors, the operation of matrix multiplication simplifies to scalar multiplication.
The preview activity asks us to compare the matrix transformations defined by two matrices, a diagonal matrix \(D\) and a matrix \(A\) whose eigenvectors are given to us. The transformation defined by \(D\) stretches horizontally by a factor of 3 and reflects in the horizontal axis, as shown in FigureΒ 7.1.2
By contrast, the transformation defined by \(A\) stretches the plane by a factor of 3 in the direction of \(\vvec_1\) and reflects in the line defined by \(\vvec_1\text{,}\) as seen in FigureΒ 7.1.3.
In this way, we see that the matrix transformations defined by these two matrices are equivalent after a \(45^\circ\) rotation. This notion of equivalence is what we called similarity in SectionΒ 4.3. There we considered a square \(m\times m\) matrix \(A\) that provided enough eigenvectors to form a basis of \(\real^m\text{.}\) For example, suppose we can construct a basis for \(\real^m\) using eigenvectors \(\vvec_1,\vvec_2,\ldots,\vvec_m\) having associated eigenvalues \(\lambda_1,\lambda_2,\ldots,\lambda_m\text{.}\) Forming the matrices,
Notice that the matrix \(A\) has eigenvectors \(\vvec_1\) and \(\vvec_2\) that not only form a basis for \(\real^2\) but, in fact, form an orthogonal basis for \(\real^2\text{.}\) Given the prominent role played by orthogonal bases in the last chapter, we would like to understand what conditions on a matrix enable us to form an orthogonal basis of eigenvectors.
Remember that the Sage command A.right_eigenmatrix() attempts to find a basis for \(\real^m\) consisting of eigenvectors of \(A\text{.}\) In particular, the assignment D, P = A.right_eigenmatrix() provides a diagonal matrix \(D\) constructed from the eigenvalues of \(A\) with the columns of \(P\) containing the associated eigenvectors.
For each of the following matrices, determine whether there is a basis for \(\real^2\) consisting of eigenvectors of that matrix. When there is such a basis, form the matrices \(P\) and \(D\) and verify that the matrix equals \(PDP^{-1}\text{.}\)
The examples in this activity illustrate a range of possibilities. First, a matrix may have complex eigenvalues, in which case it will not be diagonalizable. Second, even if all the eigenvalues are real, there may not be a basis of eigenvalues if the dimension of one of the eigenspaces is less than the algebraic multiplicity of the associated eigenvalue.
We are interested in matrices for which there is an orthogonal basis of eigenvectors. When this happens, we can create an orthonormal basis of eigenvectors by scaling each eigenvector in the basis so that its length is 1. Putting these orthonormal vectors into a matrix \(Q\) produces an orthogonal matrix, which means that \(Q^T=Q^{-1}\text{.}\) We then have
\begin{equation*}
A = QDQ^{-1} = QDQ^T.
\end{equation*}
In this case, we say that \(A\) is orthogonally diagonalizable.
If there is an orthonormal basis of \(\real^n\) consisting of eigenvectors of the matrix \(A\text{,}\) we say that \(A\) is orthogonally diagonalizable. In particular, we can write \(A=QDQ^T\) where \(Q\) is an orthogonal matrix.
Consider the matrix \(A =
\begin{bmatrix}
-2 \amp 36 \\
36 \amp -23
\end{bmatrix}
\text{,}\) which has eigenvectors \(\vvec_1 = \twovec43\text{,}\) with associated eigenvalue \(\lambda_1=25\text{,}\) and \(\vvec_2=\twovec{3}{-4}\text{,}\) with associated eigenvalue \(\lambda_2=-50\text{.}\) Notice that \(\vvec_1\) and \(\vvec_2\) are orthogonal so we can form an orthonormal basis of eigenvectors:
If \(A = \begin{bmatrix}
1 \amp 2 \\
2 \amp 1 \\
\end{bmatrix}
\text{,}\) then there is an orthogonal basis of eigenvectors \(\vvec_1 = \twovec11\) and \(\vvec_2 =
\twovec{-1}1\) with eigenvalues \(\lambda_1=3\) and \(\lambda_2=-1\text{.}\) Using these eigenvectors, we form the orthogonal matrix \(Q\) consisting of eigenvectors and the diagonal matrix \(D\text{,}\) where
Notice that the matrix transformation represented by \(Q\) is a \(45^\circ\) rotation while that represented by \(Q^T=Q^{-1}\) is a \(-45^\circ\) rotation. Therefore, if we multiply a vector \(\xvec\) by \(A\text{,}\) we can decompose the multiplication as
That is, we first rotate \(\xvec\) by \(-45^\circ\text{,}\) then apply the diagonal matrix \(D\text{,}\) which stretches and reflects, and finally rotate by \(45^\circ\text{.}\) We may visualize this factorization as in FigureΒ 7.1.8.
Figure7.1.8.The transformation defined by \(A=QDQ^T\) can be interpreted as a sequence of geometric transformations: \(Q^T\) rotates by \(-45^\circ\text{,}\)\(D\) stretches and reflects, and \(Q\) rotates by \(45^\circ\text{.}\)
We have seen that a matrix that is orthogonally diagonalizable must be symmetric. In fact, it turns out that any symmetric matrix is orthogonally diagonalizable. We record this fact in the next theorem.
Each of the following matrices is symmetric so the Spectral Theorem tells us that each is orthogonally diagonalizable. The point of this activity is to find an orthogonal diagonalization for each matrix.
To begin, find a basis for each eigenspace. Use this basis to find an orthogonal basis for each eigenspace and put these bases together to find an orthogonal basis for \(\real^m\) consisting of eigenvectors. Use this basis to write an orthogonal diagonalization of the matrix.
Consider the matrix \(A = B^TB\) where \(B = \begin{bmatrix}
0 \amp 1 \amp 2 \\
2 \amp 0 \amp 1
\end{bmatrix}
\text{.}\) Explain how we know that \(A\) is symmetric and then find an orthogonal diagonalization of \(A\text{.}\)
As the examples in ActivityΒ 7.1.3 illustrate, the Spectral Theorem implies a number of things. Namely, if \(A\) is a symmetric \(m\times m\) matrix, then
We wonβt justify the first two facts here since that would take us rather far afield. However, it will be helpful to explain the third fact. To begin, notice the following:
Suppose a symmetric matrix \(A\) has eigenvectors \(\vvec_1\text{,}\) with associated eigenvalue \(\lambda_1=3\text{,}\) and \(\vvec_2\text{,}\) with associated eigenvalue \(\lambda_2 = 10\text{.}\) Notice that
Many of the ideas weβll encounter in this chapter, such as orthogonal diagonalizations, can be applied to the study of data. In fact, it can be useful to understand these applications because they provide an important context in which mathematical ideas have a more concrete meaning and their motivation appears more clearly. For that reason, we will now introduce the statistical concept of variance as a way to gain insight into the significance of orthogonal diagonalizations.
Notice that the centroid lies in the center of the data so the spread of the data will be measured by how far away the points are from the centroid. To simplify our calculations, find the demeaned data points
Now that the data has been demeaned, we will define the total variance as the average of the squares of the distances from the origin; that is, the total variance is
\begin{equation*}
V = \frac 1N\sum_j~|\dtil_j|^2.
\end{equation*}
Find the total variance \(V\) for our set of three points.
Now plot the projections of the demeaned data onto the \(x\) and \(y\) axes using FigureΒ 7.1.14 and find the variances \(V_x\) and \(V_y\) of the projected points.
What do you notice about the relationship between \(V\text{,}\)\(V_x\text{,}\) and \(V_y\text{?}\) How does the Pythagorean theorem explain this relationship?
Plot the projections of the demeaned data points onto the lines defined by vectors \(\vvec_1=\twovec11\) and \(\vvec_2=\twovec{-1}1\) using FigureΒ 7.1.15 and find the variances \(V_{\vvec_1}\) and \(V_{\vvec_2}\) of these projected points.
What is the relationship between the total variance \(V\) and \(V_{\vvec_1}\) and \(V_{\vvec_2}\text{?}\) How does the Pythagorean theorem explain your response?
Notice that variance enjoys an additivity property. Consider, for instance, the situation where our data points are two-dimensional and suppose that the demeaned points are \(\dtil_j=\twovec{\widetilde{x}_j}{\widetilde{y}_j}\text{.}\) We have
More generally, suppose that we have an orthonormal basis \(\uvec_1\) and \(\uvec_2\text{.}\) If we project the demeaned points onto the line defined by \(\uvec_1\text{,}\) we obtain the points \((\dtil_j\cdot\uvec_1)\uvec_1\) so that
since \(\uvec_1\cdot\uvec_2 = 0\text{.}\) When we average over all the data points, we find that the total variance \(V\) is the sum of the variances in the \(\uvec_1\) and \(\uvec_2\) directions. This leads to the following proposition, in which this observation is expressed more generally.
If \(W\) is a subspace with orthonormal basis \(\uvec_1\text{,}\)\(\uvec_2\text{,}\)\(\ldots\text{,}\)\(\uvec_n\text{,}\) then the variance of the points projected onto \(W\) is the sum of the variances in the \(\uvec_j\) directions:
The next activity demonstrates a more efficient way to find the variance \(V_{\uvec}\) in a particular direction and connects our discussion of variance with symmetric matrices.
In general, the matrix \(C=\frac1N~AA^T\) is called the covariance matrix of the dataset, and it is useful because the variance \(V_{\uvec} =
\uvec\cdot(C\uvec)\text{,}\) as we have just seen. Find the matrix \(C\) for our dataset with three points.
Use the covariance matrix to find the variance \(V_{\uvec_2}\) when \(\uvec_2=\twovec{-2/\sqrt{5}}{1/\sqrt{5}}\text{.}\) Since \(\uvec_1\) and \(\uvec_2\) are orthogonal, verify that the sum of \(V_{\uvec_1}\) and \(V_{\uvec_2}\) gives the total variance.
This activity introduced the covariance matrix of a dataset, which is defined to be \(C=\frac1N~AA^T\) where \(A\) is the matrix of demeaned data points. Notice that
which tells us that \(C\) is symmetric. In particular, we know that it is orthogonally diagonalizable, an observation that will play an important role in the future.
If \(C\) is the covariance matrix associated to a demeaned dataset and \(\uvec\) is a unit vector, then the variance of the demeaned points projected onto the line defined by \(\uvec\) is
Our goal in the future will be to find directions \(\uvec\) where the variance is as large as possible and directions where it is as small as possible. The next activity demonstrates why this is useful.
Evaluating the following Sage cell loads a dataset consisting of 100 demeaned data points and provides a plot of them. It also provides the demeaned data matrix \(A\text{.}\)
In approximately what direction is the variance greatest? Choose a reasonable vector \(\uvec\) that points in approximately that direction and find \(V_{\uvec}\text{.}\)
In approximately what direction is the variance smallest? Choose a reasonable vector \(\wvec\) that points in approximately that direction and find \(V_{\wvec}\text{.}\)
This activity illustrates how variance can identify a line along which the data are concentrated. When the data primarily lie along a line defined by a vector \(\uvec_1\text{,}\) then the variance in that direction will be large while the variance in an orthogonal direction \(\uvec_2\) will be small.
Remember that variance is additive, according to PropositionΒ 7.1.16, so that if \(\uvec_1\) and \(\uvec_2\) are orthogonal unit vectors, then the total variance is
\begin{equation*}
V = V_{\uvec_1} + V_{\uvec_2}.
\end{equation*}
Therefore, if we choose \(\uvec_1\) to be the direction where \(V_{\uvec_1}\) is a maximum, then \(V_{\uvec_2}\) will be a minimum.
In the next section, we will use an orthogonal diagonalization of the covariance matrix \(C\) to find the directions having the greatest and smallest variances. In this way, we will be able to determine when data are concentrated along a line or subspace.
This section explored both symmetric matrices and variance. In particular, we saw that
A matrix \(A\) is orthogonally diagonalizable if there is an orthonormal basis of eigenvectors. In particular, we can write \(A=QDQ^T\text{,}\) where \(D\) is a diagonal matrix of eigenvalues and \(Q\) is an orthogonal matrix of eigenvectors.
The variance of a dataset can be computed using the covariance matrix \(C=\frac1N~AA^T\text{,}\) where \(A\) is the matrix of demeaned data points. In particular, the variance of the demeaned data points projected onto the line defined by the unit vector \(\uvec\) is \(V_{\uvec} = \uvec\cdot C\uvec\text{.}\)
For each of the following matrices, find the eigenvalues and a basis for each eigenspace. Determine whether the matrix is diagonalizable and, if so, find a diagonalization. Determine whether the matrix is orthogonally diagonalizable and, if so, find an orthogonal diagonalization.
Suppose that \(\uvec\) is an eigenvector of \(B\) with associated eigenvalue \(\lambda\) and that \(\uvec\) has unit length. Explain why \(\lambda =
\len{A\uvec}^2\text{.}\)
Suppose that \(C\) is the covariance matrix of a demeaned dataset.
Suppose that \(\uvec\) is an eigenvector of \(C\) with associated eigenvalue \(\lambda\) and that \(\uvec\) has unit length. Explain why \(V_{\uvec} = \lambda\text{.}\)