Skip to main content
Logo image

Section 7.4 Singular Value Decompositions

The Spectral Theorem has motivated the past few sections. In particular, we applied the fact that symmetric matrices can be orthogonally diagonalized to simplify quadratic forms, which enabled us to use principal component analysis to reduce the dimension of a dataset.
But what can we do with matrices that are not symmetric or even square? For instance, the following matrices are not diagonalizable, much less orthogonally so:
\begin{equation*} \begin{bmatrix} 2 \amp 1 \\ 0 \amp 2 \end{bmatrix}, \hspace{24pt} \begin{bmatrix} 1 \amp 1 \amp 0 \\ -1 \amp 0 \amp 1 \end{bmatrix}. \end{equation*}
In this section, we will develop a description of matrices called the singular value decomposition that is, in many ways, analogous to an orthogonal diagonalization. For example, we have seen that any symmetric matrix can be written in the form \(QDQ^T\) where \(Q\) is an orthogonal matrix and \(D\) is diagonal. A singular value decomposition will have the form \(U\Sigma V^T\) where \(U\) and \(V\) are orthogonal and \(\Sigma\) is diagonal. Most notably, we will see that every matrix has a singular value decomposition whether it’s symmetric or not.

Preview Activity 7.4.1.

Let’s review orthogonal diagonalizations and quadratic forms as our understanding of singular value decompositions will rely on them.
  1. Suppose that \(A\) is any matrix. Explain why the matrix \(G = A^TA\) is symmetric.
  2. Suppose that \(A = \begin{bmatrix} 1 \amp 2 \\ -2 \amp -1 \\ \end{bmatrix}\text{.}\) Find the matrix \(G=A^TA\) and write out the quadratic form \(q_G\left(\twovec{x_1}{x_2}\right)\) as a function of \(x_1\) and \(x_2\text{.}\)
  3. What is the maximum value of \(q_G(\xvec)\) over all unit vectors and in which direction does it occur?
  4. What is the minimum value of \(q_G(\xvec)\) over all unit vectors and in which direction does it occur?
  5. What is the geometric relationship between the directions in which the maximum and minimum values occur?

Subsection 7.4.1 Finding singular value decompositions

We will begin by explaining what a singular value decomposition is and how we can find one for a given matrix \(A\text{.}\)
Recall how the orthogonal diagonalization of a symmetric matrix is formed: if \(A\) is symmetric, we write \(A = QDQ^T\) where the diagonal entries of \(D\) are the eigenvalues of \(A\) and the columns of \(Q\) are the associated eigenvectors. Moreover, the eigenvalues are related to the maximum and minimum values of the associated quadratic form \(q_A(\uvec)\) among all unit vectors.
A general matrix, particularly a matrix that is not square, may not have eigenvalues and eigenvectors, but we can discover analogous features, called singular values and singular vectors, by studying a function somewhat similar to a quadratic form. More specifically, any matrix \(A\) defines a function
\begin{equation*} l_A(\xvec) = |A\xvec|, \end{equation*}
which measures the length of \(A\xvec\text{.}\) For example, the diagonal matrix \(D=\begin{bmatrix} 3 \amp 0 \\ 0 \amp -2 \\ \end{bmatrix}\) gives the function \(l_D(\xvec) = \sqrt{9x_1^2 + 4x_2^2}\text{.}\) The presence of the square root means that this function is not a quadratic form. We can, however, define the singular values and vectors by looking for the maximum and minimum of this function \(l_A(\uvec)\) among all unit vectors \(\uvec\text{.}\)
While \(l_A(\xvec)\) is not itself a quadratic form, it becomes one if we square it:
\begin{equation*} \left(l_A(\xvec)\right)^2 = |A\xvec|^2 = (A\xvec)\cdot(A\xvec) = \xvec\cdot(A^TA\xvec)=q_{A^TA}(\xvec)\text{.} \end{equation*}
We call \(G=A^TA\text{,}\) the Gram matrix associated to \(A\) and note that
\begin{equation*} l_A(\xvec) = \sqrt{q_G(\xvec)}\text{.} \end{equation*}
This is important in the next activity, which introduces singular values and singular vectors.

Activity 7.4.2.

The following interactive figure will help us explore singular values and vectors geometrically before we begin a more algebraic approach.
Instructions.
You may choose a \(2\times2\) matrix \(A=\begin{bmatrix} a \amp b \\ c \amp d \\ \end{bmatrix}\) using the four sliders at the top of the diagram. On the left, you may vary the red unit vector \(\xvec\) by clicking in the head of the vector.
Figure 7.4.1. Singular values, right singular vectors and left singular vectors
Select the matrix \(A=\begin{bmatrix} 1 \amp 2 \\ -2 \amp - 1 \\ \end{bmatrix}\text{.}\) As we vary the vector \(\xvec\text{,}\) we see the vector \(A\xvec\) on the right in gray while the height of the blue bar to the right tells us \(l_A(\xvec) = |A\xvec|\text{.}\)
  1. The first singular value \(\sigma_1\) is the maximum value of \(l_A(\xvec)\) over all unit vectors and an associated right singular vector \(\vvec_1\) is a unit vector describing a direction in which this maximum occurs.
    Use the diagram to find the first singular value \(\sigma_1\) and an associated right singular vector \(\vvec_1\text{.}\)
  2. The second singular value \(\sigma_2\) is the minimum value of \(l_A(\xvec)\) over all unit vectors and an associated right singular vector \(\vvec_2\) is a unit vector describing a direction in which this minimum occurs.
    Use the diagram to find the second singular value \(\sigma_2\) and an associated right singular vector \(\vvec_2\text{.}\)
  3. Here’s how we can find the right singular values and vectors without using the diagram. Remember that \(l_A(\xvec) = \sqrt{q_G(\xvec)}\) where \(G=A^TA\) is the Gram matrix associated to \(A\text{.}\) Since \(G\) is symmetric, it is orthogonally diagonalizable. Find \(G\) and an orthogonal diagonalization of it.
    What is the maximum value of the quadratic form \(q_G(\xvec)\) among all unit vectors and in which direction does it occur? What is the minimum value of \(q_G(\xvec)\) and in which direction does it occur?
  4. Because \(l_A(\xvec) = \sqrt{q_G(\xvec)}\text{,}\) the first singular value \(\sigma_1\) will be the square root of the maximum value of \(q_G(\xvec)\) and \(\sigma_2\) the square root of the minimum. Verify that the singular values that you found from the diagram are the square roots of the maximum and minimum values of \(q_G(\xvec)\text{.}\)
  5. Verify that the right singular vectors \(\vvec_1\) and \(\vvec_2\) that you found from the diagram are the directions in which the maximum and minimum values occur.
  6. Finally, we introduce the left singular vectors \(\uvec_1\) and \(\uvec_2\) by requiring that \(A\vvec_1 = \sigma_1\uvec_1\) and \(A\vvec_2=\sigma_2\uvec_2\text{.}\) Find the two left singular vectors.
  7. Form the matrices
    \begin{equation*} U = \begin{bmatrix}\uvec_1 \amp \uvec_2 \end{bmatrix}, \hspace{24pt} \Sigma = \begin{bmatrix} \sigma_1 \amp 0 \\ 0 \amp \sigma_2 \\ \end{bmatrix}, \hspace{24pt} V = \begin{bmatrix}\vvec_1 \amp \vvec_2 \end{bmatrix} \end{equation*}
    and explain why \(AV = U\Sigma\text{.}\)
  8. Finally, explain why \(A=U\Sigma V^T\) and verify that this relationship holds for this specific example.
As this activity shows, the singular values of \(A\) are the maximum and minimum values of \(l_A(\xvec)=|A\xvec|\) among all unit vectors and the right singular vectors \(\vvec_1\) and \(\vvec_2\) are the directions in which they occur. The key to finding the singular values and vectors is to utilize the Gram matrix \(G\) and its associated quadratic form \(q_G(\xvec)\text{.}\) We will illustrate with some more examples.

Example 7.4.2.

We will find a singular value decomposition of the matrix \(A=\begin{bmatrix} 1 \amp 2 \\ -1 \amp 2 \end{bmatrix} \text{.}\) Notice that this matrix is not symmetric so it cannot be orthogonally diagonalized.
We begin by constructing the Gram matrix \(G = A^TA = \begin{bmatrix} 2 \amp 0 \\ 0 \amp 8 \\ \end{bmatrix}\text{.}\) Since \(G\) is symmetric, it can be orthogonally diagonalized with
\begin{equation*} D = \begin{bmatrix} 8 \amp 0 \\ 0 \amp 2 \\ \end{bmatrix},\hspace{24pt} Q = \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \\ \end{bmatrix}\text{.} \end{equation*}
We now know that the maximum value of the quadratic form \(q_G(\xvec)\) is 8, which occurs in the direction \(\twovec01\text{.}\) Since \(l_A(\xvec) = \sqrt{q_G(\xvec)}\text{,}\) this tells us that the maximum value of \(l_A(\xvec)\text{,}\) the first singular value, is \(\sigma_1=\sqrt{8}\) and that this occurs in the direction of the first right singular vector \(\vvec_1=\twovec01\text{.}\)
In the same way, we also know that the second singular value \(\sigma_2=\sqrt{2}\) with associated right singular vector \(\vvec_2=\twovec10\text{.}\)
The first left singular vector \(\uvec_1\) is defined by \(A\vvec_1 = \twovec22 = \sigma_1\uvec_1\text{.}\) Because \(\sigma_1 = \sqrt{8}\text{,}\) we have \(\uvec_1 = \twovec{1/\sqrt{2}}{1/\sqrt{2}}\text{.}\) Notice that \(\uvec_1\) is a unit vector because \(\sigma_1 = |A\vvec_1|\text{.}\)
In the same way, the second left singular vector is defined by \(A\vvec_2 = \twovec1{-1} = \sigma_2\uvec_2\text{,}\) which gives us \(\uvec_2 = \twovec{1/\sqrt{2}}{-1/\sqrt{2}}\text{.}\)
We then construct
\begin{align*} U \amp {}={} \begin{bmatrix} \uvec_1 \amp \uvec_2 \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} \amp 1/\sqrt{2} \\ 1/\sqrt{2} \amp -1/\sqrt{2} \\ \end{bmatrix}\\ \Sigma \amp {}={} \begin{bmatrix} \sigma_1 \amp 0 \\ 0 \amp \sigma_2 \\ \end{bmatrix} = \begin{bmatrix} \sqrt{8} \amp 0 \\ 0 \amp \sqrt{2} \\ \end{bmatrix}\\ V \amp {}={} \begin{bmatrix} \vvec_1 \amp \vvec_2 \end{bmatrix} = \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \\ \end{bmatrix} \end{align*}
We now have \(AV=U\Sigma\) because
\begin{equation*} AV = \begin{bmatrix} A\vvec_1 \amp A\vvec_2 \end{bmatrix} = \begin{bmatrix} \sigma_1\uvec_1 \amp \sigma_2\uvec_2 \end{bmatrix} = \Sigma U\text{.} \end{equation*}
Because the right singular vectors, the columns of \(V\text{,}\) are eigenvectors of the symmetric matrix \(G\text{,}\) they form an orthonormal basis, which means that \(V\) is orthogonal. Therefore, we have \((AV)V^T = A = U\Sigma V^T\text{.}\) This gives the singular value decomposition
\begin{equation*} A = \begin{bmatrix} 1 \amp 2 \\ -1 \amp 2 \\ \end{bmatrix} = \begin{bmatrix} 1/\sqrt{2} \amp 1/\sqrt{2} \\ 1/\sqrt{2} \amp -1/\sqrt{2} \\ \end{bmatrix} \begin{bmatrix} \sqrt{8} \amp 0 \\ 0 \amp \sqrt{2} \\ \end{bmatrix} \begin{bmatrix} 0 \amp 1 \\ 1 \amp 0 \\ \end{bmatrix}^T = U\Sigma V^T\text{.} \end{equation*}
To summarize, we find a singular value decomposition of a matrix \(A\) in the following way:
  • Construct the Gram matrix \(G=A^TA\) and find an orthogonal diagonalization to obtain eigenvalues \(\lambda_i\) and an orthonormal basis of eigenvectors.
  • The singular values of \(A\) are the squares roots of eigenvalues \(\lambda_i\) of \(G\text{;}\) that is, \(\sigma_i = \sqrt{\lambda_i}\text{.}\) By convention, the singular values are listed in decreasing order: \(\sigma_1 \geq \sigma_2 \geq \ldots \text{.}\) The right singular vectors \(\vvec_i\) are the associated eigenvectors of \(G\text{.}\)
  • The left singular vectors \(\uvec_i\) are found by \(A\vvec_i = \sigma_i\uvec_i\text{.}\) Because \(\sigma_i=|A\vvec_i|\text{,}\) we know that \(\uvec_i\) will be a unit vector.
    In fact, the left singular vectors will also form an orthonormal basis. To see this, suppose that the associated singular values are nonzero. We then have:
    \begin{align*} \sigma_i\sigma_j(\uvec_i\cdot\uvec_j) \amp {}={} (\sigma_i\uvec_i)\cdot(\sigma_j\uvec_j) = (A\vvec_i)\cdot(A\vvec_j)\\ \amp {}={} \vvec_i\cdot(A^TA\vvec_j) \\ \amp {}={} \vvec_i\cdot(G\vvec_j) = \lambda_j\vvec_i\cdot\vvec_j = 0 \end{align*}
    since the right singular vectors are orthogonal.

Example 7.4.3.

Let’s find a singular value decomposition for the symmetric matrix \(A=\begin{bmatrix} 1 \amp 2 \\ 2 \amp 1 \end{bmatrix}\text{.}\) The associated Gram matrix is
\begin{equation*} G = A^TA = \begin{bmatrix} 5 \amp 4 \\ 4 \amp 5 \\ \end{bmatrix}\text{,} \end{equation*}
which has an orthogonal diagonalization with
\begin{equation*} D = \begin{bmatrix} 9 \amp 0 \\ 0 \amp 1 \\ \end{bmatrix},\hspace{24pt} Q = \begin{bmatrix} 1/\sqrt{2} \amp 1/\sqrt{2} \\ 1/\sqrt{2} \amp -1/\sqrt{2} \\ \end{bmatrix}\text{.} \end{equation*}
This gives singular values and vectors
\begin{align*} \sigma_1 = 3, \hspace{24pt}\amp \vvec_1 = \twovec{1/\sqrt{2}}{1/\sqrt{2}}, \amp \uvec_1 = \twovec{1/\sqrt{2}}{1/\sqrt{2}}\\ \sigma_2 = 1, \hspace{24pt}\amp \vvec_2 = \twovec{1/\sqrt{2}}{-1/\sqrt{2}}, \amp \uvec_2 = \twovec{-1/\sqrt{2}}{1/\sqrt{2}} \end{align*}
and the singular value decomposition \(A=U\Sigma V^T\) where
\begin{equation*} U = \begin{bmatrix} 1/\sqrt{2} \amp -1/\sqrt{2} \\ 1/\sqrt{2} \amp 1/\sqrt{2} \end{bmatrix},\hspace{24pt} \Sigma = \begin{bmatrix} 3 \amp 0 \\ 0 \amp 1 \end{bmatrix},\hspace{24pt} V = \begin{bmatrix} 1/\sqrt{2} \amp 1/\sqrt{2} \\ 1/\sqrt{2} \amp -1/\sqrt{2} \end{bmatrix}. \end{equation*}
This example is special because \(A\) is symmetric. With a little thought, it’s possible to relate this singular value decomposition to an orthogonal diagonalization of \(A\) using the fact that \(G=A^TA = A^2\text{.}\)

Activity 7.4.3.

In this activity, we will construct the singular value decomposition of \(A=\begin{bmatrix} 1 \amp 0 \amp -1 \\ 1 \amp 1 \amp 1 \end{bmatrix}\text{.}\) Notice that this matrix is not square so there are no eigenvalues and eigenvectors associated to it.
  1. Construct the Gram matrix \(G=A^TA\) and find an orthogonal diagonalization of it.
  2. Identify the singular values of \(A\) and the right singular vectors \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\text{.}\) What is the dimension of these vectors? How many nonzero singular values are there?
  3. Find the left singular vectors \(\uvec_1\) and \(\uvec_2\) using the fact that \(A\vvec_i = \sigma_i\uvec_i\text{.}\) What is the dimension of these vectors? What happens if you try to find a third left singular vector \(\uvec_3\) in this way?
  4. As before, form the orthogonal matrices \(U\) and \(V\) from the left and right singular vectors. What are the shapes of \(U\) and \(V\text{?}\) How do these shapes relate to the number of rows and columns of \(A\text{?}\)
  5. Now form \(\Sigma\) so that it has the same shape as \(A\text{:}\)
    \begin{equation*} \Sigma = \begin{bmatrix} \sigma_1 \amp 0 \amp 0 \\ 0 \amp \sigma_2 \amp 0 \end{bmatrix} \end{equation*}
    and verify that \(A = U\Sigma V^T\text{.}\)
  6. How can you use this singular value decomposition of \(A=U\Sigma V^T\) to easily find a singular value decomposition of \(A^T=\begin{bmatrix} 1 \amp 1 \\ 0 \amp 1 \\ -1 \amp 1 \\ \end{bmatrix}\text{?}\)

Example 7.4.4.

We will find a singular value decomposition of the matrix \(A=\begin{bmatrix} 2 \amp -2 \amp 1 \\ -4 \amp -8 \amp -8 \\ \end{bmatrix}\text{.}\)
Finding an orthogonal diagonalization of \(G=A^TA\) gives
\begin{equation*} D=\begin{bmatrix} 144 \amp 0 \amp 0 \\ 0 \amp 9 \amp 0 \\ 0 \amp 0 \amp 0 \\ \end{bmatrix},\hspace{24pt} Q=\begin{bmatrix} 1/3 \amp 2/3 \amp 2/3 \\ 2/3 \amp -2/3 \amp 1/3 \\ 2/3 \amp 1/3 \amp -2/3 \\ \end{bmatrix}\text{,} \end{equation*}
which gives singular values \(\sigma_1=\sqrt{144}=12\text{,}\) \(\sigma_2 = \sqrt{9}= 3\text{,}\) and \(\sigma_3 = 0\text{.}\) The right singular vectors \(\vvec_i\) appear as the columns of \(Q\) so that \(V = Q\text{.}\)
We now find
\begin{align*} A\vvec_1 = \twovec{0}{-12} = 12\uvec_1, \hspace{24pt} \amp \uvec_1 = \twovec{0}{-1}\\ A\vvec_2 = \twovec{3}{0} = 3\uvec_1, \hspace{24pt} \amp \uvec_1 = \twovec10\\ A\vvec_3 = \twovec{0}{0} \end{align*}
Notice that it’s not possible to find a third left singular vector since \(A\vvec_3=\zerovec\text{.}\) We therefore form the matrices
\begin{equation*} U = \begin{bmatrix} 0 \amp 1 \\ -1 \amp 0 \\ \end{bmatrix},\hspace{24pt} \Sigma = \begin{bmatrix} 12 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ \end{bmatrix},\hspace{24pt} V=\begin{bmatrix} 1/3 \amp 2/3 \amp 2/3 \\ 2/3 \amp -2/3 \amp 1/3 \\ 2/3 \amp 1/3 \amp -2/3 \\ \end{bmatrix}\text{,} \end{equation*}
which gives the singular value decomposition \(A=U\Sigma V^T\text{.}\)
Notice that \(U\) is a \(2\times2\) orthogonal matrix because \(A\) has two rows, and \(V\) is a \(3\times3\) orthogonal matrix because \(A\) has three columns.
As we’ll see in the next section, some additional work may be needed to construct the left singular vectors \(\uvec_j\) if more of the singular values are zero, but we won’t worry about that now. For the time being, let’s record our work in the following theorem.
Notice that a singular value decomposition of \(A\) gives us a singular value decomposition of \(A^T\text{.}\) More specifically, if \(A=U\Sigma V^T\text{,}\) then
\begin{equation*} A^T = (U\Sigma V^T)^T = V\Sigma^T U^T. \end{equation*}
As we said earlier, a singular value decomposition should be thought of a generalization of an orthogonal diagonalization. For instance, the Spectral Theorem tells us that a symmetric matrix can be written as \(QDQ^T\text{.}\) Many matrices, however, are not symmetric and so they are not orthogonally diagonalizable. However, every matrix has a singular value decomposition \(U\Sigma V^T\text{.}\) The price of this generalization is that we usually have two sets of singular vectors that form the orthogonal matrices \(U\) and \(V\) whereas a symmetric matrix has a single set of eignevectors that form the orthogonal matrix \(Q\text{.}\)

Subsection 7.4.2 The structure of singular value decompositions

Now that we have an understanding of what a singular value decomposition is and how to construct it, let’s explore the ways in which a singular value decomposition reveals the underlying structure of the matrix. As we’ll see, the matrices \(U\) and \(V\) in a singular value decomposition provide convenient bases for some important subspaces, such as the column and null spaces of the matrix. This observation will provide the key to some of our uses of these decompositions in the next section.

Activity 7.4.4.

Let’s suppose that a matrix \(A\) has a singular value decomposition \(A=U\Sigma V^T\) where
\begin{equation*} U=\begin{bmatrix} \uvec_1 \amp \uvec_2 \amp \uvec_3 \amp \uvec_4 \end{bmatrix},\hspace{10pt} \Sigma = \begin{bmatrix} 20 \amp 0 \amp 0 \\ 0 \amp 5 \amp 0 \\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \end{bmatrix},\hspace{10pt} V=\begin{bmatrix} \vvec_1 \amp \vvec_2 \amp \vvec_3 \end{bmatrix}. \end{equation*}
  1. What is the shape of \(A\text{;}\) that is, how many rows and columns does \(A\) have?
  2. Suppose we write a three-dimensional vector \(\xvec\) as a linear combination of right singular vectors:
    \begin{equation*} \xvec = c_1\vvec_1 + c_2\vvec_2 + c_3\vvec_3\text{.} \end{equation*}
    We would like to find an expression for \(A\xvec\text{.}\)
    To begin, \(V^T\xvec = \threevec{\vvec_1\cdot\xvec} {\vvec_2\cdot\xvec} {\vvec_3\cdot\xvec} = \threevec{c_1}{c_2}{c_3} \text{.}\)
    Now \(\Sigma V^T \xvec = \begin{bmatrix} 20 \amp 0 \amp 0 \\ 0 \amp 5 \amp 0 \\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \end{bmatrix}\threevec{c_1}{c_2}{c_3} = \cfourvec{20c_1}{5c_2}00\text{.}\)
    And finally, \(A\xvec = U\Sigma V^T\xvec = \begin{bmatrix} \uvec_1 \amp \uvec_2 \amp \uvec_3 \amp \uvec_4 \end{bmatrix} \cfourvec{20c_1}{5c_2}00 = 20c_1\uvec_1 + 5c_2\uvec_2\text{.}\)
    To summarize, we have \(A\xvec = 20c_1\uvec_1 + 5c_2\uvec_2\text{.}\)
    What condition on \(c_1\text{,}\) \(c_2\text{,}\) and \(c_3\) must be satisfied if \(\xvec\) is a solution to the equation \(A\xvec=40\uvec_1 + 20\uvec_2\text{?}\) Is there a unique solution or infinitely many?
  3. Remembering that \(\uvec_1\) and \(\uvec_2\) are linearly independent, what condition on \(c_1\text{,}\) \(c_2\text{,}\) and \(c_3\) must be satisfied if \(A\xvec = \zerovec\text{?}\)
  4. How do the right singular vectors \(\vvec_i\) provide a basis for \(\nul(A)\text{,}\) the subspace of solutions to the equation \(A\xvec = \zerovec\text{?}\)
  5. Remember that \(\bvec\) is in \(\col(A)\) if the equation \(A\xvec = \bvec\) is consistent, which means that
    \begin{equation*} A\xvec = 20c_1\uvec_1 + 5c_2\uvec_2 = \bvec \end{equation*}
    for some coefficients \(c_1\) and \(c_2\text{.}\) How do the left singular vectors \(\uvec_i\) provide an orthonormal basis for \(\col(A)\text{?}\)
  6. Remember that \(\rank(A)\) is the dimension of the column space. What is \(\rank(A)\) and how do the number of nonzero singular values determine \(\rank(A)\text{?}\)
This activity shows how a singular value decomposition of a matrix encodes important information about its null and column spaces. More specifically, the left and right singular vectors provide orthonormal bases for \(\nul(A)\) and \(\col(A)\text{.}\) This is one of the reasons that singular value decompositions are so useful.

Example 7.4.7.

Suppose we have a singular value decomposition \(A=U\Sigma V^T\) where \(\Sigma = \begin{bmatrix} \sigma_1 \amp 0 \amp 0 \amp 0 \amp 0 \\ 0 \amp \sigma_2 \amp 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp \sigma_3 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \amp 0 \amp 0 \\ \end{bmatrix} \text{.}\) This means that \(A\) has four rows and five columns just as \(\Sigma\) does.
As in the activity, if \(\xvec = c_1 \vvec_1 + c_2\vvec_2 + \ldots + c_5\vvec_5\text{,}\) we have
\begin{equation*} A\xvec = \sigma_1c_1\uvec_1 + \sigma_2c_2\uvec_2 + \sigma_3c_3\uvec_3\text{.} \end{equation*}
If \(\bvec\) is in \(\col(A)\text{,}\) then \(\bvec\) must have the form
\begin{equation*} \bvec = \sigma_1c_1\uvec_1 + \sigma_2c_2\uvec_2 + \sigma_3c_3\uvec_3\text{,} \end{equation*}
which says that \(\bvec\) is a linear combination of \(\uvec_1\text{,}\) \(\uvec_2\text{,}\) and \(\uvec_3\text{.}\) These three vectors therefore form a basis for \(\col(A)\text{.}\) In fact, since they are columns in the orthogonal matrix \(U\text{,}\) they form an orthonormal basis for \(\col(A)\text{.}\)
Remembering that \(\rank(A)=\dim\col(A)\text{,}\) we see that \(\rank(A) = 3\text{,}\) which results from the three nonzero singular values. In general, the rank \(r\) of a matrix \(A\) equals the number of nonzero singular values, and \(\uvec_1, \uvec_2, \ldots,\uvec_r\) form an orthonormal basis for \(\col(A)\text{.}\)
Moreover, if \(\xvec = c_1 \vvec_1 + c_2\vvec_2 + \ldots + c_5\vvec_5\) satisfies \(A\xvec = \zerovec\text{,}\) then
\begin{equation*} A\xvec = \sigma_1c_1\uvec_1 + \sigma_2c_2\uvec_2 + \sigma_3c_3\uvec_3=\zerovec\text{,} \end{equation*}
which implies that \(c_1=0\text{,}\) \(c_2=0\text{,}\) and \(c_3=0\text{.}\) Therefore, \(\xvec = c_4\vvec_4+c_5\vvec_5\) so \(\vvec_4\) and \(\vvec_5\) form an orthonormal basis for \(\nul(A)\text{.}\)
More generally, if \(A\) is an \(m\times n\) matrix and if \(\rank(A) = r\text{,}\) the last \(n-r\) right singular vectors form an orthonormal basis for \(\nul(A)\text{.}\)
Generally speaking, if the rank of an \(m\times n\) matrix \(A\) is \(r\text{,}\) then there are \(r\) nonzero singular values and \(\Sigma\) has the form
\begin{equation*} \begin{bmatrix} \sigma_1 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ 0 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ 0 \amp \ldots \amp \sigma_r \amp \ldots \amp 0 \\ 0 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ \vdots \amp \vdots \amp \vdots \amp \ddots \amp \vdots \\ 0 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ \end{bmatrix}, \end{equation*}
The first \(r\) columns of \(U\) form an orthonormal basis for \(\col(A)\text{:}\)
\begin{equation*} U = \left[ \underbrace{\uvec_1 ~ \ldots ~ \uvec_r}_{\col(A)} \hspace{3pt} \uvec_{r+1} ~ \ldots ~ \uvec_m \right] \end{equation*}
and the last \(n-r\) columns of \(V\) form an orthonormal basis for \(\nul(A)\text{:}\)
\begin{equation*} V = \left[ \vvec_1 ~ \ldots ~ \vvec_r\hspace{3pt} \underbrace{\vvec_{r+1} ~ \ldots ~ \vvec_n}_{\nul(A)} \right] \end{equation*}
Remember that Proposition 7.4.6 says that \(A\) and its transpose \(A^T\) share the same singular values. Since the rank of a matrix equals its number of nonzero singular values, this means that \(\rank(A)=\rank(A^T)\text{,}\) a fact that we cited back in Section 6.2.
If we have a singular value decomposition of an \(m\times n\) matrix \(A=U\Sigma V^T\text{,}\) Proposition 7.4.6 also tells us that the left singular vectors of \(A\) are the right singular vectors of \(A^T\text{.}\) Therefore, \(U\) is the \(m\times m\) matrix whose columns are the right singular vectors of \(A^T\text{.}\) This means that the last \(m-r\) vectors form an orthonormal basis for \(\nul(A^T)\text{.}\) Therefore, the columns of \(U\) provide orthonormal bases for \(\col(A)\) and \(\nul(A^T)\text{:}\)
\begin{equation*} U = \left[ \underbrace{\uvec_1 ~ \ldots ~ \uvec_r}_{\col(A)} \hspace{3pt} \underbrace{\uvec_{r+1} ~ \ldots ~ \uvec_m}_{\nul(A^T)} \right]\text{.} \end{equation*}
This reflects the familiar fact that \(\nul(A^T)\) is the orthogonal complement of \(\col(A)\text{.}\)
In the same way, \(V\) is the \(n\times n\) matrix whose columns are the left singular vectors of \(A^T\text{,}\) which means that the first \(r\) vectors form an orthonormal basis for \(\col(A^T)\text{.}\) Because the columns of \(A^T\) are the rows of \(A\text{,}\) this subspace is sometimes called the row space of \(A\) and denoted \(\row(A)\text{.}\) While we have yet to have an occasion to use \(\row(A)\text{,}\) there are times when it is important to have an orthonormal basis for it, and a singular value decomposition provides just that. To summarize, the columns of \(V\) provide orthonormal bases for \(\col(A^T)\) and \(\nul(A)\text{:}\)
\begin{equation*} V = \left[ \underbrace{\vvec_1 ~ \ldots ~ \vvec_r}_{\col(A^T)}\hspace{3pt} \underbrace{\vvec_{r+1} ~ \ldots ~ \vvec_m}_{\nul(A)} \right] \end{equation*}
Considered altogether, the subspaces \(\col(A)\text{,}\) \(\nul(A)\text{,}\) \(\col(A^T)\text{,}\) and \(\nul(A^T)\) are called the four fundamental subspaces associated to \(A\text{.}\) In addition to telling us the rank of a matrix, a singular value decomposition gives us orthonormal bases for all four fundamental subspaces.
When we previously outlined a procedure for finding a singular decomposition of an \(m\times n\) matrix \(A\text{,}\) we found the left singular vectors \(\uvec_j\) using the expression \(A\vvec_j = \sigma_j\uvec_j\text{.}\) This produces left singular vectors \(\uvec_1, \uvec_2,\ldots,\uvec_r\text{,}\) where \(r=\rank(A)\text{.}\) If \(r\lt m\text{,}\) however, we still need to find the left singular vectors \(\uvec_{r+1},\ldots,\uvec_m\text{.}\) Theorem 7.4.9 tells us how to do that: because those vectors form an orthonormal basis for \(\nul(A^T)\text{,}\) we can find them by solving \(A^T\xvec = \zerovec\) to obtain a basis for \(\nul(A^T)\) and applying the Gram-Schmidt algorithm.
We won’t worry about this issue too much, however, as we will frequently use software to find singular value decompositions for us.

Subsection 7.4.3 Reduced singular value decompositions

As we’ll see in the next section, there are times when it is helpful to express a singular value decomposition in a slightly different form.

Activity 7.4.5.

Suppose we have a singular value decomposition \(A = U\Sigma V^T\) where
\begin{equation*} U = \begin{bmatrix} \uvec_1 \amp \uvec_2 \amp \uvec_3 \amp \uvec_4 \end{bmatrix},\hspace{24pt} \Sigma = \begin{bmatrix} 18 \amp 0 \amp 0 \\ 0 \amp 4 \amp 0 \\ 0 \amp 0 \amp 0 \\ 0 \amp 0 \amp 0 \\ \end{bmatrix},\hspace{24pt} V = \begin{bmatrix} \vvec_1 \amp \vvec_2 \amp \vvec_3 \end{bmatrix}\text{.} \end{equation*}
  1. What is the shape of \(A\text{?}\) What is \(\rank(A)\text{?}\)
  2. Identify bases for \(\col(A)\) and \(\col(A^T)\text{.}\)
  3. Explain why
    \begin{equation*} U\Sigma = \begin{bmatrix} \uvec_1 \amp \uvec_2 \end{bmatrix} \begin{bmatrix} 18 \amp 0 \amp 0 \\ 0 \amp 4 \amp 0 \\ \end{bmatrix}\text{.} \end{equation*}
  4. Explain why
    \begin{equation*} \begin{bmatrix} 18 \amp 0 \amp 0 \\ 0 \amp 4 \amp 0 \\ \end{bmatrix}V^T = \begin{bmatrix} 18 \amp 0 \\ 0 \amp 4 \\ \end{bmatrix} \begin{bmatrix} \vvec_1 \amp \vvec_2 \end{bmatrix}^T\text{.} \end{equation*}
  5. If \(A = U\Sigma V^T\text{,}\) explain why \(A=U_r\Sigma_rV_r^T\) where the columns of \(U_r\) are an orthonormal basis for \(\col(A)\text{,}\) \(\Sigma_r\) is a square, diagonal, invertible matrix, and the columns of \(V_r\) form an orthonormal basis for \(\col(A^T)\text{.}\)
We call this a reduced singular value decomposition.

Example 7.4.11.

In Example 7.4.4, we found the singular value decomposition
\begin{equation*} A=\begin{bmatrix} 2 \amp -2 \amp 1 \\ -4 \amp -8 \amp -8 \\ \end{bmatrix} = \begin{bmatrix} 0 \amp 1 \\ -1 \amp 0 \\ \end{bmatrix} \begin{bmatrix} 12 \amp 0 \amp 0 \\ 0 \amp 3 \amp 0 \\ \end{bmatrix} \begin{bmatrix} 1/3 \amp 2/3 \amp 2/3 \\ 2/3 \amp -2/3 \amp 1/3 \\ 2/3 \amp 1/3 \amp -2/3 \\ \end{bmatrix}^T\text{.} \end{equation*}
Since there are two nonzero singular values, \(\rank(A) =2\) so that the reduced singular value decomposition is
\begin{equation*} A=\begin{bmatrix} 2 \amp -2 \amp 1 \\ -4 \amp -8 \amp -8 \\ \end{bmatrix} = \begin{bmatrix} 0 \amp 1 \\ -1 \amp 0 \\ \end{bmatrix} \begin{bmatrix} 12 \amp 0 \\ 0 \amp 3 \\ \end{bmatrix} \begin{bmatrix} 1/3 \amp 2/3 \\ 2/3 \amp -2/3 \\ 2/3 \amp 1/3 \\ \end{bmatrix}^T\text{.} \end{equation*}

Subsection 7.4.4 Summary

This section has explored singular value decompositions, how to find them, and how they organize important information about a matrix.
  • A singular value decomposition of a matrix \(A\) is a factorization where \(A=U\Sigma V^T\text{.}\) The matrix \(\Sigma\) has the same shape as \(A\text{,}\) and its only nonzero entries are the singular values of \(A\text{,}\) which appear in decreasing order on the diagonal. The matrices \(U\) and \(V\) are orthogonal and contain the left and right singular vectors, respectively, as their columns.
  • To find a singular value decomposition of a matrix, we construct the Gram matrix \(G=A^TA\text{,}\) which is symmetric. The singular values of \(A\) are the square roots of the eigenvalues of \(G\text{,}\) and the right singular vectors \(\vvec_j\) are the associated eigenvectors of \(G\text{.}\) The left singular vectors \(\uvec_j\) are determined from the relationship \(A\vvec_j=\sigma_j\uvec_j\text{.}\)
  • A singular value decomposition reveals fundamental information about a matrix. For instance, the number of nonzero singular values is the rank \(r\) of the matrix. The first \(r\) left singular vectors form an orthonormal basis for \(\col(A)\) with the remaining left singular vectors forming an orthonormal basis of \(\nul(A^T)\text{.}\) The first \(r\) right singular vectors form an orthonormal basis for \(\col(A^T)\) while the remaining right singular vectors form an orthonormal basis of \(\nul(A)\text{.}\)
  • If \(A\) is a rank \(r\) matrix, we can write a reduced singular value decomposition as \(A=U_r\Sigma_rV_r^T\) where the columns of \(U_r\) form an orthonormal basis for \(\col(A)\text{,}\) the columns of \(V_r\) form an orthonormal basis for \(\col(A^T)\text{,}\) and \(\Sigma_r\) is an \(r\times r\) diagonal, invertible matrix.

Exercises 7.4.5 Exercises

1.

Consider the matrix \(A = \begin{bmatrix} 1 \amp 2 \amp 1 \\ 0 \amp -1 \amp 2 \\ \end{bmatrix} \text{.}\)
  1. Find the Gram matrix \(G=A^TA\) and use it to find the singular values and right singular vectors of \(A\text{.}\)
  2. Find the left singular vectors.
  3. Form the matrices \(U\text{,}\) \(\Sigma\text{,}\) and \(V\) and verify that \(A=U\Sigma V^T\text{.}\)
  4. What is \(\rank(A)\) and what does this say about \(\col(A)\text{?}\)
  5. Determine an orthonormal basis for \(\nul(A)\text{.}\)

2.

Find singular value decompositions for the following matrices:
  1. \(\begin{bmatrix} 0 \amp 0 \\ 0 \amp -8 \end{bmatrix}\text{.}\)
  2. \(\begin{bmatrix} 2 \amp 3 \\ 0 \amp 2 \end{bmatrix}\text{.}\)
  3. \(\displaystyle \begin{bmatrix} 4 \amp 0 \amp 0 \\ 0 \amp 0 \amp 2 \end{bmatrix}\)
  4. \(\displaystyle \begin{bmatrix} 4 \amp 0 \\ 0 \amp 0 \\ 0 \amp 2 \end{bmatrix}\)

3.

Consider the matrix \(A = \begin{bmatrix} 2 \amp 1 \\ 1 \amp 2 \end{bmatrix} \text{.}\)
  1. Find a singular value decomposition of \(A\) and verify that it is also an orthogonal diagonalization of \(A\text{.}\)
  2. If \(A\) is a symmetric, positive semidefinite matrix, explain why a singular value decomposition of \(A\) is an orthogonal diagonalization of \(A\text{.}\)

4.

Suppose that the matrix \(A\) has the singular value decomposition
\begin{equation*} \begin{bmatrix} -0.46 \amp 0.52 \amp 0.46 \amp 0.55 \\ -0.82 \amp 0.00 \amp -0.14 \amp -0.55 \\ -0.04 \amp 0.44 \amp -0.85 \amp 0.28 \\ -0.34 \amp -0.73 \amp -0.18 \amp 0.55 \end{bmatrix} \begin{bmatrix} 6.2 \amp 0.0 \amp 0.0 \\ 0.0 \amp 4.1 \amp 0.0 \\ 0.0 \amp 0.0 \amp 0.0 \\ 0.0 \amp 0.0 \amp 0.0 \end{bmatrix} \begin{bmatrix} -0.74 \amp 0.62 \amp -0.24 \\ 0.28 \amp 0.62 \amp 0.73 \\ -0.61 \amp -0.48 \amp 0.64 \end{bmatrix}. \end{equation*}
  1. What are the dimensions of \(A\text{?}\)
  2. What is \(\rank(A)\text{?}\)
  3. Find orthonormal bases for \(\col(A)\text{,}\) \(\nul(A)\text{,}\) \(\col(A^T)\text{,}\) and \(\nul(A^T)\text{.}\)
  4. Find the orthogonal projection of \(\bvec=\fourvec102{-1}\) onto \(\col(A)\text{.}\)

5.

Consider the matrix \(A = \begin{bmatrix} 1 \amp 0 \amp -1 \\ 2 \amp 2 \amp 0 \\ -1 \amp 1 \amp 2\\ \end{bmatrix} \text{.}\)
  1. Construct the Gram matrix \(G\) and use it to find the singular values and right singular vectors \(\vvec_1\text{,}\) \(\vvec_2\text{,}\) and \(\vvec_3\) of \(A\text{.}\) What are the matrices \(\Sigma\) and \(V\) in a singular value decomposition?
  2. What is \(\rank(A)\text{?}\)
  3. Find as many left singular vectors \(\uvec_j\) as you can using the relationship \(A\vvec_j=\sigma_j\uvec_j\text{.}\)
  4. Find an orthonormal basis for \(\nul(A^T)\) and use it to construct the matrix \(U\) so that \(A=U\Sigma V^T\text{.}\)
  5. State an orthonormal basis for \(\nul(A)\) and an orthonormal basis for \(\col(A)\text{.}\)

6.

Consider the matrix \(B=\begin{bmatrix} 1 \amp 0 \\ 2 \amp -1 \\ 1 \amp 2 \end{bmatrix}\) and notice that \(B=A^T\) where \(A\) is the matrix in Exercise 7.4.5.1.
  1. Use your result from Exercise 7.4.5.1 to find a singular value decomposition of \(B=U\Sigma V^T\text{.}\)
  2. What is \(\rank(B)\text{?}\) Determine a basis for \(\col(B)\) and \(\col(B)^\perp\text{.}\)
  3. Suppose that \(\bvec=\threevec{-3}47\text{.}\) Use the bases you found in the previous part of this exercise to write \(\bvec=\bhat+\bvec^\perp\text{,}\) where \(\bhat\) is in \(\col(B)\) and \(\bvec^\perp\) is in \(\col(B)^\perp\text{.}\)
  4. Find the least-squares approximate solution to the equation \(B\xvec=\bvec\text{.}\)

7.

Suppose that \(A\) is a square \(m\times m\) matrix with singular value decomposition \(A=U\Sigma V^T\text{.}\)
  1. If \(A\) is invertible, find a singular value decomposition of \(A^{-1}\text{.}\)
  2. What condition on the singular values must hold for \(A\) to be invertible?
  3. How are the singular values of \(A\) and the singular values of \(A^{-1}\) related to one another?
  4. How are the right and left singular vectors of \(A\) related to the right and left singular vectors of \(A^{-1}\text{?}\)

8.

  1. If \(Q\) is an orthogonal matrix, remember that \(Q^TQ=I\text{.}\) Explain why \(\det Q = \pm 1\text{.}\)
  2. If \(A=U\Sigma V^T\) is a singular value decomposition of a square matrix \(A\text{,}\) explain why \(|\det A|\) is the product of the singular values of \(A\text{.}\)
  3. What does this say about the singular values of \(A\) if \(A\) is invertible?

9.

If \(A\) is a matrix and \(G=A^TA\) its Gram matrix, remember that
\begin{equation*} \xvec\cdot(G\xvec) = \xvec\cdot(A^TA\xvec) = (A\xvec)\cdot(A\xvec) = \len{A\xvec}^2. \end{equation*}
  1. For a general matrix \(A\text{,}\) explain why the eigenvalues of \(G\) are nonnegative.
  2. Given a symmetric matrix \(A\) having an eigenvalue \(\lambda\text{,}\) explain why \(\lambda^2\) is an eigenvalue of \(G\text{.}\)
  3. If \(A\) is symmetric, explain why the singular values of \(A\) equal the absolute value of its eigenvalues: \(\sigma_j = |\lambda_j|\text{.}\)

10.

Determine whether the following statements are true or false and explain your reasoning.
  1. If \(A=U\Sigma V^T\) is a singular value decomposition of \(A\text{,}\) then \(G=V(\Sigma^T\Sigma)V^T\) is an orthogonal diagonalization of its Gram matrix.
  2. If \(A=U\Sigma V^T\) is a singular value decomposition of a rank 2 matrix \(A\text{,}\) then \(\vvec_1\) and \(\vvec_2\) form an orthonormal basis for the column space \(\col(A)\text{.}\)
  3. If \(A\) is a symmetric matrix, then its set of singular values is the same as its set of eigenvalues.
  4. If \(A\) is a \(10\times7\) matrix and \(\sigma_7 = 4\text{,}\) then the columns of \(A\) are linearly independent.
  5. The Gram matrix is always orthogonally diagonalizable.

11.

Suppose that \(A=U\Sigma V^T\) is a singular value decomposition of the \(m\times n\) matrix \(A\text{.}\) If \(\sigma_1,\ldots,\sigma_r\) are the nonzero singular values, the general form of the matrix \(\Sigma\) is
\begin{equation*} \Sigma = \begin{bmatrix} \sigma_1 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ 0 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ 0 \amp \ldots \amp \sigma_r \amp \ldots \amp 0 \\ 0 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ 0 \amp \vdots \amp 0 \amp \vdots \amp 0 \\ 0 \amp \ldots \amp 0 \amp \ldots \amp 0 \\ \end{bmatrix}. \end{equation*}
  1. If you know that the columns of \(A\) are linearly independent, what more can you say about the form of \(\Sigma\text{?}\)
  2. If you know that the columns of \(A\) span \(\real^m\text{,}\) what more can you say about the form of \(\Sigma\text{?}\)
  3. If you know that the columns of \(A\) are linearly independent and span \(\real^m\text{,}\) what more can you say about the form of \(\Sigma\text{?}\)