Up to this point, we have used the Gaussian elimination algorithm to find solutions to linear systems. We now investigate another way to find solutions to the equation \(A\xvec=\bvec\) when the matrix \(A\) has the same number of rows and columns. To get started, letβs look at some familiar examples.
The preview activity began with a familiar type of equation, \(3x = 5\text{,}\) and asked for a strategy to solve it. One possible response is to divide both sides by 3. Instead, letβs rephrase this as multiplying by \(3^{-1} = \frac
13\text{,}\) the multiplicative inverse of 3.
Now that we are interested in solving equations of the form \(A\xvec = \bvec\text{,}\) we might try to find a similar approach. Is there a matrix \(A^{-1}\) that plays the role of the multiplicative inverse of \(A\text{?}\) Of course, the real number \(0\) does not have a multiplicative inverse so we probably shouldnβt expect every matrix to have a multiplicative inverse. We will see, however, that many do.
An \(n\times n\) matrix \(A\) is called invertible if there is a matrix \(B\) such that \(AB = I_n\text{,}\) where \(I_n\) is the \(n\times n\) identity matrix. The matrix \(B\) is called the inverse of \(A\) and denoted \(A^{-1}\text{.}\)
Suppose that \(A\) is the matrix that rotates two-dimensional vectors counterclockwise by \(90^\circ\) and that \(B\) rotates vectors by \(-90^\circ\text{.}\) We have
Notice that if we multiply the matrices in the opposite order, we find that \(BA=I\text{,}\) which says that \(B\) is also invertible and that \(B^{-1} = A\text{.}\) In other words, \(A\) and \(B\) are inverses of each other.
This activity demonstrates a procedure for finding the inverse of a matrix \(A\text{.}\)
Suppose that \(A = \begin{bmatrix}
3 \amp -2 \\
1 \amp -1 \\
\end{bmatrix}
\text{.}\) To find an inverse \(B\text{,}\) we write its columns as \(B = \begin{bmatrix}\bvec_1 \amp \bvec_2
\end{bmatrix}\) and require that
Solve these equations to find \(\bvec_1\) and \(\bvec_2\text{.}\) Then write the matrix \(B\) and verify that \(AB=I\text{.}\) This is enough for us to conclude that \(B\) is the inverse of \(A\text{.}\)
We now develop a condition that must be satisfied by an invertible matrix. Suppose that \(A\) is an invertible \(n\times n\) matrix with inverse \(B\) and suppose that \(\bvec\) is any \(n\)-dimensional vector. Since \(AB=I\text{,}\) we have
We can reformulate this procedure for finding the inverse of a matrix. For the sake of convenience, suppose that \(A\) is a \(2\times2\) invertible matrix with inverse \(B=\begin{bmatrix} \bvec_1 \amp \bvec_2 \end{bmatrix}\text{.}\) Rather than solving the equations
separately, we can solve them at the same time by augmenting \(A\) by both vectors \(\twovec10\) and \(\twovec01\) and finding the reduced row echelon form.
In fact, this reformulation will always work. Suppose that \(A\) is an invertible \(n\times n\) matrix with inverse \(B\text{.}\) Suppose furthermore that \(\bvec\) is any \(n\)-dimensional vector and consider the equation \(A\xvec = \bvec\text{.}\) We know that \(x=B\bvec\) is a solution because \(A(B\bvec) = (AB)\bvec = I\bvec = \bvec.\)
If \(A\) is an invertible matrix with inverse \(B\text{,}\) then any equation \(A\xvec = \bvec\) is consistent and \(\xvec = B\bvec\) is a solution. In other words, the solution to \(A\xvec = \bvec\) is \(\xvec =
A^{-1}\bvec\text{.}\)
Now since \(A\xvec=\bvec\) is consistent for every vector \(\bvec\text{,}\) the columns of \(A\) must span \(\real^n\) so there is a pivot position in every row. Since \(A\) is also square, this means that the reduced row echelon form of \(A\) is the identity matrix.
The matrix \(A\) is invertible if and only if the reduced row echelon form of \(A\) is the identity matrix: \(A\sim I\text{.}\) In addition, we can find the inverse by augmenting \(A\) by the identity and finding the reduced row echelon form:
\begin{equation*}
\left[
\begin{array}{r|r}
A \amp I \\
\end{array}
\right]
\sim
\left[
\begin{array}{r|r}
I \amp A^{-1} \\
\end{array}
\right].
\end{equation*}
You may have noticed that PropositionΒ 3.1.4 says that the solution to the equation \(A\xvec = \bvec\) is \(\xvec =
A^{-1}\bvec\text{.}\) Indeed, we know that this equation has a unique solution because \(A\) has a pivot position in every column.
It is important to remember that the product of two matrices depends on the order in which they are multiplied. That is, if \(C\) and \(D\) are matrices, then it sometimes happens that \(CD \neq DC\text{.}\) However, something fortunate happens when we consider invertibility. It turns out that if \(A\) is an \(n\times n\) matrix and that \(AB=I\text{,}\) then it is also true that \(BA=I\text{.}\) We have verified this in a few examples so far, and ExerciseΒ 3.1.5.12 explains why it always happens. This leads to the following proposition.
If \(A\) is a \(n\times n\) invertible matrix with inverse \(B\text{,}\) then \(BA=I\text{,}\) which tells us that \(B\) is invertible with inverse \(A\text{.}\) In other words,
\begin{equation*}
(A^{-1})^{-1} = A.
\end{equation*}
If \(A\) is an invertible matrix, then PropositionΒ 3.1.4 shows us how to use \(A^{-1}\) to solve equations involving \(A\text{.}\) In particular, the solution to \(A\xvec = \bvec\) is \(\xvec = A^{-1}\bvec\text{.}\)
We can find \(A^{-1}\) by finding the reduced row echelon form of \(\left[\begin{array}{r|r} A \amp I
\end{array}\right]\text{;}\) namely,
\begin{equation*}
\left[\begin{array}{r|r} A \amp I \end{array}\right]
\sim
\left[\begin{array}{r|r} I \amp A^{-1} \end{array}\right]\text{.}
\end{equation*}
There is a simple formula for finding the inverse of a \(2\times2\) matrix:
\begin{equation*}
\left[\begin{array}{rr}
a \amp b \\
c \amp d \\
\end{array}\right]^{-1}
=
\frac{1}{ad-bc}
\left[\begin{array}{rr}
d \amp -b \\
-c \amp a \\
\end{array}\right]\text{,}
\end{equation*}
which can be easily checked. The condition that \(A\) be invertible is, in this case, reduced to the condition that \(ad-bc\neq 0\text{.}\) We will understand this condition better once we have explored determinants in SectionΒ 3.4. There is a similar formula for the inverse of a \(3\times
3\) matrix, but there is not a good reason to write it here.
Subsection3.1.3Triangular matrices and Gaussian elimination
With some of the ideas weβve developed, we can recast the Gaussian elimination algorithm in terms of matrix multiplication and invertibility. This will be especially helpful later when we consider determinants and LU factorizations. Triangular matrices will play an important role.
We say that a matrix \(A\) is lower triangular if all its entries above the diagonal are zero. Similarly, \(A\) is upper triangular if all the entries below the diagonal are zero.
We can develop a simple test to determine whether an \(n\times n\) lower triangular matrix is invertible. Letβs use Gaussian elimination to find the reduced row echelon form of the lower triangular matrix
If, however, there is a zero entry on the diagonal, the matrix cannot be invertible. Considering the matrix below, we see that having a zero on the diagonal leads to a row without a pivot position.
and apply a replacement operation that multiplies the first row by \(-2\) and adds it to the second row. Rather than performing this operation in the usual way, we construct a new matrix by applying the desired replacement operation to the identity matrix. To illustrate, we begin with the identity matrix
Other row operations can be performed using a similar procedure. For instance, suppose we want to scale the second row of \(A\) by \(4\text{.}\) Find a matrix \(S\) so that \(SA\) is the same as that obtained from the scaling operation. Why is \(S\) invertible and what is \(S^{-1}\text{?}\)
Finally, suppose we want to interchange the first and third rows of \(A\text{.}\) Find a matrix \(P\text{,}\) usually called a permutation matrix that performs this operation. What is \(P^{-1}\text{?}\)
The original matrix \(A\) is seen to be row equivalent to the upper triangular matrix \(U\) by performing three replacement operations on \(A\text{:}\)
Explain why the matrix product \(L_3L_2L_1\) is invertible and use this fact to write \(A = LU\text{.}\) What is the matrix \(L\) that you find? Why do you think we denote it by \(L\text{?}\)
For the forward substitution phase of Gaussian elimination, we perform a sequence of three replacement operations. The first replacement operation multiplies the first row by \(3\) and adds the result to the second row. We can perform this operation by multiplying \(A\) by the lower triangular matrix \(L_1\) where
This says that if we want to undo the operation of multiplying the first row by \(3\) and adding to the second row, we should multiply the first row by \(-3\) and add it to the second row. That is the effect of \(L_1^{-1}\text{.}\)
This way of writing \(A=LU\) as the product of a lower and an upper triangular matrix is known as an \(LU\) factorization of \(A\text{,}\) and its usefulness will be explored in SectionΒ 5.1.
In this section, we found conditions guaranteeing that a matrix has an inverse. When these conditions hold, we also found an algorithm for finding the inverse.
A square matrix is invertible if there is a matrix \(B\text{,}\) known as the inverse of \(A\text{,}\) such that \(AB =
I\text{.}\) We usually write \(A^{-1} = B\text{.}\)
If a matrix \(A\) is invertible, we can use Gaussian elimination to find its inverse:
\begin{equation*}
\left[\begin{array}{r|r} A \amp I \end{array}\right] \sim
\left[\begin{array}{r|r} I \amp A^{-1} \end{array}\right]\text{.}
\end{equation*}
In this exercise, we will consider \(2\times 2\) matrices as defining matrix transformations.
Write the matrix \(A\) that performs a \(45^\circ\) rotation. What geometric operation undoes this rotation? Find the matrix that perform this operation and verify that it is \(A^{-1}\text{.}\)
Write the matrix \(A\) that performs a \(180^\circ\) rotation. Verify that \(A^2 = I\) so that \(A^{-1} = A\text{,}\) and explain geometrically why this is the case.
Inverses for certain types of matrices can be found in a relatively straightforward fashion.
The matrix \(D=\begin{bmatrix}
2 \amp 0 \amp 0 \\
0 \amp -1 \amp 0 \\
0 \amp 0 \amp -4 \\
\end{bmatrix}\) is called diagonal since the only nonzero entries are on the diagonal of the matrix.
Explain why the inverse of a diagonal matrix is also diagonal and explain the relationship between the diagonal entries in \(D\) and \(D^{-1}\text{.}\)
Our definition of an invertible matrix requires that \(A\) be a square \(n\times n\) matrix. Letβs examine what happens when \(A\) is not square. For instance, suppose that
If \(A\) has a left inverse \(B\text{,}\) we can still use it to find solutions to linear equations. If we know there is a solution to the equation \(A\xvec = \bvec\text{,}\) we can multiply both sides of the equation by \(B\) to find \(\xvec = B\bvec\text{.}\)
Suppose you know there is a solution to the equation \(A\xvec = \threevec{-1}{-3}{6}\text{.}\) Use the left inverse \(B\) to find \(\xvec\) and verify that it is a solution.
If a matrix \(A\) is invertible, there is a sequence of row operations that transforms \(A\) into the identity matrix \(I\text{.}\) We have seen that every row operation can be performed by matrix multiplication. If the \(j^{th}\) step in the Gaussian elimination process is performed by multiplying by \(E_j\text{,}\) then we have
\begin{equation*}
E_p\cdots E_2E_1 A = I\text{,}
\end{equation*}
For each of the following matrices, find a sequence of row operations that transforms the matrix to the identity \(I\text{.}\) Write the matrices \(E_j\) that perform the steps and use them to find \(A^{-1}\text{.}\)
Suppose that \(A^2 = AA\) is invertible with inverse \(B\text{.}\) This means that \(A^2B = AAB = I\text{.}\) Explain why \(A\) must be invertible with inverse \(AB\text{.}\)
Suppose that \(A^{100}\) is invertible with inverse \(B\text{.}\) Explain why \(A\) is invertible. What is \(A^{-1}\) in terms of \(A\) and \(B\text{?}\)
We say that two square matrices \(A\) and \(B\) are similar if there is an invertible matrix \(P\) such that \(B = PAP^{-1}\text{.}\)
If \(A\) and \(B\) are similar, explain why \(A^2\) and \(B^2\) are similar as well. In particular, if \(B = PAP^{-1}\text{,}\) explain why \(B^2 =
PA^2P^{-1}\text{.}\)
If \(A\) is similar to \(B\) and \(B\) is similar to \(C\text{,}\) explain why \(A\) is similar to \(C\text{.}\) To begin, you may wish to assume that \(B =
PAP^{-1}\) and \(C = QBQ^{-1}\text{.}\)
Suppose that \(A\) and \(B\) are two \(n\times n\) matrices and that \(AB\) is invertible. We would like to explain why both \(A\) and \(B\) are invertible.
We first explain why \(B\) is invertible.
Since \(AB\) is invertible, explain why any solution to the homogeneous equation \(AB\xvec = \zerovec\) is \(\xvec=\zerovec\text{.}\)
Using the fact that \(AB\xvec = A(B\xvec) =
\bvec\) is consistent for every \(\bvec\text{,}\) explain why every equation \(A\xvec = \bvec\) is consistent.
We defined an \(n\times n\) matrix to be invertible if there is a matrix \(B\) such that \(AB=I_n\text{.}\) In this exercise, we will explain why it is also true that \(BA = I\text{,}\) which is the statement of PropositionΒ 3.1.6. This means that, if \(B=A^{-1}\text{,}\) then \(A = B^{-1}\text{.}\)
Suppose that \(\xvec\) is an \(n\)-dimensional vector. Since \(AB=I\text{,}\) explain why \(AB\xvec =
\xvec\) and use this to explain why the only vector for which \(B\xvec = \zerovec\) is \(\xvec = \zerovec\text{.}\)