Lecture #10

Scientific Computing Using Python - PHYS:4905

Lecture #10 - Prof. Kaaret

Matrices, their inverses, and determinants

These notes borrow from Linear Algebra by Cherney, Denton, Thomas, and Waldron.

Transpose of a matrix

The transpose

T = M^T

of a

n

r × k matrix is a

k × r matrix

T= ( m_i^j )

where i = 1, 2, ..., r and j = 1, 2, ..., k. The elements are swapped across the diagonal. If the matrix M is not square, then the transpose will have a different shape. For example, the transpose of a column vector is a row vector.

Taking the transpose twice gets you back to the original matrix.

The transpose of the product of matrices equals the product of the transposes with the order swapped,

(MN)^T = N^T M^T

If the transpose equals the matrix,

M = M^T

, then the matrix is symmetric. Of course, only square matrices can be symmetric (just from their geometry).

The inverse of the transpose is the transpose of the inverse,

(M^{-1})^T = (M^T)^{-1}

. I'll let you work that one out for yourself.

Raising a matrix to a power

We can define powers of matrices, e.g.

M^2 = MM

, but only for square matrices. Why?

In general, to find Mⁿ, you simply multiply M times itself n times.

Similar to the fact that

x^{0} = 1

for all real numbers, we define

M^0 = I

, where I is the identity matrix.
This allows us to evaluate any polynomial on any square matrix.

Exercise: Let

f(x) = 3x + 2x^3

and

M = \begin{pmatrix} 2 & 1 \\ 2 & 3 \end{pmatrix}

. What is f(M)?

Trace

The trace of a square matrix is the sum of its diagonal entries,

tr(M) = \underoverset{i=1}{n}{∑} \, m_i^i

Taking the transpose does not affect the trace, since the transpose doesn't change any of the diagonal elements, tr(M) = tr(M^T).

Even though matrix multiplication does not commute, the trace of a product of matrices does not depend on the order of multiplication, tr(MN) = tr(NM).

The trace operator is a linear operation that transforms matrices to the real numbers.

Block matrices

Sometimes it is efficient to break a matrix into blocks. For example,

You can then work out matrix multiplication using the smaller matrices in the blocks.
For example, we can find the square of M using there blocks. First, we do the matrix math on the blocks in schematic form,

Then, we can calculate each of these expressions (which are polynomials in A, B, C, and D) using the matrices for each block as defined above,

Then, we substitute these results back into the schematic form of our matrix multiplication and get the answer,

This is exactly what we would have gotten if we did the full matrix multiplication.

This example doesn't actually save much work. Block matrices are much more useful if some of the blocks are zero or an identity matrix.

Exercise: The matrix

\begin{pmatrix} \cos \, θ & \sin \,θ \\ -\sin \, θ & \cos \, θ \end{pmatrix}

rotates a vector in a two-dimensional space through an angle

θ

. We can use a block matrix to do a rotation in three dimensions. The matrix

M(θ) = \begin{pmatrix} \cos \, θ & \sin \, θ & 0 \\ - \sin \, θ & \cos \, θ & 0 \\ 0 & 0 & 1 \end{pmatrix}

will rotate a 3-dimensional vector (x, y, z) around the z axis (or in the xy plane) through an angle

θ

.

Using block matrices, find the matrix product

M(α) M(β)

.
What does this matrix do when applied to a 3-dimensional vector (x, y, z)?
What matrix would rotate a 3-dimensional vector (x, y, z) around the x axis?

Solving linear equations with the inverse

The process of Gaussian elimination to find the solution of a system of linear equations is equivalent to finding an inverse matrix. Let's look at this in matrix form.

We start with a system of linear equations that we can write in matrix form as MX = V, where M is a matrix and X and V are column vectors. The problem is specified by M and V. The solution is X.

We write the system of equations in augmented matrix form as (M | V). We apply a bunch of elementary row operations to M that reduce it to the identity matrix I. The operations taken together are equivalent to multiplying by M^-1. We apply the same operations to V to get the solution. This is the same as multiplying V by M^-1, so the solution of the system of linear equations is

X = M^-1V.

If we want to solve another system of linear equations that has the same M, but a different V that we'll call W, we have already done most of the work. We just need to find M^-1 W.

When does the inverse exist?

A square matrix M is invertible if and only if the homogeneous system of equations Mx = 0 has no non-zero solutions.

If M^-1 exists, then we can multiply both sides of Mx = 0 by the inverse. We get

M^{-1}M x = M^{-1} 0 → I x = 0 → x = 0

.

So the only solution of Mx = 0 must be x = 0.
If M^-1 does not exist, then we cannot multiply both sides of Mx = 0 by the inverse.

This condition must be satisfied for a matrix to be invertible, but is it sufficient to ensure the existence of the inverse?

Inverse of a 2×2 matrix

If we have a 2×2 matrix $M = \begin{pmatrix} a & b \\ c & d \end{pmatrix}$ then its inverse is $M^{-1} = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}$ .

Some of you may already know this formula. Or you can verify that it is correct by doing the matrix multiplication.

Here, we write it down because it is useful for figuring out when a 2×2 matrix has an inverse. The matrix will have an inverse as long as the expression in the denominator of the fraction does not equal zero, specifically $ad - bc \neq 0$ . We call that expressing the determinant. (Mostly because it determines whether or not the matrix has an inverse.)

Determinants

If we write the determinant for a 2×2 matrix using the index notation for the matrix elements, we have

\det(M) = m_1^1 m_2^2 - m_2^1 m_1^2

The inverse of M exists as long as

\det(M) \neq 0

.

How do we generalize this to more dimensions?

First, we need a little background thinking on permutations or ways to re-arranged things.

Let's consider an ordered set of 3 objects, that we'll call (1, 2, 3). How many ways can we shuffle those 3 objects?

We can build up all the different arrangements by swapping pairs of objects.
Start with (1, 2, 3).
Swap the last two to get (1, 3, 2) - odd.
Swap the first two to get (3, 1, 2) - even.
Swap the last two to get (3, 2, 1) - odd.
Swap the first two to get (2, 3, 1) - even.
Swap the last two to get (2, 1, 3) - odd.

Can we get any more permutations? How about swapping the last two? How about swapping the first two? How about swapping the first and the last?

So, we have 6 permutations of 3 objects. In general, there are n! permutations of n objects since there are n choices for the first object, n-1 choices for the second once the first has been chosen, and so on.

We also define the sign sgn(σ) for the different permutations σ. The sign is +1 if the permutation is even, meaning it takes an even number of swaps to reach it from the original ordering, and -1 if the permutation is odd. Above, we recorded the parity (whether odd or even) of the swaps.

Now we can write down an equation for the determinant,

\det(M) = \underset{σ}{∑} sgn(σ) \, m_{σ(1)}^1 \, m_{σ(2)}^2 \, ⋯ \, m_{σ(n)}^n

The sum is over all permutations of the set (1, 2, ... n) where n is the dimension of the matrix. Each term in the sum for a determinant is a product of n entries, each from a different row of the matrix. In the different terms, the column indexes are shuffled by the different permutations σ.

Does this work for a 2×2 matrix?

We start with the set (1, 2). The only permutation is (2,1), which is odd. Note 2! = 2. We have to sum over σ = (1,2) and (2,1) with the latter coming in with a minus sign. So, we get

\det(M) = +m_1^1 \, m_2^2 - m_2^1 \, m_1^2

which matches what we found above.

How many terms are in the sum for the determinant of a 3×3 matrix? How about a 5×5 matrix?

Exercise: what is the determinant of a 3×3 matrix?

Fun facts about determinants

If the matrix has a row consisting entirely of zeros, then det(M) = 0. You can see this by going back to the description of the terms above. Each term in the sum for a determinant is a product of n entries, each from a different row of the matrix. If there is a row with all zeros, then every term will have at least one factor that equals zero. Therefore, the product will be zero. Since this is true for every term, the sum will also be zero.

If M is diagonal, then the determinant is the product of the diagonal entries, or

\det(M) = m_1^2 \, m_2^2 \, ⋯ \, m_n^n

. This is true since all of the summands involving off-diagonal entries are zero.

The determinant of the identity matrix is 1. All of the diagonal entries are 1, so their product is 1. This is true regardless of the dimension of the matrix.

If we swap two rows of the matrix, then the determinant changes sign. This one is best considered in a quiet room. However, you can easily check that it is true for a 2×2 matrix.

This implies that if two rows of the matrix are identical, then the determinant is zero. Since zero is the only number that equals its negative.

The determinant of the product of two matrices is the product of the determinants,

\det(MN) = \det(M) \, \det(N)

This turns out to be very useful. For a proof, see section 8.2.4 of the Linear Algebra textbook.

If M is a square matrix, then

\det(M^T) = \det( M)

If the inverse of M exists, then

\det(M^{-1}) = \frac{1}{\det(M)}

Since we are fond of elementary row operations, we can figure out the determinants of the corresponding matrices.

Row swap - we did this one above. The determinant of a row swap is -1.

Row multiplication by a scalar - The matrix for this is the identity matrix with one element on the diagonal replaced by a number, λ, not equal to one. The matrix is diagonal, so the determinant is the product of the diagonal entries, which equal to λ.

Row addition - The matrix for this is the identity matrix with one element off the diagonal replaced by a number, λ, not equal to zero. The determinant is 1. This is worked out in section 8.2.3 of the Linear Algebra textbook.

Minors, adjoints, and inverses

A minor of an n×n matrix M is the determinant of any square matrix obtained from M by deleting one row and one column. If we pick entry

m_j^i

, you then delete row i and column j of M and the resulting matrix is the minor associated with that entry.

The cofactor of M corresponding to the entry

m_j^i

is the product of the minor associated to

m_j^i

and
(-1)^i+j . This is written as cofactor

(m_j^i)

.

The adjoint matrix of M is found by taking the cofactor of each entry of M and then taking the transpose,

adj(M) = (cofactor

(m_j^i)

_{}^{})

where i and j run from 1 to n.

An example is

Run i and j run from 1 to 3 and check each entry.

All of this is useful because we can use the adjoint along with the determinant to find the inverse,

M^{-1} = \frac{1}{\det(M)} adj(M)

This gives an explicit formula for the inverse. This is sometimes called Cramer's rule. Cramer's rule is the basis of the formula given at the beginning of the lecture for the inverse of a 2×2 matrix. That formula is quicker than doing Gaussian elimination. Finding the inverse of 3×3 matrix using Cramer's rule is also a reasonable task.

However, let's consider a 5×5 matrix. The first bit of calculation would be to find the determinant in the denominator of the fraction. How many terms are there in the sum for the determinant of a 5×5 matrix? The answer is 120. Then one would have to find determinants for 25 different 4×4 matrices, each with 24 terms. Along with the first determinant, we're looking at 720 terms each with 4 or 5 factors. Cramer's rule rapidly become computationally tedious for large matrices. Gaussian elimination also become tedious, but less so. The number of operations needed for Gaussian elimination is usually equal to the number operations needed to calculate the first determinant for Cramer's rule.

When does the inverse exist? (the remake)

A matrix has an inverse if its determinant is non-zero. If the determinant is zero, the inverse does not exist and the matrix is singular.

The determinant can also be used to determine if a set of vectors is linearly dependent or independent. Recall from lecture #9 that a set of vectors v₁, v₂, ..., v_n is linearly dependent if there exist a set of scalars c¹, c², ..., cⁿ that are not all zero such that

c¹v₁ + c²v₂ + ... + cⁿv_n = 0

Let's rewrite this with the c's on the right and write the vectors are column vectors,

\begin{pmatrix} (v_1)_1 \\ (v_1)_2 \\ ⋮ \\ (v_1)_n\end{pmatrix} \, c^1 + \begin{pmatrix} (v_2)_1 \\ (v_2)_2 \\ ⋮ \\ (v_2)_n\end{pmatrix} \, c^2 + ⋯ + \begin{pmatrix} (v_n)_1 \\ (v_n)_2 \\ ⋮ \\ (v_n)_n\end{pmatrix} \, c^n = 0

We can think of this as matrix multiplication V c = 0 with

\begin{pmatrix} (v_1)_1 & (v_2)_1 & ⋯ & (v_n)_1 \\ (v_1)_1 & (v_2)_1 & ⋯ & (v_n)_1 \\ ⋮ & ⋮ & & ⋮ \\ (v_1)_1 & (v_2)_1 & ⋯ & (v_n)_1 \\ \end{pmatrix} \begin{pmatrix} c^1 \\ c^2 \\ ⋮ \\ c^n \end{pmatrix} = 0

This is a homogeneous system of equations and has a solution only if M^-1 does not exist which occurs when matrix M is singular and its determinant is zero. Therefore, the set of vectors is linearly dependent only if the det(V) = 0.

Assignment

The assignment for today's class involves matrices and their inverses.

HW #10 is due at on 10/3.