Scientific Computing Using Python - PHYS:4905

Lecture #05 - Prof. Kaaret

Matrices and Gaussian elimination

These notes borrow from Linear Algebra by Cherney, Denton, Thomas, and Waldron and the WikiPedia pages on Augmented matrix and Gaussian elimination.  You need to use the FireFox browser to view the web page with matrices.

Matrices


An  r × k matrix is a rectangular array of numbers M=(mji)M = ( m_j^i ) where i = 1, 2, ..., r and j = 1, 2, ..., k.  Each number m is an element of the matrix.

We can write the matrix in the form

M=(m11m21mk1m12m22mk2m1rm2rmkr)M = \begin{pmatrix} m_1^1 & m_2^1 & ⋯ & m_k^1 \\ m_1^2 & m_2^2 & ⋯ & m_k^2 \\ ⋮ & ⋮ & & ⋮ \\ m_1^r & m_2^r & ⋯ & m_k^r \\ \end{pmatrix}

An r × k matrix has r rows and k columns.  A matrix is called square if its two dimensions are equal.  Makes sense, since you then have a square pattern of numbers when you write it down.

An  r × 1 matrix is a column vector,
 
v=(v1v2vr)v = \begin{pmatrix} v^1 \\ v^2 \\ ⋮ \\ v^r \end{pmatrix}
A 1 × r  matrix is a row vector,
vT=(v1v2vr)v^T = \begin{pmatrix} v^1 & v^2 & ⋯ & v^r \end{pmatrix}

Some people like to add commas to separate the elements of row vectors.

Vectors written in this form are just like vectors written in terms of x, y, z, components as you did in Physics I.  You write a vector specifying the position of an object in 3D space by writing the components as a column vector or as the sum of the components multiplied by unit vectors along each axis,

v=(xyz)=xi^+yj^+zk^v = \begin{pmatrix} x \\ y \\ z \end{pmatrix} = x \, \hat{i} + y \, \hat{j} + z \, \hat{k}
The set of unit vectors {i^,j^,k^}\{ \hat{i}, \hat{j}, \hat{k} \} is called a basis and we'll talk a lot about bases later.

Operations on Matrices

Scalar multiplication: We can multiply a matrix by a scalar, which means that we multiply each element by the scalar.

rM=(rmji)rM = ( rm_j^i )
Addition: We can add matrices, which means that we add each elements of each matrix.  Note that the two matrices must have the same dimensions for addition to be defined.

M+N=(mji+nji)M + N = (m_j^i + n_j^i)
Matrix multiplication: We can multiply matrices,

MN=p=1kmpinjpM N = {∑}_{p=1}^{k} m_p^i n_j^p
Matrix multiplication is not simply multiplying each element of M by the corresponding element of N.  Rather, it is (exactly) like the dot product - you multiply several elements in M by elements in N and then sum the products. 

The simplest example is multiplying a  1 × r  matrix (a row vector) times a r × 1  matrix (column vector). For example,

(123)(456)=1×4+2×5+3×6=32\begin{pmatrix} 1 & 2& 3 \end{pmatrix} \begin{pmatrix} 4 \\ 5 \\ 6 \end{pmatrix} = 1×4 + 2×5 + 3×6 = 32

This is the same as the dot product - you multiply each pair of corresponding components and then sum the products.  Note that the row and column vectors need to have the same length.  This is equivalent to the second dimension of the first matrix equaling the first dimension of the second matrix.

Let's look at multiplying a 2×2 matrix times a 2-component column vector.  To do this graphically, we take the column vector and rotate it by 90° counterclockwise (which basically makes it into a row vector).  To get the top line of the resultant column vector, we line up the rotated vector with the top line of the matrix and take the product of each component of the vector with the corresponding component in the matrix.  In this case a×e and b×g.  Finally, we sum the products.  To get the bottom line of the resultant column vector, we line up the rotated vector with the bottom line of the matrix and follow the same procedure.  In the end, we find

(abcd)(eg)=(ae+bgce+dg)\begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} e \\ g \end{pmatrix} = \begin{pmatrix} ae + bg \\ ce + dg \end{pmatrix}
Now let's try multiplying a 2×2 matrix by another multiplying a 2×2 matrix.  You can think of the second 2×2 matrix as a pair of column vector and use the same procedure as above.  Try it.  The answer is

(abcd)(efgh)=(ae+bgaf+bhce+dgcf+dh)\begin{pmatrix} a & b \\ c & d \end{pmatrix} \begin{pmatrix} e & f \\ g & h \end{pmatrix} = \begin{pmatrix} ae + bg & af + bh \\ ce + dg & cf + dh \end{pmatrix}
In general, doing this for each of the m columns of the second matrix on each of the r rows on the first matrix gives you r × m numbers that form your r × m  output matrix.  Note that the dot product (multiply and add operation) doesn't make sense unless the number of columns of the first matrix equals the number of rows of the second matrix.  So, to multiply two matrices, they do not need to have the same dimensions, but the second dimension of the first matrix must equal the first dimension of the second matrix.

What do you get if you multiply a 3×3 matrix times a column vector?

What do you get if you multiply a 3×3 matrix times a row vector?

What do you get if you multiply a 3×3 matrix times a 3×3 matrix?

What do you get if you multiply a 3×1 matrix times a 1×2 matrix?

In general, multiplying a matrix of the form (r × k) times a matrix of the form (k × m) makes a matrix of the form (r × m).  For matrix multiplication to be define, the number of columns of the first matrix must equal the number of rows of the second matrix.

A Special Matrix

The number 1 is called the 'identity' in multiplication because any number multiplied by the identity is the same number.  Is there a similar identity for matrices?

A square matrix that is zero for all elements off the diagonal is called a diagonal matrix.  A diagonal matrix with all diagonal entries equal to 1 is called the identity matrix.  The identity matrix (I) is special because IM = MI = M.  The identity matrix works just like the multiplicative identity in real numbers (1).  The identity matrix is square and there are an infinite number of identity matrices with different sizes.

Try multiplying the 2×2 identity matrix times a 2×2 matrix.

Linear equations

Linear equations are equations in which all of the terms are either constants or linear in the variables, i.e. they include terms like 2x but none like x2 or x3.  Linear equations can depend on several variables, but the individual terms have to be linear in the variables and cannot have products of the variables, i.e. 3x + 4y is allowed, but 3xy is not.

A system of linear equations is a set of linear equations all involving the same set of variables.  For example,

x+2y+3z=03x+4y+7z=26x+5y+9z=11\begin{matrix} x + 2y + 3z = 0 \\ 3x + 4y + 7z = 2 \\ 6x + 5y +9z = 11 \\ \end{matrix}

Solving linear equations

Now let's try to find the solution of our set of linear equations, i.e. find values for x, y, z that satisfy all of the equations.  First, let's think about what sorts of operations we can do on the equations that keep the solutions of the equations the same.  We'll call those elementary operations.  Then by applying the elementary operations, we can attempt to simplify the equations to the point where we can read off the solutions.

Will the solutions be changed by swapping the order of two equations?

No, they won't.  So, we can define our first elementary operation as swapping two equations.  This seems a bit trivial when thinking of the set as equations, but will be useful when we translate them to matrices.

What other operations can you do that leave the solutions unchanged?

We can multiply any equation by a non-zero number. Since all the terms get multiplied by the same factor, this leaves the solutions unchanged.

Also, we can add any two equations.  Since we apply the summation to both sides of the equation, again the solutions will be unchanged.  More generally, we can add a multiple of any equation to any other equation, replacing the original equation.

Let's try to solve this set of linear equations.
x+y=272x-y=0\begin{matrix} x + y = 27 \\ 2x - y = 0\\ \end{matrix}
Replace the first equation by the sum of the two equations.  We get
3x+0=272x-y=0\begin{matrix} 3x + 0 = 27 \\ 2x - y = 0 \\ \end{matrix}
Divide the first equation by 3.
x+0=92x-y=0\begin{matrix} x + 0 = 9 \\ 2x - y = 0 \\ \end{matrix}
Now we can read off the solution for x as x = 9. 
Now, replace the second equation by the second equation minus 2 times the first equation.
x+0=90-y=-18\begin{matrix} x + 0 = 9 \\ 0 - y = -18 \\ \end{matrix}
Multiply the second equation by -1.
x+0=90+y=18\begin{matrix} x + 0 = 9 \\ 0 + y = 18 \\ \end{matrix}
We can read off the solution for y as y = 18.
Let's check that these values are actually a solution of our original equations.

x+y=9+18=272x-y=2×9-18=0\begin{matrix} x + y = 9 + 18 = 27 \\ 2x - y = 2×9 - 18 = 0 \\ \end{matrix}

Augmented matrices

We can write our system of linear equation in matrix form.
x+y=272x-y=0(112-1)(xy)=(270)\begin{matrix} x + y = 27 \\ 2x - y = 0\\ \end{matrix} \;\;\; ⟺ \;\;\; \begin{pmatrix} 1 & 1 \\ 2 & -1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 27 \\ 0 \end{pmatrix}
We could do all of the manipulations above in matrix form.  Note that we would never touch the column vector with x and y, all of the operations would affect only the numbers in the matrix and in the column vector on the right hand side of the equation.  So, we can write this in an even more compact form, leaving out the column vector with x and y.

(11|272-1|0)\left(\begin{array}{cc|l} 1 & 1 & | \, 27 \\ 2 & -1 & | \, 0 \\ \end{array}\right)

This is called an augmented matrix.  Note that we've removed all of the variables, so this form is very amenable to use with numerical matrix operations.

Our final set of equations, where we could read off the solutions, in augmented matrix form is

(10|901|18)\left(\begin{array}{cc|l} 1 & 0 & | \, 9 \\ 0 & 1 & | \, 18 \\ \end{array}\right)

The left hand side of the augmented matrix (the original matrix) is the Identity Matrix which has ones along the diagonal and zeros elsewhere.  Finding a solution of our linear equations is equivalent to changing the left hand side of the augmented matrix to the identity matrix.  The only changes allowed are those that keep the solutions of the system of equations unchanged, specifically the elementary operations.

Our three elementary operations translated to the augmented matrix representation are

These are called elementary row operations or EROs.

Echelon form

We can derive the solutions to a set of equations if we can change the augmented matrix so it is in Row Echelon Form.  This is an example of a matrix in row echelon form:

(231|6011|2003|3)\left(\begin{array}{ccc|l} 2 & 3 & 1 | \, 6 \\ 0 & 1 & 1 | \, 2 \\ 0 & 0 & 3 | \, 3 \\ \end{array}\right)

All entries to the left of the diagonal are zero.  The leftmost nonzero entry in each row is called the leading coefficient or more commonly the pivot of that row.  Echelon means "level or rank in an organization".  In this context, the first row is the upper echelon because it has the most non-zero entries, while the bottom row is the bottom echelon because it has the fewest non-zero entries.

You can't directly read off the solution to the equation from this form, but you can derive it by "back substitution".  The bottom row is equivalent to the equation 3z = 3, which has the solution z = 1.  The middle row is equivalent to the equation y + z = 2.  Since we know z = 1, we can substitute z into this equation and find y + 1 = 2 or y = 1.  Substituting y and z into the equation derived from the top row, we find x = 0.

To make the back substitution easier, it is preferable to get the augmented matrix a form that has and ones along the diagonal (as much as possible) and the minimum number of non-zero entries off the diagonal.  Some people call this the Reduced Row Echelon Form and define it with the following rules.
If the system of equations has one unique solution, then it will be possible to change the left hand part of the augmented matrix to be the identity matrix.  Once you've done that, you'll be able to read off the solutions from the right hand part of the augmented matrix (the column vector part).  If it is not possible to change the left hand part into the identity matrix, that means that the set of linear equations is redundant, so there are many solutions, or inconsistent, so there are no solutions.

Gaussian elimination

The process of applying elementary row operations to simplify the augmented matrix to Reduced Row Echelon Form is called 'Gaussian elimination'.  We can write down an algorithm for Gaussian elimination.
  1. Check that the first entry of the first row is non-zero.  If not, swap that row with a row that does have a non-zero first entry. (ERO - Row swap)
  2. Multiply the first row by a scalar to make the first entry equal 1. (ERO - scalar multiplication)
  3. Use that 1 as a pivot to zero out the first entry of all of the other rows.  More explicitly, for each other row, multiply the top row by the negative of the first entry in the row and add that to the row. (ERO - row addition)  Note that the first row does not change during this step.
  4. In the first column, you should now have a one in the first row and zeros in the other rows.
  5. Move to the second row.
  6. Check that the second entry of the the current row is non-zero.  If not, swap that row with a row (other than the first row) that does have a non-zero second entry. (ERO - Row swap)
  7. Multiply the second row by a scalar to make the second entry equal 1. (ERO - scalar multiplication)
  8. Use that 1 as a pivot to zero out the second entry of all of the other rows.  More explicitly, for each other row, multiply the second row by the negative of the second entry in the row and add that to the row. (ERO - row addition)  Note that the second row does not change during this step, but the first row probably will.
  9. In the second column, you should now have a one in the second row and zeros in the other rows.
  10. Repeat the process in lines 5 through 9 for the remaining rows.  Note that for the third row, you work on the third entry, etc...
You should practice Gaussian elimination as we will be using it repeatedly in this class.

Practice

Write each following systems of linear equations as an augmented matrix and follow the algorithm above to perform Gaussian elimination and find the solution.

x+5y=7-2x-7y=-5\begin{matrix} x +5y = 7 \\ -2x - 7y = -5 \\ \end{matrix
and
2y+z=-8x-2y-3z=0-x+y+2z=3\begin{matrix} 2y + z = -8 \\ x -2y -3z = 0 \\ -x + y + 2z = 3 \\ \end{matrix}


Assignment

The assignment for today's class is HW #5 and involves doing Gaussian elimination and matrix operations with no programming.  The assignment is due at the beginning of the next class.  It will be very useful to do it before class, since that class will be devoted to doing Gaussian elimination in Python.