Lecture #19

Scientific Computing Using Python - PHYS:4905

Lecture Notes #19 - Prof. Kaaret

These notes borrow from Linear Algebra by Cherney, Denton, Thomas, and Waldron.

String Theory

If we tie down both ends of a string, we can excite standing waves on the string. The ends of the string must be at nodes (zero amplitude of displacement), since they are tied down. Standing waves must have wavelength 𝜆=2𝐿/𝑛, where L = length of string. The different allowed wavelengths are “harmonics”.

The waves on the string obey the wave equation

\frac{∂^2}{∂t^2} y(x,t) = c^2 \frac{∂^2}{∂x^2} y(x,t)

where c is the speed of the wave.

The solutions to the wave equation are functions of the form

y(x,t) = A \, \sin(kx) \sin(ωt)

. To check, we need to insert this into the wave equations. Taking derivatives, we find

\frac{∂}{∂t} A \, \sin(kx) \sin(ωt) = ωA \, \sin(kx) \cos(ωt)

\frac{∂^2}{∂t^2} A \, \sin(kx) \, \sin(ωt) = \frac{∂}{∂t} \left( \frac{∂}{∂t} A \, \sin(kx) \, \cos(ωt) \right) = \frac{∂}{∂t} ωA \, \sin(kx) \, \cos(ωt) = - ω^2 A \, \sin(kx) \, \sin(ωt)

and

\frac{∂}{∂x} A \, \sin(kx) \, \sin(ωt) = kA \, \cos(kx) \, \sin(ωt)

\frac{∂^2}{∂x^2} A \, \sin(kx) \sin(ωt) = \frac{∂}{∂x} \left( \frac{∂}{∂x} A \, \sin(kx) \sin(ωt) \right) = \frac{∂}{∂x} kA \, \cos(kx) \,\sin(ωt) = - k^2 A \, \sin(kx) \, \sin(ωt)

Putting these into the wave equation, we find

ω^2 A \, \sin(kx) \, \sin(ωt) = c^2 k^2 A \, \sin(kx) \, \sin(ωt)

The a and the sine functions cancel, so we have a solution if

ω = ck

.

For our string theory problem, the boundary condition at x = 0 is automatically satisfied,

x = 0 \; ⇒ \; y = \sin(k x) = \sin(k \, 0) = \sin(0) = 0

To satisfy the boundary condition at x = L, we require

y(x = L) = 0 = \sin(k L) \; ⇒ \; k = \frac{nπ}{L}

since sine has zeros at 0, π, 2π, 3π, ...

The solutions to our string theory problem are then any linear combination of functions of the form

\sin(ω_nt) \, \sin\left({\frac{nπx}{L}}\right)

where

ω_n = nπc/L

A Linear Operator for String Theory

Now we'll translate this to linear functions and vector spaces.

We can define a linear function

W = \frac{∂^2}{∂t^2} - c^2 \frac{∂^2}{∂x^2}

that acts on the vector space V of functions y(x, t) for which all partial derivatives of the form

\frac{∂^{k+m}}{∂x^k \, ∂t^m} y(x,t)

where k and m are any positive integer exist.

The function L is linear if

L(ru+sv) = rL(u) + sL(v)

for all

u,v ∊ V

and

r,s ∊ ℝ

. Is that true of W acting on any y that is an element of V?

Note that vector spaces needs to have closure under addition and scalar multiplication and that a linear operator defined on the space also needs to have closure - the resultant of the linear operator must also be in the vector space. Do you see why the condition on partial derivatives is needed?

Now our wave equation can be written as W y = 0.

It is called a homogeneous equation because one side of the equation is zero. The solutions of this equation are a "sub-space" of the vector space V. This particular sub-space, the set of all elements y in the vector space V for which W y = 0, is called the "null space" of W. It is also called the "nullspace" (nulling out the space ;) and the "kernel" (which unfortunately has nothing to do with Colonel Sanders). It is sometimes written as ker(W) or Null(W). The nullspace is also a vector space. Most of the properties needed to make the nullspace a vector space transfere automatically from V. The only one that you might worry about is closure. (People often have a hard time getting closure.)

Eigenvalues and Eigenvectors

Let's look at just the spatial parts of our functions. We can introduce a vector space U that is the set of functions f(x) for which all derivatives exist. (The derivatives are, of course, with respect to x and are total derivatives since the functions depend only on x.) We can introduce a linear operator L on this vector space that is the spatial part of W,

L = c^2 \frac{d^2}{dx^2}

If we take the spatial part of any of the solutions to our string theory problem,

f_n(x) = \sin\left({\frac{nπx}{L}}\right)

then we find that

L(f_n) = c^2 \frac{d^2}{dx^2}\, \sin\left({\frac{nπx}{L}}\right) = c^2 \frac{d}{dx} \frac{d}{dx} \sin\left({\frac{nπx}{L}}\right) = c^2 \frac{d}{dx} \frac{nπ}{L} \cos\left({\frac{nπx}{L}}\right)

= - c^2 \frac{n^2π^2}{L^2} \cos\left({\frac{nπx}{L}}\right) = - \left( \frac{nπc}{L} \right)^2 \sin\left({\frac{nπx}{L}}\right) = - ω_n^2 \, \sin\left({\frac{nπx}{L}}\right) = -ω_n^2 \, f_n

L(f_n) = λ_n \, f_n

where

λ_n = -ω_n^2

. The equation just above is called the eigenvalue-eigenvector equation or just the eigenvector equation. "Eigen" is a German adjective meaning own, as in "my own house", and characteristic, as in "he answered with his characteristic cynicism". The eigenvector of a linear operator are the vectors whose direction is unchanged when acted on by the linear operator, i.e. the result of L acting on f_n is a scalar multiplied by f_n. The scalar is the eigenvalue of that eigenvector, which in our equation above is λ_n. The eigenvectors provide a means to characterize a linear operator. For this reason, the textbook says that "This is perhaps one of the most important equations in all of linear algebra!".

Invariant Directions

The eigenvectors are directions where the output of the linear operator is parallel to the input. These are called 'invariant directions', since they are not changed by the linear operator. Finding the eigenvectors, or the invariant directions, and then changing to a basis defined by the eigenvectors greatly simplifies calculations with the linear operator.

Let's try an example. Consider a linear operator L acting on

ℝ^2

defined by the transformations

L \,\begin{pmatrix} 1 \\ 0 \end{pmatrix} = \begin{pmatrix} -4 \\ -10 \end{pmatrix}

and

L \, \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 3 \\ 7 \end{pmatrix}

The matrix for L in the standard basis is then

L = \begin{pmatrix} -4 & 3 \\ -10 & 7 \end{pmatrix}

.

Let's make an inspired guess and look at

L \begin{pmatrix} 3 \\ 5 \end{pmatrix} = \begin{pmatrix} -4•3+3•5 \\ -10•3+7•5 \end{pmatrix} = \begin{pmatrix} 3 \\ 5 \end{pmatrix}

The output vector is the same as the input vector. This means that the vector is along an invariant direction and is an eigenvector. In this case, the output vector has the same magnitude as the input vector, so the eigenvalue = 1.

Note that any vector in the same direction as

v_1 = \begin{pmatrix} 3 \\ 5 \end{pmatrix}

can be written as a scalar, c, multiplied by

v_1

. We then have

L(cv_1) = cL(v_1) = cv_1

. So the new vector

cv_1

is also an eigenvector of L with the same eigenvalue of 1.

Now let's make another inspired guess,

L(v_2) = L \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \begin{pmatrix} -4•1+3•2 \\ -10•1+7•2 \end{pmatrix} = \begin{pmatrix} 2 \\ 4 \end{pmatrix} = 2 \, \begin{pmatrix} 1 \\ 2 \end{pmatrix} = 2 \, L(v_2)

The vector

v_2

is another eigenvector of L, but this time the magnitude of the output is twice the magnitude of the input, so the eigenvalue is 2. Any vector in the same direction as

v_2

gets stretched by L by a factor of 2.

Diagonalization

We can write any vector in

ℝ^2

as a linear combination of

v_1

and

v_2

w = r v_1 + s v_2

. Applying L, we find

L(w) = L(rv_1 + sv_2) = rL(v_1) + sL(v_2) = rv_1 + 2sv_2

Transforming to a basis using

v_1

and

v_2

as basis vectors, the vector w would be written as

w = \begin{pmatrix} r \\ s \end{pmatrix}

and the matrix representation of L would be

L = \begin{pmatrix} 1 & 0 \\ 0 & 2 \end{pmatrix}

. The matrix is diagonal and the diagonal entries are the eigenvalues. As you might imagine, this greatly simplifies calculations involving L.

The process of transforming to a basis consisting of the eigenvectors is called diagonalization. To do this, we can't always rely on inspired guesses, but instead need an algorithm.

Let's try finding the eigenvectors for the linear transformation L described in the standard basis by the matrix

M = \begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix}

.

We want to find invariant directions. That means that L will transform the vector

\begin{pmatrix} x \\ y \end{pmatrix}

to a multiple of that vector. In matrix notation,

\begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = λ \, \begin{pmatrix} x \\ y \end{pmatrix}

Re-writing the right hand side as a matrix expression,

\begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} λ & 0 \\ 0 & λ \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix}

and subtracting the same matrix from both sides, we find

\begin{pmatrix} 2-λ & 2 \\ 16 & 6-λ \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} λ-λ & 0 \\ 0 & λ-λ \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}

A system of equations where the column vector part is zero is called a homogeneous system. Homogeneous systems have non-zero solutions only if the inverse to the matrix part does not exist. To see this, think about the situation if that inverse exists, then we can multiply both sides from the left by the inverse and we would find that x = y = 0. The inverse of the matrix exists if its determinant is not zero, so we need to have the determinant of the matrix equal to zero in order for the linear operator to be diagonalizable, which is a fun word to say.

In equations, we need

\det \begin{pmatrix} 2-λ & 2 \\ 16 & 6-λ \end{pmatrix} = 0 = (2-λ)(6-λ)-32 = λ^2-8λ-20 = (λ-10)(λ+2)

We find that L has two eigenvalues which are

λ_1 = 10

and

λ_2 = -2

. The non-matrix terms in the equation above are a polynomial in λ that is the negative of the so-called characteristic polynomial.

Now we need to find the corresponding eigenvectors. The eigenvectors must satisfy the equation,

\begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = λ \, \begin{pmatrix} x \\ y \end{pmatrix}

With the appropriate value of λ. As we did above, we can write write this as a homogeneous system,

\begin{pmatrix} 2-λ & 2 \\ 16 & 6-λ \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}

Plugging in

λ_1 = 10

, we get

\begin{pmatrix} -8 & 2 \\ 16 & -4 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}

The top row gives us

-8x + 2y = 0

y = 4x

. The bottom row gives the same relation. We are free to choose any length for the vector as long as it points in the invariant direction, so we can choose x = 1 and get the eigenvector

v_1 = \begin{pmatrix} 1 \\ 4 \end{pmatrix

Doing the same for

λ_2 = -2

, we find

\begin{pmatrix} 4 & 2 \\ 16 & 8 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix} \;\; ⇒ \;\; v_2 = \begin{pmatrix} 1 \\ -2 \end{pmatrix}

Algorithm for diagonalization

Let's review our algorithm for finding eigenvalues and eigenvectors so that we can put a matrix into diagonal form.

First, we found the characteristic polynomial of the matrix M representing the linear operator L. In equation form the characteristic polynomial is

P_M(λ) = \det(λI - M)

Second, we found the roots of the characteristic polynomial. Actually, we found the roots of its negative,

\det(M-λI) = 0

. The roots are the same, but writing it this way tends to save on minus signs.

Third, we plugged each eigenvalue into the equation

(M - λ_n I)v_n = 0

to find the associated eigenvector. Note that only the direction of the eigenvector is important. You are free to choose the magnitude.

Doesn't this newfound knowledge make you want to go out and diagonalize some matrices?

Having issues

One issue that might come up is that the roots of the characteristic polynomial may include complex values. The Fundamental Theorem of Algebra states that any polynomial can be factored into a product of first order polynomials, but only if you allow complex numbers in the first order polynomials and therefore complex roots. In physics, we usually only consider operators that have real eigenvalues and we'll discuss conditions that insure this later.

Another issue that might come up is that some of the roots of the characteristic polynomial may be the same. The number of times that any given root appears in the collection of eigenvalues is called its multiplicity. If you have a root with a multiplicity greater than one, then your equation in step three for that root will have multiple, linearly-independent solutions. In fact, the number of linearly independent vectors satisying the equation will equal the multiplicity. In that case, you are free to choose as your eigenvectors any set of linearly independent vectors satisying the equation with the number of vectors in the set equal to the multiplicity. The set of eigenvectors with the same eigenvalue λ_n span a sub-space that is called the eigenspace of λ_n.

Change of basis

Another condition that must hold for a linear operator to be diagonalizable (still fun to say) is that it must be possible to write the operator as a matrix in a basis consisting of the eigenvectors of the operator. As we saw above in our 2×2 matrix example, the matrix for the linear operator will be diagonal in this basis with the diagonal entries equal to the eigenvalues.

Let's call set of basis vectors for the standard basis

B = (b_1, b_2, ...)

. In the standard basis, we have

b_1 = \begin{pmatrix} 1 \\ 0 \\ 0 \\ ⋮ \end{pmatrix}, \;\;\; b_2 = \begin{pmatrix} 0 \\ 1 \\ 0 \\ ⋮ \end{pmatrix}, \;\;\; b_3 = \begin{pmatrix} 0 \\ 0 \\ 1 \\ ⋮ \end{pmatrix}, \;\;\; ⋯

Our basis vectors for the new basis where the linear operator is diagonal is

E = (v_1, v_2, ...)

where the vectors are the eigenvectors.

If our linear operator is diagonalizable, then we can transform from the basis B to the basis V by writing each

v_i

as a linear combination of the

b_i

. In equations,

v_k = \underset{i}{∑} b_i p_k^i \;\;\;\; ⟺ \;\;\;\; (v_1, v_2, ...) = (b_1, b_2, ...) \begin{pmatrix} p_1^1 & p_2^1 & ⋯ & p_n^1 \\ p_1^2 & p_2^2 & ⋯ & p_n^2 \\ ⋮ & ⋮ & & ⋮ \\ p_1^n & p_2^n & ⋯ & p_n^n \\ \end{pmatrix}

The are constants and we can think of them as forming a square matrix P, as shown above. The matrix P is called a change of basis matrix. Note that the values

p_k^i

are numbers, but the

ab_i

and the

v_k

are vectors.

Let's look at the first vector in our new basis,

v_1

, we can write this vector in terms of the B basis vectors as

v_1 = p_1^1 b_1 + p_1^2 b_2 + ⋯

This is the same as the equation on the left with k = 1 or doing the multiplication in the equation on the right for only the first column. From this, we can see that the components of the first column of the matrix P are the same as the components of the vector

v_1

in the basis B. Therefore, the columns of the change of basis matrix are the components of the new basis vectors in terms of the old basis vectors.

The change of basis matrix lets us take a vector u written in components in the basis V, which we will call

u_V

, to and write it in component in the basis B, which we will call

u_B

, by doing a matrix multiplication,

u_B = P \, u_V

. To see that this is correct, think about the first eigenvector. In the basis V, it is

(v_1)_V = \begin{pmatrix} 1 \\ 0 \\ ⋮ \\ \end{pmatrix}

. In the basis B, it is

(v_1)_B = \begin{pmatrix} p_1^1 \\ p_1^2 \\ ⋮ \\ \end{pmatrix}

.

Multiplying

(v_1)_V

by P gives you

(v1)_B

.

If we change from the basis V to the basis B, we must also be able to change from basis B to basis V. Let's call the matrix that does that Q, so

u_V = Q \, u_B

. Clearly, if we change u from the basis B to the basis V and then back to the basis B, we should get the same vector that we started with, so

u_B = P \, u_V = P \, Q \, u_B \;\;\; ⇒ \;\;\; P \, Q = I \;\;\; ⇒ \;\;\; Q = P^{-1}

This means that the matrix Q must be the inverse of the matrix P. It also means that P must be invertiable if we can actually do the transformation of basis.

So, to transform a vector u from the basis B to the basis V, we do the matrix multiplication

u_V = (P^{-1}) \, u_B

To transform the matrix M representing the linear operator L from the basis B to the basis V, we use the following equation. Since the new matrix is diagonal, we call it D.

D = M_V = P^{-1} M P

We conclude that a matrix M is diagonalizable if there exists an invertible matrix P and a diagonal matrix D that satisfy the equation above.

A worked example

Let's take the linear transformation L from the Diagonalization section above. In the standard basis, the linear transformation is described by the matrix

M = \begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix}

.

We found that the eigenvectors are

λ_1 = 10

and

λ_2 = -2

and that the eigenvectors are

v_1 = \begin{pmatrix} 1 \\ 4 \end{pmatrix} \, , \; v_2 = \begin{pmatrix} 1 \\ -2 \end{pmatrix}

.

We can construct the change of basis matrix P using the eigenvectors. It is

P = \begin{pmatrix} 1 & 1 \\ 4 & -2 \end{pmatrix}

.

To change the matrix M into the diagonal matrix D, we need the inverse of P. Reviewing from lecture #11, we can either do this directly from the equation for the inverse of a 2x2 matrix given at the beginning of that lecture or use the formula that the end which gives the inverse in terms of the adjoint and determinant of the matrix. Note that the latter generalizes to larger matrices. We find

\det(P) = -6, \;\;\;\; adj(P) = \begin{pmatrix} -2 & -1 \\ -4 & 1 \end{pmatrix}, \;\;\;\; P^{-1} = \frac{1}{6} \begin{pmatrix} 2 & 1 \\ 4 & -1 \end{pmatrix}

Then

D = P^{-1} M P = \frac{1}{6} \begin{pmatrix} 2 & 1 \\ 4 & -1 \end{pmatrix} \, \begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix} \, \begin{pmatrix} 1 & 1 \\ 4 & -2 \end{pmatrix} = \begin{pmatrix} 10 & 0 \\ 0 & -2 \end{pmatrix}

This matches our prescription above that the new matrix for the linear transformation L should be diagonal in this basis with the diagonal entries equal to the eigenvalues.

Note that changing basis changes the matrix representing a linear transformation but does not change the linear transformation itself. The linear transformation maps input vectors to output vectors with a one-to-one correspondence of input vector to output vector. This mapping stays the same no matter which basis we use. Linear transformations are the actual objects of study of linear algebra, not matrices. Matrices are merely a convenient way of doing computations.

To test this, let's calculate the action of L on the vector u in both basis. We choose u to have components both components equal 1 in the standard basis. Then

u_B = \begin{pmatrix} 1 \\ 1 \end{pmatrix} \, , \;\;\; M u_B = \begin{pmatrix} 2 & 2 \\ 16 & 6 \end{pmatrix} \, \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 4 \\ 22 \end{pmatrix}

Transforming u into the eigenvector basis V, we find

u_V = (P^{-1}) u_B = \frac{1}{6} \begin{pmatrix} 2 & 1 \\ 4 & -1 \end{pmatrix} \, \begin{pmatrix} 1 \\ 1 \end{pmatrix} = \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix}

(Mu)_V = (P^{-1}) (Mu_B) = \frac{1}{6} \begin{pmatrix} 2 & 1 \\ 4 & -1 \end{pmatrix} \, \begin{pmatrix} 4 \\ 22 \end{pmatrix} = \begin{pmatrix} 5 \\ -1 \end{pmatrix}

Doing the matrix multiplication in the V basis, we find

D u_V = \begin{pmatrix} 10 & 0 \\ 0 & -2 \end{pmatrix} \, \begin{pmatrix} 0.5 \\ 0.5 \end{pmatrix} = \begin{pmatrix} 5 \\ -1 \end{pmatrix}

The linear transformation describes the same mapping from the input to the output vector, regardless of the basis that we use.

Isn't that nice?

To solidify your understanding, try finding the eigenvalues and eigenvectors of the linear operator L written in the standard basis as

M = \begin{pmatrix} -4 & 3 \\ -10 & 7 \end{pmatrix}

. Then find the change of basis matrix and calculate the action of L on the vector u that we specify in the standard basis as

u_B = \begin{pmatrix} 2 \\ 1 \end{pmatrix}