Lecture #13

Scientific Computing Using Python - PHYS:4905

Lecture #13 - Prof. Kaaret

Numerical analysis and finding your roots

Numerical analysis is the use of numerical technique to solve mathematical problems. Today, we'll discuss how to find the roots of functions numerically, but first we'll look at some of the issues that arise in numerical analysis.

Floating point numbers and round-off error

Floating point numbers are represented by a finite number of bits in the computer's memory. The number format most often used in numerical computations in Python is the double precision floating point number or "double" in the IEEE 754 binary64 format. A double uses 64 bits and consists of a mantissa or significand, an exponent, and a sign bit s. The mantissa is a binary number b with 52 bits. The exponent is also a binary number p, but has 11 bits. The number represented by the bits is

x = (-1)^s (1.b_51 B_52 … b_0) × 2^{p-1023} = (-1)^s \begin{pmatrix}1 + \underoverset{i=1}{52}{∑} b_{52-i} 2^{-i} \end{pmatrix} × 2^{p-1023}

The exponent is in the 'biased' or 'offset binary' form. That means that p is an unsigned binary number and an offset is subtracted before calculating the exponent. With 11 bits, we can go from 0 to 2047. The two end values are reserved for special numbers, so the allowed exponents, after subtracting the offset, run from -1022 to +1023. The mantissa is a binary number with a leading 1, hence it can represent numbers between 1 and almost 2. The smallest non-zero number that can be represented is

2^{-1022} ≈ 2×10^{-308}

. The largest number that can be represented is

2×2^{1023} ≈ 2×10^{308}

. You never actually want to get anywhere close to these limits.

It's not possible to represent all real numbers with a finite number of bits. "Round-off errors" are the inaccurcies that arise when we try to represent an arbitrary real number with a finite number of bits. Going back to the double, if we take p = 1023, then the exponent will be zero. In this case, the difference between two successive values of the mantissa will be

2^{-52} ≈ 2×10^{-16}

. The maximum rounding error is half of this value. If you use a different exponent, then the rounding error will be multiplied by the exponent. However, the fractional rounding error (error/value) will remain the same.

Discretization error

If we have a function f(x), its derivative is defined as

f'(x) = \underset{h → 0}{\lim} \frac{f(x+h) - f(x)}{h}

Numerically, we can't let h go to zero. From the discussion above, we could try letting h get close to 10^-16, but this would deprive us of numerical accuracy in calculating the difference between f(x) and f(x+h) because the two values might only differ by a few bits. To get better accuracy, a larger value of h is preferred. Typically, when evaluating the derivative at a single point, one would choose h to be some fraction of x, typically in the range 10^-6 to 10^-9. However, in that case we won't be evaluating the true derivative, but rather an approximation to the derivative. This is called discretization error. Note that discretization error is different from round-off error.

Discretization error becomes much more significant if we wish to evaluate the derivative at a large number of points. Imagine we have some function f(x) defined in the interval 0 to 1 and we wish to obtain the derivative over that entire range. To do this, we can evaluate f at a set of points between 0 and 1 with a step size of

Δx

, and find the derivative at each point by applying the discrete derivative formula to each pair of successive points,

f'(x) ≈ \frac{f(x+Δx) - f(x)}{Δx}

In this case, we have a trade off between discretization error and the computation required. If we make the step size small, then we get accurate derivatives but need to do a lot of computations. If we make the step size large, our program will run faster, but the derivatives won't be so accurate. This process of performing calculations on a discrete set of points is sometimes called computing on a lattice or a grid (although grid has other meanings in computations). Performing calculations on 3 or higher dimensional grids with small enough spacing to limit discretization error can require a very large number of operations. A lot of the need for supercomputers is driven essentially by the people wanting to do calculations on finer grids requiring more points and therefore more computations.

Truncation error

Computers are good at doing addition and multiplication, but don't have built-in hardware to evaluate transcendental functions like sine and cosine. To numerically evaluate such functions, one typically rewrites it in terms of a series of polynomials, a Taylor series. However, Taylor series usually have an infinite number of terms. To limit the number of computations needed, one must truncate the series to a finite number of terms. Of course, this introduces an inaccuracy in the calculation, since you're not adding in the contribution of the dropped terms. This inaccuracy is called truncation error. Often one can bound the truncation error by looking at the magnitude of the dropped terms. Series expansions come up in the solution of a variety of equations in physics, so truncation error applies to more than just the evaluation of transcendental functions.

Finding roots of non-linear functions

Finding the roots of functions, meaning the input values that produce zero output, is a common numerical task. For example, finding the solution to a set of non-linear equation can always be re-written as finding the root of a function equal to the difference between the two sides of the equation.

Often you'll want to start off in finding roots by plotting the function over the interval of interest. You can easily see approximately where the root lie by inspecting the plot and it will also quickly show you how many roots there are in the interval. Python makes this easy.

The bisection method is the simplest method for finding the root of a function, so we'll look at it first. The bisection method (and essentially all numerical root finding methods) works for piece-wise continuous functions. If the value of a function is positive when evaluated at some point a, f(a) > 0 , negative when evaluated at some point b, f(b) < 0 , and continuous over the interval from a to b, then the function must be zero when evaluated at some point between a and b. This is a special case of the "intermediate value theorem". You can prove it to yourself graphically. Try drawing a function that is positive at a and negative at b. Whatever function you draw must cross zero somewhere between a and b. Therefore, the root must lie between a and b and we say that a and b bound the root.

In the bisection method, we try to decrease the distance between the two points that bound the root by cutting the interval in half or bisecting it. One takes the halfway point, c, between a and b, c = (a+b)/2. If f(c) < 0, then there must be a root of f between c and a. If, instead, f(c) > 0, then there must be a root of f between c and b. We can see this because the intermediate value theorem will apply to [a, c] in the former case and [b, c] in the latter case. So, we use c to make a new pair of values. If f(c) has the same sign as f(a), then we replace a with c. If f(c) has the same sign as f(b), then we replace b with c.

Once we have our new pair of values that bound the root, we can apply the same procedure again. Each time we find a new halfway point, we reduce the interval bounding the root by a factor of 2. If we take the average of the two values as an estimate of the root, then the error on our estimate decreases by a factor of 2 each time we iterate. See https://en.wikipedia.org/wiki/Bisection_method for further reading.

On to Python

Let's write some Python code to implement the bisection method. The input for the code is a continuous function, two values that define an interval where the function contains a root, and a tolerance that specifies how close to zero the function needs to be for use to say that we have found the root. We will assume that the function is defined with code similar to

def f(x) : return(x**3 - 11)

We will then define a function using

def bisection_method(f, a, b, xtol) :

Note that we can use a function as an argument to another function. This is part of the versatility of Python that grows out of everything being defined as an object. bisection_method should return a tuple with two numbers. The first number should be the value in the interval [a, b] at which the function is (close to) zero or None to indicate there is no root in the interval [a, b]. The second number should be the number of iterations done.

Your routine should first check f(a) and f(b) to see if there is a root in the interval [a, b]. If there is, then you need to iterate to find a root that is good to the accuracy xtol. Each iteration performs these steps:

Calculate c, the midpoint of the interval, c = (a + b)/2. Note that c is now our best estimate of the value of the root of f.
Check if the error our estimate of the root of f is small enough, specifically check if abs(a-b)/2 < xtol. If so, return c. Why do we apply xtol to the input to the function rather than the output?
Otherwise, examine the sign of f(c), replace either a or b with c as appropriate, then do another iteration. Remember that you need to count the iterations through the loop so you can return that number.

Note that you cannot make any assumptions about whether f(a) is positive or negative or whether f(b) is positive or negative. Your code needs to handle all of the cases.

After you write your code, test it using the function above with the intervals [0, 10], [10, 20], and [-10, 0] with xtol = 0.001. Do you get the results you expect? How does the number of iterations change if you change from xtol = 10^-3 to xtol = 10^-6?

Once you have all of that working, write a Python program that takes a function and interval [a, b] as input. The program should plot the function in the interval. It should then try values for xtol of 10ⁿ where n = -2, -3, ..., -10 and plot the value of the root versus the value of xtol. Since it will be hard to see changes in the estimate of the root as xtol gets small, you should use the estimate of the root for xtol = 10^-10 as your best guess for the root and plot the absolute value of the difference between the best guess and your other estimates of the root versus xtol. Your program should also plot the number of iterations versus the value of xtol. You may want to use a log scale on one or both axes. Test your code using the function

def g(x) : return(1+5*x+(x**2)*np.cos(x))

and the intervals [-2,2], [2, 6], [6,10].

Regula falsi (false position) method

Much of numerical analysis is the search for algorithms that produce a given result, like finding the root of a function, with fewer computations. The false position method is a root finding algorithm that generally, but not always, converges to a given accuracy faster than the bisection method.

A secant of a function is a line that intersects the function at two (or more) distinct points. The basic idea of the false position method is that we draw a line between two points on our function. We see where that line intersects the x-axis and use the intersection point as an improved estimate of the position of the root. The figure below is from Wikipedia.

The equation for the line between the two points (x, y) = (a, f(a)) and (b, f(b)) is

y = \frac{f(b) - f(a)}{b-a}(x-a) + f(a)

Setting y(c) = 0, we find that the intercept is at

c = a - f(a)\frac{b-a}{f(b)-f(a)}

This is then the best estimate of the root. For the next iteration, we use c to replace either a or b exactly as in the bisection method. If f(c) has the same sign as f(a), then we replace a with c, otherwise we replace b with c.

If our function is linear, then the false position method will give us a very good estimate of the root after one iteration. In general, the false position method gives better guesses for the root than the bisector method, so it converges faster. However, if the function has unfavorable curvature, the false position method can get confused.

New - It is important to note that the false position method does not necessarily improve both estimates of the root a and b. Depending on the shape of the function, all of the improved root estimates may result with f(c) having the same sign. In that case, only one side of the interval bounding the root will change. Therefore, we need to change our convergence criterion from abs(a-b)/2 < xtol to abs(f(c)) < ytol. Note that we are changing from a tolerance on the accuracy of the position of the root to a tolerance on the accuracy to which the function is zero.

Now write Python code to implement the false position method. The input to the function should be the same as for the bisection method, except that you call it ytol instead of xtol and the function should be called false_position_method. The function definition should be

def false_position_method(f, a, b, ytol) :

After you write your code, test it using the function f(x) above with the intervals [0, 10], [10, 20], and [-10, 0] with ytol = 0.001. Do you get the results you expect? How does the number of iterations change if you change from ytol = 10^-3 to ytol = 10^-6 ?

Once you have done that, insert your new function for the false position method in your code that plots the value of the root and the number of iterations versus ytol and try it out. How does it's performance compare to the bisection method? That's the topic of the assignment for this lecture.

Root finding in the wild

When you start doing root finding in your own code, you'll probably want to use more optimized algorithms than the ones that you wrote today (or maybe not if you like knowing exactly what's going on in your code). The SciPy library for Python has a module called Optimus Prime (or at least it should be) that includes a bunch of functions for root finding, see https://docs.scipy.org/doc/scipy/reference/optimize.html. Several of the functions use much more complicated algorithms than those discussed here (the package does include bisect that does the bisection method, but you are not allowed to use it for this assignment). Most of these routines come from MINPACK which is a numerical library for function minimization and least-squares solutions that was written in the 1970s and 1980s in the ancient and venerable language Fortran, see http://www.math.utah.edu/software/minpack.html. Beyond functions to find roots of scalar functions of a single variable, there are functions that can find roots of functions defined on vectors which is a more complex issue that we'll discuss when we talk about the closely related topic of minimization.

Assignment

You will use the root finding functions that you wrote in class in HW #11 which is due on 10/17/2019.