Week 12: Prepare to Build a Distance Measurement Tool (Floating Pt Numbers and Num. Differentiation)#

Laboratory 5a
Last updated April 5, 2023

00. Content #

Mathematics

  • Approximating derivatives using finite difference schemes

  • Approximation bias

Programming Skills

  • Number types: int, float

  • Use of underscore as a numerical separator

  • Vectorized computation

Embedded Systems

  • N/A

0. Required Hardware #

  • N/A

Copy the file `student_info.py` into the folder you are working in, then run this cell to put your name and email into this document.

from student_info import show_info
show_info()

1. Two Types of Numbers in Python #

Integers#

One of the benefits of using Python is that it has great support for numbers. This makes it much more friendly for doing math than, for example, Javascript or C. Whole numbers in Python have the type int, which is short for Integer. You can check this yourself:

type(5)

Python ints can be as big as you like, positive or negative. It can get a bit cumbersome writing and reading them, and the comma has a special meaning in Python, so you can’t use it to separate big integers for legibility:

123,456,789

That gives us a list of three integers. If you want to refer to the number 123,456,789 you can write it without separations, but it is easier to read if you use the underscore symbol where there would be a comma:

123_456_789 == 123456789

You can put underscores wherever you like in a number – python ignores them. You could represent a credit card number, for example:

3782_8224_6310_005

Floats#

The other type of number was mentioned in Lab 4e, floating point numbers. As explained previously, floats are used to represent non-whole numbers like decimals. However, they do present some error since Python can only store a certain amount of digits. For almost all purposes, this relative error is no trouble at all, and if it is posing a problem then the tool introduced last time, SymPy, can be implemented to avoid this.

Exercise 1 #

For each of the following constants, look them up on Wikipedia or WolframAlpha and report the relative error in their known values using SI units. If an error is not given, you can assume the absolute error is \(\pm 1 \) of the last digit given. Note that some times, the absolute error is given in a concise notation, as described here: https://physics.nist.gov/cgi-bin/cuu/Info/Constants/definitions.html. Compare your answers to the relative error in a floating point number, known as machine epsilon (Reminder: we are using single precision floats). Make sure you are using relative error, not absolute error.

  • Mass of an electron

  • Orbital period of the moon (the Sidereal period, specifically)

  • Distance from New York to Los Angeles

  • Freezing point of water at atmospheric pressure

Write Answers for Exercise 1 Below:

2. Troubles with Floating Point Numbers #

Precision is usually not a problem with floating point numbers, with one major exception which we will address later. The main issue you will face is when comparing numbers. Check this out:

for x in range(1, 10):
    if x/10 == 0.1*x:
        print(x)

Exercise 2 #

Explain why a few numbers are missing from the printout above. Include the python values of x/10 and 0.1*x in your explanation, for some values of x (make a new code cell printing the relevant values).

Write Answers for Exercise 2 Below

There are two main ways to deal with this:

  1. Whenever possible, use >= or <= instead of == when comparing floating point numbers. This is usually the best thing to do. For example, if a number is decreasing and you want to know when it hits zero, check if it is positive instead of checking if it’s zero. Or, better yet, check if it’s less than 0.0001, keeping as much precision as you need for your purpose and no more.

  2. Instead of checking if floats are equal, check if they are close to eachother. For example, the builtin function math.isclose can do this for you, or you can use the version in numpy if working with arrays. Here is how that might look:

import math
for x in range(1, 10):
    if math.isclose(x/10, 0.1*x):
        print(x)
import numpy as np
x = np.arange(1, 10)
print(np.isclose(x/10, 0.1*x))

3. Catastrophic Cancellation #

There is one situation where roundoff error is a really big problem. That is when you are subtracting two numbers which are very close together. This isn’t anything special about floating point numbers, this is just a fact about rounding. Here, have a look:

\[\begin{split} \begin{array}{r} 3.14159265358???\\ -\; 3.14159265332???\\\hline 0.00000000026??? \end{array} \end{split}\]

We have used question marks to represent unknown digits, i.e. digits which were lost due to rounding. See how the relative error in the output is much bigger than in either of the inputs: each input has 12 significant figures, but the output has only two significant figures. We can do this a bit more precisely by writing the absolute errors explicitly:

\[\begin{align*} 3.14159265358??? &\rightarrow 3.14159265358 \pm 5\times 10^{-12}\\ 3.14159265332??? &\rightarrow 3.14159265332 \pm 5\times 10^{-12}\\ 0.00000000026??? &\rightarrow 0.00000000026 \pm \text{(you figure this one out)}\\ \end{align*}\]

Exercise 3 #

What is the absolute error in the output? What are the relative errors of each of the inputs, and what is the relative error of the output?

Write Answers for Exercise 3 Below

Exercise 4 #

A practical example of catastrophic cancellation#

Before continuing, run the following block. This code will be able to check if your code works properly.

%%sh
wget -q -N https://raw.githubusercontent.com/TheDataScienceLabs/DSLab_Calculus/main/book/labs/5_measure_speeds/a_derivatives/autograder.py
wget -q -N https://raw.githubusercontent.com/TheDataScienceLabs/DSLab_Calculus/main/book/labs/5_measure_speeds/a_derivatives/lab_5a_checker.py

Here we walk you through coding an example where catastrophic cancellation makes a pretty big difference, and something clever you can do to fix it.

  1. Write a function which takes the coefficients a, b, and c from a polynomial \(ax^2+bx+c\) and gives you a list of the roots. Just apply the standard quadratic formula. For now, you can assume that \(b^2-4ac>0\) so you can take the square root, and you can assume a is not zero. Have it give the smaller root first (you can use the built in function sorted for this).

Write Answers for Exercise 4 Part 1 Below

import lab_5a_checker

def naive_roots(a, b, c):
    # fill in the formula here
    
    return sorted([x1, x2])

lab_5a_checker.check_naive_roots(naive_roots)
  1. Factor the polynomial \(x^2-1,000,000.000,001 x +1\) exactly, on paper (note the decimal place in b). What should the roots be?

Write Answers for Exercise 4 Part 2 Below

  1. Apply your simple root-finding function to this polynomial. What are the relative errors in your computed roots, compared to the true values? Explain why one of these has a larger relative error.

Write Answers for Exercise 4 Part 3 Below

# call your root finding function

Here is a simple technique we can apply to reduce the error. Let’s assume for now that b is a negative number. So, we get cancellation in the numerator when subtracting \(-b\) and \(\sqrt{b^2-4ac}\). In this case, just multiply by one in a clever way:

\[\begin{align*} \frac{-b-\sqrt{b^2-4ac}}{2a} &= \frac{-b-\sqrt{b^2-4ac}}{2a} \times \frac{-b+\sqrt{b^2-4ac}}{-b+\sqrt{b^2-4ac}}\\ &= \frac{b^2-(b^2-4ac)}{2a(-b+\sqrt{b^2-4ac})}\\ &= \frac{2c}{-b+\sqrt{b^2-4ac}} \end{align*}\]

Since b is negative and -b is positive, there is no cancellation here! We can use this formula to compute the “-” root, while the original formula will continue to compute the “+” root correctly.

  1. Assuming b is always negative, rewrite your function to use this formula for one root and the standard formula for the other root. Compute the relative errors on the example above, using your new formula.

Write Answers for Exercise 4 Part 4 Below

# rewrite your function
import lab_5a_checker

def naive_roots(a, b, c):
    # fill in the formula here
    return sorted([x1, x2])

lab_5a_checker.check_improved_naive_roots(naive_roots)
  1. Figure out a similar formula which works when b is always positive. Use both formulas to write a function to compute the roots which is always accurate, and does not suffer from catastrophic cancellation.

Write Answers for Exercise 4 Part 5 Below

# write a better root finding function
import lab_5a_checker

def naive_roots(a, b, c):
    # fill in the formula here
    return sorted([x1, x2])

lab_5a_checker.check_improved_naive_roots(naive_roots)
  1. Finish off your function by adding a special case for if c is exactly zero (don’t want to divide by zero, after all!). We have provided a test you can use to check your work.

Write Answers for Exercise 4 Part 6 Below

# add special cases for a and c
def roots(a, b, c):
    # fill in the formula here
    return sorted([x1, x2])

lab_5a_checker.check_roots(roots)

This example shows how catastrophic cancellation can occur even in friendly, well-known functions. This also shows that we can usually deal with it, by doing some clever algebra. This is such a common issue that there are now tools available to identify and fix catastrophic cancellation. If you are curious, check out Herbie, which a friend of mine helped to create.

4. Numerical Differentiation #

Now you are prepared to see the major problem we are up against when trying to compute derivatives using floating-point arithmetic. Remember that by definition, the derivative of a function \(f(x)\) is

\[ f'(x) = \lim_{h\to 0} \frac{f(x+h)-f(x)}{h} \]

This suggests a simple way to compute a derivative: pick a really small number \(h\), and compute \(\frac{f(x+h)-f(x)}{h}\). Since \(h\) is just a small but finite number, this is called the finite difference method.

Exercise 5 #

Let’s see what happens when we apply this procedure blindly. We will apply this formula for \(f(x)=x^2\) at the point \(x=4\). The correct derivative should be 8. We will compute the relative error in using the finite difference method, for various values of \(h\). It is easy to get a good spread of values for \(h\), in a geometric sequence:

h = np.geomspace(1e-15, 1, 200)

Part 1

Make an array giving approximations to \(f'(4)\) for each value of h, using the finite difference method. Then, compute the relative error in this approximation using the relative error formula. Tools to check your work are available. You must use arrays and broadcasting to get full credit here.

Write Answers for Exercise 5 Part 1 Below

approximation = 
relative_error = 
lab_5a_checker.check_finite_difference(approximation, relative_error)

Part 2

Now that we have computed the relative error, let’s inspect it in a plot. We will use a logarithmic scale for the x and y axes, which will help us to compare very different values. Since we are taking a logarithm, we must look at the absolute value of the relative error.

import matplotlib.pyplot as plt
plt.loglog(h, [abs(err) for err in relative_error])
plt.xlabel('step size $h$')
plt.ylabel('relative error in computing the finite difference')
plt.show()

In light of what you have learned today about catastrophic cancellation, explain the shape of the curve you get. Explain why it is not a good idea to just make \(h\) as small as possible. What is the smallest relative error we can achieve using this method?

Write Answers for Exercise 5 Part 2 Below

5. Finite Difference Schemes #

This shows the basic struggle of computing numerical derivatives: the definition of the derivative requires h to be infinitely small, but we can’t practically make it that small. This isn’t just because of catastrophic cancellation, by the way – often when working with real data you are limited in how fast you can collect it. Hence, our goal will be to find ways to get more accurate estimates of the derivative without just making the step size \(h\) tiny. We will consider \(h\) to be fixed, and try to get our estimate as good as possible.

Bias in the forward difference#

Let’s examine the finite difference method, applied to the function \(f(x) = x^3\) at \(x=2\). For ease of illustration, let’s choose \(h=0.5\). Our estimate of \(f'(2)\) is

\[ f'(2)\approx \frac{(2+0.5)^3-2^3}{0.5}. \]

Is this an over-estimate or an under-estimate? Let’s draw a picture to help us see.

Exercise 6 #

Part 1

Find an equation for the tangent line to \(y=f(x)\) at \(x=2\).

Part 2

Find an equation for the secant line intersecting \(y=f(x)\) at \(x=2\) and \(x=2.5\).

Write Answers for Exercise 6 Below

Fill in the formulas you found for the secant and tangent lines below.

x = np.linspace(1,3)
cube = x**3
tangent = #fill in here
secant = #fill in here
plt.plot(x, cube, label="$x^3$")
plt.plot(x, tangent, '--', label="tangent")
plt.plot(x, secant, '--', label="secant")
plt.ylim(0,27)
plt.xlim(1, 3)
plt.legend()
plt.show()

That shows it visually: the slope of the secant line is bigger than the slope of the tangent line. We can see it algebraically too:

\[\begin{align*} \frac{(2+0.5)^3-2^3}{0.5} &=\frac{12(0.5)+6(0.5)^2+(0.5)^3}{0.5}\\ &=12+6(0.5)+(0.5)^2 \end{align*}\]

The correct derivative is 12, and the other terms (most of all the term \(6(0.5)\)) are making it too big.

When we choose \(h\) to be a positive number, we call that a forward difference. What if we try making \(h\) negative, doing a so-called backward difference? Try that now.

Exercise 7 #

Part 1

Create a similar plot to show what happens if \(h=-0.5\).

Part 2

Is the slope of the secant line too big, or too small, in this case? Support your claim using your figure. How is this related to the curvature of the function \(f(x)=x^3\)?

Write Answers for Exercise 7 Part 1 Below

x = np.linspace(1,3)
cube = x**3
tangent = #fill in here
secant = #fill in here
plt.plot(x, cube, label="$x^3$")
plt.plot(x, tangent, '--', label="tangent")
plt.plot(x, secant, '--', label="secant")
plt.ylim(0,27)
plt.xlim(1, 3)
plt.legend()
plt.show()

Write Answers for Exercise 7 Part 2 Below

Correcting the bias#

As you have noticed, the forward difference gave us an over-estimate while the backward difference gave an under-estimate. This gives us a clear way to do better: we could average these answers out, which would give us a (hopefully) unbiased answer. Algebraically, this would look something like this:

\[\begin{align*} \frac{1}{2}\left[ \frac{f(x+h) - f(x)}{h} + \frac{f(x-h)-f(x)}{-h} \right] &= \frac{f(x+h)-f(x)-f(x-h) + f(x)}{2h}\\ &= \frac{f(x+h)-f(x-h)}{2h} \end{align*}\]

This is the symmetric difference scheme. You can think of it as a secant line which doesn’t go through a given point, but rather goes through its neighbors.

x = np.linspace(1,3)
cube = x**3
tangent = 12*(x-2)+8
secant = (2.5**3-1.5**3)*(x-1.5)+1.5**3
plt.plot(x, cube, label="$x^3$")
plt.plot(x, tangent, '--', label="tangent")
plt.plot(x, secant, '--', label="secant")
plt.ylim(0,27)
plt.xlim(1, 3)
plt.legend()
plt.show()

No obvious bias! We can still work out algebraically that it’s not exact:

\[\begin{align*} \frac{(2+0.5)^3 - (2-0.5)^3}{2*(0.5)} = 12.25 \end{align*}\]

This is still a slight overestimate. However, it’s clearly way better for any given step size \(h\). The error gets small really fast, too: if we instead had \(h=10^{-9}\), the error would be just \((10^{-9})^2 = 10^{-18}\), which is smaller than the floating-point error. Symmetric difference is better than forward difference!

Exercise 8 #

Work out the symmetric difference exactly on paper for the function \(f(x)=5x^2+3x-7\), for a variable step size \(h\), at the point \(x=2\). Find the relative error algebraically, and comment on it.

Write Answers for Exercise 8 Below

6. Unequal Step Sizes #

In Exercise 8 you should have found that the symmetric difference scheme gave exactly the right answer. This is not just a coincidence. Whenever your function is a quadratic polynomial, the symmetric finite difference scheme gives exactly the correct answer. A scheme with this property is called second-order accurate.

The symmetric finite difference scheme is a very powerful way to estimate derivatives. It’s a second-order accurate scheme, and it only uses two points to estimate the derivative. Unfortunately, it only works when the step sizes are equal, that is, when the samples are evenly spaced. Frequently we are not so lucky as to have evenly spaced samples. Data often arrive whenever they are available, and we have to deal with that. Fortunately, we can still get a second-order accurate scheme, even if the step sizes are unequal. In this section, we will develop a second-order accurate scheme to estimate the derivative of a function at a point \(x=b\) using nearby points \(x=a\) to the left and \(x=c\) to the right, so \(a<b<c\). That is, we will find numbers \(\alpha\), \(\beta\), and \(\gamma\) such that

\[ f'(b) = \alpha f(a) + \beta f(b) + \gamma f(c) \tag{$*$} \]

whenever \(f(x)\) is a quadratic polynomial. These numbers should depend on \(a\), \(b\), and \(c\), but they shouldn’t depend on \(f\). There are many ways we could figure out the formulas for \(\alpha\), \(\beta\), and \(\gamma\). Next we will walk you through one of the simplest ways.

A quick way to figure out \(\alpha\), \(\beta\), and \(\gamma\).#

We know that \(\alpha\), \(\beta\), and \(\gamma\) will not depend on \(f\). For some values of \(f\), the values of \(\alpha\), \(\beta\), and \(\gamma\) will be obvious. For example, let’s say we choose \(f(x)=(x-b)(x-c)\). We can just evaluate each term in (\(*\)) directly.

\[\begin{align*} f'(x) &= 2x-b-c\\ f'(b) &= b-c\\ f(a) &= (a-b)(a-c)\\ f(b) &= (b-b)(b-c)=0\\ f(c) &= (c-b)(c-c)=0 \end{align*}\]

By substituting these values, equation (\(*\)) becomes

\[\begin{align*} b-c &= \alpha (a-b)(a-c)\\ \alpha &= \frac{b-c}{(a-b)(a-c)}\\ &= \frac{1}{c-a}-\frac{1}{b-a} \end{align*}\]

By choosing \(f\) carefully, we were able to get the value of \(\alpha\) directly. The trick was to choose \(f\) so that the terms \(f(a)\) and \(f(c)\) worked out to be zero.

We will work out how to find \(\alpha\) together as another example. Then, you will be asked to choose \(f\) to give you \(\gamma\). Since I want \(f(a)=0\) and \(f(c)=0\), I will choose \(f(x)=(x-a)(x-c)\). This quadratic has zeros at \(x=a\) and \(x=c\), which is just what I want. Then I can fill out the values as before.

\[\begin{align*} f'(x) &= 2x-a-c\\ f'(b) &= 2b-a-c\\ f(a) &= (a-a)(a-c)=0\\ f(b) &= (b-a)(b-c)\\ f(c) &= (c-a)(c-c)=0 \end{align*}\]

By substituting these values, equation (\(*\)) becomes

\[\begin{align*} 2b-a-c &= \beta(b-a)(b-c)\\ \beta &= \frac{2b-a-c}{(b-a)(b-c)}\\ &= \frac{1}{b-a}-\frac{1}{c-b} \end{align*}\]

Exercise 9

#

Choose a symbolic quadratic polynomial \(f(x)\) so that when you substitute \(f(x)\) into (\(*\)), you find the value of \(\gamma\) using the method demonstrated above. For your choice of \(f(x)\), copy the following table into your answer and fill it out.

\begin{align*}
f(x)  &=     \\
f'(x) &=     \\
f'(b) &=     \\
f(a)  &=     \\
f(b)  &=     \\
f(c)  &= 
\end{align*}

Then, substitute the values into (\(*\)) and write that equation in your answer as well. Solve the equation for \(\gamma\) in terms of \(a\), \(b\), and \(c\). You should find that

\[\begin{align*} \gamma = \frac{1}{c-b}-\frac{1}{c-a}. \end{align*}\]

Write Answers for Exercise 9 Below

Exercise 10 #

Substitute \(f(x)=1\) in (\(*\)). What relationship do you find between \(\alpha\), \(\beta\), and \(\gamma\)? Substitute the formulas we found for \(\alpha\), \(\beta\), and \(\gamma\) into that expression to verify that our answers are correct.

Write Answers for Exercise 10 Below

The symmetric finite difference method with unequal step sizes#

Suppose we have three x-values \(a\), \(b\), and \(c\) with \(a<b<c\). Suppose further that we have measured \(f(a)\), \(f(b)\), and \(f(c)\). We now have a good way to approximate the derivative of \(f\) at \(b\). We compute

\[\begin{align*} \alpha &=\frac{1}{c-a}-\frac{1}{b-a}\\ \beta &=\frac{1}{b-a}-\frac{1}{c-b}\\ \gamma &=\frac{1}{c-b}-\frac{1}{c-a}\\ f'(b) &\approx \alpha f(a) +\beta f(b) + \gamma f(c) \end{align*}\]

The final exercise for today walks you through the process of creating a vectorized version of this formula, which computes the symmetric difference scheme for unequal step sizes in a way that is

  1. Efficient (very fast to perform)

  2. Easy to read (easy to see that it does what it should).

Exercise 11

#

Our function will take two arguments, x and y, each of which is an array. The idea is that \(y\) will have the \(f(x)\) values corresponding to each \(x\) value. It will produce an array of the same shape which has the symmetric finite difference scheme applied to find the first derivative.

When working out a procedure, it is a good idea to have an example at hand so you can follow through the steps. Let’s pick some examples for \(x\) and \(y\).

x = np.array([1.0, 2, 5, 7, 8])
y = np.array([50.0, 49, 42, 35, 34])
plt.plot(x, y)
plt.show()

We can see that the derivative is steeply negative, leveling at the end. That gives us an idea of what we are looking for: our answer will be an array of negative numbers.

In order to make the function we are writing more legible, we will use the same symbols that are in the derivation we did above. Since we want \(f'(x)\) and our approximation gives us \(f'(b)\), we can just choose \(b=x\).

b = x

We want \(a\) to be the number to the left of \(b\). So, let’s build an array which has all the same numbers as \(x\) but shifted one to the right. Since the leftmost \(x\) has no neighbor to the left, we should instead say that value is missing using the special value nan. Here is a quick way to do that:

a = np.full_like(x, np.nan)
a[1:] = x[:-1]
print(b)
print(a)

Now we can see that where b has the value 7, for example, a has the value 5 and 5 is the number to the left of 7. Each column corresponds to the values of a and b at one place in our answer.

Step by step, what we did here was to make an array the same shape as x but full of the special value nan. Then, we took the spots of a starting after the first one, and filled them with values from x stopping before the last value of x.

We will do the same thing to write c, but shifting the values in the other direction.

c = np.full_like(x, np.nan)
c[:-1] = x[1:]
print(c)

Now that we have those variables set up, we can compute alpha, beta, and gamma without needing any loops. Numpy will automatically compute each column for us using broadcasting.

alpha = 1/(c-a)-1/(b-a)
beta = 1/(b-a)-1/(c-b)
print(alpha)
print(beta)

Numpy even automatically filled in the values at the ends with nan, because we don’t have neighbors there to compute the symmetric difference.

Part 1

Compute gamma as well, using broadcasting.

Write Answers for Exercise 11 Part 1 Below

We also need the values of \(f(a)\), \(f(b)\), and \(f(c)\). We will build those just like we built a, b, and c.

fa = np.full_like(y, np.nan)
fa[1:] = y[:-1]

fb = y

Part 2

Build an array fc which holds the values of \(f(c)\).

Write Answers for Exercise 11 Part 2 Below

Now that we have all of our variables set up, computing the derivative is as simple as applying the formula (\(*\)) from above.

derivative = alpha*fa+beta*fb+gamma*fc
print(derivative)

What should we do about the missing values? Since we can’t do a symmetric difference, the next best thing is a one-sided difference. Since there are just two of these, we can fill them out manually.

derivative[0] = (y[1]-y[0])/(x[1]-x[0])
print(derivative)

Part 3

Set the last entry of derivative similarly. Use negative indices, so that you don’t need to know how many elements are in the array.

Write Answers for Exercise 11 Part 3 Below

Part 4

Write a function called symmetric_difference which computes the symmetric difference for unequal step sizes. You can do this by copying code line-by-line from above – all the code we wrote should work just fine using an arbitrary x and y.

Write Answers for Exercise 11 Part 4 Below

def symmetric_difference(x, y):
    #write a function here!
    
    return derivative.tolist()

lab_5a_checker.check_symmetric_difference(symmetric_difference)

Part 5

In a language like Python (or Julia, or Matlab, for that matter) a so-called “vectorized” procedure goes way faster than looping through the variables, because you can pass all the difficult computation to highly-optimized code (in our case, the code which comes in the package numpy). The difference in speed is very small for arrays without very many entries, but when we start to have hundreds or thousands of entries in our arrays it makes a huge difference. That is one reason why people often prefer to write functions which treat the data as arrays, rather than looping through lists like you would do in other languages like C, Java, or Javascript.

Another reason is that many people find the code easier to read, and easier to understand, if it is written in a vectorized way.

Write a few sentences of your thoughts on the matter. Do you find the function symmetric_difference you wrote to be easier to read than if it were done with a loop? Which parts are harder to read, which parts easier? Give your honest opinion.

Write Answers for Exercise 11 Part 4 Below