Week 4: Estimate Blood Flow from PPG Data#

Laboratory 3: Intro to Functions, Polynomial Fitting
Last updated Sep 14, 2023

00. Content #

Mathematics

  • Empirical mean

  • Empirical standard deviation

  • Slope

Programming Skills

  • Functions

Embedded Systems

  • Thonny and Micropython

0. Required Hardware #

  • Microcontroller: Raspberry Pi Pico

  • Pulse sensor

  • Breadboard

  • USB connector

  • Capacitor

1. Functions in Python #

Write your name and email below:

First Name, Last Name, email

The most useful way to organize code is the function#

Up to now, we have been relying on functions other people have written to achieve our goals. One of the most important roles of a computer programmer is writing good functions.

For this lab, we are going to learn how to write functions. As a refresher, last lab we discussed how functions are groups of code that always run together. When you call a function, you execute the code that is associated within that function. Functions can take inputs, manipulate the inputs, and return an output.

Why use functions? Functions help reduce repetitive code. Functions also make your code easier to read.

We define/create a function by using the key word def, the name of the function, parameters enclosed within (), and : signifying the end of the header. A function also has the ability to return a value via the command return.

Here are two examples of a simple function that prints something.

def name():  # Function Header
    print("Data Science is cool!")  # Function Body


name()  # Function Call
def add12():
    a = 1
    b = 2
    c = a + b
    print("1 + 2 = {}".format(c))


add12()
def add_int(a, b):
    c = a + b
    return c

add_int(3, 100)

You can also pass information into functions so that you can manipulate the parameters of the function within the function. Parameters are enclosed within the () part of the function header. You can pass as many parameters as you would like within the function header. The order of the parameters are significant and should match the order in the function’s header.

import numpy as np


def subtract(a, b):  # Function Header passes in parameters a and b
    c = a - b
    print("a - b = {}".format(c))


subtract(2, 1)  # Function call passes 2 as 'a' and 1 as 'b', 2 - 1 = 1
subtract(25, 4)  # 25 - 4 = 21
subtract(4, 25)  # 4 - 25 = -21

a = 1
b = -1
subtract(b, a)  # Remember that order of the inputs into the parameters matter!

d = np.array([1, 2, 3, 4, 5])
e = np.array([18, 19, 20, 21, 22])
subtract(e, d)

Parameters can be strings as well.

def info(name, email):
    print("Name: {}".format(name))
    print("Email: {}".format(email))


info("Purdue Pete", "boilerup@purdue.edu")

The neat thing about functions is that you could also return a value from a function utilizing the key word return. You must remember to store the returned value from the function into a variable if you want to save it. Remember that when you call a function, it’s like the function call is replaced by the code within the body of the function.

def return5():
    return 5


def multiply(a, b):
    c = a * b
    return c  # return the value of a * b


e = 9
f = 4
d = multiply(5, 6)  # d = 5 * 6
g = multiply(e, f) + return5()  # d = (9 * 4) + (5)

print(return5())
print("The values of d and g are {} and {}, respectively.".format(d, g))

Let’s look at why functions are good at reducing repetition in code.

print("Repetitive code when functions can be used are unprofessional and take space.")
for i in range(1, 6):
    if i == 3:
        print(i)
for i in range(1, 6):
    if i == 3:
        print(i)
for i in range(1, 6):
    if i == 3:
        print(i)

Instead of having repeating lines of code to create the output you want, you can use functions to shorten and organize them.

def print3():
    for i in range(1, 6):
        if i == 3:
            print(i)


print("Functions can help keep your code neat, concise, and easy to read.")
print3()
print3()
print3()

Exercise 1 #

Part 1

Create the following functions.

  1. Function named trapezoid_area that has three parameters named base1, base2, and height, and returns the area of a trapezoid.

  2. Function named slope that calculates the slope of the straight line passing through the two points in the plane. The parameters are x1, y1, x2, and y2 and the function returns the slope.

Part 2

Create the following functions for the following formulas. Do not use any of python’s NumPy functions, with the exception of the sum() function.

  1. Arithmetic Mean

\[\begin{align*} \mu &= \frac{1}{n}\sum_{i=1}^n a_i\\ &= \frac{a_1+a_2+\dots+a_n}{n} \end{align*}\]
  • Name: mean

  • Parameters: x (an array of values of variable length)

  1. Standard Deviation (You must utilize your mean function from Question 1)

\[\begin{align*} \sigma &= \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i-\mu)^2}\\ &= \sqrt{\frac{(x_1-\mu)^2+(x_2-\mu)^2+\dots+(x_n-\mu)^2}{n-1}} \end{align*}\]
  • Name: std

  • Parameters: x (an array of values of variable length)

Write Answers for Exercise 1 Below

Functions make code reusable#

One of the most important things about functions is that they can allow us to reproduce a task we did before, very reliably.

At the start of each lab, you must include your first name, last name, and email address. We can easily automate that process with functions. Run the following cell in order to download student_info.py. Afterwards, open it and make sure it is saved inside the same directory. Once you have read that, you should not be surprised by what the following cell does.

%%sh
wget -q -N https://raw.githubusercontent.com/TheDataScienceLabs/DSLab_Calculus/main/book/labs/3_estimate_blood_flow/student_info.py
# this makes it so that changes to the imported module are automatically updated in this document.
# usually not necessary, but if we are making changes, this is helpful.
%load_ext autoreload
%autoreload 2

from student_info import show_info

show_info()

Exercise 2 #

Modify student_info.py to give your own contact information. You will use this file in the future to put a heading on your lab submissions.

Check to make sure it is working by running the cell below. It should show your contact information.

show_info()

Functions make it easy to check your work#

One of the great things about Python, which is not something every language can do, is that you can pass functions around as variables. You can put a function into another function, like so:

def do_three_times(f):
    f()
    f()
    f()
    
def greet():
    print('hey')

do_three_times(greet)

In the above block, we are not calling the greet function directly – instead we are passing it as a variable into the function do_three_times, which in turn calls the greet function.

A really common way to take advantage of this pattern is to pass a function into another one, which checks to see if it is working correctly. Let’s do that now! We have written a couple functions which check to see if the functions trapezoid_area and slope which you wrote are working correctly. Run the following cells to download lab3_grader.py and autograder.py (which will be used in later labs!). Afterwards, run the cell that checks your functions and see if you have all of the tests passed correctly! If your functions are not working how they should, go back and change them to make sure you get full points.

%%sh
wget -q -N https://raw.githubusercontent.com/TheDataScienceLabs/DSLab_Calculus/main/book/labs/3_estimate_blood_flow/autograder.py
wget -q -N https://raw.githubusercontent.com/TheDataScienceLabs/DSLab_Calculus/main/book/labs/3_estimate_blood_flow/lab3_grader.py
from lab3_grader import check_trapezoid_area, check_slope

check_trapezoid_area(trapezoid_area)
check_slope(slope)

Reusing code from last week#

We wish to explore the heartbeat data we collected in the previous lab further. To do so, let’s copy that code into a new function. In the following cell, fill in the heartbeat and time variables with your heart beat array (and corresponding time) from last week. Then, fill out the missing parts of the function get_period with the steps you performed last week (it’s okay to copy and paste). If your heartbeat measurements from last lab were not up to par or something went wrong while measuring, run the following cell in order to load the example_heartbeat.txt that you can use in order to correctly fill out the next few exercises.

%%sh
wget -q -N https://raw.githubusercontent.com/TheDataScienceLabs/DSLab_Calculus/main/book/labs/3_estimate_blood_flow/example_heartbeat.txt
heartbeat = 
time = 


def get_period(heartbeat, time):
    """
    Given a hearbeat signal, what is the period of the heart beat?

    Arguments
    ---------
    heartbeat: array - the measured heartbeat signal
    time: array - the times when the signal was measured. Should be the same
        shape as heartbeat.

    Returns
    -------
    period: float - the average period of the heartbeat signal.
    error: float - an estimate of the error in the measurement of the period.
    """
    
    # put here the code which averages the function and finds where the function is increasing and decreasing.
    # As always, ask for help if you get stuck.

    maxima = np.zeros(averaged.shape, dtype=bool)
    maxima[:-1] = increasing[:-1] & decreasing[1:]  
    peak_indices = maxima & (averaged > averaged[maxima].mean())
    peak_times = time[peak_indices]
    gaps = peak_times[1:] - peak_times[:-1]
    return gaps.mean(), gaps.std(ddof=1) / np.sqrt(len(gaps))
period = get_period(heartbeat, time)
print(period)

2. Using Functions with Data #

Looking deeper with polynomials#

The averaging technique we introduced in the previous lab works very well for identifying the locations of large features. Unfortunately, it blurs out some of the fine detail, brings peaks down, and lifts bottoms up. That’s no good if you want to know information like the amplitude or shape of the signal. We can do much better.

We will explore one way of finding a function which “interpolates” between the observed values. A function which interpolates is called an “interpolant”.

Since you have the period of your heartbeat signal, you can extract just one cycle of the initial signal. In the following cell, fill in a start time which gives you a good window capturing one full heart beat.

start_time =
mask = (time >= start_time) & (time < start_time + period[0])
cycle = heartbeat[mask]
cycle_t = time[mask]

import matplotlib.pyplot as plt

plt.plot(cycle_t, cycle)
plt.show()

A parabola has one local maximum or minimum. A cubic polynomial has at most two. A fourth degree polynomial has at most three local maxima and minima. Fit a polynomial of degree one more than the number of local maxima and minima. Numpy has a built in tool which finds a polynomial which matches observed values closely. Here is how it works:

from numpy.polynomial import Polynomial

interpolant = Polynomial.fit(cycle_t, cycle, deg= )
print(interpolant)

The object interpolant represents a list of coefficients of a polynomial. We can also call it as a function. Let’s use that to show how well it lines up with the data we measured:

plt.plot(cycle_t, cycle)
plt.plot(cycle_t, interpolant(cycle_t))
plt.show()

What other values of the degree could we try? What kind of interpolant do we get with a smaller degree, or a larger degree than 7?

Exercise 3 #

Use a for loop to make plots showing the interpolants of degrees 3 through 15. Which ones fit best? Try larger values too. What if you have an interpolant of degree 30? 100?

Write Answers for Exercise 3 Below

Quantifying the fit#

Beyond a certain point, the polynomial can snake through each point we give it, capturing all the noise. That defeats the purpose of the fit! It is good practice to choose as low a degree as possible while still matching the data well. This is called the parsimony principle. We can quantify this with the so-called “mean square error” of a fit: the average value of the difference between the interpolant and the measured value. If the measured values are \(y_i\) and we are measuring how well an interpolant \(f(x_i)\) agrees, then the mean square error is given by the formula \( \frac{1}{n}\sum_{i=0}^n \left(f(x_i)-y_i\right)^2 \)

Exercise 4 #

Write a function which takes an interpolant and returns the mean square error. Then, read and run the cell below it, which will use the code you wrote to make a plot of how the polynomial degree relates to the mean square error.

Write Answers for Exercise 4 Below

def mean_square_error(x, y, interpolant):
    # put your answer here!
degrees = list(range(1, 20))
errors = []
for deg in degrees:
    interpolant = Polynomial.fit(cycle_t, cycle, deg=deg)
    errors.append(mean_square_error(cycle_t, cycle, interpolant))
    
plt.plot(degrees, errors)
plt.xlabel("polynomial degree")
plt.ylabel("mean square error")
plt.show()

Exercise 5 #

Interpret the plot above. After what point do we see diminishing returns on the model’s fit? What is the smallest degree of polynomial which has a reasonably low mean square error, in your opinion? Write in complete sentences here, as if you were describing the plot in a professional report.

Write Answers for Exercise 5 Below

Exploring the data using the polynomial model#

Now that we have a good model for the behavior of each cycle, we can start asking questions of our model. We can ask things like, “What is the largest value of the function on the interval?” Here is an example of how we might do that. Suppose we have the following interpolant:

interpolant = np.polynomial.Polynomial(
    [
        43497.94586794,
        -5266.09794496,
        -14940.13657245,
        57262.21812427,
        48591.17239832,
        -146342.3668821,
        -31204.60140348,
        96269.6657132,
    ],
    domain=[0.0, 0.78],
)

t = np.linspace(*interpolant.domain, 200)
plt.plot(t, interpolant(t))
plt.show()

Since we have a function which represents our data accurately, we can use standard calculus tools to find its local extrema. The Polynomial object which NumPy creates is very versatile. If you have a polynomial, you can find things like its roots easily, like so:

print(interpolant.roots())

We can also find derivatives, which are given back to us as Polynomial objects which we can do all those same things to.

print(interpolant.deriv())

We can ask it for its domain, which it also stores for us as an array:

interpolant.domain

Exercise 6 #

Remember the closed interval method from calculus 1, where you find all the critical numbers of a function within its interval, and plug those into the function to find the maximum and minimum values. Write a function which does exactly that, to a polynomial interpolant.

Hint: critical points are the roots of the derivative of a certain function.

def closed_interval(interpolant):
    #your code goes here.
    
    return minimum, maximum
    
print(closed_interval(interpolant))

Interpretation and code reuse#

This article published in Current Cardiology News gives several commonly used statistics which can be read from a photoplethysmogram. We will focus on just two of them: the “systolic amplitude” which is roughly proportional to blood flow, and the “ratio b/a” which is connected to arterial stiffness and can indicate exposure to lead and risk of heart disease.

First, let’s make a function which computes the systolic amplitude. We have already done most of the work, when we wrote the function closed_interval. Let’s take advantage of that:

def systolic_amplitude(interpolant):
    minimum, maximum = closed_interval(interpolant)
    return maximum - minimum

systolic_amplitude(interpolant)

The ratio b/a is based off the maximum and minimum values of the second derivative. Specifically, it is their ratio. Since the photoplethysmogram is concave up in some places and concave down in others, the maximum will be positive and the minimum will be negative. Since we are more familiar working with positive numbers, people usually compute \(-b/a\), where \(b\) is the minimum value of the second derivative and \(a\) is the maximum value of the second derivative.

Exercise 7 #

Write a function to compute the ratio b/a. You should still be able to reuse the function closed_interval (ask for help if you need a hint for how to do this).

def ratio_b_a(interpolant):
    # your code here

ratio_b_a(interpolant)

Let’s look at where we are. Today, you have learned about writing functions. You have written functions which identify the period of a heartbeat signal, interpolate it with a polynomial, and compute two medically interesting values from it. We saw that by reusing old code, we can save ourselves a lot of time. That is a lot! Well done.

Optional Exercise 8 #

Measure yourself!#

In the case that you have remaining time within your lab, you should put the tools you have used to work. Make a new Jupyter notebook, copying code from lab 2, to capture some new pulse data (remember, it is good practice to separate data acquisition from data analysis!). Then, analyze it here using the functions you have written. You can use the prompts below for scenarios for your analysis:

  1. If you jog up and down the hallway briefly, how does your pulse rate change?

  2. If you measure your systolic amplitude with your finger above your shoulders versus below your waist, can you detect a difference? How big is the difference?

  3. Is your ratio b/a different from your classmates’?

If you have time, try to at least answer two of the three questions. Write markdown cells explaining what you are doing and why. Interpret your results.

Write Answers for Exercise 8 Below