Week 11: Optical Flow

Week 11: Optical Flow#

Laboratory 8
Last updated August 17, 2022

00. Content #

Mathematics

Taylor series
Vector fields
Polar coordinates

Programming Skills

Vectorization

Embedded Systems

Thonny and MicroPython

0. Required Hardware #

Microcontroller: Raspberry Pi Pico
Breadboard
USB connector
Camera

Write your name and email below:

Name: me

Email: me @purdue.edu

import numpy as np
import matplotlib.pyplot as plt
import cv2

1. Optical Flow and Motion Tracking #

Optical flow is the motion and light patterns observed when an object and camera move relative to each other. The velocity of objects in a video can be estimated using optical flow. This is especially relevant in computer vision tasks when the goal is motion-based object detection or robot navigation.

Let \(f(x,y,t)\) be a continuous grayscale video mapping \(\mathbb{R}^3\) to \(\mathbb{R}\). The variables \(x\) and \(y\) represent pixel coordinates and \(t\) is the frame number. The output is the pixel intensity, a integer between \(0\) and \(255\), at location \((x,y)\) in frame \(t\). When we read in an image in Python we are observing a sampling of points of \(f(x,y,t)\) which we will call \(f_d(n,m,t)\). So, \( f_d(n,m,t) = f(n\Delta i, m \Delta j,t),\) where \(\Delta i\) and \(\Delta j\) are the sampling distances along the x and y-axis, respectively.

Let’s assume there is a object at the location \((x,y,t)\) and after a small \(\Delta t\) step forward in time, the object moves from \((x,y)\) to \((x+\Delta x,y+\Delta y)\) in the video. Using Taylor series to expand \(f(x,y,t)\) and truncating after the first degree terms, we can estimate the velocity of the object.

\[\begin{align*} f(x+\Delta x,y+\Delta y,t+\Delta t) \approx f(x,y,t) + \frac{\partial f}{dx}\Delta x + \frac{\partial f}{dy}\Delta y + \frac{\partial f}{dt}\Delta t \end{align*}\]

Assuming the object’s brightness does not change as it moves, \(f(x+\Delta x,y+\Delta y,t+\Delta t)=f(x,y,t)\), so

\[\begin{align*} \frac{\partial f}{dx}\Delta x + \frac{\partial f}{dy}\Delta y + \frac{\partial f}{dt}\Delta t = 0 \end{align*}\]

Divide both sides by \(\Delta t\) to get

\[\begin{align*} \frac{\partial f}{dx} \frac{\Delta x}{\Delta t} + \frac{\partial f}{dy} \frac{\Delta y}{\Delta t} + \frac{\partial f}{dt} = 0 \end{align*}\]

Notice that \(\frac{\partial f}{dx}\) and \(\frac{\partial f}{dy}\) are the image gradients we calculated in the lab on edge detection, and \(\frac{\partial f}{dt}\) is the pixel-wise difference between frame \(t+\Delta t\) and frame \(t\). That means in the last equation the velocity components \(\frac{\Delta x}{\Delta t}\) and \(\frac{\Delta y}{\Delta t}\) are unknown and the other 3 terms \(\frac{\partial f}{dx},\frac{\partial f}{dy},\) and \(\frac{\partial f}{dt}\) we can calculate, but we can’t solve for 2 unknowns with only one equation. We’re going to need some more assumptions.

Exercise 1 #

Look up “the aperature problem” as it relates to optical flow. In a few sentences, describe what it is and how it impacts the type of objects we are able to track with optical flow.

Write Answers for Exercise 1 Below

Farneback Optimal Flow#

The assumption that \(f(x+\Delta x,y+\Delta y,t+\Delta t)=f(x,y,t)\) is reasonable for most situations. However, if there is a lot of change from frame to frame, for example a helicopter-mounted camera undergoing strong vibrations, then our results will not be as good. In 2003, Gunnar Farneback proposed a solution to this problem. To fully explain his method, a good knowledge of linear algebra and least squares minimization is necessary, but we will give a general overview.

The idea behind Farneback’s algorithm for calculating optimal flow is:

Starting with the previous frame, reduce the resolution of the image.
Estimate the values of a small patch of pixels with a quadratic polynomial (like how we did polynomial interpolation in 1 dimension but this is in 2 dimensions).
In the next frame, do the same quadratic estimation.
With the 2 simplified quadratic functions, it’s “easy” to calculate the displacement of the patch of pixels (this is where the linear algebra and least squares minimization comes into play).
Increase the resolution of the frame and repeat the process again.

By repeating on higher and higher resolution images, we are able to capture both big and small displacements. After refining how much displacement there is between the two frames, the result is \(\frac{\Delta x}{\Delta t}\) and \(\frac{\Delta y}{\Delta t}\) for each pixel in the image, so we call this type of process dense optimal flow.

Exercise 2 #

Read through the following cell. In a few sentences, describe what the code does.

Write Answers for Exercise 2 Below

answer here

vid = cv2.VideoCapture('test_vid.mov')
height = vid.get(cv2.CAP_PROP_FRAME_HEIGHT)
width  = vid.get(cv2.CAP_PROP_FRAME_WIDTH) 
scale = 0.25
new_size = (int(width*scale),int(height*scale))

success, frame1 = vid.read()
frame1 = cv2.resize(frame1,dsize=new_size)
previous = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)

while True:
    success, frame2 = vid.read()
    if not success:
        print("Unable to read frame. Exiting ...")
        break
    next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)
    next = cv2.resize(next,dsize=new_size)
    
    # flow is a numpy 2 dimensional array and 
    # flow[i,j] = [dx/dt,dy/dt] for pixel (i,j)
    flow = cv2.calcOpticalFlowFarneback(previous,next, flow = None, 
                    # don't worry about these parameters for now
                    pyr_scale = 0.5, 
                    levels = 3, 
                    winsize = 10, 
                    iterations = 3, 
                    poly_n = 5, 
                    poly_sigma = 1.1, flags = 0)
    cv2.imshow('next_frame',next)
    k = cv2.waitKey(30) & 0xff
    if k == ord('q'): # press Q on keyboard to stop
        break
    previous = next

vid.release()
cv2.destroyAllWindows()

At each step flow defines a vector field. At pixel location \([i,j]\), we define a vector \(\langle dx/dt, dy/dt\rangle\). This vector has a magnitude and a direction.

Exercise 3 #

What are the formulas to calculate the magnitude and direction of a vector \(\langle x,y \rangle\)?
Why would computing the magnitude and direction of \(\langle x,y \rangle\) be the same as converting the Cartesian ordered pair \((x,y)\) to polar coordinates?

Write Answers for Exercise 3 Below

Exercise 4 #

Pick two frames to calculate flow. Use the matplotlib function quiver to plot the vector field using arrows.

Write Answers for Exercise 4 Below

# example for a plotting a constant vector field where each vector is <1,1>
x,y = np.meshgrid(np.linspace(start=0, stop=10, num=10, endpoint=False), 
                   np.linspace(start=0, stop=10, num=10 , endpoint=False))
dxdt = 1
dydt = 1
plt.quiver(x,y,dxdt,dydt,color='b', units='xy', scale=1)
plt.show()

Another way to visualize the vector field is using color to represent angles and the brightness of the color to represent the magnitude. The HSV or (Hue Saturation Value) color space is the perfect candidate. We are going to relate Hue to the angle of the vector. The image below shows each hue at full saturation for different angles.

Exercise 5 #

Fill in the code below to compute the magnitude and direction of the velocity vectors at each pixel location. Now we can visualize how the vector field changes from frame to frame and in what direction the objects in the frame are moving.

Write Answers for Exercise 5 Below

cap = cv2.VideoCapture('test_vid.mov')
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)
width  = cap.get(cv2.CAP_PROP_FRAME_WIDTH) 
scale = 0.25

ret, frame1 = cap.read()
frame1 = cv2.resize(frame1,dsize=(int(width*scale),int(height*scale)))
previous = cv2.cvtColor(frame1,cv2.COLOR_BGR2GRAY)
hsv = np.zeros_like(frame1)
hsv[...,1] = 255 # always at full saturation

while True:
    ret, frame2 = cap.read()
    if not ret:
        print("Unable to read frame. Exiting ...")
        break
    next = cv2.cvtColor(frame2,cv2.COLOR_BGR2GRAY)
    next = cv2.resize(next,dsize=(int(width*scale),int(height*scale)))
    flow = cv2.calcOpticalFlowFarneback(previous,next, flow = None, 
                    pyr_scale = 0.5, 
                    levels = 3, 
                    winsize = 10, 
                    iterations = 3, 
                    poly_n = 5, 
                    poly_sigma = 1.1, flags = 0)

    # compute magnitute and angle for each vector in flow
    # mag and ang should have the same dimensions as the variable next 
    mag = # fill in
    ang = # fill in
    # ang needs to be between 0 and 2pi

    hsv[...,0] = ang*180/np.pi / 2 # convert radians to degrees then divide by 2
    hsv[...,2] = cv2.normalize(mag,None,0,255,cv2.NORM_MINMAX)
    rgb = cv2.cvtColor(hsv,cv2.COLOR_HSV2BGR)

    cv2.imshow('frame2',rgb)
    k = cv2.waitKey(30) & 0xff
    if k == ord('q'): # press Q on keyboard to stop
        break
    previous = next

cap.release()
cv2.destroyAllWindows()

2. Repeat with Your Own Video #

As you did in the previous lab, you will be repeating some of exercises again with your own recorded video. When recording your video, take into account what is being done in the lab and consider if the video you have will produce good results.

Exercise 6 #

Save a few seconds worth of frames from the video feed. There needs to be at least one moving object. Pick two frames to calculate flow and plot the associated vector field.

Write Answers for Exercise 6 Below

Exercise 7 #

Use the completed code from Exercise 5 to show the optical flow of your own video.

Write Answers for Exercise 7 Below

Reflection #

1. What parts of the lab, if any, do you feel you did well?
2. What are some things you learned today?
3. Are there any topics that could use more clarification?
4. Do you have any suggestions on parts of the lab to improve?