# Week 10: Evaluating Random Number Generators

<font size="6"> Laboratory 7 </font> <br>
<font size="3"> Last updated July 28, 2023 </font>

## <span style="color:orange;"> 00. Content </span>

<font size="5"> Mathematics </font>
- N/A
     
<font size="5"> Programming Skills </font>
- Functions
- Arrays
    
<font size="5"> Embedded Systems </font>
- Thonny & MicroPython

## <span style="color:orange;"> 0. Required Hardware </span>

- Raspberry Pi Pico
- MAX30102 Pulse sensor
- Camera (Arducam HM01B0)
- Ceramic Capacitor
- Breadboard
- USB connector
- At least 8 Wires

<h3 style="background-color:lightblue"> Write your name and email below: </h3>

**Name:** 

**Email:** 

## <span style="color:orange;">1. Mathematical Tests for RNGs </span>

Given the importance of random number generators (RNGs) in areas such as cryptography, it is clear that there is a need for secure RNGs. The National Institute of Standards and Technology (NIST) has developed formal tests to assist in evaluating the randomness of a randomly generated binary sequence consisting of 0s and 1s. In this lab, we will implement several of these NIST tests and attempt to determine which of the RNGs from the previous lab perform the best.

### <span style="color:red"> Exercise 1</span>

Pick a method from the previous lab to create a text file, `rng_test_data.txt` with at least 100 lines of bytes. The more random the data is, the more likely it is to pass the NIST RNG tests. Hypothesize whether or not your data will pass the NIST RNG tests.

<h3 style="background-color:lightblue"> Write Answer for Exercise 1 Below </h3>

In [None]:
import numpy as np

# importing the first 3 rows of the data as a string
# prints the first 3 values as one string
x = np.loadtxt("rng_test_data.txt", dtype=str, max_rows=3) 
seq = ''.join(x)
print(seq)

# importing all data as a floating point number
# prints the first 3 values of the numpy array
y = np.loadtxt("rng_test_data.txt", dtype=str)
y = np.array([float(int(i, 2)) for i in y])
print(y[:3])



### Monobit Frequency Test

An ideal binary random number generator would have approximately the same number of ones and zeros in a sequence. The Monobit Frequency Test quantifies how close the fraction of ones in the sequence matches $\frac{1}{2}$. The steps of the test are as follows:

1. For each bit within the given binary sequence $\varepsilon_1,\varepsilon_2,\dots,\varepsilon_{n}$, where $\varepsilon_i$ is a binary bit, apply the transformation $X_i = 2\varepsilon_i - 1$ for $i=1,2,\dots,n$. This transformation turns $0$s and $1$s into $\pm 1$
2. Compute $S_n$, the sum of the transformed sequence: $ S_n = \sum_{i=1}^{n} X_i $
1. Compute the value of $s$ using the formula: $ s = \frac{|S_n|}{\sqrt{n}} $
1. If $s > 2.57583$, then conclude that the sequence $(\varepsilon_1,\varepsilon_2,\dots,\varepsilon_n)$ is non-random. The best-case scenario for this computation is when $s = 0$, indicating an equal number of $1$s and $0$s in the sequence.

According to NIST, the parameter $n$ (the length of the binary sequence) should be at least 100.

### <span style="color:red"> Exercise 2</span>

__Part 1:__ In Step 1, describe how the transformation affects the ones and zeros in the original random sequence.

<h3 style="background-color:lightblue"> Write Answers for Exercise 2 Part 1 Below </h3>

__Part 2:__ If $|X_1+X_2+\cdots+X_n|$ is large, what does that imply about the original sequence $(\varepsilon_1,\varepsilon_2,\dots,\varepsilon_{n})$?

<h3 style="background-color:lightblue"> Write Answers for Exercise 2 Part 2 Below </h3>

__Part 3:__ Implement the Monobit Frequency Test on the test data.

<h3 style="background-color:lightblue"> Write Answers for Exercise 2 Part 3 Below </h3>


### Monobit Block Frequency Test

This test assesses the proportion of ones and zeros in a random sequence but on a block level, rather than the entire sequence. Here are the steps:

1. Divide the binary data values $X_1,X_2,\dots,X_{M\cdot N}$ into $N$ non-overlapping blocks of length $M$.
1. For each block, calculate the proportion of ones that appear in the block's length $M$ sequence. Denote the proportion of ones in block $i$ as $\pi_i$ for $i=1,2,\dots,N$.
1. Compute the value of $t$ using the formula: $ t = 4M \sum_{i=1}^N \left(\pi_i-\frac12 \right)^2 $
1. If $G(N/2,t/2) < 0.01$, then conclude that the sequence $(X_1,X_2,\dots,X_{M\cdot N})$ is non-random. The function $G$ represents the normalized incomplete Gamma function.

NIST recommends using values such that $20 \leq M < 100$ and $N \leq 100$.

### <span style="color:red"> Exercise 3</span>

__Part 1:__ If $t$ is large, what does that imply about the original sequence $(X_1,X_2,\dots,X_{M\cdot N})$

__Part 2:__ Choose appropriate values for $M$ and $N$. Implement the Monobit Block Frequency Test on the test data.

<h3 style="background-color:lightblue"> Write Answers for Exercise 3 Below </h3>

In [None]:
# here's how to import and use the normalized incomplete Gamma function.
from scipy.special import gammaincc as G

G(N/2, t/2) # you will first need to define N and t

### Cumulative Sum Test

This test helps determine if the cumulative sum of the adjusted partial sequences is too large or small. For many non-random sequences, the cumulative sums will deviate from zero. Here are the steps:

1. Apply the transformation $Y_i = 2X_i - 1$ to the binary data values $X_1,X_2,\dots,X_{n}$ for $i=1,2,\dots,n$.
1. Compute the forward partial sums
\begin{align*}
    S_1 &= Y_1 \\
    S_2 &= Y_1+Y_2 \\
    S_3 &= Y_1+Y_2+Y_3\\
    \vdots \\
    S_n &= Y_1 + Y_2 + \cdots Y_n \\
\end{align*} 
and the backward partial sums
\begin{align*}
    T_1 &= Y_n \\
    T_2 &= Y_n+Y_{n-1} \\
    \vdots \\
    T_n &= Y_n + Y_{n-1} + \cdots Y_1 \\
\end{align*}
3. Find the maximum values: 

$$z_{\text{forward}} = \max\{|S_1|,|S_2|,\dots,|S_n|\}, \quad z_{\text{backward}} = \max\{|T_1|,|T_2|,\dots,|T_n|\}$$

4. Compute the values of $\rho$ using the formula:
\begin{equation*}
    \rho = 1 -
        \sum_{k=\lfloor \frac{-n/z+1}{4} \rfloor}^{\lfloor \frac{n/z-1}{4} \rfloor} 
            \left[ \Phi\left(\frac{(4k+1)z}{\sqrt{n}}\right) - 
                    \Phi\left(\frac{(4k-1)z}{\sqrt{n}}\right) \right]  +
        \sum_{k=\lfloor \frac{-n/z-3}{4} \rfloor}^{\lfloor \frac{n/z-1}{4} \rfloor} 
            \left[ \Phi\left(\frac{(4k+3)z}{\sqrt{n}}\right) - 
                    \Phi\left(\frac{(4k+1)z}{\sqrt{n}}\right) \right]
\end{equation*} 
where $z=z_{\text{forward}}$ and $z=z_{\text{backward}}$. Here, $\Phi$ represents the cumulative distribution function (cdf) of the standard normal distribution, and $\lfloor \cdot \rfloor$ denotes the floor function that returns the largest integer less than or equal to the input.
1. If $\rho < 0.01$, then conclude that the sequence $(X_1,X_2,\dots,X_{n})$ is non-random.


### <span style="color:red"> Exercise 4</span>

Choose a value for $n$. NIST suggests $n \geq 100$. Implement the Cumulative Sum Test forwards and backwards on the test data.

<h3 style="background-color:lightblue"> Write Answers for Exercise 4 Below </h3>

In [None]:
# here's how to import and use the cdf of the standard normal distribution
from scipy.stats import norm
import numpy as np

norm.cdf(x) # you will first need to define x 

np.floor(1.2)

### Runs Test

The runs test measures the degree of oscillation between zero and one in a random sequence. The test formalizes the idea that, for a truly random sequence, the probability of the next value being different from the previous value follows a binomial distribution.

A run is defined as a series of consecutive values of the same kind. For example, the sequence $0011010000100111$ has 8 runs. Here are the steps of the test:

1. Count the number of runs $R$ in the sequence $(X_1,X_2,\dots,X_n)$.
1. Let $n_0$ be the number of zeros and $n_1$ be the number of ones in the sequence. Compute the value of $q$ using the formula: $ q = \frac{|R - \overline{R}|}{u} $ where $ \overline{R} = \frac{2n_0n_1}{n}+1,\quad u^2=\frac{2n_0n_1(2n_0n_1-n_0-n_1)}{n^2(n+1)}. $ Here,
$\overline{R}$ represents the expected number of runs in the sequence.
1. If $q > 2.57583$, then conclude that the sequence $(X_1,X_2,\dots,X_n)$ is non-random.

### <span style="color:red"> Exercise 5</span>

Choose a value for $n$. NIST suggests $n \geq 100$. Implement the Runs test on the test data.

<h3 style="background-color:lightblue"> Write Answers for Exercise 5 Below </h3>

### <span style="color:red"> Exercise 6</span>

We will say that a random sequence passes if all 5 tests (Monobit Frequency, Monobit Block Frequency, Forward Cumulative Sum, Backward Cumulative Sum, and Runs) pass.

Compare the pass rates among the random number generators using Method A, B, C, and D based on at least 10 runs, resulting in a total of 40 random sequences from the collected data.

In two paragraphs, summarize the results, including the details of the parameters chosen for the tests and RNGs. Which generators appear to be performing well? Is there a test that frequently yields failures? Do you believe further testing is necessary? Why or why not?

<h3 style="background-color:lightblue"> Write Answers for Exercise 6 Below </h3>

In [None]:
# to help manage your files,
# in the same directory as this lab .ipynb file, create a folder called folder_of_data (or whatever name you like)
# and move rng_test_data.txt into the folder
# then you can load your data with the updated path to rng_test_data.txt
x = np.loadtxt("folder_of_data/rng_test_data.txt", dtype=str, max_rows=3) 
seq = ''.join(x)
print(seq)

# another tip for file naming
# it is easy to load your files if you name them well
for run_number in range(1,10+1):
    print(f"method_A/run_{run_number}.txt")

## <span style="color:green;"> Reflection </span>

1. Was your hypothesis about the original `rng_test_data.txt` correct? Explain.
2. Which part of the lab did you find the most challenging?
3. Which part of the lab was the easiest?

<h3 style="background-color:lightblue"> Write Answers for the Reflection Below </h3>

## References

1. Rukhin, Andrew, et al. “A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications.” *NIST SP 800-22 Rev. 1: A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications*, Apr. 2010, https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-22r1a.pdf. 