Homework 6: Correlations.¶
Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.
Also, please be sure to always specify units of any quantities that have units, and label axes of plots (again, with units when appropriate).
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
1. Ever Upwards¶
You are part of a team aiming to predict future costs for a coffee shop, and are given the following model. Let $X_0 = \$1.50$ be the price (to the shop) of a cup of coffee today, and model the price $n$ weeks from now as $X_n = X_{n-1} + Z_n$, where each $Z_n$ has a Normal distribution with mean \$0.10 and standard deviation \$0.10, and is independent of other $Z$. We want to see how well we can predict prices for the next 10 weeks under this model.
(a) If we define $Z = (Z_1, Z_2, \ldots, Z_{10})$, and $X = (X_1, X_2, \ldots, X_{10})$, then (taking $X$ and $Z$ to be column vectors) we can write $X = X_0 + AZ$ for some matrix $A$. What is that matrix?
(b) What is the mean and covariance matrix of $X$? Explain, and check by simulation.
2. Books by a different name¶
In class, we did PCA on word count data from passages from three books. The passages are in the file data/passages.txt and the sources of each passage are in data/passage_sources.tsv. Repeat the analysis. You may use the same code from class to read in and process the data, but you should use scikit-learn to do the PCA. Your results should be similar but not the same as those from class, since scikit-learn's implementation differs somewhat. Also, you don't need to show everything that we did in class (use your judgement) but we encourage you to explore.
Note: part of this question is to figure out how what another method gives you maps on to what we discussed in class. Big clues are provided by the sizes of various outputs.
3. The Matrix¶
The secret vault can only be unlocked by a stream of numbers satisfying certain statistical properties. You can pass in 5 floating-point numbers at a time, and each set of 5 must be related to eachother in the following way: they should be Normally distributed with mean zero and the ($5 \times 5$) covariance matrix: $$\begin{aligned} M_{ij} = (1+i+j) \times 2^{-|i-j|} \qquad \text{for } 1 \le j \le 5, \quad 1 \le i \le 5 . \end{aligned}$$ Write a function to produce a random set of 5 numbers of this form, and test the result by verifying that (a) $\text{var}[X_2] = 5$ and (b) $\text{cov}[X_3,X_5] = 2.25$.