Homework 6: Correlations.¶
Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.
Also, please be sure to always specify units of any quantities that have units, and label axes of plots (again, with units when appropriate).
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
1. Y given X¶
Let $X \sim \text{Normal}(\text{mean}=0, \text{sd}=1)$, and given $X$, let $$Y = a X + \epsilon,$$ where $\epsilon \sim \text{Normal}(\text{mean}=0, \text{sd}=\sigma)$ is independent of $X$.
(a) Find $\text{cov}[X,Y]$ and $\text{cor}[X,Y]$ by doing math.
(b) Find $\text{cov}[X,Y]$ and $\text{cor}[X,Y]$ by simulation with $a=2$ and $\sigma=0.5$, and include a scatter plot of their joint distribution.
(c) Since $Y - aX = \epsilon$, clearly $\text{cov}[X, Y - aX] = 0$. Show that for any two random variables $X$ and $Y$ with nonzero variance, if $a = \text{cov}[X,Y] / \text{var}[X]$ then $X$ and $Z = Y - aX$ are uncorrelated (i.e., $\text{cov}[X,Z] = 0$).
2. Random walk¶
Consider the matrix $$ A = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 1 & 1 & 0 & 0 \\ 1 & 1 & 1 & 1 & 0 \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix} . $$ Let $Z$ be a vector of five independent draws from the Normal(0, 1) distribution. What is the covariance matrix of $$ X = A Z ? $$ Explain, and check by simulation.
3. Two clouds¶
Consider the following model: $$\begin{aligned} U &= \begin{cases} 0 \qquad &\text{with probability 1/2} \\ 1 \qquad &\text{with probability 1/2} \end{cases} \\ X_j &= \text{Normal}\left( \text{mean}= U, \text{sd}=7/5 \right) \qquad \text{for}\quad 1 \le j \le 50 . \end{aligned}$$ (In words: $X$ is a 50-dimensional vector of independent draws from a Normal distribution; these all have the same mean, $U$; and this mean is either 0 or 1, with probability 1/2 each.)
(a) Simulate 1,000 independent samples from this model;
the result should be an array of shape (1000, 50)
.
(Note: each row should have it's own, independent, simulated value for $U$.)
Treat this is a matrix of data with 1000 observations of 50 variables.
(b) Plot some of these "variables" against each other, colored by the value of $U$.
(c) Carry out principal components analysis for these data, and show the scree plot, the positions of the 1,000 data points on the first two PCs, and the loadings of the 50 variables on these two PCs.