Homework 2: Random variables¶
Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.
Several times below I ask you to compare the results of a simulation to a theoretical distribution. Here is a function that makes this easy:
import numpy as np
import pandas as pd
import math
rng = np.random.default_rng()
def comparison_table(x, expected, *sims):
"""
Returns a pandas DataFrame with columns corresponding to:
x: the possible values
expected: the expected frequencies these should happen at
sim1, sim2, ...: these are vectors of simulated values that
will be tabulated, and the frequencies of each of the values
of x will be put in a column in the result.
"""
df = pd.DataFrame(data={"x" : x})
df['expected'] = expected
for k, sim in enumerate(sims):
total = len(sim)
n = [np.sum(sim == y)/total for y in x]
df[f"sim{k}"] = n
df.style.format("{:.3f}")
return df
For instance, here I'm simulating 1000 draws, twice, from a fair die (i.e., uniform draws from 1, 2, 3, 4, 5, 6)
and using the function to see if it's close to the expected proportions of 1/6. Note that expected
is a vector of length 6, but sim1
and sim2
are vectors of length 1000.
A good way to check your numbers are "close enough" is to do the simulations two or three times (like I do here),
and so check that the differences between simulations are similar to the difference to the expected column.
n = 1000
x = np.arange(1, 7)
expected = np.repeat(1/6, 6)
sim1 = rng.choice(x, size=n, replace=True)
sim2 = rng.choice(x, size=n, replace=True)
comparison_table(x, expected, sim1, sim2)
x | expected | sim0 | sim1 | |
---|---|---|---|---|
0 | 1 | 0.166667 | 0.180 | 0.166 |
1 | 2 | 0.166667 | 0.183 | 0.173 |
2 | 3 | 0.166667 | 0.137 | 0.156 |
3 | 4 | 0.166667 | 0.151 | 0.172 |
4 | 5 | 0.166667 | 0.166 | 0.170 |
5 | 6 | 0.166667 | 0.183 | 0.163 |
1. The Binomial distribution¶
A random variable $X$ has a Binomial($n$, $p$) distribution if $\mathbb{P}\{ X = k \} = \binom{n}{k} p^k (1-p)^{n-k}$, where $\binom{n}{k} = n! / (k! (n-k)!)$, for $0 \le k \le n$.
(a) This has the following interpretation: suppose you try to do something $n$ times, and each time the chance you succeed is $p$, independently of everything else. $X$ is the total number of successes. Write a function to simulate a random number in this way.
(b) Check your function by simulating at least 10,000 random draws with $n=20$ and $p=0.3$, and making a table comparing the observed and expected proportions of these draws that are $k$ for each $0 \le k \le 20$.
(c) Make up a story in which you'd get a Binomial distribution, being clear exactly which are the numbers that could be plotted to show the distribution.
Note: you can do the factorial, $n!$, by math.factorial( )
.
2. The Poisson distribution¶
A random variable $X$ has a Poisson($\lambda$) distribution if $\mathbb{P}\{ X = k \} = \frac{\lambda^k}{k!} e^{-\lambda}$, for $k \ge 0$.
(a) The Poisson distribution is a good approximation for
"the number of rare events", i.e., for the Binomial when $n$ is large but $p$ is small.
Simulate at least 10,000 draws from the Poisson($5$) distribution
and compare their distribution to
the same number of draws from the Binomial(10, 0.5), the Binomial(100, 0.05), and the Binomial(1000, 0.005).
(Note: you may use rng.binomial( )
instead of the function you wrote above.)
(b) Make up a story in which you might get the Poisson distribution, being clear exactly which are the numbers that could be plotted to show the distribution.
(c) Suppose that $X \sim \text{Poisson}(5)$. Write down a mathematical expression for $\mathbb{E}[X (X-1)]$ using the definition of expectation, and evaluate it either with math or simulation.
3. The Normal distribution¶
The Normal distribution is additive, meaning that if $X_1$ is Normal with mean $\mu_1$ and variance $\sigma^2_1$ and $X_2$ is Normal with mean $\mu_2$ and variance $\sigma^2_2$, independent of $X_1$, then $X_1 + X_2$ is again Normal, with mean $\mu_1 + \mu_2$ and variance $\sigma_1^2 + \sigma_2^2$.
(a) Simulate a large number of draws of $X_1$ and $X_2$ above with $\mu_1 = 0$, $\sigma^2_1 = 2$, $\mu_2 = 3$, and $\sigma^2_2 = 1.5$, and compare the distribution to $X_3 \sim \text{Normal}(3, 4.5)$ by plotting the histogram of $X_1 + X_2$ and the histogram of draws from the second Normal.
(b) One way the Normal distribution arises is by adding together lots of independent things (e.g., the cumulative effect of lots of small, additive errors). Make up a story in which you might get a Normal distribution, being clear exactly which are the numbers that could be plotted to show the distribution.