Homework 2: Random variables¶

Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.

Several times below I ask you to compare the results of a simulation to a theoretical distribution. Here is a function that makes this easy:

In [1]:

import numpy as np
import pandas as pd
rng = np.random.default_rng()

def comparison_table(x, expected, *sims):
    """
    Returns a pandas DataFrame with columns corresponding to:
    x: the possible values
    expected: the expected frequencies these should happen at
    sim1, sim2, ...: these are vectors of simulated values that
        will be tabulated, and the frequencies of each of the values
        of x will be put in a column in the result.
    """
    df = pd.DataFrame(data={"x" : x})
    df['expected'] = expected
    for k, sim in enumerate(sims):
        total = len(sim)
        n = [np.sum(sim == y)/total for y in x]
        df[f"sim{k}"] = n
    df.style.format("{:.3f}")
    return df

For instance, here I'm simulating 1000 draws, twice, from a fair die (i.e., uniform draws from 1, 2, 3, 4, 5, 6) and using the function to see if it's close to the expected proportions of 1/6. Note that expected is a vector of length 6, but sim1 and sim2 are vectors of length 1000. A good way to check your numbers are "close enough" is to do the simulations two or three times (like I do here), and so check that the differences between simulations are similar to the difference to the expected column.

In [2]:

n = 1000
x = np.arange(1, 7)
expected = np.repeat(1/6, 6)
sim1 = rng.choice(x, size=n, replace=True)
sim2 = rng.choice(x, size=n, replace=True)
comparison_table(x, expected, sim1, sim2)

Out[2]:

	x	expected	sim0	sim1
0	1	0.166667	0.157	0.174
1	2	0.166667	0.180	0.178
2	3	0.166667	0.159	0.155
3	4	0.166667	0.152	0.155
4	5	0.166667	0.175	0.159
5	6	0.166667	0.177	0.179

1. The geometric distribution¶

A random variable $X$ has a geometric distribution with parameter $p$ if $\mathbb{P}\{ X = k \} = (1-p)^{k-1} p$, for $k \ge 1$ (at least, that's the version that np.random simulates from, as we learned in HW1).

(a) This has the following interpretation: suppose you are trying to do something; on each attempt the chance of success is $p$, independently of how many times you've tried; then $X$ is the number of attempts before you succeed (including the last). Write a function to simulate a random number in this way (by explicitly simulating the failures and successes).

(b) Check your answer in (a) by comparing the distribution you get to the Geometric distribution. You may do this by simulating at least 10,000 draws using your function in (a) with $p=0.2$ and comparing the proportion of these that are $k$ to $(1-p)^{k-1} p$, for $k$ between 1 and 10.

(c) Make up a story about a situation in which you'd get a Geometric distribution.

2. The Binomial distribution¶

A random variable $X$ has a Binomial($n$, $p$) distribution if $\mathbb{P}\{ X = k \} = \binom{n}{k} p^k (1-p)^{n-k}$, where $\binom{n}{k} = n! / (k! (n-k)!)$, for $0 \le k \le n$.

(a) This has the following interpretation: suppose you try to do something $n$ times, and each time the chance you succeed is $p$, independently of everything else. $X$ is the total number of successes. Write a function to simulate a random number in this way.

(b) Check your function by simulating at least 10,000 random draws with $n=20$ and $p=0.3$, and making a table comparing the observed and expected proportions of these draws that are $k$ for each $0 \le k \le 20$.

(c) Make up a story in which you'd get a Binomial distribution.

Note: you can do the factorial, $n!$, in numpy by np.math.factorial( ).

3. The Poisson distribution¶

A random variable $X$ has a Poisson($\lambda$) distribution if $\mathbb{P}\{ X = k \} = \frac{\lambda^k}{k!} e^{-\lambda}$, for $k \ge 0$.

(a) The Poisson distribution is a good approximation for "the number of rare events", i.e., for the Binomial when $n$ is large but $p$ is small. Simulate at least 10,000 draws from the Poisson($5$) and compare their distribution to the same number of draws from the Binomial(10, 0.5), the Binomial(100, 0.05), and the Binomial(1000, 0.005). (Note: you may use rng.binomial( ) instead of the function you wrote above.) Explain at least one difference in the distributions you see between the Poisson and Binomial that gets smaller as $n$ gets bigger.

(b) Make up a story in which you might get the Poisson distribution.

(c) Suppose that $X \sim \text{Poisson}(5)$. Write down a mathematical expression for $\mathbb{E}[X (X-1)]$ using the definition of expectation, and evaluate it either with math or simulation.

4. The Normal distribution¶

The Normal distribution is additive, meaning that if $X_1$ is Normal with mean $\mu_1$ and variance $\sigma^2_1$ and $X_2$ is Normal with mean $\mu_2$ and variance $\sigma^2_2$, independent of $X_1$, then $X_1 + X_2$ is again Normal, with mean $\mu_1 + \mu_2$ and variance $\sigma_1^2 + \sigma_2^2$.

(a) Simulate a large number of draws of $X_1$ and $X_2$ above with $\mu_1 = 0$, $\sigma^2_1 = 2$, $\mu_2 = 3$, and $\sigma^2_2 = 1.5$, and compare the distribution to Normal(3, 3.5) by plotting the histogram of $X_1 + X_2$ and the histogram of draws from the second Normal.

(b) The Exponential distribution is not additive in this way; show this by plotting a histogram of $Y_1 + Y_2$, where $Y_1$ and $Y_2$ are independent Exponential(1).