Homework 4: Statistics!¶
Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.
Also, please be sure to always specify units of any quantities that have units, and label axes of plots (again, with units when appropriate).
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng()
1. Mosquito bites¶
My kid's class all went camping, and came home with mosquito bites. Thanks to a post-trip poll, we know how many mosquito bites each of the 27 kids had. Here are the numbers:
bites = np.array([4, 5, 4, 2, 4, 8, 4, 6, 7, 5, 4, 0, 5, 7, 5, 3, 2, 0, 3, 4, 5, 3, 6, 1, 2, 3, 5])
Use maximum likelihood to fit a Poisson distribution to these data. To do this, you should
(a) make a plot of the Poisson likelihood as a function of $\lambda$, the mean of the Poisson, and
(b) use an optimization function (like scipy.optimize.minimize()
)
to find the value of $\lambda$ that maximizes the log-likelihood.
(note: the minimize
function works better if you use the log likelihood instead of the likelihood!)
(c) check your answer is sensible by comparing the distribution of the data to that expected under the model you've fit.
(d) Under this model, what proportion of kids do we expect to have zero mosquito bites? Answer this question with math, and check it with simulation.
2. Modeling proportions¶
The Beta distribution
can be used to model proportions:
it gives random numbers between 0 and 1,
and has two parameters: $\alpha$ and $\beta$.
If $X \sim \text{Beta}(\alpha, \beta)$ then
$$ \begin{aligned}
\mathbb{E}[X] &= \frac{\alpha}{\alpha + \beta} \\
\mathbb{E}[X^2] &= \frac{\alpha (\alpha-1)}{(\alpha + \beta)(\alpha + \beta - 1)} ,
\end{aligned}$$
and
$X$ has probability density
$$ f_X(u) =
\frac{ u^{\alpha - 1}(1 - u)^{\beta - 1} }{ B(\alpha, \beta) }.
$$
This density can be computed (as usual) with scipy.stats.beta.pdf
,
or by hand; in the latter case, $B(\alpha, \beta)$ can be computed with scipy.special.beta
.
Suppose we have data from many different tracts of forest of what proportion of the trees have burned, and we'd like to fit a Beta distribution to the data. These proportions are:
burned = np.array([
0.04, 0.55, 0.91, 0.64, 0.83, 0.62, 0.98, 0.7, 0.36, 0.73, 0.74, 0.28, 0.35, 0.65, 0.85, 0.9, 0.94,
0.11, 0.74, 0.48, 0.62, 0.66, 0.51, 0.79, 0.61, 0.66, 0.75, 0.86, 0.52, 0.84, 0.43, 0.61, 0.99, 0.85,
0.97, 0.46, 0.75, 0.61, 0.95, 0.76, 0.78, 0.89, 0.79, 0.92, 0.83, 0.84, 0.61, 0.52, 0.82, 0.87, 0.9,
0.58, 0.67, 0.42, 0.9, 0.4, 0.95, 0.98, 0.56, 0.94, 0.5, 0.84, 0.58, 0.91, 0.21, 0.54, 0.9, 0.64, 0.48,
0.82, 0.77, 0.63, 0.84, 0.97, 0.77, 0.96, 0.83, 0.9, 0.96, 0.52, 0.24, 0.92, 0.11, 0.96, 0.85, 0.62,
0.96, 0.67, 0.87, 0.78, 0.85, 0.88, 0.88, 0.68, 0.13, 0.9, 0.94, 0.49, 0.74, 0.99
])
(a) Make a plot of the likelihood surface for this data over the range $0.5 \le \alpha, \beta \le 4$.
(b) Estimate the values of $\alpha$ and $\beta$ that best fit the data by maximum likelihood.
(c) Using this model, in what proportion of forest tracts do you estimate less than 20% of the trees have burned?
3. Statistics statistics¶
We have given 25 statistics students a standardized test on statistics concepts both before and after taking a statistics class. Across the class, the mean improvement was 32 points (out of 100), with an SD of 30 points. Assuming that the $t$ test is appropriate, what can we conclude from this study? Please interpret the results, including degree of uncertainty.