Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.
Also, please be sure to always specify units of any quantities that have units, and label axes of plots (again, with units when appropriate).
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(123)
Make up a situation in which you might have data like: $$ Y_i \sim \text{Binomial}\left(1, \frac{1}{1 + e^{-a + b X_i}} \right) . $$ (In other words, $Y_i = 0$ or 1 with logistic probabilities.)
(a) Describe the situation in words, including choosing values for $a$ and $b$. Make a plot of $ \frac{1}{1 + e^{-a + b x}}$ against $x$.
(b) Simulate 1000 observations from this model (with values for $X$ drawn from some reasonable distribution).
(c) Fit a logistic linear model to your simulated data. Identify the estimates of $a$ and $b$ (they should be close to the real values!).
In the file data/whales.csv (direct link: github) are the (average) body mass (in kg) and length (in m) of 43 modern cetacean species, from this dataset. Suppose that someone has found fossils of a new species of whale that is about 8m long, and they'd like to estimate how much it weighed, based on the length-weight relationship of modern species.
(a) Read in the data and make a plot of mass against length. Also make a plot of log(mass) against log(length).
(b) Fit a linear model to predict log(mass) with log(length), i.e., find $a$ and $b$ so that $$ \log(\text{mass}) \approx a + b \log(\text{length}) , $$ and add the resulting best-fit line to the plot of log(mass) against log(length).
(c) The predicted mass of a whale of length $\ell$ is $e^{a + b \log(\ell)}$. Add this line to the original plot of mass against length.
(d) What is the predicted weight of the 8m long whale? How far off do you estimate this prediction to be? (You can get the margin of error either by looking at the plot or computing the standard deviations of the residuals on a log scale, and transforming.)