Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.
Also, please be sure to always specify units of any quantities that have units, and label axes of plots (again, with units when appropriate).
import numpy as np
import matplotlib.pyplot as plt
rng = np.random.default_rng(123)
Suppose that the probability that someone comes down with the flu depends on two things:
(a) Suppose that the probability that someone with total exposure $E$ person-minutes and antibody binding $B$ BAU/mL is $$ p(E, B) = \frac{1}{1 + e^{-(a E + b B + c)}}, $$ with $a=1/100$, $b=-1/300$, and $c=-3$. Write a function that, given arrays of the same length of $E$ and $B$ values, returns an array of 0's and 1's, one for each $E, B$ pair, so that the $i^\text{th}$ entry is 1 with probability $p(E_i, B_i)$. (A "1" in the $i^\text{th}$ entry will indicate that the $i^\text{th}$ person caught the flu.)
(b) Create one simulated dataset using the following values of $E$ and $B$:
E = np.array([136, 537, 447, 176, 218, 283, 513, 466, 603, 219, 44, 606, 410,
536, 336, 212, 521, 211, 433, 464, 404, 575, 171, 257, 843, 272,
271, 196, 149, 304, 187, 218, 94, 345, 318, 234, 455, 653, 193,
288, 178, 635, 174, 135, 342, 523, 353, 544, 220, 426, 191, 221,
223, 230, 432, 563, 210, 174, 223, 176, 417, 227, 310, 321, 310,
220, 571, 658, 279, 518, 235, 328, 175, 464, 612, 242, 185, 352,
212, 335, 276, 234, 249, 421, 358, 300, 167, 209, 492, 584, 765,
277, 162, 156, 217, 599, 399, 144, 292, 125])
B = np.array([ 330, 581, 1381, 2013, 1144, 1571, 1151, 1293, 983, 1101, 279,
714, 1880, 676, 464, 1514, 117, 584, 1015, 420, 202, 1605,
1540, 989, 962, 1407, 1333, 675, 300, 379, 711, 925, 1219,
490, 702, 1086, 950, 1126, 263, 713, 1343, 309, 630, 1074,
1305, 1468, 970, 1422, 754, 508, 872, 1137, 1648, 1217, 1731,
1077, 1353, 742, 331, 1263, 962, 1116, 248, 971, 1929, 261,
1367, 779, 1814, 295, 594, 421, 671, 1408, 1076, 1613, 2142,
1127, 596, 813, 497, 1219, 2021, 1546, 558, 884, 307, 778,
378, 1473, 386, 1162, 365, 387, 2805, 1058, 1367, 1822, 1194,
1307])
(c) Fit a logistic model to the data,
inferring the values of $a$, $b$ and $c$.
Make a plot showing the predicted chance of getting the flu
using these estimated paramters
as a function of exposure, both at $B=200$ and $B=2000$.
Compare these curves to the true curves obtained with the parameters used to simulate the data.
(You can fit the model with scikit-learn.linear_model.LogisticRegression
, as we did in class,
or by implementing the likelihood.)