Homework 2: Random variables¶

Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.

1. Modeling some counts¶

Consider the following model: $$\begin{aligned} N &\sim \text{Poisson}(\lambda) \\ K &\sim \text{Binomial}(N, p) , \end{aligned}$$ with $\lambda=20$ and $p=0.1$. In words, $N$ has a Poisson distribution with mean $\lambda=20$, and given the value of $N$, $K$ has a Binomial distribution with parameters $N$ and $p=0.1$ (i.e., $N$ trials and with probability $p$).

(a) What values can $N$ take? What values can $K$ take? What is the expected value of $K$?

(b) Make up a story for a situation in which a random quantity might be modeled using the distribution of $K$. Make explicit in your story what $N$ is, and how to get the (random) number which is the value of $K$.

(c) Simulate at least 1,000 draws from the distribution of $K$, and describe the result using a table or a histogram.

2. Tulips¶

You are working with a tulip farmer to improve the color of a new variety of blue tulip. After many measurements, in which you've summarized the color of each tulip flower by a single wavelength, you've determined that

  • color values range from about 450-500nm (possibly with occasional tulips outside this range)
  • the average tulip color is fairly blue: about 475nm
  • the standard deviation of color across flowers from a single plant tends to be around 5nm
  • however, some tulip plants produce more variable colors than others: most plants have standard deviations below 10nm, but some (roughly 10%) have larger standard deviations.

(a) Develop a model for $C$, the color value (measured as wavelengths, in nm) of a randomly chosen tulip in the field, using at least two different distributions, formulated as in the last question. Make sure all parameters are specified, and explain your choice of distributions.

(b) Simulate 100 tulip colors, and verify that the resulting values agree with the verbal description above. You do not have to explicitly simulate the separate plants (i.e., you may assume that each flower comes from a separate plant).

3. Your stochastic day¶

Give examples from your day of quantities that might be reasonably modeled as random draws from the following distributions:

(a) Binomial (b) Normal (c) Poisson (d) Exponential

In each case, give example parameter values (i.e., for (a), say what $n$ and $p$ are in your example).

Example: Every day I tie my shoes twice. Each time, there is a probability of about 5% that I'll need to untie them to adjust. The number of times I don't tie a shoe correctly on the first time in a day is Binomial($n=4$, $p=0.05$). (It is $n=4$ for two shoes, two times.)

Example: My eight-year-old-kid sometimes takes a loooong time to put on her shoes (indeed, seemingly unboundedly long), but is more often quick. The time she takes to put on her shoes in the morning is perhaps Exponential with a mean of 2 minutes.

4. Integrals¶

Suppose that the amount of time it takes me to write a page of text is $R$ minutes, and I will make $X$ errors during that time. Suppose that $R$ is Gamma distributed with shape $\alpha=5$ and scale $\theta=4$. Also, I make more errors the longer I type: the number of errors is Poisson, with mean $R/5$. In symbols: $$\begin{aligned} \text{total time: } R &\sim \text{Gamma}(\text{scale}=4, \text{shape}=5) \\ \text{number of errors: } X &\sim \text{Poisson}(\text{mean}=R/5) . \end{aligned}$$

(a) Suppose that for a given page I took $r$ minutes (i.e., I had $R=r$). What is the probability that I made no errors? Write down the expression, and evaluate it for $r=20$.

(b) Now, what is the probability I made no errors on a randomly chosen page? To do this, write down the integral which averages your expression for (a) over possible values of $R$.

(c) Now use software (for instance, np.trapezoid(), or some symbolic algebra software that can "do" integrals) to find a value for the integral. Note: you could use scipy.stats.gamma.pdf for the probability density function of the Gamma distribution.

(d) Check your answer to (b) by simulating at least 10000 draws from the distribution.