Homework 1: simulation, and probability

Instructions: Please answer the following questions and submit your work by editing this jupyter notebook and submitting it on Canvas. Questions may involve math, programming, or neither, but you should make sure to explain your work: i.e., you should usually have a cell with at least a few sentences explaining what you are doing.

Note: Thes writing is at least as important as anything else in the assignment - please be clear and explain yourself in your own words - do this part on your own, even if you've collaborated with others on the code, etc.

Pay especially close attention to how long it takes, and how much work, to do this assignment! You need enough fluency in the python language that it's a tool for you, not an obstacle.

In particular: this homework has a problem 0. To be blunt, Problem 0 it is not supposed to be difficult; it is there to let you decide, for yourself, whether you have the programming skills to continue in the class. If you have any trouble with it, then please come and speak to me as soon as possible; Math 345 may not be the best way for you to learn statistics this term.

0. Goat, please

I have two chickens and one goat, and am going to give one to you, but you have to pick randomly. We have three barn stalls that you can't see in, and I put one animal in each stall, in a randomly chosen order. Let us suppose that you would like to get a chicken (the goat won't fit in your apartment). You then pick a stall. Then, I open one of the other stalls that has a chicken in it (I can always do this), and remove that chicken. Now, you have the choice of either taking what's in the stall you originally picked, or taking what's in the other (as yet unopened) stall.

(a) First, decide whether you'd like to switch stalls in the last step or not. Then, write python code to simulate from this procedure. The code should explicitly represent what happens (e.g., which animal is in which stall), and produce either "chicken" or "goat", corresponding to which animal you get in the end.

(b) Use your code to simulate at least 10,000 times. Report how often you get a chicken.

Here's one way to choose random numbers:

In [5]:
import numpy as np
rng = np.random  #or, you may need to say: rng = np.random.default_rng()
u = rng.uniform(size=1)
k = rng.choice([0, 1, 2])
print(f"Here is a random number between 0 and 1: {u} and a random integer in (0, 1, 2): {k}")
Here is a random number between 0 and 1: [0.56675405] and a random integer in (0, 1, 2): 2

1. The geometric distribution

Wikipedia tells us that the Geometric distribution is "the number $X$ of Bernoulli trials needed to get one success", and so if $X$ has the Geometric distribution with parameter $p$, then $$ \mathbb{P}\{ X = k \} = (1 - p)^{k-1} p, $$ for $k \in \{1, 2, 3, \ldots\}$. (There's another nearly-the-same definition, but this is the version numpy.random provides.) A "Bernoulli trial" with probability $p$ is just something that is 1 with probability $p$ and 0 with probability $1-p$.

Check this:

(a) write a function that takes $p$ as an argument, simulates Bernoulli trials until the first success, and returns the number of trials;

(b) use this function to simulate many (at least 1,000) draws from the Geometric(1/3); and

(c) compare the result to the same number of draws from the numpy.random implementation by making a table of the number draws that take the value $k$ for $0 \le k \le 20$. (If they are not similar, go back and fix your function. To get an idea of what is "similar", re-run the code and see how much they change.)

You may want to use these methods:

In [1]:
import numpy as np
rng = np.random.default_rng()
# rng.uniform() < p   # this is True with probability p
# x = rng.geometric( ... )  # numpy.random's implementation

2. Ascending sums

For each $i \ge 1$, Let $D_i$ be a random number drawn independently and uniformly from $\{1, 2, 3, 4, 5, 6\}$. Let $$ K = \min\{ k \ge 1 \;:\; D_{k+1} < D_k \} , $$ i.e., $K$ is defined by the fact that $D_{K+1}$ is the first number that is smaller than the one before it. Finally, let $$ X = \sum_{i=1}^K D_i . $$

a. Describe in words what the above sum means, and explain how to simulate $X$ using fair dice.

b. Write a function to simulate $X$ (in python). The function should have one argument, size, that determines the number of independent samples of $X$ that are returned.

c. Make a plot describing the distribution of $X$, and estimate its mean (by simulating at least $10^5$ values).

In [ ]: