Main points: the Poisson distribution shows up when you are counting rare events. So do Exponentials, as waiting times.
Suppose that we're running a solar panel manufacturing plant. Each panel is made up of many modules, each of which may be defective. Usually only a few are: the average number of defects per panel is 1. A few defects are okay, but if a panel has more then 2 defects, the panel will not work. What proportion of panels will work?
What do we need to know? Well, let's suppose that a panel has $N$ modules, and each module is broken independently of the others. If the probability of a particular module being broken is $p$, then we must have $p = 1/N$. (Why?)
Let's call the number of defects $X$. So, $X$ is Binomial($N$, $\lambda/N$) with $\lambda=1$. Binomial probabilities say that $$\begin{aligned} \P\{ X = k \} &= \frac{N (N-1) \cdots (N-k+1)}{k!} \left(\frac{\lambda}{N}\right)^k \left(1-\frac{\lambda}{N}\right)^{N-k} . \end{aligned}$$ However, $N$ was arbitrary. For large $N$, $$\begin{aligned} N (N-1) \cdots (N-k+1) &\approx N^k, \\ \text{and} \qquad \left(1 - \frac{\lambda}{N}\right) &\approx e^{-\lambda}, \end{aligned}$$ so $$\begin{aligned} \P\{ X = k \} &\approx \frac{1}{k!} \lambda^k e^{-\lambda} , \end{aligned}$$ i.e., $X$ is approximately Poisson with mean $\lambda$.
Suppose that a few defects are okay, but if a panel has more then 2 defects, the panel will not work. What proportion of panels will work?
Solution: Let $X$ denote the number of defective modules on a randomly chosen panel. Then $X$ has a Poisson($\lambda = 1$) distribution, so $$\begin{aligned} \P\{X \le 2\} &= \P\{X = 0\} + \P\{X = 1\} + \P\{X = 2\} \\ &= e^{-\lambda} \left( 1 + \lambda + \lambda^2 / 2 \right) , \end{aligned}$$ which is here
lam = 1
p_good = np.exp(-lam) * (1 + lam + lam**2 / 2)
print(f"Proportion that are good: {p_good:.3f}")
Proportion that are good: 0.920
Exercise: What proportion of good panels still have some defects? Answer with math and check it by simulation.
num_broken = rng.poisson(1, size=100000)
good_panels = np.sum(num_broken <= 2)
no_defect_panels = np.sum(num_broken == 0)
print(f"Proportion of good panels with some defects: {(good_panels - no_defect_panels)/good_panels}")
Proportion of good panels with some defects: 0.5985120017378082
Here's a different example. Cosmic rays from outside the solar system pass through us pretty often. Let's say we have a cosmic ray detector that plays a sound whenever a strong enough cosmic ray passes through it, which turns out to be about once every 1.2 minutes. How many do we expect in 6 minutes? How long do we have to wait between sounds?
How many in 6 minutes? By the same argument as above, the number of cosmic rays we get in $t$ minutes is Poisson with mean $t / 1.2$. So, we expect about $6 / 1.2 = 5$ cosmic rays in 6 minutes, with a standard deviation of $\sqrt{5} \approx 2.23$ cosmic rays. The probability we see zero cosmic rays is $e^{-5} = 0.0067$
How long between sounds? Well, the probability there's no rays in $t$ minutes is $\exp(-t/1.2)$, i.e., if $T$ is the time we have to wait for the next ray, then $$\P\{T > t\} = e^{-t/1.2}.$$
So: $T$ is Exponentially distributed with rate 1.2 rays/second, and so $\E[T] = 1/1.2$ and $\sd[T] = 1/1.2$.
Fact: The Exponential distribution is memoryless, i.e., if $T$ is Exponential, then $$ \P\{ T > t + s | T > t \} = \P\{ T > s \} . $$
Proof:
$$\begin{aligned} \P\{ T > t + s | T > t \} &= \frac{ \P\{ T > t + s \} }{ \P\{ T > t \} } \\ &= \frac{ e^{-\lambda(t+s)} }{ e^{-\lambda t} } \\ &= \P\{ T > s \} . \end{aligned}$$Exercise: This implies that cosmic rays don't care how long I've already been waiting for one to arrive: the probability that $T > 2$ minutes given that $T > 1$ minute is equal to the probability that $T > 1$ minute. Demonstrate this, by simulation.
If something happens at rate $\lambda$ per unit time, and whether it happens in each bit of time is independent of other bits of time, then:
The number of times it happens in $t$ units of time, $N(t)$, is Poisson with mean $\lambda t$; so $\E[N(t)] = \var[N(t)] = \lambda t$ and $$ \P\{ N(t) = n \} = e^{-\lambda t} \frac{(\lambda t)^n}{n!} ,$$
The time between subsequent events, $T$, is Exponential with rate $\lambda$, so $\E[T] = \sd[T] = 1/\lambda$ and $$ \P\{ T > t \} = \exp(-\lambda t) . $$
Let's go back to the solar panels: suppose that the number of defects per panel turns out to not be Poisson, and in fact the distribution of number of defects per panel looks like this:
defects
value | expected | observed | |
---|---|---|---|
0 | 0 | 3678.794412 | 4941 |
1 | 1 | 3678.794412 | 2484 |
2 | 2 | 1839.397206 | 1259 |
3 | 3 | 613.132402 | 675 |
4 | 4 | 153.283100 | 319 |
5 | 5 | 30.656620 | 169 |
6 | 6 | 5.109437 | 78 |
7 | 7 | 0.729920 | 35 |
8 | 8 | 0.091240 | 14 |
9 | 9 | 0.010138 | 15 |
10 | 10 | 0.001014 | 7 |
What's going on? With some more investigation, we find that some panels are more error-prone than others: a better model for the number of defects per panel is that the "quality" of a panel, $R$, is drawn from an Exponential distribution, and given this quality, the number of defects is Poisson with mean $R$: $$\begin{aligned} \text{error rate: } R &\sim \text{Exponential}(1) \\ \text{number of defects: } X &\sim \text{Poisson}(R) . \end{aligned}$$
Question: What is $\E[X]$?
Well, given $R$, the mean is, well, $R$, i.e., $\E[X|R] = R$.
So, it would make sense if $\E[X] = \E[R] = 1$.
This is true; here is the "proof" from first principles: $$\begin{aligned} \E[X] &= \sum_x x \P\{X = x\} \\ &= \sum_x x \sum_r \P\{X = x, R = r\} \\ &= \sum_x x \sum_r \P\{R = r\} \P\{X = x \;|\; R = r\} \\ &= \sum_r \P\{R = r\} \sum_x x \P\{X = x \;|\; R = r\} \\ &= \sum_r \P\{R = r\} \E[X = x \;|\; R = r] \\ &= \sum_r \P\{R = r\} r \\ &= \E[R] . \end{aligned}$$
Estimate the proportion of broken panels from this model by simulation.
n = 100000
R = rng.exponential(1, size=n)
X = rng.poisson(R, size=n)
print(f"Proportion of broken panels: {np.mean(X > 2)}")
Proportion of broken panels: 0.12579