Poisson counting¶

Fall 2022: Peter Ralph

https://uodsci.github.io/dsci345

Motivation, and the Poisson¶

Main points: the Poisson distribution shows up when you are counting rare events. So do Exponentials, as waiting times.

Suppose that we're running a solar panel manufacturing plant. Each panel is made up of many modules, each of which may be defective. Usually only a few are: the average number of defects per panel is 1. A few defects are okay, but if a panel has more then 2 defects, the panel will not work. What proportion of panels will work?

What do we need to know? Well, let's suppose that a panel has $N$ modules, and each module is broken independently of the others. If the probability of a particular module being broken is $p$, then we must have $p = 1/N$. (Why?)

Poisson approximation¶

Let's call the number of defects $X$. So, $X$ is Binomial($N$, $\lambda/N$) with $\lambda=1$. Binomial probabilities say that $$\begin{aligned} \P\{ X = k \} &= \frac{N (N-1) \cdots (N-k+1)}{k!} \left(\frac{\lambda}{N}\right)^k \left(1-\frac{\lambda}{N}\right)^{N-k} . \end{aligned}$$ However, $N$ was arbitrary. For large $N$, $$\begin{aligned} N (N-1) \cdots (N-k+1) &\approx N^k, \\ \text{and} \qquad \left(1 - \frac{\lambda}{N}\right) &\approx e^{-\lambda}, \end{aligned}$$ so $$\begin{aligned} \P\{ X = k \} &\approx \frac{1}{k!} \lambda^k e^{-\lambda} , \end{aligned}$$ i.e., $X$ is approximately Poisson with mean $\lambda$.

Example: defects¶

Suppose that a few defects are okay, but if a panel has more then 2 defects, the panel will not work. What proportion of panels will work?

Solution: Let $X$ denote the number of defective modules on a randomly chosen panel. Then $X$ has a Poisson($\lambda = 1$) distribution, so $$\begin{aligned} \P\{X \le 2\} &= \P\{X = 0\} + \P\{X = 1\} + \P\{X = 2\} \\ &= e^{-\lambda} \left( 1 + \lambda + \lambda^2 / 2 \right) , \end{aligned}$$ which is here

In [2]:

lam = 1
p_good = np.exp(-lam) * (1 + lam + lam**2 / 2)
print(f"Proportion that are good: {p_good:.3f}")

Proportion that are good: 0.920

Exercise: What proportion of good panels still have some defects? Answer with math and check it by simulation.

$$ \P\{ X > 0 | X \le 2 \} = \frac{ \P\{ X > 0 \text{ and } X \le 2 \} }{ \P\{X \le 2\} } = \frac{ e^{-1} (1 + 1/2) }{ e^{-1} (1 + 1 + 1/2) } = \frac{3}{5} $$

In [7]:

num_broken = rng.poisson(1, size=100000)
good_panels = np.sum(num_broken <= 2)
no_defect_panels = np.sum(num_broken == 0)
print(f"Proportion of good panels with some defects: {(good_panels - no_defect_panels)/good_panels}")

Proportion of good panels with some defects: 0.5985120017378082

Waiting times¶

Here's a different example. Cosmic rays from outside the solar system pass through us pretty often. Let's say we have a cosmic ray detector that plays a sound whenever a strong enough cosmic ray passes through it, which turns out to be about once every 1.2 minutes. How many do we expect in 6 minutes? How long do we have to wait between sounds?

How many in 6 minutes? By the same argument as above, the number of cosmic rays we get in $t$ minutes is Poisson with mean $t / 1.2$. So, we expect about $6 / 1.2 = 5$ cosmic rays in 6 minutes, with a standard deviation of $\sqrt{5} \approx 2.23$ cosmic rays. The probability we see zero cosmic rays is $e^{-5} = 0.0067$

How long between sounds? Well, the probability there's no rays in $t$ minutes is $\exp(-t/1.2)$, i.e., if $T$ is the time we have to wait for the next ray, then $$\P\{T > t\} = e^{-t/1.2}.$$

So: $T$ is Exponentially distributed with rate 1.2 rays/second, and so $\E[T] = 1/1.2$ and $\sd[T] = 1/1.2$.

Memorylessness¶

Fact: The Exponential distribution is memoryless, i.e., if $T$ is Exponential, then $$ \P\{ T > t + s | T > t \} = \P\{ T > s \} . $$

Proof:

$$\begin{aligned} \P\{ T > t + s | T > t \} &= \frac{ \P\{ T > t + s \} }{ \P\{ T > t \} } \\ &= \frac{ e^{-\lambda(t+s)} }{ e^{-\lambda t} } \\ &= \P\{ T > s \} . \end{aligned}$$

Exercise: This implies that cosmic rays don't care how long I've already been waiting for one to arrive: the probability that $T > 2$ minutes given that $T > 1$ minute is equal to the probability that $T > 1$ minute. Demonstrate this, by simulation.

Distributions: recap¶

If something happens at rate $\lambda$ per unit time, and whether it happens in each bit of time is independent of other bits of time, then:

The number of times it happens in $t$ units of time, $N(t)$, is Poisson with mean $\lambda t$; so $\E[N(t)] = \var[N(t)] = \lambda t$ and $$ \P\{ N(t) = n \} = e^{-\lambda t} \frac{(\lambda t)^n}{n!} ,$$

The time between subsequent events, $T$, is Exponential with rate $\lambda$, so $\E[T] = \sd[T] = 1/\lambda$ and $$ \P\{ T > t \} = \exp(-\lambda t) . $$

Complications: overdispersion¶

Let's go back to the solar panels: suppose that the number of defects per panel turns out to not be Poisson, and in fact the distribution of number of defects per panel looks like this:

In [4]:

defects

Out[4]:

	value	expected	observed
0	0	3678.794412	4941
1	1	3678.794412	2484
2	2	1839.397206	1259
3	3	613.132402	675
4	4	153.283100	319
5	5	30.656620	169
6	6	5.109437	78
7	7	0.729920	35
8	8	0.091240	14
9	9	0.010138	15
10	10	0.001014	7

Overdispersion¶

What's going on? With some more investigation, we find that some panels are more error-prone than others: a better model for the number of defects per panel is that the "quality" of a panel, $R$, is drawn from an Exponential distribution, and given this quality, the number of defects is Poisson with mean $R$: $$\begin{aligned} \text{error rate: } R &\sim \text{Exponential}(1) \\ \text{number of defects: } X &\sim \text{Poisson}(R) . \end{aligned}$$

Question: What is $\E[X]$?

Well, given $R$, the mean is, well, $R$, i.e., $\E[X|R] = R$.

So, it would make sense if $\E[X] = \E[R] = 1$.

This is true; here is the "proof" from first principles: $$\begin{aligned} \E[X] &= \sum_x x \P\{X = x\} \\ &= \sum_x x \sum_r \P\{X = x, R = r\} \\ &= \sum_x x \sum_r \P\{R = r\} \P\{X = x \;|\; R = r\} \\ &= \sum_r \P\{R = r\} \sum_x x \P\{X = x \;|\; R = r\} \\ &= \sum_r \P\{R = r\} \E[X = x \;|\; R = r] \\ &= \sum_r \P\{R = r\} r \\ &= \E[R] . \end{aligned}$$

Challenge¶

Estimate the proportion of broken panels from this model by simulation.

In [8]:

n = 100000
R = rng.exponential(1, size=n)
X = rng.poisson(R, size=n)
print(f"Proportion of broken panels: {np.mean(X > 2)}")

Proportion of broken panels: 0.12579