# Coupon Collector's Problem

**Problem statement:** There are 5 possible coupons.
Everyday, you are given a random coupon.
How many days, on average, would you need to buy to collect all 5 coupons?

## Solving the problem

Let $X_{total}$ be the number of days to collect all 5 coupons.

Then, $X_{total} = X_1 + X_2 + X_3 + X_4 + X_5$ where $X_1$ is the number of days to collect the first coupon, $X_2$ is the number of days to collect the second coupon, etc. Less formally:

**First coupon** is trivial. $X_1$ is always one day because it takes one day to get our first unique coupon (can be any).

**Second coupon** is less obvious.
How do we find the number of days for the second coupon?
Note, that this coupon can be *any* coupon as long as *it is not the card we got as our first coupon*.

So the chance that we get a coupon that *isn't* the first coupon is $4/5$ since there are 4 possible *uncollected* coupons out of a choice of 5.

**However, there is that chance we keep getting the coupon we've already collected, again and again....then we'd never get all 5 coupons?**

The days for the second coupon ($X_2$) follows a geometric distribution. This takes into account the probability of things happening. It is random, but random in the sense it follows these probabilities that our problem establishes. This would consider the small likelihood that we keep getting the coupon we've already collected.

Formally, $X_2 \sim Geom(4/5)$. The days for our second coupon are distributed among probabilities.

#### What does geometric distribution mean in our example?

The chance that it takes one day to collect our next coupon is $\frac{4}{5}$; two days is $\frac{1}{5}\cdot\frac{4}{5}$; three days is $\frac{1}{5} \cdot \frac{1}{5} \cdot \frac{4}{5}$ ; four days is $\frac{1}{5} \cdot \frac{1}{5} \cdot \frac{1}{5} \cdot \frac{4}{5}$ and so on...

This makes sense because you have to get the coupon you've already collected for the first few days before getting a different coupon.

Geometric distributions have the special property of $X \sim Geom(p) \Rightarrow E[X] = 1/p$. For our example $E[X_2] = 5/4$. This means that on average, it'll take $5/4$ days for us to collect the second coupon. You can look at the proof for this expected value for geometric distributions in the references. More formally:

**Third coupon** is exactly the same reasoning as the second coupon.
It is also a geometric distribution.
Except, there is a $\frac{3}{5}$ chance that it is a new coupon and a $\frac{2}{5}$ chance that it is a coupon we've already collected.
Hence, $X_3 \sim Geom(3/5)$

Exactly like the second coupon above.

### We can extrapolate:

To generalize for $N$ coupons:

$E[X_{total}] = N \cdot (1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + ... + \frac{1}{N}) = N \cdot \sum_{n=1}^N \frac{1}{n}$

## References

- Fifty Challenging Problems in Probability with Solutions by Frederick Mostreller
- CS70 Notes (https://www.eecs70.org/assets/pdf/notes/n19.pdf)