Coupon Collector's Problem

Problem statement: There are 5 possible coupons. Everyday, you are given a random coupon. How many days, on average, would you need to buy to collect all 5 coupons?

Solving the problem

Let XtotalX_{total} be the number of days to collect all 5 coupons.

Then, Xtotal=X1+X2+X3+X4+X5X_{total} = X_1 + X_2 + X_3 + X_4 + X_5 where X1X_1 is the number of days to collect the first coupon, X2X_2 is the number of days to collect the second coupon, etc. Less formally:

First coupon is trivial. X1X_1 is always one day because it takes one day to get our first unique coupon (can be any).

Second coupon is less obvious. How do we find the number of days for the second coupon? Note, that this coupon can be any coupon as long as it is not the card we got as our first coupon.

So the chance that we get a coupon that isn't the first coupon is 4/54/5 since there are 4 possible uncollected coupons out of a choice of 5.

However, there is that chance we keep getting the coupon we've already collected, again and again....then we'd never get all 5 coupons?

The days for the second coupon (X2X_2) follows a geometric distribution. This takes into account the probability of things happening. It is random, but random in the sense it follows these probabilities that our problem establishes. This would consider the small likelihood that we keep getting the coupon we've already collected.

Formally, X2Geom(4/5)X_2 \sim Geom(4/5). The days for our second coupon are distributed among probabilities.

What does geometric distribution mean in our example?

The chance that it takes one day to collect our next coupon is 45\frac{4}{5}; two days is 1545\frac{1}{5}\cdot\frac{4}{5}; three days is 151545\frac{1}{5} \cdot \frac{1}{5} \cdot \frac{4}{5} ; four days is 15151545\frac{1}{5} \cdot \frac{1}{5} \cdot \frac{1}{5} \cdot \frac{4}{5} and so on...

This makes sense because you have to get the coupon you've already collected for the first few days before getting a different coupon.

Geometric distributions have the special property of XGeom(p)E[X]=1/pX \sim Geom(p) \Rightarrow E[X] = 1/p. For our example E[X2]=5/4E[X_2] = 5/4. This means that on average, it'll take 5/45/4 days for us to collect the second coupon. You can look at the proof for this expected value for geometric distributions in the references. More formally:

X2Geom(4/5)E[X2]=1(4/5)=5/4 \begin{aligned} X_2 &\sim Geom(4/5) \\ E[X_2] &= \frac{1}{(4/5)} = 5/4 \\ \end{aligned}

Third coupon is exactly the same reasoning as the second coupon. It is also a geometric distribution. Except, there is a 35\frac{3}{5} chance that it is a new coupon and a 25\frac{2}{5} chance that it is a coupon we've already collected. Hence, X3Geom(3/5)X_3 \sim Geom(3/5)

Exactly like the second coupon above.

X3Geom(3/5)E[X3]=1/pE[X3]=1(3/5)=5/3 \begin{aligned} X_3 &\sim Geom(3/5) \\ E[X_3] &= 1/p \\ E[X_3] &= \frac{1}{(3/5)} = 5/3 \\ \end{aligned}

We can extrapolate:

Xtotal=X1+X2+X3+X4+X5Our established definition aboveXtotal=1+X2+X3+X4+X5based on our conclusion about the first couponE[Xtotal]=1+E[X2]+E[X3]+E[X4]+E[X5]taking expectation of both sidesE[Xtotal]=1+54+53+52+51Expectation of geometric distributions defined aboveE[Xtotal]=5(1+12+13+14+15)11.41 daysSimplifying \begin{aligned} X_{total} &= X_1 + X_2 + X_3 + X_4 + X_5 && \text{Our established definition above} \\ X_{total} &= 1 + X_2 + X_3 + X_4 + X_5 && \text{based on our conclusion about the first coupon} \\ E[X_{total}] &= 1 + E[X_2] + E[X_3] + E[X_4] + E[X_5] && \text{taking expectation of both sides} \\ E[X_{total}] &= 1 + \frac{5}{4} + \frac{5}{3} + \frac{5}{2} + \frac{5}{1} && \text{Expectation of geometric distributions defined above} \\ E[X_{total}] &= 5 \cdot (1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + \frac{1}{5}) \approx 11.41 \text{ days} && \text{Simplifying} \\ \end{aligned}

To generalize for NN coupons:

E[Xtotal]=N(1+12+13+14+...+1N)=Nn=1N1nE[X_{total}] = N \cdot (1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} + ... + \frac{1}{N}) = N \cdot \sum_{n=1}^N \frac{1}{n}

References

  • Fifty Challenging Problems in Probability with Solutions by Frederick Mostreller
  • CS70 Notes (https://www.eecs70.org/assets/pdf/notes/n19.pdf)