1

The Coupon Collector problem off Wikipedia:

Suppose that there is an urn of $n$ different coupons, from which coupons are being collected, equally likely, with replacement. How many coupons do you expect you need to draw with replacement before having drawn each coupon at least once?

The standard method to solve it divides the time taken to collect all coupons - random variable $T$ - into time taken to collect each new coupon - random variables $T_i$. My question is why do we immediately assume that we get all $n$ coupons in each outcome?

This is because the random variable $T$ is not well-defined for sequences of draws where we never get everything. Thus the step $T=T_1 + T_2+ \dots +T_n$ is not well defined for such outcomes in the sample space.

I'm thinking about the possibility of defining $T(\omega) = \infty$ for outcomes $\omega$ where we never get everything, as well as defining $T_i(\omega) = \infty$ for outcomes $\omega$ that have less than $i$ distinct coupons. Is this definition legal? One problem I see is that when calculating the expectation, which is $E(X_i) = \sum_{u=1}^\infty P(X_i = u)u + P(X_i = \infty)\infty$, the right part seems to be not well-defined if $P(X_i = \infty) = 0$.

The next part is calculating $P(X_i = k)$. How do we prove, rigorously, that this is indeed $\left(\frac{n-i}{n-(i-1)}\right)^{k-1}\frac{1}{n-(i-1)}$, in accordance with the geometric distribution? By prove rigorously I mean based on defining a probability space formally (e.g. one where the sample space consists of infinite sequences of coupon draws).

Wakaka
  • 1,373

1 Answers1

2

It is conceivable that $T$ is infinity for some realizations of the collector problem. But this event has zero probability. You can see this from $$\{T>u\}\subset \bigcup_{c=1}^n\{\mbox{Coupon $c$ not seen in $u$ draws}\}$$ which implies $$P(T>u)\le \sum_{c=1}^n\left(1-\frac1n\right)^u = n\left(1-\frac1n\right)^u,\tag1$$ and therefore $P(T=\infty) \le n\left(1-\frac1n\right)^u$ for every $u$. Inequality (1) also shows that $E(T)<\infty$, using the formula $$E(T)=\sum_{u=0}^\infty P(T>u)\quad.\tag2$$

It is legal to set $T(\omega)=\infty$ on realizations $\omega$ where not every coupon is drawn. If you allow $T$ to be infinity, there is no indeterminacy in calculating the expectation of $T$ if you use the formula (2). You can also use the formula $E(T)=\sum_u uP(T=u)$ under the convention that $\infty\cdot P(T=\infty)=0$ when $P(T=\infty)=0$. Alternatively, there is no loss in restricting your analysis to the event where $T$ is finite, since you're ignoring a set of probability zero.

As for calculating $P(T_{i+1}=k)$: Let $X_1,X_2,\ldots$ be the infinite sequence of coupon draws. One way to rigorously calculate the distribution of $T_{i+1}$ is to condition on $N_i$, the number of draws needed to see $i$ distinct coupons, and ${\mathbf Y}_i:=(Y_1,\ldots,Y_i)$, the distinct coupons drawn so far, in the order seen: $$ P(T_{i+1}=k)=\sum P(T_{i+1}=k\mid N_i=u, {\mathbf Y}_i={\mathbf y}_i)P(N_i=u, {\mathbf Y}_i={\mathbf y}_i)$$ The sum is taken over all choices of $u$ and ${\mathbf y}_i$. Conditional on $N_i=u$ and ${\mathbf Y}_i={\mathbf y}_i$, the event that $T_{i+1}=k$ is the same as the event that the $k-1$ draws $X_{u+1},\ldots,X_{u+k-1}$ all take values in $(y_1,\ldots,y_i)$ and the final draw $X_{u+k}$ takes one of the other $n-i$ possible values not yet seen. By symmetry, the conditional probability $P(T_{i+1}=k\mid N_i=u, {\mathbf Y}_i={\mathbf y}_i)$ must be $(\frac{i}{n})^{k-1}(1-\frac in)$, the same value for every choice of $u$ and $\mathbf y$. The unconditional probability $P(T_{i+1}=k)$ is therefore also this value, and we conclude $T_{i+1}$ has geometric distribution with mean $1/(1-\frac in)$ = $n/(n-i)$.

grand_chat
  • 40,909
  • This is perfect! I haven't thought about bounding the probability that $T$ is infinite, so I thought it's very difficult to check that $P(\text{$T$ finite}) = 1$. – Wakaka Aug 19 '15 at 14:15