3

A computer program generates a number between $1$ and $n$ every second ( there is a $1/n$ probability for each number ). It continues until all $n$ possible numbers were generated.

Let $X$ be the number of seconds it took for the program to generate all different numbers. Is there a name for the distribution of $X$ ?. I couldn't find it anywhere.

Also, I found that $ \displaystyle{\,\mathrm{P}\left(X = k\right) = \sum_{i = 0}^{n}\left(-1\right)^{i + 1}{n - 1 \choose i - 1} \left(1 - \frac{i}{n}\right)^{k - 1}} $. Could someone verify that ?.

Thank you!

Felix Marin
  • 94,079
35T41
  • 3,601
  • 4
    See the Coupon Collector problem: http://math.stackexchange.com/questions/1401561/coupon-collector-problem-doubts – Matthew Conroy Dec 17 '16 at 21:22
  • See this question for what can be said about this. The random variable $X$ is a sum $X=T_1+\cdots+T_n$ of independent geometric random variables with $E(T_k)=\frac{n}k$. – Did Dec 17 '16 at 21:30
  • The term in the sum for $i=0$ is $0$ since $\binom{n}{-1}=0$ for all $n$, so the sum can be started at $i=1$. – robjohn Dec 17 '16 at 23:16
  • For earlier questions which state the distribution, see http://math.stackexchange.com/questions/669685/what-is-the-probability-of-rolling-n-dice-until-each-side-appears-at-least-onc and http://math.stackexchange.com/questions/963077/cdf-of-probability-distribution-with-replacement and http://math.stackexchange.com/questions/379525/probability-distribution-in-the-coupon-collectors-problem which mention Stirling numbers of the second kind – Henry Dec 18 '16 at 09:18

1 Answers1

3

As Matthew Conroy mentions in a comment, this is related to the Coupon Collector's Problem. I am not sure if this distribution has a name, but here is a derivation of the probability given in the question.


Probability of Completion on the $\boldsymbol{m^\text{th}}$ Trial

Let $S_j$ be the set of arrangements where $j$ has not been chosen after $m$ trials. The sum of the probabilities of all intersections of $k$ of the $S_j$'s is $$ \overbrace{\ \ \ \binom{n}{k}\ \ \ }^{\substack{\text{number of ways}\\\text{to choose $k$}\\\text{particular numbers}}}\overbrace{\left(1-\frac kn\right)^m}^{\substack{\text{probability of}\\\text{$k$ particular}\\\text{numbers not}\\\text{being chosen}\\\text{after $m$ trials}}} $$ Inclusion-Exclusion says that the probability of not getting some number after $m$ trials is $$ \sum_{k=1}^n(-1)^{k-1}\binom{n}{k}\left(1-\frac kn\right)^m $$ Thus, the probability of getting the last number on the $m^\text{th}$ trial is $$ \begin{align} &\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}\left[\left(1-\frac kn\right)^{m-1}-\left(1-\frac kn\right)^m\right]\\ &=\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}\frac kn\left(1-\frac kn\right)^{m-1}\\ &=\bbox[5px,border:2px solid #C0A000]{\sum_{k=1}^n(-1)^{k-1}\binom{n-1}{k-1}\left(1-\frac kn\right)^{m-1}} \end{align} $$


Expected Duration (Using the Formula Above)

We can compute the expected duration using the formula above $$ \begin{align} &\sum_{m=1}^\infty\sum_{k=1}^n(-1)^{k-1}\binom{n-1}{k-1}m\left(1-\frac kn\right)^{m-1}\\ &=\sum_{k=1}^n(-1)^{k-1}\binom{n-1}{k-1}\frac1{\left(1-\left(1-\frac kn\right)\right)^2}\\ &=n\sum_{k=1}^n(-1)^{k-1}\binom{n}{k}\frac1k\\ &=n\sum_{k=1}^n(-1)^{k-1}\sum_{j=k}^n\binom{j-1}{k-1}\frac1k\\ &=n\sum_{j=1}^n\sum_{k=1}^j(-1)^{k-1}\binom{j}{k}\frac1j\\ &=n\sum_{j=1}^n\frac1j\\[6pt] &=\bbox[5px,border:2px solid #C0A000]{nH_n} \end{align} $$


Expected Duration (Summing Expected Durations)

If a stream of independent events occurs, each with probability of success $p$, the expected number of events until a success is $\frac1p$. The probability of picking a new number after we have picked $k$ distinct numbers is $\frac{n-k}{n}$. Therefore, the duration after we've picked $k$ distinct numbers until we pick the $k+1^\text{st}$ number is $\frac{n}{n-k}$. Thus, the expected duration until we pick all numbers is $$ \begin{align} \sum_{k=0}^{n-1}\frac{n}{n-k} &=\sum_{k=1}^n\frac nk\\ &=\bbox[5px,border:2px solid #C0A000]{nH_n} \end{align} $$

robjohn
  • 353,833
  • So my formula is true then? Thanks! – 35T41 Dec 18 '16 at 05:45
  • Indeed, it is. An exercise is to show that the probability that getting the last number on some trial is $1$; that is, show that the distribution given above is a probability distribution. – robjohn Dec 18 '16 at 06:16
  • The expected time until completion is $nH_n$ where $H_n$ is the $n^\text{th}$ Harmonic Number. This can be verified with this formula, but also by a much simpler argument. – robjohn Dec 18 '16 at 06:59