12

Given the set of numbers from 1 to n: { 1, 2, 3 .. n } We draw n numbers randomly (with uniform distribution) from this set (with replacement). What is the expected number of distinct values that we would draw?

My Approach:

Let $X(k)$ denote the expected number of distinct values in a sample of size $k$. Then,

$X(k) = \frac{n - X(k-1)}{n}*(1 + X(k-1)) + \frac{X(k-1)}{n}*X(k-1)$

$X(k) = 1 + \frac{n-1}{n}*X(k-1)$

Since $X(1) = 1$, solving the recursive relation, we get

$X(k) = 1 + (\frac{n-1}{n}) + (\frac{n-1}{n})^2 + (\frac{n-1}{n})^3 + ... + (\frac{n-1}{n})^{k-1}$

$X(k) = \frac{1-(\frac{n-1}{n})^k}{\frac{1}{n}} = n*(1-(1-\frac{1}{n})^k)$

Hence,

$X(n) = n*(1-(1-\frac{1}{n})^n)$

The answer is correct, but I doubt if my approach is correct or not. The idea behind the first equation is: after $k-1$th sample, the probability of getting a new value in $k$th sample is $ \frac{n - X(k-1)}{n} $. Since $X(k-1)$ is not necessarily an integer, I doubt if the probability is correct or not. So my question is: is my approach correct or not? please provide some convincing explanation as to why or why not is it correct.

  • 1
    You approach is correct due to the linearity of the recurrence. Essentially, the $\frac{n - X(k-1)}{n}$ you wrote is $\frac{n - \mathbb{E}(X(k-1))}{n}$, which can be expanded to $\sum_{x} P(X(k-1)=x) \frac{n - x}{n}$ without any problem since the form is linear. – Vezen BU Jul 13 '22 at 05:39
  • To make the proof more rigorous, you may want to define $X(k)$ as the NUMBER of distinct values in a sample of size $k$ and explicitly use $\mathbb{E}(X(k))$ as the EXPECTED number. – Vezen BU Jul 13 '22 at 05:42
  • 1
    There is a more elegant route: linearity of expectation combined with symmetry. Write $Y=Y_1+\cdots +Y_1$ where $Y_i=1$ if number $i$ is drawn and $Y_i=0$ otherwise. Take expectation on both sides. – drhab Jul 13 '22 at 06:03
  • @drhab I know but I was wondering if what I tried is correct or not – dumbguywithmathsmajor Jul 13 '22 at 06:11
  • @VezenBU Thanks! I understand your idea. If you can write an answer using the same idea, I'll accept it – dumbguywithmathsmajor Jul 13 '22 at 06:27
  • Oh, sorry. I was mislead by your alias of course. You are certainly not "dumb" . – drhab Jul 13 '22 at 06:27
  • @drhab no problem, dumb is a relative word – dumbguywithmathsmajor Jul 13 '22 at 06:29
  • Someone should mention that this is an example of the "coupon collector's problem", for which there are plenty of resources online. – Greg Martin Jul 13 '22 at 19:43
  • @GregMartin coupon collector problem is a bit different, there we ask the expected number of coupons that need to be drawn in order to have at least one coupon of each type – dumbguywithmathsmajor Jul 13 '22 at 19:48
  • That is the main but not the only variant considered, and the analysis for that variant is the same as for this variant. – Greg Martin Jul 13 '22 at 22:25

5 Answers5

8

The easier route is through using linearity of expectation.

Modifying the notation slightly, let $\{1, \dots, m\}$ denote the set composed of $m$ numbers and $n$ denote the number of draws we take from the set.

Let $Y_i$ be an indicator variable that is equal to $1$ if number $i \in \{1, \dots, m\}$ is not collected in $n$ draws, and $0$ otherwise.

Then $P(Y_i) = {(1- \frac{1}{m})}^n$

Now the expectation of an indicator variable is just the probability of the event it indicates, and by linearity of expectation, which operates even when the variables are not independent,

$\Bbb E[Y] = \Sigma \Bbb E[Y_i] = m(1-\frac{1}{m})^n$

and $\Bbb E[X]=$ expected number of distinct coupons collected in $n$ draws
= $m - \Bbb E[Y] = m[1- (1-\frac{1}{m})^n]$

Quoding
  • 103
  • please read my question again. – dumbguywithmathsmajor Jul 13 '22 at 06:13
  • Oh... I see that Vezen BU has already addressed your concern. I am just leaving my answer as a simpler approach. – true blue anil Jul 13 '22 at 06:31
  • 1
    Dear tba (I so much respect you, and not only for your age). It appeared that the question was posted by a smartguywithmathsmajor. See his comment on my comment. – drhab Jul 13 '22 at 06:34
  • @drhab "So my question is: is my approach correct or not? please provide some convincing explanation as to why or why not is it correct." – dumbguywithmathsmajor Jul 13 '22 at 06:38
  • 1
    @drhab: Thanks, I respect you equally, one among half a dozen or so. What to do ? I am the guy without math major getting dumber by the day ! – true blue anil Jul 13 '22 at 06:47
  • @dumbguywithmathsmajor Haven't looked at it yet. Only checked the answer which was okay. Also I noticed that you are sloppy in discerning random variables and their expectation. Maybe I will have a closer look later (no promises). Maybe true blue anil will edit his answer. – drhab Jul 13 '22 at 06:50
  • @drhab its generally better to read the full question before commenting/answering. – dumbguywithmathsmajor Jul 13 '22 at 06:56
  • @dumbguywithmathsmajor I also noticed that the more natural route was not mentioned in your answer. IMV that's reason enough on its own to mention it. Someone not familiar with it yet would have been helped. – drhab Jul 13 '22 at 07:03
  • @drhab thanks for mentioning it!! – dumbguywithmathsmajor Jul 13 '22 at 07:06
5

In your answer $X(k)$ is defined to be the expectation of the number of distinct results if $k$ are drawn. That is wrong.

Let $X_k$ denote the number of distinct results if $k$ are drawn.

Then:$$\mathbb E[X_k|X_{k-1}=r]=\frac{r}{n}\cdot r+\left(1-\frac{r}{n}\right)\cdot(r+1)=\left(1-\frac1n\right)r+1$$ From this we conclude that:$$\mathbb E[X_k|X_{k-1}]=\left(1-\frac1n\right)X_{k-1}+1$$and consequently:$$\mathbb EX_k=\mathbb E[\mathbb E[X_k|X_{k-1}]]=\left(1-\frac1n\right)\mathbb EX_{k-1}+1\tag1$$

This can be further exploited as you do in your question, resulting in:$$\mathbb EX_k=n\left(1-\left(1-\frac1n\right)^k\right)$$ Substitution $k=n$ gives your final result.


Of course it is much more elegant to make use of linearity of expectation and symmetry (provided in the answer of @true blue anil). I am aware of it (now) that you already know that but IMV the method simply deserves to be mentioned in this context. Also for the benefit of persons who read this and are not familiar with it yet.

drhab
  • 153,781
  • thanks! now I have even more clarity. I had skipped conditional expectation, guess I'll have to have a look at it. Suppose I define X(k) as you have done, then is my recurrence relation still true? – dumbguywithmathsmajor Jul 13 '22 at 07:55
  • Your recurrence relation holds (for expectations) and agrees with statement $(1)$ in my answer. – drhab Jul 13 '22 at 08:12
3

The idea is correct although it would be better if you can state the proof in a more rigorous way. In short, the reason why the current a-bit-not-rigorous proof works essentially is due to the linearity of the recurrence. More specifically, the $\frac{n - X(k-1)}{n}$ you wrote can be better written as $\frac{n - \mathbb{E}(X(k-1))}{n}$, which can be expanded to $\sum_x \mathbb{P}(X(k−1)=x) \frac{n - x}{n}$ without any problem since the form is linear.


Revised Proof

Let $X(k)$ be the random variable indicating the number of distinct values in a sample of size k.

Then we have the following recurrence: $$ X(k) = \frac{n - X(k-1)}{n}*(1 + X(k-1)) + \frac{X(k-1)}{n}*X(k-1) = 1 + \frac{n-1}{n}*X(k-1), $$ which gives \begin{align} \mathbb{E}(X(k)) & = \sum_x \mathbb{P}(X(k) = x) x \\ &= \sum_{x'} \mathbb{P}(X(k-1) = x') (1 + \frac{n-1}{n}*x') \\ &= 1 + \frac{n-1}{n}*\mathbb{E}(X(k-1)). \end{align} The above equations with and without expectation have the same form due to the linearity of the recurrence.

Now, starting from $\mathbb{E}(X(1)) = X(1)=1$, we solve the above recursive relation and get \begin{align} \mathbb{E}(X(k)) &= 1 + (\frac{n-1}{n}) + (\frac{n-1}{n})^2 + (\frac{n-1}{n})^3 + ... + (\frac{n-1}{n})^{k-1} \\ &= \frac{1-(\frac{n-1}{n})^k}{\frac{1}{n}} = n*(1-(1-\frac{1}{n})^k) \end{align}

Hence, $$ \mathbb{E}(X(n)) = n*(1-(1-\frac{1}{n})^n). $$

Vezen BU
  • 2,320
  • Thanks!! Can't we just take the expectation on both sides of the recursive relation to obtain the relation between expectations? – dumbguywithmathsmajor Jul 13 '22 at 07:28
  • @dumbguywithmathsmajor You are welcome! Do you mean $\mathbb{E}(X(k)) = \mathbb{E}(1 + \frac{n-1}{n}X(k-1)) = 1 + \frac{n-1}{n}\mathbb{E}(X(k-1))$? Sure! I just wanted to add more details. – Vezen BU Jul 13 '22 at 07:35
  • Yeah, that's what I meant. Also, in your recurrence relation isn't the RHS just the expectation of X(k) given X(k-1)? – dumbguywithmathsmajor Jul 13 '22 at 07:51
  • The recurrence works for expectations. Not for random variables. Notice that $X(k)$ is not determined by $X(k-1)$. – drhab Jul 13 '22 at 08:25
0

As @VezenBU suggested.

Let X(k) denote the number of distinct values in a sample of size k.

$\mathbb{E}[X(k)] = \sum_x P(X(k-1) = x) \{(\frac{n-x}{n})(1+x) + \frac{x}{n}x\} $

$\mathbb{E}[X(k)] = \sum_x P(X(k-1) = x) \{(1-\frac{x}{n})(1+x) + \frac{x^2}{n}\} $

$\mathbb{E}[X(k)] = \sum_x P(X(k-1) = x) \{1 - \frac{x}{n} + x\} $

$\mathbb{E}[X(k)] = \sum_x \{P(X(k-1) = x) - \frac{x}{n} * P(X(k-1) = x) + x *P(X(k-1) = x) \} $

$\mathbb{E}[X(k)] = 1 - \frac{\mathbb{E}[X(k-1)]}{n} + \mathbb{E}[X(k-1)]$

$\mathbb{E}[X(k)] = 1 + (\frac{n-1}{n})*\mathbb{E}[X(k-1)]$

$\mathbb{E}[X(k)] = n*(1-(1-\frac{1}{n})^n)$

0

This can also be done using Stirling numbers. We have from first principles for the expectation that it is

$$\frac{1}{n^n} \sum_{k=1}^n k {n\choose k} k! {n\brace k} \\ = \frac{1}{n^n} n! [z^n] \sum_{k=1}^n k {n\choose k} (\exp(z)-1)^k \\ = \frac{n}{n^n} n! [z^n] \sum_{k=1}^n {n-1\choose k-1} (\exp(z)-1)^k \\ = \frac{n}{n^n} n! [z^n] (\exp(z)-1) \sum_{k=0}^{n-1} {n-1\choose k} (\exp(z)-1)^k \\ = \frac{n}{n^n} n! [z^n] (\exp(z)-1) \exp((n-1)z) \\ = \frac{n}{n^n} n! [z^n] (\exp(nz)-\exp((n-1)z)) \\ = \frac{n}{n^n} (n^n - (n-1)^n) \\ = n \left(1-\left(1-\frac{1}{n}\right)^n\right).$$

Marko Riedel
  • 64,728