0

Consider a length-$m$ array of unique elements $A=\{1,2,\dots, m\}$. Independently poll $n$ samples (with replacement, so duplicate elements are possible) and place them into array $B$. Then, you de-duplicate $B$ to produce array $C$ with $k$ unique elements (of course, $k\le n$).

Question: what is the probability $P(k\ge i)$ for each $i$?

1 Answers1

2

The probability $\mathsf P(k=i)$ is the probability that $k$ different elements appeared. This is $\left\{n\atop k\right\}\frac{m!}{(m-k)!}m^{-n}$, where $\left\{n\atop k\right\}$ is a Stirling number of the second kind that counts the ways to partition $n$ samples into $k$ groups, $\frac{m!}{(m-k)!}$ counts the ways to assign $k$ out of $m$ elements to these groups, and $m^n$ is the total number of outcomes. Then $\mathsf P(k\ge i)=\sum_{j\ge i}\mathsf P(k=j)$.

joriki
  • 242,601
  • 1
    Thank you much, it's gonna take me a hot minute to actually grock what this anwser means and how to best apply it 'cause I'm out of practice, maths-brainwise and stirling numbers aren't something I ever learnt to begin wit.

    But it definitely reads as a correct answer. So, thank you.

    – AnDrew the Awesome Mar 03 '24 at 16:26
  • This is a very interesting problem, and I would also be very interested to understand how the partitioning of n samples into k groups helps us in this case. I understand that $m^n$ is the number of permutations for B (with repetition) and that $\frac{m!}{(m-k)!}$ is the number of permutations for C (without repetition). Also, it is clear that there are multiple B vectors that can result in a C vector (eg. AAABC, ABBCC etc. will result in ABC). But I am failing to see how the partitioning method helps us calculate these different ways. – tps Mar 03 '24 at 17:13
  • Furthermore: As an example, the B vector ABACA can result (after de-duplication) in C vector ABC. But we can also consider BAC or BCA. Are these C vectors distinct? (ie. should we use permutation or combination in order to calculate C?). And if they are distinct, which is is the correct one in the above example? – tps Mar 03 '24 at 17:20
  • @tps: If $k$ distinct elements are chosen, then grouping together all slots in the array $B$ that contain the same element partitions these $n$ slots into $k$ groups. This can be done in $\left{n\atop k\right}$ ways. There are then $m$ ways to choose a value for the first group, $m-1$ for the second, etc., and thus $\frac{m!}{(m-k)!}$ ways to choose elements for all $k$ groups. Each of these $\left{n\atop k\right}\frac{m!}{(m-k)!}$ choices yields one of the ways in which the slots can be filled such that there are $k$ distinct elements. – joriki Mar 03 '24 at 17:45
  • 1
    @joriki Thanks a lot! It makes sense now, and I learned something today! – tps Mar 03 '24 at 19:07