0

This question is a follow-up to a previous question I posted Probability of Seeing "X" % of Balls in "Y" Turns?

Set up:

  • Suppose we have integers 1,2,3...99, 100

  • Each integer has an equal probability of being selected Game:

  • In round=1, we pick 5 numbers randomly without replacement and then put them back

  • In round=2 we again pick 5 numbers randomly without replacement and then put them back

  • We do this until round = 100

I wrote an R program to simulate this situation:

 round   numbers_picked         cumulative_unique_numbers_seen     percent_of_new_numbers
     1  31, 79, 51, 14, 67                              5                    100
     2  42, 50, 43, 14, 25                              9                     80
     3  90, 91, 69, 99, 57                             14                    100
     4   92, 9, 93, 72, 26                             19                    100
     5    7, 42, 9, 83, 36                             22                     60
     6  78, 81, 43, 76, 15                             26                     80
     7    32, 7, 9, 41, 74                             29                     60
     8   23, 27, 60, 53, 7                             33                     80
     9  53, 27, 96, 38, 89                             36                     60
    10  34, 93, 69, 72, 76                             37                     20
    11  63, 13, 82, 97, 91                             41                     80
    12  25, 38, 21, 79, 41                             42                     20
    13  47, 90, 60, 95, 16                             45                     60
    14   94, 6, 72, 86, 97                             48                     60
    15  39, 31, 81, 50, 34                             49                     20

I am wondering if there is some probability distribution that can be used to answer the following question:

Suppose we are currently at round = n and we have seen "m" unique numbers. If we DO NOT know that there are 100 total numbers - what is the probability we will have seen 99% of all numbers by round = k? (k>n)

I have been beginning to learn about mathematical biology models and their use in these kinds of problems. For example:

Can someone please suggest if there are some mathematical biology models that can be used in this problem?

konofoso
  • 681

1 Answers1

1

Answering your first question, let us consider the following natural and simplified model.

Suppose that for each natural number $N\ge m$, $A_N$ is the hypothesis that there are $N$ numbers in total and $p_N$ is the a priori probability of $A_N$. To simplify the model, suppose that at each round we pick only one number and all choices have equal probabilities. I expect that the probability required to calculate depends on probabilities $(p_N)_{N\ge m}$ even for $k=n$.

Indeed, assuming $A_N$, for each natural $r$ let $q_{N,r}$ be the vector of dimension $N$ such that for each natural $k\le N$ the $k$th entry $q_{N,r,k}$ of the vector is equal to the probability that up to $r$ rounds we have seen exactly $k$ unique numbers. Then $q_{N,r,1}=\frac 1{N^{r-1}}$, $q_{N,1,k}=0$ for any natural $k$ with $2\le k\le N$ and for each natural $r$ we have $$q_{N,r+1,k}=q_{N,r,k}\cdot\frac kN+q_{N,r,k-1}\cdot \frac {N-k}N.$$ Let $M_N=\|m_{N,i,j}\|$ be the $N\times N$ matrix such that $m_{N,1,1}=\frac 1N$ and for each natural $k$ with $2\le k\le N$ we have $m_{N,k,k}=\frac kN$ and $m_{N,k,k-1}=\frac {N-k}N$, and all others entries of $M$ are zeroes. By the recurrence, $q_{N,r}=M_N^{r-1}q_{N,1}$.

Then given that up to $n$ rounds we have seen exactly $m$ unique numbers, Bayes' formula suggests that for each natural $N\ge m$, the a posteriori probability $p_N'$ of $A_N$ is $$\frac {p_N\cdot q_{S,n,m}}{\sum_{S=m}^\infty p_S\cdot q_{S,n,m}}.$$

Let $\bar m=\left\lfloor\frac m{0.99}\right\rfloor$. Then the probability that up to the round $n$ he have seen at least $99\%$ of all numbers should be equal to $$\sum_{S=m}^{\bar m} p'_S=\frac {\sum_{S=m}^{\bar m} p_S\cdot q_{S,n,m}}{\sum_{S=m}^\infty p_S\cdot q_{S,n,m}}.$$

This value depends on the probabilities $(p_N)_{N\ge m}$ provided there exist natural numbers $S,T>\bar m$ such that $q_{S,n,m}\ne q_{T,n,m}$. This clearly holds, for instance, when $n>1$ and $m=1$ because then for distinct natural $S,T$ we have $$q_{S,n,m}=\frac 1{S^{n-1}} \ne \frac 1{T^{n-1}} =q_{T,n,m}.$$

Alex Ravsky
  • 106,166
  • 1
    @ Alex: thank you! I am learning about recursion and having trouble here: https://math.stackexchange.com/questions/4955881/i-have-never-written-a-recursion-formula-before-can-someone-help-please can you please help me out if you have time? – konofoso Aug 08 '24 at 12:19
  • @konofoso Since you asked, there were provided two answers of the question. Do you still need my help? – Alex Ravsky Aug 08 '24 at 14:51
  • 1
    @ Alex Ravsky: thank you for your reply. You are an extremely talented and kind individual with great talents in explaining complex subjects to a layman like myself. I really appreciate your efforts and explanations as you teach me. If you have time, I would really appreciate a third answer from someone respected like yourself. Thank you so much – konofoso Aug 08 '24 at 18:43
  • do you know if Martingales are useful in this problem? https://math.stackexchange.com/questions/4953261/relationship-between-martingales-and-picking-balls-from-a-hat – konofoso Aug 09 '24 at 04:57
  • @konofoso Unfortunately, I am not acquainted with martingales. Saying the truth, I have only a basic knowledge in probability theory. – Alex Ravsky Aug 09 '24 at 06:58
  • 1
    Thank you for your reply ... I appreciate everything – konofoso Aug 09 '24 at 17:11