0

Problem

Given a set of unknown size, X elements are sampled with replacement. Of these sampled elements, M elements are unique, meaning (X - M) elements were chosen more than once. Knowing this, what is the probability that the set has size N?

Related question on SE

If a set has n elements and x elements are selected with replacement, what is the probability that m unique elements are selected? answer

1 Answers1

0

We require the probability $\operatorname{Pr}\left(N = n\middle| M = m\right)$ after observing $m$ unique elements sampled from a sample size of $X$. By Bayes' theorem (where $X$ has been suppressed in the notation): $$ \operatorname{Pr}\left(N = n\middle| M = m\right) = \dfrac{\operatorname{Pr}\left( M = m\middle|N = n\right)\operatorname{Pr}\left(N = n\right)}{\operatorname{Pr}\left( M = m\right)}$$ From this it appears that the problem is ill-defined without knowledge of the prior distribution $\operatorname{Pr}\left(N = n\right)$. However if we proceed with the mindset that '$N$ is equally likely to be anything' (in the positive integers), then we could choose the improper prior that $\operatorname{Pr}\left(N = n\right) \propto 1$. This allows us to take $$ \operatorname{Pr}\left(N = n\middle| M = m\right) \propto \operatorname{Pr}\left( M = m\middle|N = n\right) $$ where the right-hand side is the same probability as from the related question. Hence this expresses that the desired probability is proportional to the likelihood, and we can obtain a valid figure for the probability through normalisation: $$ \operatorname{Pr}\left(N = n\middle| M = m\right) = \dfrac{\operatorname{Pr}\left( M = m\middle|N = n\right)}{\sum_{i = m}^{\infty}\operatorname{Pr}\left( M = m\middle|N = i\right)} $$

rzch
  • 346