1

If I get a vector $x$ of length $\ell\geq n$ that contains random draws of $n$ numbers and know that all $n$ numbers appear at least once in such vector, how many numbers appear exactly once in this vector in expectation?

To be more precise: $x$ is drawn with replacement but I know that the vector has at least one copy of each of the $n$ numbers.

The question is related to the existing questions Probability distribution of the number of unique elements when picking $m$ items from $n$ with replacement

Expected number of unique items when drawing with replacement

but it is not the same. I wonder if the solution using Stirling numbers of the second kind is also useful in this case.

Marko Riedel
  • 64,728
fox
  • 521
  • 1
    One needs to specify the random model more precisely. Is $\ell$ fixed in advance or does it depend on the draws that occur? Do you keep sampling and discarding attempts that don't contain all $n$ numbers until you get one that does? – Greg Martin Jun 05 '24 at 16:38
  • You probably want to specify explicitly that there are $l$ independent draws from a uniform distribution on the set ${1,\ldots, n}$ and that $l>n.$ Then the question can be reformulated in terms of conditional distribution. – Lieven Jun 05 '24 at 16:45
  • I provided some clarifications above. I saw the suggestion of reformulating in terms of conditional distributions but I must be very thick because I didn't get it. – fox Jun 05 '24 at 16:47
  • The number is a random variable. Would you be satisfied with the mean of that variable, or do you need the full distribution before you can approve an answer? – Lieven Jun 05 '24 at 17:29
  • The mean of that variable would be more than enough – fox Jun 05 '24 at 20:08

2 Answers2

1

Using combinatorial classes as in Analytic Combinatorics by Flajolet and Sedgewick, we have the following class $\mathcal{V}$ of vectors drawn from $[n]$ where each value occurs at least once

$$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \mathcal{V} = \textsc{SEQ}_{=n}(\textsc{SET}_{\ge 1}(\mathcal{Z})).$$

This gives the EGF

$$F(z) = (\exp(z)-1)^n.$$

In particular the count of these vectors is

$$\ell! [z^\ell] (\exp(z)-1)^n = n! {\ell \brace n}.$$

Now mark singletons to get

$$\mathcal{V} = \textsc{SEQ}_{=n}(\mathcal{U}\times \textsc{SET}_{=1}(\mathcal{Z}) + \textsc{SET}_{\ge 2}(\mathcal{Z})).$$

We get the bivariate GF

$$Q(z,u) = (\exp(z)-z-1+uz)^n.$$

For the count of singletons we find the EGF

$$\left. \frac{\partial}{\partial u} Q(z,u) \right|_{u=1} = nz (\exp(z)-1)^{n-1}.$$

Extract the coefficient on $z^\ell$

$$\ell! [z^\ell] n z (\exp(z)-1)^{n-1} = \ell (\ell-1)! [z^{\ell-1}] n! \frac{1}{(n-1)!} (\exp(z)-1)^{n-1} \\ = \ell \times n! {\ell-1\brace n-1}.$$

It follows that the expectation is

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[S] = \ell {\ell-1\brace n-1} {\ell\brace n}^{-1}.}$$

Using the asymptotic

$${\ell\brace n} \underset{\ell\rightarrow\infty}{\sim} \frac{n^\ell}{n!}$$

we get with the pair of Stirling numbers

$$\ell \frac{(n-1)^{\ell-1}}{(n-1)!} \frac{n!}{n^\ell}$$

which means that

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[S] \sim \ell \left(1-\frac{1}{n}\right)^{\ell-1}.}$$

We can in fact compute the $r$th factorial moment by differentiating $Q(z,u)$ $r$ times. We get

$$\ell! [z^\ell] n(n-1)\cdots(n-(r-1)) z^r (\exp(z)-1)^{n-r} \\ = \ell(\ell-1)\cdots (\ell-(r-1)) (\ell-r)! n! [z^{\ell-r}] \frac{(\exp(z)-1)^{n-r}}{(n-r)!} \\ = r! {\ell\choose r} n! {\ell-r\brace n-r}.$$

This means that

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[S(S-1)\cdots (S-(r-1))] = r! {\ell\choose r} {\ell-r\brace n-r} {\ell\brace n}^{-1}.}$$

As a sanity check $S$ is of course at most $n$, when $r\gt n$ we thus get zero for the moment in question which agrees with the first Stirling number. When $r=n$ we get zero when $\ell\gt r$ which is $\ell\gt n$ and indeed the number of singletons is at most $n-1$ in that case which gives zero for the moment. When $r=n$ and $\ell=r$ we get $n!$ contributing vectors (only contribution is when $S=n$ so we permute the $n$ unique values) with the statistic equal to $n!$ which is indeed $n!{n\choose n}n!{n-r\brace n-r},$ giving $n!$ as the value of the moment.

Marko Riedel
  • 64,728
  • Thank you, just checked with simulations, very accurate. Would it be possible to use the same approach when the vector of length l has *at least one singleton? – fox Jun 10 '24 at 12:15
  • 1
    @fox I have added an answer where I hope I have understood your question correctly. – Marko Riedel Jun 10 '24 at 21:19
1

Additional statistics as per request. We ask about the conditional expectation of the number of singletons given that we know the vector has at least one singleton. Counting the vectors with no singletons we get the EGF

$$A(z) = (\exp(z)-z)^n.$$

Extracting coefficients we find with $\ell\ge n$

$$\ell! [z^\ell] \sum_{q=0}^n {n\choose q} (-1)^{n-q} z^{n-q} \exp(qz) = \ell! \sum_{q=0}^n {n\choose q} (-1)^{n-q} [z^{\ell+q-n}] \exp(qz) \\ = \ell! \sum_{q=0}^n {n\choose q} (-1)^{n-q} \frac{q^{\ell+q-n}}{(\ell+q-n)!}.$$

Therefore the count of the admissible vectors is

$$n^\ell - \ell! \sum_{q=0}^n {n\choose q} (-1)^{n-q} \frac{q^{\ell+q-n}}{(\ell+q-n)!} = \ell! \sum_{q=0}^{n-1} {n\choose q} (-1)^{n-1-q} \frac{q^{\ell+q-n}}{(\ell+q-n)!}.$$

The contribution from the number of singletons is found by differentiation of the marked mixed GF (this will give a zero contribution from the vectors containing no singletons)

We have

$$B(z) = (\exp(z)-z+uz)^n.$$

Differentiate and evaluate to get

$$\ell! [z^\ell] n z \exp(z(n-1)) = n \ell! [z^{\ell-1}] \exp(z(n-1)) = n \ell (n-1)^{\ell-1}.$$

We thus have for the desired expectation

$$\bbox[5px,border:2px solid #00A000]{ \mathrm{E}[S] = \frac{n (n-1)^{\ell-1}}{(\ell-1)!} \left[ \sum_{q=0}^{n-1} {n\choose q} (-1)^{n-1-q} \frac{q^{\ell+q-n}}{(\ell+q-n)!} \right]^{-1}.}$$

Using the dominant term of the sum we find for $\ell\to\infty$

$$\mathrm{E}[S] \sim \frac{n (n-1)^{\ell-1}}{(\ell-1)!} \left[n \frac{(n-1)^{\ell-1}}{(\ell-1)!}\right]^{-1}$$

This says that

$$\bbox[5px,border:2px solid #00A000]{\mathrm{E}[S] \sim 1.}$$

The formula is exact when $n=2.$

We can also evaluate the count of vectors with no singletons in terms of Stirling set numbers, we get

$$\ell! [z^\ell] \sum_{q=0}^n {n\choose q} (-1)^{n-q} (z-1)^{n-q} (\exp(z)-1)^q \\ = \ell! [z^\ell] \sum_{q=0}^n {n\choose q} q! (-1)^{n-q} \sum_{p=0}^{n-q} (-1)^{n-q-p} {n-q\choose p} z^p (\exp(z)-1)^q/q! \\ = \ell! \sum_{q=0}^n {n\choose q} q! \sum_{p=0}^{n-q} (-1)^p {n-q\choose p} \frac{1}{(\ell-p)!} {\ell-p\brace q} \\ = n! \sum_{q=0}^n \sum_{p=0}^{n-q} (-1)^p \frac{1}{(n-q-p)!} {\ell\choose p} {\ell-p\brace q}.$$

Marko Riedel
  • 64,728
  • Thanks for this! I do no t think it makes sense that the expected number of singletons is roughly one because we know there is at least one singleton, but maybe I am misunderstanding. – fox Jun 11 '24 at 13:25
  • Try to run some simulations. I believe it is true. I also computed the exact expectation for small values of the parameters with Maple and they matched. With $\ell$ large the majority of vectors have instance count of every value close to $(\ell-1)/(n-1)$, leaving just the one singleton. – Marko Riedel Jun 11 '24 at 17:47