5

I am looking for an elementary proof of the fact that expected time for finding a colision with $n$ bins is $\sqrt{\frac{\pi n}{2}} + O(1)$. The proof that I knows relies on the asymptotic expansion of the Ramanujan $Q$-function. Additional information on the parallel algorithm (discussed below) is welcome too.

Background information below

Suppose we sample uniformly random elements from a set of cardinality $n$, and save them in a table. We continue doing this process (each sampling is one step) until we get a collision. What is the expected number of steps until we find the first collision?
This is a common problem, also known as the birthday paradox, since the solution is $O(\sqrt{n})$ which is rather unintuitive.

Suppose we sample $k$ times. Then, the probability of having no collisions after $k$ steps is $$ \prod_{i=0}^{k-1}\left(\frac{n-i}{n}\right) = \prod_{i=0}^{k-1}\left(1-\frac{i}{n}\right) \leq \prod_{i=0}^{k-1}e^{\frac{-i}{n}} = e^{-\sum_{i=0}^{k-1}\frac{i}{n}}=e^{\frac{-k(k-1)}{2n}}\approx e^{-\frac{k^2}{n}}, $$ and so the probability of a collision with $k$ steps is $$ 1 - \prod_{i=0}^{k-1}\left(\frac{n-i}{n}\right) \geq 1 - e^{-\frac{k^2}{n}} $$ which is $O(1)$ when $k=\Theta(\sqrt{n})$. The probability of no collision after $k$ steps can also be written as $\frac{n!}{(n-(k-1))! n^k}$.

Now consider the following question

What is the expected number of steps until the first collision?

I haven't seen an easy solution to this problem. This question and the corresponding Wikipedia page treat it, without a proof.
Let $X$ is the random variable "index of step of first collision". Then $$ \mathbb{E}[X] = \sum_{k=1}^{\infty} \mathbb{P}[X \geq k] = 1 + \sum_{k=1}^n \frac{n!}{n^k (n-k)!}, $$ which is easy to prove from the above formula.
Now comes the non-trivial part. The function $$ Q(n) = \sum_{k=1}^n \frac{n!}{n^k (n-k)!} $$ is known as the Ramanujan $Q$-function, and has the asymptotic expansion (in $\sqrt{n}$) $$ Q(n) = \sqrt{\frac{\pi n}{2}} - \frac{1}{3} + \frac{1}{12}\sqrt{\frac{\pi}{2n}} + O\left(\frac{1}{n}\right), $$ and therefore the expected number of steps until first collision is $\sqrt{\frac{\pi n}{2}} + O(1)$.

  • Is there a simpler proof that the expected number of steps is $\Theta(\sqrt{n})$?
  • What is the variance of $X$?

Now suppose we parallelise, i.e., we run the same algorithm on $m$ different machines which do not communicate with each other.

  • What is the expected run time until we find the first collision?

The interesting thing here is that parallelisation with $m$ machines only gives a $\sqrt{m}$ improvement in the run-time. One can do a similar argument to show that after $k$ steps the probability of having no collisions is $$ \prod_{i=0}^{k-1}\left(\frac{n-i}{n}\right)^m = \cdots \approx e^{-\frac{k^2m}{2n}}, $$ so we will have an $O(1)$ probability of collision when $k^2m \approx n$. Since the run-time is $k$ we have $k \geq \sqrt{\frac{n}{m}}$, so increasing $m$ only gives a square-root improvement in the run-time $k$.
However, I read here that the expected number of steps until first collision is $\sqrt{\frac{\pi n}{2m}} + O(1)$, which is a statement that I can't prove, so my final question is

  • How do I prove that for the parallel algorithm the expected number of steps until first collision is $\sqrt{\frac{\pi n}{2m}} + O(1)$?

But I'd also like to ask

  • Is there a simple proof that this expected number of steps is $\Theta(\sqrt{\frac{n}{m}})$?
  • What is the variance of the number of steps ?

I feel like the parallelisation questions should be easily provable if one knowns the variance of $X$.

Kolja
  • 3,033
  • 1
    "I haven't seen an easy solution to this problem." I think the MSE question you linked to has answers that give two simple proofs. Granted, that's only the prelude of your question and not something you were actually asking. – David K Dec 08 '22 at 03:52
  • @DavidK Actually it was me that wrote the simple answer to the linked MSE question. However the expansion of $Q(n)$ relies on some more advanced methods, and I am not aware of a simpler proof of that expansion. For $\Theta({\sqrt{n}})$ I feel like that should be much easier, but I couldn't show it. – Kolja Dec 08 '22 at 09:09
  • A proof can be found here. – Kolja Jan 31 '24 at 13:33

1 Answers1

1

The probability $p_{n,k}$ that the first collision (among $n$ bins) happens at time $k$ is given by $\frac{k}{n}\prod_{j=0}^{k-1}\left(1-\frac{j}{n}\right)$. So then we can easily estimate $$ e^{-\frac{(k-1)^2}{2n}}\geq e^{-\frac{k(k-1)}{2n}}=\prod_{j=0}^{k-1}e^{\frac{-j}{n}}\geq \frac{n}{k}p_{n,k}\geq \prod_{j=0}^{k-1}\left(e^{\frac{-j}{n}}-\frac{j^2}{2n^2}\right)\geq \max\left(\left(1-\frac{k^2e^{\frac{k}{n}}}{2n^2}\right),0\right)\prod_{j=0}^{k-1}e^{\frac{-j}{n}}=\max\left(\left(1-\frac{k^2e^{\frac{k}{n}}}{2n^2}\right),0\right)e^{-\frac{k(k-1)}{2n}}\geq \max\left(\left(1-\frac{k^2e}{2n^2}\right),0\right)e^{-\frac{k^2}{2n}}\geq \chi_{[0,n^{3/4}]}(k)\left(1-n^{-1/2}e/2\right)e^{-\frac{k^2}{2n}}.$$ $k \mapsto f_n(k):=\frac{k^2}{n} e^{-k^2/2n}$ is increasing on the interval $(0,(2n)^{1/2})$, decreasing on the interval $((2n)^{1/2},+\infty)$ and $f((2n)^{1/2})=2/e$, so for the expected time we have on the one hand $$\mathbb{E}_n[k]-2\overset{\text{Prob}_n[k\geq 1]=1}{\leq} \mathbb{E}_n[(k-1)^2/k]= \sum_{k=0}^{n-1}\frac{(k-1)^2}{k}p_{n,k}\leq \sum_{k=0}^{n-1}f_n(k-1)\\ \leq \sum_{k=0}^{\lfloor (2n)^{1/2}\rfloor-1}f_n(k-1)+4/e + \sum_{k=\lceil (2n)^{1/2}\rceil+1}^{n-1}f_n(k-1) \\ \leq 4/e +\int_{0}^{n-1}dk\,f_n(k-1) \leq 4/e +(\pi n/2)^{1/2}$$ On the other hand we have $$\left(1-n^{-1/2}e/2\right)^{-1}\mathbb{E}_n[k]=\left(1-n^{-1/2}e/2\right)^{-1}\sum_{k=0}^{n-1} kp_{n,k} \geq \sum_{k=0}^{\lfloor n^{3/4} \rfloor} f_n(k)\geq \sum_{k=1}^{\lfloor (2n)^{1/2}\rfloor}f_n(k) + \sum_{k=\lceil (2n)^{1/2}\rceil}^{\lfloor n^{3/4} \rfloor}f_n(k)\\ \geq \int_{k=0}^{\lfloor (2n)^{1/2}\rfloor}dk\,f_n(k) + \int_{k=\lceil (2n)^{1/2}\rceil}^{\lfloor n^{3/4} \rfloor+1}dk\,f_n(k)\geq (\pi n/2)^{1/2} -2/e - \int_{n^{3/4}}^\infty dk\, f_n(k).$$ From here, it's easy to finish the proof that $\lim_{n \to \infty}\mathbb{E}_n[k]n^{-1/2} = (\pi/2)^{1/2}$.