4

Theorem: Choose $Q$ random natural numbers in the set $\{1,2, \dots, M\}.$ The probability of getting at least one collision is

$$P_C(Q) = 1 - \frac{M - (Q - 1)}{M} P_{\neg C}(Q-1).$$

Notation: By $P_C$, I mean the probability of getting a collision. By $P_{\neg C}$ I mean the probability of not getting a collision.

Remark: This is the birthday problem.

Remark: So $P_C(Q)$ is just being computed by using its complement. The reason I express the theorem this way is because its induction proof relates directly to it being enunciated this way.

Theorem: $$P_C(Q) \approx 1 - e^{-\tfrac{(Q-1)Q}{2M}}.$$

Proof: We know $$e^{-x} = 1 - x + \frac{x^2}{2!} - \frac{x^3}{3!} + \frac{x^4}{4!} - \ldots$$

If we take the two terms of this expansion, we get $e^{-x} \approx 1 - x$. Then

\begin{align} P_{\neg C}(Q)&= \prod_{i=1}^{Q-1} \left(1 - \dfrac{i}{M}\right)\\ &\approx \prod_{i=1}^{Q-1} e^{-i/M} \\ &= e^{-1/M} e^{-2/M} \dotsc\ e^{-(Q - 1)/M} \\ &= e^{-\sum_{i=1}^{Q-1} i/M} \\ &= e^{-\dfrac{1}{M} (Q-1)Q/2}\\ &= e^{-\dfrac{(Q-1)Q}{2M}}, \end{align}

So $P_C(Q) \approx 1 - P_{\neg C}(Q),$ as desired.

Question: How can I (at least) have some notion of how off the right number I am if I use this estimation to compute the probability of getting a collision in a concrete case?

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
user45491
  • 409
  • 2
  • 12

2 Answers2

4

Clearly, for $x\in (0,1)$ which is our case, $$ e^{-x}-(1-x)< x^2/2, $$ and thus the relative multiplicative error between the estimate and actual answer satisfies $$ \frac{\widehat{P}_{\neg C}(Q)}{P_{\neg C}(Q)}<\prod_{i=1}^{Q} \frac{(i/M)^2}{2(1-(i/M))}\leq 2^{-Q} M^{-2Q} \frac{Q(Q+1)(2Q+1)}{6}\frac{1}{(M+1-(Q+1)/2))}, $$ by using the sum of first $Q$ squares on the numerator and the arithmetic geometric mean inequality in the denominator. So $$ \frac{Q^3/3}{ 2^Q M^{2Q}(M-Q/2)} $$ is a good approximation to the error.

kodlu
  • 25,146
  • 2
  • 30
  • 63
4

kodlu provided an approximation to the error term, but you might also be interested in firm bounds on the collision probability, which you can get without diving into the higher-order terms of the Taylor expansion of $e^{-x}$. How?

  1. You are guaranteed that $1 - x \leq e^{-x}$, for all $x$.

    Proof. Let $f(x) = e^{-x} + x - 1$; the claim is that $f(x)$ is nonnegative everywhere. $f'(x) = 1 - e^{-x}$ is zero only at $x = 0$, so the only possible extreme point is $x = 0$ where $f$ is zero; at, e.g., $x = 1$ and $x = -1$, $f(x)$ is positive, so $f$ is positive on both sides of $x = 0$ and nonnegative everywhere.

Consequently, you can set $1 - i/M \leq e^{-i/M}$ and thus $P_{\lnot C}(Q) \leq e^{-Q (Q - 1)/(2M)}$.

  1. You are also guaranteed that $e^{-2x} \leq 1 - x$, as long as $0 \leq x \leq 1/2$.

    Proof. Let $g(x) = 1 - x - e^{-2x}$; the claim is that $g(x)$ is nonnegative for all $x \in [0, 1/2]$. $g'(x) = 2 e^{-2x} - 1$ is zero only at $x = \frac 1 2 \log 2 \in [0, 1/2]$, so $g$ can have only one extreme point, where it is positive, since $g(\frac 1 2 \log 2) = 1 - \frac 1 2 \log 2 - 1/2 > 0$; at the endpoints, $g(0) = 0$ and $g(1/2) = 1/2 - 1/e$, $g$ is nonnegative, so it is nonnegative on the whole interval.

Consequently, if $Q < M/2$, you can set $1 - i/M \geq e^{-2i/M}$ and thus $P_{\lnot C}(Q) \geq e^{-Q(Q - 1)/M}$.

Putting the inequalities together, if $Q (Q - 1) < M/8$, we have $$1 - \frac{Q (Q - 1)}{M} \leq e^{-Q(Q - 1)/M} \leq P_{\lnot C}(Q) \leq e^{-Q(Q - 1)/(2M)} \leq 1 - \frac{Q (Q - 1)}{4M},$$ or equivalently $$\frac{Q (Q - 1)}{4M} \leq P_C(Q) \leq \frac{Q (Q - 1)}{M}.$$

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230