9

Say we have a hash function that produces $n$ bit outputs. From the birthday problem that after around $\sqrt{2^n}$ different inputs to the has function, we can expect a collision.

Say instead that we have $m$ outputs. How many collisions can we expect in those $m$ outputs?

If there are $k$ inputs that all have the same output, we say there are $\binom{k}{2}$ collisions.

mikeazo
  • 39,117
  • 9
  • 118
  • 183

1 Answers1

11

The expected number of collisions (assuming that the hash function can be modeled as a random function) is precisely $2^{-n}\binom{m}{2}$; that is, the expected number of pairs of values $x \ne y$ with $H(x) = H(y)$ (and so, to answer Ricky's question, $H(x) = H(y) = H(z)$ would count as three collisions).

The reasoning is the obvious one; there are $\binom{m}{2}$ separate pairs, and each pair has a probability $2^{-n}$ of colliding (and hence has an expected number of collisions of $2^{-n}$), and the expected sum of a set of probabilistic values is the sum of each individual value's expectation. The probabilities are not independent (for example, if $H(x)=H(y)$ and $H(y)\ne H(z)$, we know that $H(x) \ne H(z)$); it turns out that the expected sum doesn't depend on independence.

mikeazo
  • 39,117
  • 9
  • 118
  • 183
poncho
  • 154,064
  • 12
  • 239
  • 382