2

A lot of questions about the birthday problem can be found here, but none seems to address my problem:

Background

I am thinking of a hash-type data structure design which accepts a certain number of collisions to occur. Collisions shall be detected and handled in a second data structure with substantially lower collision probabilities. The number I'm interested in is the number of datasets which go into the second data structure.

Question

I do not care if there are 4 'children' having birthday at the same 'day' or 2 pairs of children where each pair shares a certain birthday. Both would be counted as 4 children being involved in collisions. Also I do not care about exact results, approximations would be fine.

My Question is:

Given n persons and m days possible for birthdays. How to calculate the probability of k=2,3,4,5,... persons being involved in collisions?

Clarification

Apparently, the main problem is that I need to handle pretty big values. My dimensions are about:

"Lets say a year had 100,000 days (alternatively 1,000,000 days). Then think of a class of 50.000 kids. Whats the probability of everyone has a unique birthday. Whats the probability of 1-20, 20-50, 50-100 kids not having a unique birthday?"

As I said, results must not be perfectly exact.

philipp
  • 211
  • See #15 of http://www.randomservices.org/random/urn/Birthday.html#General. – vadim123 Sep 29 '14 at 13:50
  • @vadim: Thanks. This is exactly what I am looking for. However, the magnitude of values I need to calculate seem to exceed the facilities even of Pythons BigFloat package. I clarified the question, maybe someone has an idea how to get estimations for that magnitudes of numbers – philipp Sep 29 '14 at 17:26
  • You could approximate your summands with Stirling's formula. – vadim123 Sep 29 '14 at 18:15

1 Answers1

1

I found a sensible approximation using the Poisson distribution:

The idea is taken from Wikipedia.

Simplification:

With m days possible, there is a chance of $\frac{1}{m}$ that a random pair of childs has birthday at the same day.

n childs result in $\frac{n(n-1)}{2}$ pairs which can be tested for identical birthdays

This leads to an expectation value $\lambda$ (date collisions in terms of the lambda distribution) of $$ \lambda = \frac{n(n-1)}{2m} $$

Then we can use Poisson's formula to get to probablility of exactly $k$ children having a non-unique birthday:

$$ P(k) = \frac{\lambda^k}{k!}e^{-\lambda} $$

An over-simplified check of the accuracy: In a class of 20 Kids, there is a chance of 0.5886 that everyone has a unique birthday, according to Wolfram Alpha. Using the Poisson Distribution, I get 0.5942 as result.

philipp
  • 211