9

I asked a somewhat related question recently and then became interested in this one: how many people are required, on average, until 3 share a birthday? More generally, if we have $M$ bins, what is the expected number of balls we must toss before some bin contains exactly 3 balls?

The straightforward techniques used in the question cited above seem hard-to-apply here because the number of ways to obtain configurations with 2-balls-or-fewer is quite a bit messier now.

Question: what is the expected number of balls needed to obtain a bin with 3 balls in it?


To expand a bit on the "messy" comment above: the EGF for the number of ways to toss $n$ balls into $M$ bins with no bin containing more than 2 balls is $$(1+z+z^2/2)^M$$

So, for example, if we have $M=3$ bins, the number of ways for $n$ balls to be arranged (with no bin having more than 2 balls) is found by taking the $n$-th coefficient of $$(1+z+z^2/2)^3 = 1/0! + 3 z/1! + 9z^2/2! + 24z^3/3! + 54z^4/4! + 90z^5/5! + 90z^6/6!$$

So if I want the number of 4-letter words from the alphabet $\{a,b,c\}$ where no letter occurs more than twice, it's the coefficient of ${z^4/4!}$ or 54 according to this formula. We can verify this directly by noticing there are three 4-letter words with all characters the same (aaaa, bbbb, and cccc). For the words with 3 letters the same and one different, there are 4 ways to choose the position of the different letter, then 3 choices of what that letter is, then 2 choices for the three letters that match. The total number of 4-letter words is therefore $3^4=81$, so we have $81-3-4\cdot 3\cdot 2 = 54$. Of course this gets complicated as $M$ increases.

How you get from this EGF to an asymptotic estimate of the expectation is beyond me.


Note: Byron's answer below settles this. For anyone interested, this fully generalizes to "k-wise collisions" using

$$E(M,k) \approx \sqrt[k]{k!}\ \Gamma(1 + 1/k)\ M^{1-1/k}$$

where setting $k=3$ yields the result in Byron's answer below. Of course this is an asymptotic result and gets better as $M$ increases. For $M$=365 this formula yields about 82.87, whereas the correct answer is about 88.73891.

Henry
  • 169,616
Fixee
  • 11,760
  • Do you want the expected 'time' of a third collision, or the point at which the probability that there is a bin with three balls is >50%? (It's the latter that's classically quoted for the birthday problem). The two are likely to be different, and I think the latter is a bit easier to solve... – Steven Stadnicki Jun 15 '13 at 00:50
  • 1
    I am asking for the expected number of balls where a 3-way collision occurs. But I would be happy to learn the "median" value, which is the number of balls where the a 3-way collision has probability $\approx$ 1/2. – Fixee Jun 15 '13 at 00:53

1 Answers1

6

The expected time to the first triple collision is $$\mathbb{E}(T) = \int_0^\infty \left(1+{x\over M}+{x^2\over2M^2}\right)^M \,e^{-x}\,dx.\tag1$$

In my answer here, I derived the formula for double collisions, and the extension to triple collisions is straightforward. From equation (1), we see that $\mathbb{E}(T)$ grows like a multiple of $M^{2/3}$.


I turned the page and noticed that in section 15.3 of Problems and Snapshots from the World of Probability by Blom, Holst, and Sandell, the authors give the precise asymptotic result $$\mathbb{E}(T)\approx 6^{1/3}\,\Gamma(4/3)\, M^{2/3}\approx 1.6226\,M^{2/3}.$$

  • 1
    Thanks, Byron. I had guessed that $E(T) \approx c M^{2/3}$, but simulations I ran showed $c$ growing slightly with $M$ so I thought the $2/3$ exponent was a tad low. It could instead be the effect of lower-order terms in the asymptotics. – Fixee Jun 15 '13 at 01:12
  • 2
    Following the link to your other question, and then to the Sedgewick/Flajolet book, and then a reference from there, I found a paper that gives the derivation for this result: http://www.sciencedirect.com/science/article/pii/S0021980067800759 – Fixee Jun 15 '13 at 01:40
  • 2
    @Fixee That was some good hunting! –  Jun 15 '13 at 01:44