7

This question is inspired by the Birthday paradox.

Suppose we have a sample space $S$ of $n$ elements. Is there a probability distribution $\mu$ on $S$ so that $$P(\text{you pick $k$ elements of $S$, according to $\mu$, without repeating}) < P(\text{you pick $k$ elements of $S$ uniformly, without repeating})$$

To avoid triviality assume $1 < k < n$. (I assume the answer is no, I tried checking by hand, but I got stuck).

The reason I ask, is that I am going to teach the Birthdng Paradox in class on Friday, and, of course, I'm going to assume that a person's birthday is uniformly distributed throughout the year. This is empirically false, and I guess that the fact that there is a bias towards certain days increases the likelihood of a shared birthday. I don't want to claim this in class without proof though.

Henry
  • 169,616
James
  • 5,550
  • 1
  • 17
  • 25
  • A constant $\mu$ would always result in $k=1$, hence the probability for any $k>1$ is $0$. Could it be you want to compare the cumulative probabilities? – Hagen von Eitzen Sep 01 '16 at 15:29
  • 1
    See http://math.stackexchange.com/questions/177692/birthday-paradox-for-non-uniform-distributions -- in particular, the comment by Byron Schmuland. – Barry Cipra Sep 01 '16 at 15:33
  • @HagenvonEitzen I think you misunderstood. $k$ is supposed to be fixed. Certainly the probability is not $0$ for $k>1$ for instance, picking two elements from ${1,2,3}$ uniformly gives us a $2/3$ probability of not repeating, which is $>0$. – James Sep 01 '16 at 15:43
  • @BarryCipra Thank you, I can boldly make the claim now. – James Sep 01 '16 at 15:43

1 Answers1

2

Comment: Perhaps something to show your class.

US Monthly Birthrates: Jan'97--Dec'99. Source: National Center for Health Statistics. [Ack: This is Fig. 1.4 in Suess and Trumbo (2010).]

enter image description here

A simulation with the actual monthly distribution of birthdays shows that US birthrates are close enough to uniform that the probability of no match among about 20-30 people is changed by perhaps one digit in the 2nd decimal place. (There is, of course, a breakdown point. You wouldn't want to apply the results to a meeting of the Sagittarian Society.)


Simulation in R statistical software of $P(\text{No match}) = 0.434 \pm 0.004$ for uniform birthdays with $n = 25$ people:

n = 25;  m = 10^5;  x = numeric(m)
for (i in 1:m)  {
  b = sample(1:366, n, repl=T)
  x[i] = n - length(unique(b))  }
mean(x==0)
## 0.43438

Exact computation:

prod(1 - (0:24)/365)
## 0.4313003

Simulation for birthdays slightly farther from uniform than actual US birthdays.

n = 25;  m = 10^5;  x = numeric(m)
w = c(rep(4,200), rep(5,165), 1)  # vector of 366 weights for birthrates
for (i in 1:m)  {
  b = sample(1:366, n, repl=T, prob=w)
 x[i] = n - length(unique(b))  }
mean(x==0)
## 0.4253

In practice, probability modeling involves making assumptions, usually of two types: (a) Assumptions we hope are true, (b) Assumptions we know are false, but hope won't produce serious errors.

In the birthday problem, an example of (a) would be that the 25 subjects are randomly chosen, and an example of (b) would be that there are 365 equally likely birthdays in the population. We have shown by simulation that our hopes for (b) are realistic.

BruceET
  • 52,418