3

I ran across the following problem in Casella and Berger's Statistical Inference (Q1.20, 2nd ed):

My telephone rings 12 times each week, the calls being randomly distributed among the 7 days. What is the probability that I get at least one call each day? (Answer: .2285)

This seems to be equivalent to putting 12 balls into 7 boxes so that there is at least 1 ball in each box. In that case, this should be a fairly straightforward selection-with-repetition problem: since we have at least 1 ball in each box, that means we must actually count the number of ways to put 5 balls into 7 boxes, and divide by the number of ways to put 12 balls into 7 boxes, which would give [(11 choose 6)/(18 choose 6)] or about 0.0249.

I see an answer given here: Statistical Inference Question which approaches it from the bottom up rather than the top down. This method seems reasonable, and I've verified that it gives the authors' desired answer, but what's different about my approach? In both cases, the calls are unordered and identical, the days are ordered and distinct, and repetition is allowed.

mim
  • 123
  • If each call chooses its day uniformly at random, which I guess is what the question means, not every distribution is equally likely. 12 0 0 0 0 0 0 can only happen one way, 2222211 can happen many ways – M T Dec 23 '21 at 00:11
  • I think you're saying that the calls are distinct, not identical; is that it? Sorry, I'm approaching this from a background with some combinatorics and not a lot of statistics, so perhaps I've misread your comment. – mim Dec 23 '21 at 16:38
  • I don't get what you mean by distinct not identical. There are twelve different calls so they are definitely distinct. The point is that your calculation assumes every way of dividing those calls between days is equally likely to occur, and that's wrong. Do the case of two calls and two days for enlightenment. – M T Dec 23 '21 at 21:49
  • Simulation in R: A million iterations give about three place accuracy, Code set.seed(2021); u = replicate(10^6, length(unique(sample(1:7, 12, rep=T)))); mean(u==7) returns $0.228986$ and 95% margin of simulation error is obtained from 2*sd(u==7)/1000, which returns $0.0008403608.$ So answer is $0.2290\pm 0.0084$ consistent with answer $0.2285$ in text. – BruceET Dec 24 '21 at 08:09
  • 1
    @BruceET thanks for the simulation! That's a great way to verify the answer. – mim Dec 27 '21 at 16:40
  • @M T, yep, I think you're right, looking at it from the point of view of 2 calls in 2 days, Casella and Berger would say the calls are distinguishable, so there are 4 distinct ways they could occur, (12,∅) , (1,2), (2,1), and (∅,12). I was treating them like they are the indistinguishabl, so there are only 3 scenarios (11,∅), (1,1), (∅,11), though of course the second of these occurs twice as often. I was thrown off by the superficial similarity to some probability problems in combinatorics where this doesn't occur (see my answer) – mim Dec 27 '21 at 18:21

2 Answers2

1

Inclusion/exclusion works well.

Take $A_i$ as the event that at least one call is received on day $i$. You're looking to compute the probability of $A_1\cap \dots \cap A_7$. Then $$\begin{eqnarray*}\mathbb{P}\left(A_1 \cap \dots \cap A_7\right) &=& 1- \mathbb{P}\left(\bigcup_{i=1}^7A_i^C\right) \\ &=& 1-\sum_{i=1}^7(-1)^{i-1}{ 7 \choose i}\mathbb{P}\left(A_1^C \cap \dots \cap A_i^C\right) \\ &=& 1-\sum_{i=1}^7(-1)^{i-1}{ 7 \choose i}\left(\frac{7-i}{7}\right)^{12} \\ &\approx & 0.2285\end{eqnarray*}$$

  • Thanks a lot for the answer (and I do prefer your phrasing to the one from https://math.stackexchange.com/questions/13219/statistical-inference-question), but my question is: why is this not equivalent to the solution I suggested, using selections with repetition? – mim Dec 27 '21 at 16:36
  • 1
    You need to be super careful when using stars and bars to formulate probabilities. (I learned this from @lulu). The patterns counted by stars and bars are not equiprobable. Consider placing two balls into two bins. The probability that the first bin contains two balls is $1/4$, the probability the second bin contains two balls is also $1/4$, but the probability each bin contains exactly one ball is $1/2$. –  Dec 27 '21 at 16:54
  • Indeed, this does boil down to a problem of distinct vs identical items. I think I've sorted out both why the correct answer should consider the calls to be distinct as you did as well as why I was confused about it; see my answer. There's a class of problems in combinatorics (and, ironically, it's the probability ones!) which is basically the only place it doesn't matter whether the items are identical or distinct. I've always taken that to be a statement about probability being blind to order, but I think it's not, it's just a coincidence that those are the problems where some $r!$ cancels – mim Dec 27 '21 at 18:20
0

Thanks to all who commented; you helped me think in the right direction. It turns out this is a confusion due to the difference in emphasis between combinatorics and statistics, so I'll answer it in depth for anyone else who runs into it:

When calculating the "probability that an event happens" in statistics, we assume that order matters - the events are happening in time, in an order, and are therefore distinct from that alone, even if they are otherwise identical. This is in sharp contrast to some probability problems you'd run into in, say, a combinatorics text, where ordered vs. unordered is irrelevant.

What I mean by "distinct" and "identical" is the difference between, ex,

How many ways can we place 15 identical balls into the 6 pockets on a billiards table?

and

How many ways can we place 15 numbered billiard balls into the 6 pockets?

The answer to the second is simple: the 1 ball can be put into any of the 6 pockets, then the 2 ball into any of the same 6 pockets, and so on, so there are $15^6$ ways to do it. This is an arrangement or permutation or ordered outcome or distribution of distinct objects or list, depending on your choice of terminology. The first question is trickier: you can use the formula $15+6-1 \choose 15$, or you visualize this as lining up 15 Os with 6-1 Is (each O represents a ball, and the Os to the left of the first I are in the first pocket, the ones between the first and the second Is are in the second pocket, and so on). This is variably a selection or combination or unordered outcome or distribution of identical objects or set. Of course, there are other combinatorial details implicit here; the pockets themselves are considered distinct (else we'd want a partition, not a distribution), and we're not allowing any repetition (the question was "place 15 balls," not "sink a ball 15 times"), etc.

In stats, since events are occuring in time, they're assumed to be distinct, whether or not they were otherwise distinct. Casella and Berger actually reference this in passing at the beginning of Ch. 1 when they comment that the statistical events they're studying all occur in an order.

There are certainly situations where ordered and unordered methods give the same answer. Just to complicate the situation, this usually occurs in a combinatorics textbook in the context of a probability question; ex:

During a billiard game, 8 of the 15 balls are sunk. What is the probability that they are the solid balls and the 8-ball (so balls numbered 1-8)?

This can be answered by considering the order in which the balls are sunk, in which case there are $15P8 = 15\times 14 \times \cdots \times 8$ total ways to put any 8 balls, in order, in the first pocket, whereas there are $8P8 = 8!$ ways to put the desired solid balls in that pocket. An equivalent formulation is to consider the balls which have been sunk as a set without considering their order of sinking; in this case, there are $15C8 = \frac{15!}{8!7!}$ total ways to have any subset of 8 balls in the pockets and then $8C8 = 1$ way to have the desired balls in the pockets. Both methods give the same answer, $\frac{8!7!}{15!}$.

The difference between the combinatorics homework problem (where order doesn't matter because we're calculating a probability) and the statistics homework problem (where order does matter because we're calculating a probability) is that the combinatorics problem assumed that the outcome took a given form (8 balls were known to be in the pockets) which rendered the ordered and unordered formulations equivalent, whereas the stats problem asked about getting to that given form. I suppose, in some sense, these are complementary problems: one absorbs all of the order, and the other is completely blind to it.

FYI, a problem similar to the one I originally asked actually appears in Tucker's Applied Combinatorics (6th ed.), 5.4.16:

Consider the problem of distributing 10 distinct books among three different people with each person getting at least one book. Explain why the following solution strategy is wrong: first select a book to give to the first person in 10 ways; then select a book to give to the second person in nine ways; then select a book to give to the third person in eight ways; and finally distribute the remaining seven books in $7^3$ ways.

mim
  • 123