Distinguishability in probability

Question

You are making chocolate chip cookies. You add N chips randomly to the cookie dough, and you randomly split the dough into 100 equal cookies. How many chips should go into the dough to give a probability of at least 90% that every cookie has at least one chip?

My initial approach to this problem was to count the number of ways of putting N chips (socks) in 100 cookies (boxes) using stars and bars: $$ \binom{100+N-1}{N}.$$

Similarly, I calculated the number of ways of doing the same, after having put 1 chip in each cookie. Applying stars and bars with $N-100$ socks and 100 boxes, this yields $$ \binom{100+N-100-1}{N-1}.$$

To compute the probability that every chip has at least one chip, I would simply take the ratio of the above and solve for $N$ such that the ratio is at least 90%.

However, I think my solution is flawed because I’m treating the chips as indistinguishable.

Is my reasoning correct?

EDIT: we can make the (unphysical) assumption that each cookie follows a uniform distribution independently of all other cookies.

Well, there are a number of problems here. The biggest one is that Stars and Bars is rarely useful in computing probabilities. That's because the patterns they count are generally not equi-probable. Also, the whole problem seems ill-posed. Are we meant to assume that the location of each chip follows a uniform distribution independent of all other chips? I wouldn't have thought that was an obvious assumption. Fine if you want to assume it, of course, but in that case it should be stated explicitly. — lulu, Oct 06 '24 at 14:10
Why do you think treating the chips as indistinguishable is wrong? — Stefan, Oct 06 '24 at 14:28
@lulu, yes, that was my concern with stars and bars. The other solution I considered (which others have suggested) was inclusion-exclusion. Alternatively, I think a recursion formula can be set up. Thanks — TheorVHP, Oct 08 '24 at 13:48

score 3 · Answer 1 · answered Oct 06 '24 at 17:40

Here are three more approaches based on the chips being allocated independently of the other chips, uniformly chosen from the cookies.

Method 1: The probability all $n$ cookies have at least one chip is $S_2(n,100) \dfrac{100!}{100^n}$ where $S_2(n,100)$ is a Stirling number of the second kind. You want the smallest $n$ which makes this at least $90\%$. Unfortunately this does not make things much easier as calculating $S_2(n,100)$ is not easy - you might use recursion or inclusion-exclusion, but then you save no time compared to doing so directly.

Method 2: Suppose $p(n,k)$ is the probability that exactly $k$ cookies have at least one chip after distributing $n$ chips among the $100$ cookies. Then, by thinking about the probability that the next chip goes to an already chipped cookie or to an unchipped cookie, and you have the recursion $$p(n,k)= \frac{k}{100}p(n-1,k)+ \frac{100-k+1}{100}p(n-1,k-1)$$ starting with $p(0,0)=1$ and $p(0,k)=0$ for $k\not=0$.

If you put this into a spreadsheet, and drag the formula far enough, you will find $p(682,100)=0.899499$ and $p(683,100)=0.900456$, so you need $n \ge 683$.

Method 3: As an approximation, if you falsely assume that the number of chips on each cookie is independent of the numbers on the other cookies, then with $n$ chips the probability a particular cookie has at least one chip is $1- (1-\frac1{100})^n$ so the probability all the cookies all have at least one chip is about $\left(1- (1-\frac1{100})^n\right)^{100}$.

If you want this to be at least $90\%$, then this suggests you need $n$ to be at least about $\dfrac{\log(1-0.9^{1/100})}{\log (1-\frac1{100})}\approx 682.17$ so at least about $683$. This approximation suggests a probability for $n=683$ of $0.900786$ so close to the more precise result of method $2$.

score 1 · Answer 2 · answered Oct 08 '24 at 09:47

Since each chip is placed independently into one of the 100 cookies, the probability that a given chip does not land in a specific cookie is:

$P(\text{chip avoids cookie}) = \frac{99}{100}$

If there are N chips, the probability that all of the chips avoid this specific cookie is:

$ P(\text{cookie is empty}) = \left( \frac{99}{100} \right)^N $

Then, the probability that every cookie (total 100 cookies) has at least one chip and also be at least 90% is:

$ P(\text{all cookies have chips}) \geq 1 - 100 \cdot \left( \frac{99}{100} \right)^N \geq 0.9 $

This inequality can be simply resolved by following process

$ N \cdot \ln\left( \frac{99}{100} \right) \leq \ln(0.001) $

Since $\ln\left( \frac{99}{100} \right) = \ln(99) - \ln(100) \approx -0.01005 $, then $ N \cdot (-0.01005) \leq -6.907 $

So that $N \geq \frac{6.907}{0.01005} \approx 687.4$ ($N \geq 688$)

I think that’s a valid approximation but the step in which you transition from one cookie being empty to all cookies being empty isn’t valid because those events aren’t independent (if I’m not mistaken). — TheorVHP, Oct 08 '24 at 13:47
@TheorVHP In fact the problem here is that the events of particular cookies being unchipped are not mutually exclusive though they are close to being so, and thus summing them, i.e. multiplying by $100$, introduces an approximation error; they are not independent either but the potential error from assuming independence (my method 3) is usually smaller than the error from assuming mutual exclusiveness. — Henry, Oct 08 '24 at 15:05

user2661923 · Answer 3 · 2024-10-06T16:28:56.120

lulu's comment, which focuses on the fact that each of the Stars and Bars solutions are not equally probable, directly answers the original poster's question.

How then should the original math problem be attacked?

To the best of my knowledge, there are (in general) only two means of attack:

Statistical Inference
I don't really know anything about statistics, other than that it was designed to quickly resolve problems of this nature, and yield a reasonable approximation for $~N.~$

I think that someone knowledgeable in statistics should provide a separate answer.
Inclusion Exclusion
This approach, while somewhat convoluted, may be routinely implemented with computer assistance, so that the exact value of $~N~$ may be found. The procedure is laid out below.

Let $~f(N)~$ denote the probability, as a function of $~N,~$ that each of the cookies has at least one chip. Clearly, for $~N < 100, ~f(N) = 0.$

Then, you are looking for the smallest positive integer $~N,~$ such that $~f(N) \geq 0.90.$

So, the entire problem reduces to finding a closed form formula for $~f(N).$

See this article for an introduction to Inclusion-Exclusion. Then, see this answer for an explanation of and justification for the Inclusion-Exclusion formula.

Following the syntax in the second link, let $~S~$ denote the collection of all distributions of the $~N~$ chips among the $~100~$ cookies, without any regard for whether each of the cookies has at least one chip.

For $~k \in \{1,2,\cdots,100\},~$ let $~S_k~$ denote the subset of $~S~$ where Cookie-$k$ is in violation, because Cookie-$k$ has no chip.

To clarify, $~S_1~$ denotes the subset of all distributions of the $~N~$ chips among the $~100~$ cookies, where:

Cookie-$1$ has no chip.
Any of the other cookies may or may not also have no chip.

Then, with $~N~$ presumed to be a fixed constant, you have that

$$f(N) = \frac{| ~S ~| - | ~S_1 ~\cup ~S_2 ~\cup ~\cdots ~\cup S_{100} ~|}{| ~S ~|}. \tag1 $$

Let $~T_0~$ denote $~| ~S ~|.~$

Let $~T_1~$ denote $~\displaystyle \sum_{1 \leq i_1 \leq 100} | ~S_{i_1} ~|.$
That is $~T_1~$ denotes the sum of $~\displaystyle \binom{100}{1}~$ terms.

Similarly, for $~r \in \{2,3,\cdots,100\},~$
let $~T_r~$ denote $~\displaystyle \sum_{1 \leq i_1 < i_2 < \cdots < i_r \leq 100} | ~S_{i_1} ~\cap ~S_{i_2} ~\cap ~\cdots ~\cap ~S_{i_r} ~|.$
That is $~T_r~$ denotes the sum of $~\displaystyle \binom{100}{r}~$ terms.

Then, by Inclusion-Exclusion theory, $~f(N),~$ which is expressed in (1) above, is equivalent to:

$$f(N) = \sum_{r=0}^{100} (-1)^r T_r. \tag2 $$

So, the entire problem is reduced to finding a closed form formula for $~T_r,~$ as a function of both $~N~$ and $~r.$

Considerations of symmetry greatly simplify the computations.

$$| ~S ~| = T_0 = 100^N.$$

$$T_1 = \binom{100}{1} (100 - 1)^N.$$

Similarly, for $~r \in \{2,3,\cdots,100\}:$

$$T_r = \binom{100}{r} (100 - r)^N.$$

In summary, you are looking for the smallest positive integer $~N,~$ such that the following expression is $~\geq 0.90$:

$$f(N) = \frac{1}{100^N} \times \left\{ ~\sum_{r=0}^{100} ~\left[(-1)^r ~\binom{100}{r} ~(100-r)^N ~\right] ~\right\}. \tag3$$

It becomes a simple matter to write a computer program that will compute $~f(N),~$ and then use the computer program to manually do a numeric search. For example, compute $~f(150).$

Assuming that $~f(150) \geq 0.90,~$ you would then compute $~f(125),~$ and so forth.

$\underline{\text{Addendum}}$

Defects in the above approach:

First see the comment of Henry, immediately following my answer.

Beyond that, my approach assumes that the assignment of each of the $~N~$ chips to one of the cookies are independent events.

In physical reality, this can't be correct, because the volume of the chip is not trivial versus the volume of the cookie. So, in the baking process, as the number of chips assigned to one unit of cookie dough increases, it becomes more difficult than normal to assign an additional chip to that cookie dough unit.

In effect, the chips are (potentially) colliding with each other and crowding each other.

So, one would expect that in an actual baking process, the true value of $~N~$ will be less than the value of $~N~$ computed by my algorithm.

As a different complicating factor, assigning chips to a cookie is different from assigning chips to a unit of cookie dough. That is, since the volume of the cookie is fixed, the more chips that are assigned to a specific cookie, the less cookie dough will be assigned to that cookie.

There are at least two other approaches, one using recursion (easy on a spreadsheet) and another using Stirling numbers of the second kind (requiring very big integers). There is also a quick approximate method, falsely assuming that the number of chips on each cookie is independent of the numbers on the other cookies. — Henry, Oct 06 '24 at 16:02
@Henry Agreed. I am dimly aware of the existence of Stirling numbers, but refrained from mentioning them because I have never really studied them. Also, recursion didn't occur to me, and even now, it is unclear to me how to set the recursion up. As for your quick approximate method, I am assuming that you are referring to a statistical inference, which I briefly discussed at the start of my answer. As for the issue of independent events, see my Addendum, which I have just added to the end of my answer. — user2661923, Oct 06 '24 at 16:10
Thanks to the both of you (@Henry). I had considered inclusion-exclusion and recursion, but not Stirling numbers. — TheorVHP, Oct 08 '24 at 13:52
I don’t think inclusion-exclusion ever requires any approximations, if we can assume that the probability of a chip being in a cookie is uniformly independent of all other cookies. The method that does require an assumption is Method 3 in @Henry’s answer. — TheorVHP, Oct 08 '24 at 13:56
@TheorVHP However, as discussed in my Addendum, uniform independence can't be right. There, the computed exact value of $~N~$ will be an over-estimation. In practice, I don't believe that math PHD's are hired to work in the kitchen. Instead, they simply experiment with various values of $~N~$ until the baked cookies seem right. — user2661923, Oct 08 '24 at 14:00

Distinguishability in probability

3 Answers3