Why is a simulation of a probability experiment off by a factor of 10?

Question

From a university homework assignment:

There are $8$ numbered cells and $12$ indistinct balls. All $12$ balls are randomly divided between all of the $8$ cells. What is the probability that there is not a single empty cell ($i.e.$ each cell has at least $1$ ball)?

The answer is $\large\frac{\binom{11}{7}}{\binom{19}{7}}$ which is about $0.0065$. I reached this result independently, and it was confirmed by the official homework solution of the university.

A friend of mine and I independently wrote Python simulations that run the experiment many times (tested up to $1,000,000$). We used both Pythons' random generator and several randomly generated lists from www.random.org. Results were similar and consistently hovering around $0.09$ which is a factor of $10$ or even a bit more off from the expected theoretical result.

Have we made some wrong assumptions? Any ideas for this discrepancy?

P.S.: Here is the Python code that I wrote, and maybe there is some faulty logic there.

def run_test():
    global count, N

    def run_experiment(n_balls, n_cells, offset):
        cells = [0] * n_cells
        # toss balls randomly to cells:
        for j in range(n_balls):
            cells[random.randrange(0, n_cells)] += 1
            # cells[int(lines[offset + j])] += 1
        cells = sorted(cells)
        # print(cells)

        # check if there is an empty cell. if so return 0, otherwise 1:
        if cells[0] == 0:
            return 0
        return 1

    count = 0
    N = 1000000
    offset = 0
    N_CELLS = 8
    N_BALLS = 12
    # iterate experiment
    for i in range(N):
        result = run_experiment(N_BALLS, N_CELLS, offset=offset)
        count += result
        offset += N_CELLS

    print("probability:", count, "/", N, "(~", count / N, ")")

I suppose the issue must be in "randomly divided." What does it mean exactly? What is the distribution of probabilities between different outcomes? Was the same distribution simulated in the experiment and used in the formula? — Alexey, Oct 29 '18 at 09:32
If you suppose that all possible distributions of the numbers of balls (not of balls themselves) between cells have the same probability, I wonder how you could simulate this in an experiment... — Alexey, Oct 29 '18 at 09:34
Besides the distinguishability of the balls is how they are distributed between the cells. You can either assign a cell to each ball, uniformly at random, independently of all the other balls, or you can look at all the different distinguishable ball distributions and you pick one of those uniformly at random. That's the difference between your two approaches. — Arthur, Oct 29 '18 at 09:57
@Alexey I think this approach would work, but I don't have time to check the math: Instead of assigning a cell to each ball, you sample the amount of balls per cell. So, choose a # of balls for cell 1, uniformly. Then choose for cell 2, for less availbale balls, and so on. I think that should give a uniform distribution over the #balls/cell. — kutschkem, Oct 29 '18 at 10:25
"Have we made some wrong assumptions?" Hard to tell, since you have shown neither the code that you have written nor specified the assumptions that you have made. Without mind-reading, this question is impossible to answer. If your question involves your code, you should show the relevant code in the question itself. — John Coleman, Oct 29 '18 at 10:51
It seems the book solution assumes that all the ways to distribute the balls into the cells are equally likely--for example a distribution of (12,0,0,0,0,0,0) is just as likely as (2,2,2,2,2,2,0,0). This is not consistent with a model that says each ball is dropped into a cell chosen independently and randomly, with each cell equally likely to be chosen, which I think is more realistic. — awkward, Oct 29 '18 at 13:30
@JohnColeman It looks like the code was mistakenly removed by an edit conflict with another user, a few minutes after the question was asked. I've edited the question just now to restore the link. — Chris Culter, Oct 29 '18 at 19:22
@ChrisCulter I see, If the code is relevant (and it is) -- please put it in the question itself rather than link to it (since links decay over time). — John Coleman, Oct 29 '18 at 20:30
@JohnColeman the code originally appeared in the post but dissappear for a while for some reason. It is now there. The code is commented to describe the method which we simulate. Regardless I've figured out the flaw in the design I'd the simulation. Thanks — Shmuel Levinson, Oct 29 '18 at 20:54
I find it fascinating that such "simple" problems in statistics can be so tricky to unravel. — MaxW, Oct 31 '18 at 21:13

Henry · Answer 1 · 2018-10-29T09:12:23.057

20

In reality, you will find it very difficult to put the balls in the cells without distinguishing between the balls, especially if you want equal probabilities so as to use counting methods for simulation. Suppose you wanted to consider the probability all the balls went into the first cell: with distinguishable balls this probability is $\frac1{8^{12}}$ and is easily simulated though a rare occurrence; with indistinguishable balls it is $\frac1{19 \choose 7}$ over a million times more likely but difficult to simulate

If the balls are distinguishable, the probability all eight boxes are full is $$\frac{8! \, S_2(12,8)}{8^{12}}$$ where $S_2(n,k)$ is a Stirling number of the second kind and $S_2(12,8)=159027$. That gives a probability that each cell has at least one ball of about $0.0933$. Is this similar to your simulation?

If you really want to simulate the indistinguishable balls case, despite it not being realistic physically outside Bose–Einstein condensate at temperatures close to absolute zero, you could use a stars and bars analogy. Choose $7$ distinct positions for the cells walls from possible positions $\{0,1,2,3,\ldots,18\}$ for the balls and cell walls; a success is when none of the cell walls are at positions $0$ or $18$ and no pair of them are consecutive

edited Oct 29 '18 at 09:12

answered Oct 29 '18 at 08:57

Henry

169,616

1

This is indeed similar to my simulation. Thanks for the detailed answer! – Shmuel Levinson Oct 29 '18 at 09:29
4

Nit picking from a physicist: any collection of Bosons will show this effect. It's just that B-E condensates are one of the simplest for us to demonstrate. Messing with , say, photons, is more difficult – Carl Witthoft Oct 29 '18 at 15:28
2

I'm clearly missing something here. Why would the probability of them all ending up in the same cell have anything to do with whether you can distinguish between them? Why would being able to tell which ball is which change a ball's chance of ending up in the first cell? – JimmyJames supports Canada Oct 29 '18 at 20:27
3

@JimmyJames: that is way it is physically difficult to get the indistinguishable case. Try a simpler thought experiment: take two indistinguishable balls and two distinguishable cells; what are the possible patterns and their probabilities? – Henry Oct 29 '18 at 20:46
2

It's your assertion that this is relevant which I'm not understanding. There's nothing in the question that states or implies that the being able to tell one ball from another changes the probability of landing in the first cell which would seem to be 1/8 assuming each event is independent. Your thought experiment has the same distribution as flipping a fair coin twice. 1/2 one head one tail, 1/4 two heads, 1/4 two tails. If I flip a fair penny and then a fair dime, it has no effect on the distribution of head and tails. – JimmyJames supports Canada Oct 29 '18 at 20:57
1

I have a backgammon set with a set of blue dice and a set of white dice. Each die is (for practical purposes) indistinguishable from it's partner i.e. I can't tell which one is which. Are you saying that when I roll with one blue and one white, the distribution is different than when I roll with matched dice? – JimmyJames supports Canada Oct 29 '18 at 21:16
4

@JimmyJames You could argue that the question is vague. It says the balls "are randomly divided" but doesn't describe how. Depending on the procedure, one gets different distributions. And the procedures you've described are plausible. Despite all that, I believe it's the convention for textbook problems like this one that all possible patterns are equally probable unless stated otherwise. And the textbook problem specifies "indistinct balls" for the precise purpose of constraining what those possible patterns are. We're meant to follow that, even if we think it's unmotivated and silly. – Chris Culter Oct 29 '18 at 21:52
1

@JimmyJames Think of two photons that can go through a half reflecting mirror or bounce. If they were distinguishable you would get 4 results both go through, neither goes through, first goes through second doesn't and second goes through first doesn't. This gives a sample space of (T,T),(B,B),(T,B),(B,T) with each outcome equally likely. And so seeing one photon at each detector should be twice as likely as seeing both at either one. If they are indistinguishable though those two (T,B),(B,T) states collapse into one and you get 1/3 probability for each. – DRF Oct 30 '18 at 05:36
@ChrisCulter I'm with you. The original question is vague. What I'm trying to understand is the simpler problem stated in this answer that suggests that if the balls are distinguishable the probability is "over a million times more likely" to all end up in the same cell. Unless identity of each ball that changes how the cell is selected, it doesn't matter whether they are distinguishable or not. It's just counting. No such mechanism is described here or in the original question. – JimmyJames supports Canada Oct 30 '18 at 13:33
@DRF I'm really not sure what quantum mechanics has to do with this. It's not mentioned anywhere in the question. But your example is wrong. There are 4 possible outcomes. 2 go through, 0 go through, or 1 goes through. The case that one goes through has two ways of happening So it's chance is 50%. The other two possibilities are 25%. Whether the photons are distinguishable or not doesn't change anything. The idea that the three outcomes are equally likely is incorrect based on how you have stated this. – JimmyJames supports Canada Oct 30 '18 at 13:39
@ChrisCulter "I believe it's the convention for textbook problems like this one that all possible patterns are equally probable unless stated otherwise." Maybe I haven't done enough textbook problems (not recently anyway) but if that's really the case, we need different textbooks and/or textbook writers. That's a terrible way to train people to think about probability. – JimmyJames supports Canada Oct 30 '18 at 14:11
@JimmyJames Consider asking a separate question of your own, either here or even better, on https://physics.stackexchange.com/ . – Chris Culter Oct 30 '18 at 22:05
@JimmyJames That's exactly the point. If the photons are indistinguishable it doesn't have 4 outcomes, it only has three. I'm not a physicist but from what I remember this is exactly what actually happens and how you show that indistinguishability is a thing. If you perform the experiment and measure the probabilities you get thirds instead of 1/4,1/4 and 1/2.I agree that it's worth writing a new question here for the math side; physics for why we would care about indistinguishable items. – DRF Oct 31 '18 at 06:06
@DRF No. You are describing both the physics and the probability incorrectly. There are now three outcomes but the distribution between them is not even. – JimmyJames supports Canada Oct 31 '18 at 13:20
@JimmyJames I'm certainly describing the mathematics correctly. As to the physics I won't quibble since as I said I'm not a physicist. You keep insisting that the fact that the items are indistinguishable is of no consequence. That is just not true for the way we do the mathematical modelling. If we say there are 2 slots and 2 balls and the balls are indistinguishable and are assigned to the slots randomly with equal probability what we mean is that the probability space consists of three outcomes which are equally likely. – DRF Oct 31 '18 at 13:26
@JimmyJames As to the physics side behind it the one example that's easy and I could find are bosons and states. See the table at the bottom of the statistical properties of bosons and fermions on https://en.wikipedia.org/wiki/Identical_particles . – DRF Oct 31 '18 at 13:28
@DRF "That is just not true for the way we do the mathematical modelling. If we say there are 2 slots and 2 balls and the balls are indistinguishable and are assigned to the slots randomly with equal probability what we mean is that the probability space consists of three outcomes which are equally likely." Maybe that's the way it's done but it's silly. Why mention 2 balls then? Just say there are three outcomes that are equally likely. Maybe this is what it means when people talk about statistics not being mathematics. – JimmyJames supports Canada Oct 31 '18 at 13:36
@DRF "As to the physics side behind it the one example that's easy and I could find are bosons and states." As far as I can tell, the experiment you described did not result in any sort of entanglement. Maybe a more precise explanation of the actual procedure or reference to some sort of actual experiment would resolve that. – JimmyJames supports Canada Oct 31 '18 at 13:39
Let us continue this discussion in chat. – DRF Oct 31 '18 at 14:13
@ChrisCulter Regarding "the convention for textbook problems like this," if I roll two fair six-sided dice simultaneously after shaking them together in a box, what is the probability the sum is $3,$ and does the answer to that question change if I insert the word "indistinguishable" before the word "dice"? – David K Nov 03 '18 at 21:01
I don't think the equiprobable distribution over the states of "indistinguishable balls" is at all difficult to simulate; I can even design a machine that literally sorts balls into bins with this distribution. On the other hand, I do not consider this to be the most "natural" interpretation of the homework question, which I think is ill-posed. – David K Nov 03 '18 at 21:19
1

@davidK There will be no physical difference if you call them indistinguishable. But if you model them as indistinguishable there will be a difference. – DRF Nov 04 '18 at 12:06
@DRF While QM is interesting, my comment concerned how one ought to interpret the word "indistinguishable" in the main question, in the context of a word problem in a math course at a university. I think the homework was ill-phrased and one should not write questions that way, but if forced to choose a model for the question I would choose a classical model; in your terms, the balls are called indistinguishable but are not truly indistinguishable in physics. This particular homework is so badly worded that it is still ambiguous even then, but I think OP's interpretation is as good as any. – David K Nov 04 '18 at 12:52
1

@DavidK The word "indistinguishable" means you are not supposed to choose a classical model. The OP uses "indistinct" even in that case I would assume as part of any math assignment in probability that we are not using a classical model. To be perfectly honest I don't quite see why QM is important. The goal is to choose the correct model and that is the one where the permutations of the balls are not distinct states, and the probability is just a counting probability on the set of all states. – DRF Nov 04 '18 at 21:00

score 10 · Answer 2 · answered Oct 29 '18 at 09:15

Consider the set $D$ of ways to distribute $12$ balls labelled [abcdefghijkl] among $8$ cells numbered [01234567]. This set has $8^{12}\approx7\times10^{10}$ elements.

Now consider the set $I$ of distinguishable ways to populate those same $8$ cells [01234567] with $12$ indistinct balls. This set has ${19\choose7}\approx 5\times10^4$ elements.

The assignment asks you to compute a probability of an event over the uniform distribution on $I$, if not in so many words. In principle, you could approximate this probability by sampling from the uniform distribution on $I$. But your strategy is to sample from the uniform distribution on $D$, and then map each sample to $I$! That's not the same.

Instead of taking the average of all the results, you need to take a weighted average, such that the weight compensates for the number of elements in $D$ that map to the same element of $I$. Hint, it's something like this:

weight = 1
for cell_population in cells:
  weight *= math.factorial(cell_population)

At least, that gets the right answer. Rigorously justifying that formula as a consequence of the mapping between $D$ and $I$ is left as an exercise to the reader.

score 2 · Answer 3 · answered Oct 29 '18 at 15:26

The original problem is posed, so far as I can tell, to show the difference between combinations and permutations. In nature, there is no such thing as indistinguishable balls. Semi-infinite tests (e.g. Las Vegas) have shown this to be true.

Now, if the problem really wants you to use "indistinguishable" balls for the purposes of solving the problem, then yes, you need to use combinations and not permutations when calculating all the ways the indistinguishable balls are placed into the containers. And of course, you need to use permutations for the numbered balls, as they are distinguishable from each other and from the collection of indistinguishable balls.

Now, I believe that Chris Culter's calculations reflect this difference. Whether your Python code does this correctly we can't say until we see the code.

If we want to show the difference between combinations and permutations, we can ask for the number of ways to distribute $12$ indistinct balls among $8$ numbered cells; then ask for the number of ways to distribute the same balls in the same cells so that no cell is empty. This has the solver do all the calculations in the "solved" problem except for the final division, without requiring any assumptions about how the word "indistinct" affects probabilities. — David K, Nov 05 '18 at 00:02

Why is a simulation of a probability experiment off by a factor of 10?

3 Answers3

Linked