2

This post states the same problem I am looking at, with different variable names.
It does not look like the answer was accepted; and I have trouble following it, as it uses technical terms I am not familiar with (Hasse diagram, poset...).
I am trying to solve this using more elementary combinatorial principles, as a non-expert.

In essence, you have $N$ balls, $n$ of which are black, $N-n$ white.
You want to partition them into $m$ bins of size $N/m$, where $N/m$ is an integer.
The question is: what fraction of all possible partitions have at least one black ball in all bins?

Despite thinking about this quite hard, and sketching out several solutions based on my own reasoning and on people's feedback, e.g. by looking at occupancy theory and restricted partitioning, I consistently got stuck with summations of terms that I could not figure out how to obtain.

Would anyone be able to point me to literature or posts that describe this problem, if any exist, or suggest how to approach it?

Another question that would be useful to know the answer to is : can this be resolved by fairly elementary methods, or does it require particularly advanced ones?

Here are some pictures from my previous attempts.

All (20) possible sequences made from 3 black + 3 white balls:

enter image description here

Partitioned into 3 bins:

enter image description here

Partitioned into 2 bins:

enter image description here

I wonder if I am approaching this the wrong way and hence missing something obvious that would make it much easier to solve.


EDIT writing out the explicit formula resulting from @JMoravitz's answer, for the record

$$P(N,n,m) = \sum_{i=1}^m {(-1)^{i-1} \cdot \binom m i \cdot \frac {\binom {N-n} {\frac {i \cdot N} {m} }} { \binom {N} {\frac {i \cdot N} {m} } } }$$

$P(N,n,m)$ = probability that, by randomly partitioning $N$ balls, $n$ of which are black, $N-n$ white, into $m$ bins of equal integer sizes $N/m$, at least one bin contains only white balls.

And, if I am not mistaken, the rightmost factor:

$\frac {\binom {N-n} {\frac {i \cdot N} {m} }} { \binom {N} {\frac {i \cdot N} {m} } } = H(\frac {i \cdot N} {m}, N-n, n, \frac {i \cdot N} {m})$

i.e. the hypergeometric probability to draw $\frac {i \cdot N} {m}$ white balls from a set containing $N-n$ white balls and $n$ black balls, by randomly drawing $\frac {i \cdot N} {m}$ balls.

For full clarity, the answer to the original question of this post is $1-P(N,n,m)$ (probability that, by randomly partitioning $N$ balls, $n$ of which are black, $N-n$ white, into $m$ bins of equal integer sizes $N/m$, all bins contain at least one black ball.

It may be noted that when $n > 0$, it is sufficient to sum up to $m-1$, as the probability that all bins are empty is null.
In fact the upper limit of the summation could be decreased even further when $n > N/m$. The hypergeometric probabilities go to $0$ in those cases.


EDIT 2 adding R script, for convenience

N <- 6 # total size of set
n <- 3 # number of black balls in set
m <- 3 # number of equal-sized bins to make (must be a divisor of N)

P2 = sum(sapply(1:m, function(i) dhyper(i * N / m, N - n, n, i * N / m) * choose(m, i) * (-1)^(i - 1))) print(P2)


EDIT 3 adding a more generic R script, handling any arbitrary set sizes

N <- 1000 # total size of set
n <- 10 # number of black balls in set
sizes <- c(100, 100, 800) # sizes of subsets to make (all >= 1, and must sum up to N)

P <- function(N, n, sizes) { if (!(sum(sizes) == N)) stop("The sum of sizes must be N.") if (any(sizes <= 0)) stop("No size can be <= 0.") if (any(sapply(sizes, function(x) floor(x) != x))) stop("Sizes must be integers.") M = length(sizes) sum( sapply(1:M, function(i) { (-1)^(i-1) * sum( combn(sizes, i, function(s) dhyper(sum(s), N-n, n, sum(s)) ) ) } ) ) }

print(P(N, n, sizes))

Function that calculates P for M subsets of sizes as close as possible to N/M

M <- 5 # number of subsets to make (must be <= N)

PE <- function(N, n, M) { if (M > N) stop("M cannot be greater than N.") if (M <= 0) stop("M must be positive.") if (floor(M) != M) stop("M must be an integer.") MC = N - floor(N/M) * M sizes = c(rep(ceiling(N/M), MC), rep(floor(N/M), M-MC)) P(N, n, sizes) }

print(PE(N, n, M))

  • Consider answering the opposite question... what the probability is that at least one bin has no black balls. Consider approaching that by inclusion-exclusion. – JMoravitz Apr 11 '24 at 17:04
  • :D That is actually the original problem I am trying to solve, and people advised to do the reverse, i.e. calculate 1 - probability that all bins have at least 1 black ball. So :/ – user6376297 Apr 11 '24 at 17:06
  • "and people advised" Bad advice here since the original problem is the easy one in comparison. – JMoravitz Apr 11 '24 at 17:06
  • Then I confirm that I am not managing to figure this one out, because I was also convinced that the 'at least one with no black' = 'at least one with only white' was the hard one to solve. See https://stats.stackexchange.com/q/644617/148856 to get an idea of the struggle and attempts I already made. I keep getting 'hints', but if this problem has actually been resolved, I am happy to just be pointed to the solution. – user6376297 Apr 11 '24 at 17:09

1 Answers1

5

Answer the opposite question of finding the probability that at least one bin has no black balls. (The "original question" apparently according to comments)

Let $N$ be the total number of balls, $n$ the number of balls which are black, and $k$ the number of bins these are being evenly split into (making $N$ a multiple of $k$) and an amount of balls in each bin being $N/k$.

Let $A_i$ be the event that the $i$'th bin has no black balls for $1\leq i\leq k$. We are trying to calculate $\Pr(\bigcup\limits_{i=1}^k A_i)$

This expands by inclusion-exclusion as

$\Pr(A_1)+\Pr(A_2)+\dots+\Pr(A_k)-\Pr(A_1\cap A_2)-\Pr(A_1\cap A_3)-\dots + \Pr(A_1\cap A_2\cap A_3)\pm\dots \pm \Pr(A_1\cap \dots \cap A_k)$

Investigating each of these individually we find that if we have the intersection of... say... $m$ of these events together, that corresponds to the event that those $m$ bins respectively all had no black balls. This occurs with probability $\dfrac{\binom{N-n}{mN/k}}{\binom{N}{mN/k}}$. That is, we need to choose an appropriate number of balls from the overall amount for these bins... and we compare this to the number of ways we had pulled from only the white balls.

Now, also recognize the high amount of symmetry, and that all terms in the expansion can be seen as equal to all others in the expansion involving the same number of events.

For your example of $N=6,n=3,k=3$ we get the calculation:

$$3\cdot \dfrac{\binom{3}{2}}{\binom{6}{2}} - 0+0 = \dfrac{3}{5}=0.6$$

Meaning the answer to the question of the question asked in your post is $1-0.6=0.4$, exactly as expected.

JMoravitz
  • 80,908
  • Excellent, thank you very much! I will write out the general formula as an edit to my post and then accept your answer. – user6376297 Apr 11 '24 at 17:43
  • The fact that I could use the hypergeometric probability by 'merging' different bins had briefly crossed my mind, but I dismissed it, thinking that it could not be correct. But indeed, as all balls are white in that case, putting them into 2 bins of size 3 or 1 bin of size 6 is the same. And the inclusion-exclusion principle... I had used it in other contexts, arriving at it from a completely different angle (I was tackling the 'dollar bill problem'). That is one powerful principle. I should remember it in cases like this. Thanks again for your kind help! – user6376297 Apr 11 '24 at 18:10