2

I'm looking for help on a stochastics/combinatorics problem. We want to distribute $N$ balls of $C$ different colours into $K$ boxes, where usually $K<N$. The number of balls of a specific colour $c$ can be denoted $b_{c}$. The boxes are of unrestricted size, for the same problem with boxes of an exact size see HERE. The balls are, apart from the colours, not distinguishable. No two balls of the same colour are allowed to be in the same box (see BGM's comment below for notation). What I am interested in is the probability of x co-occurrences between balls of certain colours. I know the solution for balls of 2 colours, when the probability can be calculated with a hypergeometric distribution (SOURCE): $$P(X=x)=\frac{\binom{K}{b_1}\binom{b_1}{x}\binom{K-b_1}{b_2-x}}{\binom{K}{b_1}\binom{K}{b_2}}=\frac{\binom{b_1}{x}\binom{K-b_1}{b_2-x}}{\binom{K}{b_2}}$$ The intuition for 2 colours is that we distribute x balls of colour 1 into K boxes and add balls of colour 2 into those x boxes. Then we distribute the remaining $b_2-x$ balls of colour into the remaining $K-b_1$ boxes which don't already have a ball of colour 1. This is nice because this has a specific distribution with a known expected value and variance, something that I need for my application.

Now I want to extend this to more than 2 colours. My current thinking that this is not a straightforward extension to the multivariate hypergeometric distribution, but I'm happy if you can convince me of the opposite. The reason is that the distribution of the remaining balls after x co-occurrences is more complicated than in the case with 2 balls. If you think e.g. of distributing 3 balls, then we can put the balls of the third colour into boxes with one of the other colours, but not in boxes where both other colours are present. The enumeration of that should be, in my understanding, harder than the simple 2 colour case (ball is either there or not there).

Example: We have 6 balls of 3 colours, and 5 boxes. The respective numbers of balls are: (b1,b2,b3)=(4,3,2). Is there an explicit probability distribution how likely it is that balls of colours (1,2,3) will appear in the same box 0 times, 1 time or 2 times?

(1) Does this experiment have an explicit probability distribution, and if so, what is it? If that does not exist: (2) How can we enumerate the combinatorial possibilities of these balls of multiple colours?

Thank you in advance for your help!

Nils R
  • 61
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Jan 25 '25 at 21:39
  • I am confused about your question and what counts as a co-occurrence. If you have $3$ red balls in a box and $2$ blue balls in the same box, does that count as $1$ co-occurrence between red and blue or $6$? Either way, how are the balls put in the boxes if both are indistinguishable? I doubt it addresses your question, but you may be talking about the multivariate hypergeometric distribution – Henry Jan 25 '25 at 22:15
  • No two balls of the same colour are allowed to be in the same box, I've edited the question to address that. As for the multivariate hypergeometric distribution, I've mentioned it in the question: I don't think it appropriately enumerates the possibilities of distributing the "remaining" balls after the x co-occurrences are set up, because there could be potentially more co-occurrences between those remaining balls. – Nils R Jan 25 '25 at 22:20
  • What is the source of this problem? – user2661923 Jan 25 '25 at 22:41
  • A problem from my research, I need the distribution for specifying a null model – Nils R Jan 25 '25 at 22:43
  • So in this setup, some boxes maybe empty at the end while some boxes can contain up to $C$ balls of different colors, $1$ ball from each colors. And now are you asking given a certain set of colors ${c_1, c_2, \ldots, c_m}$, with number of balls ${b_1, b_2, \ldots, b_m}$ for each of the color, what is the distribution (probability mass function) of number of boxes containing all these $m$ colors of balls? – BGM Jan 26 '25 at 12:08
  • Correct @BGM, I'll edit the question to make that clearer – Nils R Jan 26 '25 at 15:53
  • If I understand correctly, the problem is equivalent to finding the probability of an intersection of size $x$ among $C$ random sets of size $b_1, \ldots, b_C$ and elements in ${1,\ldots,K}$. Maybe the generating function approach here could be extended to $C \gt 2$. – Fabius Wiesner Jan 27 '25 at 11:22

0 Answers0