Drawing balls from an urn or counting certain posets

Question

A colleague of mine was curious about the number of possible start-configurations in a game. The game itself is not known to me, but the question which he formulated as urn problem was interesting.

Urn problem:

Assume we have an urn containing 100 balls. The balls are colored, 25 are red, 25 blue, 25 green and 25 are black. We pick four balls without replacement and repeat this step until the urn is empty.

We so obtain 25 groups of four balls each and the question is: how many configurations of this type are possible? Thereby we may assume the order of the $4$ balls in each group is not relevant as well as the order of the $25$ groups is not relevant.

A reformulation:

Given an alphabet $V=\{1,2,3,4\}$ we consider $4$-letter words $x_1x_2x_3x_4$ with $1\leq x_1\leq x_2\leq x_3\leq x_4\leq 4$ when considered as numbers.

These $4$-letter words are building blocks of words of length $100$. We take $25$ blocks of this kind to form a $100$-letter word $w=b_1b_2\ldots b_{25}$ with the property that $b_j\leq b_{j+1}, 1\leq j\leq 25$ when considered as numbers.

Question: How many words of this type contain exactly $25$ characters of each of the characters $j\in V, 1\leq j\leq 4$.

In general we are given an alphabet $V=\{1,2,\ldots,q\}$ with size $|V|=q$.

(a) We consider words of length $N$ and building blocks $x_1x_2\ldots x_M$ of size $M$ with $1\leq x_1\leq x_2\leq \cdots \leq x_M\leq q$ and $M|N$, i.e. $N$ being an integer multiple of $N$.

(b) The words are of the form $w=b_1b_2\ldots b_{N/M}$ with $b_j\leq b_{j+1}, 1\leq j \leq N/M-1$.

(c) We are looking for the number $\color{blue}{A_q(N,M)}$, the number of words as specified in (a) and (b) which contain $N/q$ characters of each of the characters $j\in V, 1\leq j\leq q$ implying that $N$ is an integer multiple of $q$ as well.

In this setting the urn problem is asking for $\color{blue}{A_4(100,4)}$.

The number of building blocks of $A_q(N,M)$ can be easily determined. It is \begin{align*} \sum_{1\leq x_1\leq x_2\leq\cdots\leq x_M\leq q}1=\binom{M+q-1}{M}\tag{1} \end{align*}

A generating function of (1) can be easily derived. We have \begin{align*} \sum_{M=0}^\infty\sum_{q=0}^\infty x^My^q\binom{M+q-1}{M}&=\frac{1-x}{1-x-y}\\ &=1+y\left(1+x+x^2+x^3+x^4+\cdots\right)\\ &\qquad+y^2\left(1+2x+3x^2+4x^3+5x^4+\cdots\right)\\ &\qquad+y^3\left(1+3x+6x^2+10x^3+\color{blue}{15}x^4+\cdots\right)\\ &\qquad\cdots \end{align*} Denoting with $[x^M]$ the coefficient of $x^M$ in a series we see for instance $[x^4y^3]\frac{1-x}{1-x-y}=\binom{6}{2}=15$ which is the number of valid building blocks of size $4$ when given a three letter alphabet $V=\{1,2,3\}$. These $15$ building blocks are \begin{align*} 1111\quad1122\quad1222\quad1333\quad2233\\ 1112\quad1123\quad1223\quad2222\quad2333\\ 1113\quad1133\quad1233\quad2223\quad3333\\ \end{align*}

The difficult part (at least for me) is to determine the number of valid words $A_q(N,M)$ which can be generated from these building blocks. I've tried to derive a generating function which describes this scenario, but I wasn't successful up to now.

Posets: Another approach could be using posets based upon the approach:

Start with an empty word and append $N/M$ times a building block respecting the ordering given in (b).

Derive a generating function for the number of valid posets.

In order to better see the situation, here is a manageable example. We are looking for $A_2(12,M)$ the number of words of length $12$ from a two-letter alphabet with different block-sizes $M$ following (a) - (c) from above. The Hasse-diagrams for $M=2,3,4,6$ are:

We see graded posets of length $N/M$ with $A_2(12,2)=A_2(12,6)=4$ and $A_2(12,3)=A_2(12,4)=5$ indicating the symmetry \begin{align*} A_q(N,M)=A_q(N,N/M) \end{align*}

Here is a list of small values of $A_2(N,M)$: $$ \begin{array}{r|rrrrrr} M&1&2\\ A_2(2,M)&1&1\\ \hline M&1&2&4\\ A_2(4,M)&1&2&1\\ \hline M&1&2&3&6\\ A_2(6,M)&1&2&2&1\\ \hline M&1&2&4&8\\ A_2(8,M)&1&3&3&1\\ \hline M&1&2&5&10\\ A_2(10,M)&1&3&3&1\\ \hline M&1&2&3&4&6&12\\ A_2(12,M)&1&4&5&5&4&1\\ \end{array} $$

Summary: The question is how to find a formula for $A_q(N,M)$ or how to derive a generating function for these numbers. Alternatively is there an appropriate technique to count the number of posets corresponding to $A_q(N,M)$?

@MarkoRiedel: I've added a list $A_2(N,M)$ of small values of $N$ and $M$. — Markus Scheuer, Jan 20 '19 at 18:18
Yes I see, thank you very much, these values match what I have, and the posets too. — Marko Riedel, Jan 20 '19 at 18:19

Marko Riedel · Accepted Answer · 2019-01-25T22:53:34.973

It would appear that these are multisets of multisets which may be enumerated using the Polya Enumeration Theorem (PET), same as what was posted at the following MSE link.

The answer is a special case of what was presented there and is given by

$$\left[\prod_{k=1}^q A_k^{N/q}\right] Z\left(S_{N/M}; Z\left(S_M; \sum_{k=1}^q A_{k}\right)\right).$$

In terms of combinatorial classes we have made use of the unlabeled class

$$\def\textsc#1{\dosc#1\csod} \def\dosc#1#2\csod{{\rm #1{\small #2}}} \textsc{MSET}_{=N/M} \left(\textsc{MSET}_{=M} \left(\sum_{k=1}^q \mathcal{A}_{k}\right)\right).$$

The linked post documents how to compute the cycle index of the symmetric group and how to substitute $Z(S_M)$ into $Z(S_{N/M}).$

The question is, why can we use multisets here? The answer is that there is a one-to-one mapping between those multisets and what OP calls building blocks. Every block obviously corresponds to exactly one multiset and every multiset to one block, namely by ordering its constituents. The same thing happens with multisets of blocks.

This was implemented in Maple and here are a few sample results that a potential contributor may compare to their work and / or use to verify that we have the correct understanding of the problem.

> seq(A(2,12,M), M in divisors(12));
                              1, 4, 5, 5, 4, 1

> seq(A(3,12,M), M in divisors(12));
                            1, 15, 25, 23, 10, 1

> seq(A(4,12,M), M in divisors(12));
                            1, 47, 93, 70, 22, 1

> seq(A(4,20,M), M in divisors(20));
                          1, 306, 2505, 1746, 73, 1

> seq(A(5,20,M), M in divisors(20));
                        1, 2021, 19834, 11131, 191, 1

The Maple code for the above goes as follows. The reader is invited to present their results in a language of their choice for a Rosetta stone effect.

with(combinat);
with(numtheory);

pet_cycleind_symm :=
proc(n)
option remember;

    if n=0 then return 1; fi;

    expand(1/n*
           add(a[l]*pet_cycleind_symm(n-l), l=1..n));
end;

pet_varinto_cind :=
proc(poly, ind)
local subs1, subsl, polyvars, indvars, v, pot;

    polyvars := indets(poly);
    indvars := indets(ind);

    subsl := [];

    for v in indvars do
        pot := op(1, v);

        subs1 :=
        [seq(polyvars[k]=polyvars[k]^pot,
             k=1..nops(polyvars))];

        subsl := [op(subsl), v=subs(subs1, poly)];
    od;

    subs(subsl, ind);
end;

pet_cycleind_comp :=
proc(idxtrg, idx)
local varstrg, vars, vt, sbstrg, sbs, len;

    varstrg := indets(idxtrg);
    vars := indets(idx);

    sbstrg := [];

    for vt in varstrg do
        len := op(1, vt);

        sbs :=
        [seq(v = a[op(1, v)*len], v in vars)];

        sbstrg :=
        [op(sbstrg),
         a[len] = subs(sbs, idx)];
    od;

    expand(subs(sbstrg, idxtrg));
end;

A :=
proc(q, N, M)
option remember;
local cind_block, cind_word, cind_comp,
    vars, gf, vidx;

    if N mod q > 0 or N mod M > 0 then
        return FAIL;
    fi;

    cind_block := pet_cycleind_symm(M);
    cind_word := pet_cycleind_symm(N/M);

    cind_comp := pet_cycleind_comp(cind_word, cind_block);

    vars := add(A[p], p=1..q);
    gf := expand(pet_varinto_cind(vars, cind_comp));

    for vidx to q do
        gf := coeff(gf, A[vidx], N/q);
    od;

    gf;
end;

Addendum. With the above answer we compute the compound cycle index of the operator

$$\textsc{MSET}_{=N/M}(\textsc{MSET}_{=M}(\cdot))$$

and then substitute the variables into this cycle index. M. Scheuer in his answer suggests a different approach where we substitute the variables into the first cycle index, obtaining the multisets, and then substitute into $Z_{N/M}.$ Experimental data indicate that this approach is superior. The same effect can be achieved by not expanding the compound cycle index into its constitutents. This yields the following Maple code (duplicate code omitted).

pet_cycleind_comp :=
proc(idxtrg, idx)
local varstrg, vars, vt, sbstrg, sbs, len;

    varstrg := indets(idxtrg);
    vars := indets(idx);

    sbstrg := [];

    for vt in varstrg do
        len := op(1, vt);

        sbs :=
        [seq(v = a[op(1, v)*len], v in vars)];

        sbstrg :=
        [op(sbstrg),
         a[len] = subs(sbs, idx)];
    od;

    subs(sbstrg, idxtrg);
end;


A :=
proc(q, N, M)
option remember;
local cind_block, cind_word, cind_comp,
    vars, gf, vidx;

    if N mod q > 0 or N mod M > 0 then
        return FAIL;
    fi;

    cind_block := pet_cycleind_symm(M);
    cind_word := pet_cycleind_symm(N/M);

    cind_comp := pet_cycleind_comp(cind_word, cind_block);

    vars := add(A[p], p=1..q);
    gf := expand(pet_varinto_cind(vars, cind_comp));

    for vidx to q do
        gf := coeff(gf, A[vidx], N/q);
    od;

    gf;
end;

AX :=
proc(q, N, M)
option remember;
local cind_block, cind_word, cind_comp,
    vars, blocks, gf, vidx;

    if N mod q > 0 or N mod M > 0 then
        return FAIL;
    fi;

    cind_block := pet_cycleind_symm(M);
    vars := add(A[p], p=1..q);
    blocks := pet_varinto_cind(vars, cind_block);

    cind_word := pet_cycleind_symm(N/M);
    gf := expand(pet_varinto_cind(blocks, cind_word));

    for vidx to q do
        gf := coeff(gf, A[vidx], N/q);
    od;

    gf;
end;

Many thanks for providing this great answer so quickly. Of course, I'll have to go through this and your other answer more thoroughly in order to fully grasp it. One question: Would you mind to calculate $A_4(100,4)$ if possible and add it to your answer? — Markus Scheuer, Jan 20 '19 at 19:08
Many thanks, Marko. It is not that important to put additional efforts into it. I was just curious and I'm already at my computation limits (using R) at much smaller values (and of course with a rather suboptimal implementation). Nevertheless many thanks for your support! :-) — Markus Scheuer, Jan 20 '19 at 19:30

Markus Scheuer · Answer 2 · 2019-01-22T19:06:14.607

This answer is a supplement to the great answer of @MarkoRiedel and a kind of reflection to better see the mechanisms working. The original problem was to determine $A_4(100,4)$ which can be written according to Markos answer as \begin{align*} A_4(100,4)=[a^{25}b^{25}c^{25}d^{25}]Z\left(S_{25}(Z\left(S_4;a+b+c+d\right)\right) \end{align*} It is a hopeless task to tackle this manually. But we can see all important aspects by taking a smaller parameter $N=8$ instead of $N=100$. Leaving the other parameter $M=4,q=4$ unchanged we have $N/M=N/q=2$ and we calculate \begin{align*} \color{blue}{A_2(8,4)=[a^{2}b^{2}c^{2}d^{2}]Z\left(S_{2}(Z\left(S_4;a+b+c+d\right)\right)}\tag{1} \end{align*}

Calculation of $Z(S_4)$:

We start calculating $Z(S_4)$ by using the recurrence relation: \begin{align*} Z(S_n)=\frac{1}{n}\sum_{j=1}^n a_j Z(S_{n-j})\qquad\textrm{where}\qquad Z(S_0)=1 \end{align*} We obtain \begin{align*} Z(S_0)&=1\\ Z(S_1)&=\frac{1}{1}\sum_{j=1}^1a_jZ(S_{1-j})=a_1Z(S_0)\\ &=a_1\\ Z(S_2)&=\frac{1}{2}\sum_{j=1}^2 a_jZ(S_{2-j})=\frac{1}{2}\left(a_1Z(S_1)+a_2Z(S_0)\right)\\ &=\frac{1}{2}\left(a_1^2+a_2\right)\tag{2}\\ Z(S_3)&=\frac{1}{3}\sum_{j=1}^3 a_jZ(S_{3-j})=\frac{1}{3}\left(a_1\frac{1}{2}\left(a_1^2+a^2\right)+a_2a_1+a_3\right)\\ &=\frac{1}{6}\left(a_1^3+3a_1a_2+2a_3\right)\\ \color{blue}{Z(S_4)}&=\frac{1}{4}\sum_{j=1}^4a_jZ(S_{4-j})\\ &=\frac{1}{4}\left(a_1\frac{1}{6}\left(a_1^3+3a_1a_2+2a_3\right)+a_2\frac{1}{2}\left(a_1^2+a_2\right)+a_3a_1+a_4\right)\\ &\,\,\color{blue}{=\frac{1}{24}\left(a_1^4+6a_1^2a_2+8a_1a_3+3a_2^2+6a_4\right)}\tag{3} \end{align*}

An ordinary generating function of the cycle index $Z(S(n))$ is \begin{align*} \sum_{n=0}^\infty Z(S_n)) z^n=\exp\left(a_1z+\frac{a_2}{2}z^2+\frac{a_3}{3}z^3+\cdots\right) \end{align*} We can use this function to make a plausibility check of (3) via Wolfram Alpha by typing

SeriesCoefficient[Exp[a_1*z+a_2*z^2/2+a_3*z^3/3+a_4*z^4/4],{z,0,4}]

which gives the expected result.

Calculation of $Z(S_4;a+b+c+d)$

We substitute $a+b+c+d$ in (2)

\begin{align*} Z(S_4)=\frac{1}{24}\left(a_1^4+6a_1^2a_2+8a_1a_3+3a_2^2+6a_4\right) \end{align*}

by replacing $a_j$ with $a^j+b^j+c^j+d^j$ and obtain \begin{align*} \color{blue}{Z(S_4;}&\color{blue}{a+b+c+d)}\\ &=\frac{1}{24}\left((a+b+c+d)^4+6(a+b+c+d)^2(a^2+b^2+c^2+d^2)\right.\\ &\qquad\qquad \left.+8(a+b+c+d)(a^3+b^3+c^3+d^3)\right.\\ &\qquad\qquad \left.+3(a^2+b^2+c^2+d^2)^2+6(a^4+b^4+c^4+d^4)\right)\\ &=\cdots\\ &\,\,\color{blue}{=a^4+a^3b+a^3c+a^3d+a^2b^2+a^2bc+a^2bd+a^2c^2+a^2cd+a^2d^2}\\ &\qquad\color{blue}{+ab^3+ab^2c+ab^2d+abc^2+abcd+abd^2+ac^3+ac^2d+acd^2+ad^3}\\ &\qquad\color{blue}{+b^4+b^3c+b^3d+b^2c^2+b^2cd+b^2d^2+bc^3+bc^2d+bcd^2+bd^3}\\ &\qquad\color{blue}{+c^4+c^3d+c^2d^2+cd^3+d^4}\tag{4} \end{align*}

Note that (4) has $35$ summands each with coefficient $1$, so that $Z(S_4;a+b+c+d)|_{a=b=c=d=1}=35$. This corresponds to (1) in the original post which is, since $M=4$ and $q=4$ \begin{align*} \sum_{1\leq x_1\leq x_2\leq x_3 \leq x_4\leq 4}1=\binom{4+4-1}{4}=\binom{7}{3}=35 \end{align*}

Calculation of $[a^2b^2c^2d^2]Z(S_2;Z(S_4;a+b+c+d))$:

We substitute (4) in $Z(S_2)=\frac{1}{2}\left(a_1^2+a_2\right)$ according to (2) and we obtain \begin{align*} \color{blue}{[a^2}&\color{blue}{b^2c^2d^2]Z(S_2;Z(S_4;a+b+c+d))}\\ &=[a^2b^2c^2d^2]\frac{1}{2}\left(a^4+a^3b+a^3c+a^3d+a^2b^2+a^2bc+a^2bd\right.\\ &\qquad\qquad\qquad\quad\left.+a^2c^2+a^2cd+a^2d^2+ab^3+ab^2c+ab^2d\right.\\ &\qquad\qquad\qquad\quad\left.+abc^2+abcd+abd^2+ac^3+ac^2d+acd^2\right.\\ &\qquad\qquad\qquad\quad\left.+ad^3+b^4+b^3c+b^3d+b^2c^2+b^2cd +b^2d^2 \right.\\ &\qquad\qquad\qquad\quad\left.+bc^3+bc^2d+bcd^2+bd^3+c^4+c^3d+c^2d^2\right.\\ &\qquad\qquad\qquad\quad\left.+cd^3+d^4\right)^2\\ &\quad+[a^2b^2c^2d^2]\frac{1}{2}\left(a^8+a^6b^2+a^6c^2+a^6d^2+a^4b^4+a^4b^2c^2+a^4b^2d^2\right.\\ &\quad\qquad\qquad\qquad\quad\left.+a^4c^4+a^4c^2d^2+a^4d^4 +a^2b^6+a^2b^4c^2+a^2b^4d^2\right.\\ &\quad\qquad\qquad\qquad\quad\left.+a^2b^2c^4+(abcd)^2+a^2b^2d^4+a^2c^6 +a^2c^4d^2+a^2c^2d^4 \right.\\ &\quad\qquad\qquad\qquad\quad\left.+a^2d^6+b^8+b^6c^2+b^6d^2+b^4c^4+b^4c^2d^2+b^4d^4\right.\\ &\quad\qquad\qquad\qquad\quad\left.+b^2c^6+b^2c^4d^2+b^2c^2d^4+b^2d^6 +c^8+c^6d^2+c^4d^4\right.\\ &\quad\qquad\qquad\qquad\quad\left.+c^2d^6+d^8\right)\\ &=[a^2b^2c^2d^2]\frac{1}{2}\left(a^2b^2+a^2bc+a^2bd+a^2c^2+a^2cd+a^2d^2\right.\\ &\qquad\qquad\qquad\quad\left.+ab^2c+ab^2d+abc^2+abcd+abd^2+ac^2d+acd^2\right.\\ &\qquad\qquad\qquad\quad\left.+b^2c^2+b^2cd+b^2d^2+bc^2d+bcd^2+c^2d^2\right)^2\\ &\quad+[a^2b^2c^2d^2]\frac{1}{2}(abcd)^2\tag{5}\\ &=[a^2b^2c^2d^2]\frac{1}{2}\left(2\left(a^2b^2\right)\left(c^2d^2\right) +2\left(a^2bc\right)\left(bcd^2\right) +2\left(a^2bd\right)\left(bc^2d\right)\right.\\ &\qquad\qquad\qquad\quad\left.+2\left(a^2c^2\right)\left(b^2d^2\right) +2\left(a^2cd\right)\left(b^2cd\right) +2\left(a^2d^2\right)\left(b^2c^2\right)\right.\\ &\qquad\qquad\qquad\quad\left.+2\left(ab^2c\right)\left(acd^2\right) +2\left(ab^2d\right)\left(ac^2d\right) +2\left(abc^2\right)\left(abd^2\right)\right.\\ &\qquad\qquad\qquad\quad\left.+(abcd)^2\right)\\ &\quad+[a^2b^2c^2d^2]\frac{1}{2}(abcd)^2\tag{6}\\ &\,\,\color{blue}{=10}\tag{7} \end{align*}

Comment:

In (5) we keep only terms from the line above which have linear or square factors, since other terms do not contribute to $[a^2b^2c^2d^2]$.
In (6) we multiply out and indicate the factors by keeping them in parenthesis.

We finally got the result (7): \begin{align*} \color{blue}{A_4(8,4)}&=[a^2b^2c^2d^2]Z(S_2;Z(S_4;a+b+c+d))\\ &\,\,\color{blue}{=10} \end{align*}

We can verify the result by listing the $35$ valid configurations according to $Z(S_4;a+b+c+d)$ $$ \begin{array}{cccc} 1111&1222&2222&3333\\ 1112&\color{blue}{1223}&2223&3334\\ 1113&\color{blue}{1224}&2224&\color{blue}{3344}\\ 1114&\color{blue}{1233}&\color{blue}{2233}&3444\\ \color{blue}{1122}&\color{blue}{1234}&\color{blue}{2234}&4444\\ \color{blue}{1123}&\color{blue}{1244}&\color{blue}{2244}&\\ \color{blue}{1124}&1333&2333&\\ \color{blue}{1133}&\color{blue}{1334}&\color{blue}{2334}&\\ \color{blue}{1134}&\color{blue}{1344}&\color{blue}{2344}&\\ \color{blue}{1144}&1444&2444&\\ \end{array} $$

The valid strings which have no more than two occurrences of each of the characters are marked $\mathrm{\color{blue}{blue}}$. Corresponding with $[a^2b^2c^2d^2]Z(S_2(Z(S_4;a+b+c+d))$ we concatenate the valid $\mathrm{\color{blue}{blue}}$ strings and find the $10$ resulting strings \begin{align*} 1122.3344\qquad1144.2233\\ 1123.2344\qquad1223.1344\\ 1124.2334\qquad1224.1334\\ 1133.2244\qquad1233.1244\\ 1134.2234\qquad1234.1234\\ \end{align*}

I have added some commentary regarding the above. BTW in your problem statement, shouldn't the number of multisets be ${M+q-1\choose q-1}.$ Also, in your generating function example you seem to have the meaning of $x^3$ and $y^4$ reversed. — Marko Riedel, Jan 25 '19 at 22:41
@MarkoRiedel: Many thanks for the clarifying addendum in your answer and for pointing at the mistakes in my post. Corrected. — Markus Scheuer, Jan 26 '19 at 13:07

Drawing balls from an urn or counting certain posets

2 Answers2

Linked