Original subset sum problem[1]: given a multiset $S$ of integers find a subset $T$ whose values sum to $a$.
Variation of subset sum problem: given a multiset $S$ of positive integers find a subset $T$ of size $m$ whose values sum to $a$.
Related unanswered post: Subset sum - probability of solution existence
Full definition of the problem
- Pick $a,b,n,m \in \mathbb{Z}^+$
- Construct a random multiset $S$ of size $n$ by doing the following:
a. Start with $S=\emptyset$
b. Pick random element in the range $[1,b]$ and add it to $S$
c. Repeat the above step until you have $n$ elements - Pick a random subset $T$ of $S$ that has size $m$
- Find $$\text{Pr} \left[\sum_{t \in T}t = a \right] = \frac{N}{D}$$
Attempt at a solution
The denominator $D$ is simply
$$D(b,n,m)=\left(\!\!{b \choose n}\!\!\right)\binom{n}{m}$$
where the double-braced binomial terms are called multiset coefficients.
The numerator is much harder to decipher. After a lot of counting this is the equation I found:
$$N(a,b,n,m)=\Lambda \left[ \left(\!\!{n-m+1 \choose m}\!\!\right) + \sum_{k=1}^{n-m} \left(\!\!{k \choose m}\!\!\right) \left[ \left(\!\!{b \choose n-m-k+1}\!\!\right) - \sum_{l=0}^{n-m-k} \left(\!\!{b-1 \choose l}\!\!\right) \right] \right]$$
The methodology I used was the following:
- Fix $m$ (starting at $m=2$), giving us $T=\{x_1,x_2,...,x_m\}$
- Fix $a$, giving us $\sum x_i = a$ (the number of options for the $x_i$ is $\Lambda$)
- Fix $b,S$
- Count, note any patterns
- Go back to #2 and repeat until you get an equation for this particular fixed $m$
- Go back to #1 and increase $m$ by 1
I got up to $m=4$ and the pattern seemed to emerge that the numerator was
$$N(a,b,n,m)=\Lambda(a,b,m) \Gamma(b,n,m)$$
$\Gamma$ wasn't too difficulty to figure out and I'm fairly sure the answer I got is correct. I haven't figured out $\Lambda$ fully but a good approximation for small $m$ seems to be
$$ \Lambda(a,b,m) = \begin{cases} p_m(a) &: a < b+m-1 \\ p_m(a) - \sum_{k=m-1}^{\alpha} &: b+m-1 \le a \le \beta \\ p_m(2\beta -a) - \sum_{k=m-1}^{\bar{}\alpha} &: \beta < a \le 2\beta -b-m+1 \\ p_m(2\beta -a) &: 2\beta-b-m+1 < a \le mb \end{cases} $$
where
$$ \begin{aligned} \alpha &= a-b-1 \\ \bar{\alpha} &= 2\beta -a-b-1 \\ \beta &= m\left( 1+\frac{b-1}{2}\right) \end{aligned} $$
$p$ is the partition function. The different cases are due to the fact that the numbers seem to have a symmetry around the center $\beta$.
Verifying the equation
The equation can be checked by adding the probabilities and checking if they equal 1:
$$1=\sum_{a=m}^{bm}\frac{N(a,b,n,m)}{D(b,n,m)}$$
Surprisingly the sum holds for most combinations in the ranges $b \in [5,40]$, $n \in [5,40]$, $m \in [2,\min(n,8)]$. Increasing $m$ further than $8$ makes the summation go haywire and tend toward negative values (which I suspect is due to the $\Lambda$ bit).
Where help is needed
Any ideas for $\Lambda$? Anyone know of any papers that have solved this problem or a similar one?