Probability of finding a solution to variation of subset sum problem (with positive integers and fixed subset size)

Question

Original subset sum problem^[1]: given a multiset $S$ of integers find a subset $T$ whose values sum to $a$.

Variation of subset sum problem: given a multiset $S$ of positive integers find a subset $T$ of size $m$ whose values sum to $a$.

Related unanswered post: Subset sum - probability of solution existence

Full definition of the problem

Pick $a,b,n,m \in \mathbb{Z}^+$
Construct a random multiset $S$ of size $n$ by doing the following:
a. Start with $S=\emptyset$
b. Pick random element in the range $[1,b]$ and add it to $S$
c. Repeat the above step until you have $n$ elements
Pick a random subset $T$ of $S$ that has size $m$
Find $$\text{Pr} \left[\sum_{t \in T}t = a \right] = \frac{N}{D}$$

Attempt at a solution

The denominator $D$ is simply

$$D(b,n,m)=\left(\!\!{b \choose n}\!\!\right)\binom{n}{m}$$

where the double-braced binomial terms are called multiset coefficients.

The numerator is much harder to decipher. After a lot of counting this is the equation I found:

$$N(a,b,n,m)=\Lambda \left[ \left(\!\!{n-m+1 \choose m}\!\!\right) + \sum_{k=1}^{n-m} \left(\!\!{k \choose m}\!\!\right) \left[ \left(\!\!{b \choose n-m-k+1}\!\!\right) - \sum_{l=0}^{n-m-k} \left(\!\!{b-1 \choose l}\!\!\right) \right] \right]$$

The methodology I used was the following:

Fix $m$ (starting at $m=2$), giving us $T=\{x_1,x_2,...,x_m\}$
Fix $a$, giving us $\sum x_i = a$ (the number of options for the $x_i$ is $\Lambda$)
Fix $b,S$
Count, note any patterns
Go back to #2 and repeat until you get an equation for this particular fixed $m$
Go back to #1 and increase $m$ by 1

I got up to $m=4$ and the pattern seemed to emerge that the numerator was

$$N(a,b,n,m)=\Lambda(a,b,m) \Gamma(b,n,m)$$

$\Gamma$ wasn't too difficulty to figure out and I'm fairly sure the answer I got is correct. I haven't figured out $\Lambda$ fully but a good approximation for small $m$ seems to be

$$ \Lambda(a,b,m) = \begin{cases} p_m(a) &: a < b+m-1 \\ p_m(a) - \sum_{k=m-1}^{\alpha} &: b+m-1 \le a \le \beta \\ p_m(2\beta -a) - \sum_{k=m-1}^{\bar{}\alpha} &: \beta < a \le 2\beta -b-m+1 \\ p_m(2\beta -a) &: 2\beta-b-m+1 < a \le mb \end{cases} $$

where

$$ \begin{aligned} \alpha &= a-b-1 \\ \bar{\alpha} &= 2\beta -a-b-1 \\ \beta &= m\left( 1+\frac{b-1}{2}\right) \end{aligned} $$

$p$ is the partition function. The different cases are due to the fact that the numbers seem to have a symmetry around the center $\beta$.

Verifying the equation

The equation can be checked by adding the probabilities and checking if they equal 1:

$$1=\sum_{a=m}^{bm}\frac{N(a,b,n,m)}{D(b,n,m)}$$

Surprisingly the sum holds for most combinations in the ranges $b \in [5,40]$, $n \in [5,40]$, $m \in [2,\min(n,8)]$. Increasing $m$ further than $8$ makes the summation go haywire and tend toward negative values (which I suspect is due to the $\Lambda$ bit).

Where help is needed

Any ideas for $\Lambda$? Anyone know of any papers that have solved this problem or a similar one?

Mike Earnest · Accepted Answer · 2023-07-24T15:06:18.137

1

We can simplify the problem somewhat. For the problem as you described it, you start with $n$ random numbers in the range $[1,b]$, then you randomly sample $m$ of them without replacement, and compute their sum. This is exactly the same as just choosing $m$ random numbers in the range $[1,b]$ in the first place.

The question becomes, when you generate $m$ random numbers in the range $[1,b]$, what is the probability their sum is $a$? This is exactly the question of the probability of rolling $a$ when $m$ fair dice numbered $1$ to $b$ are rolled, which was answered in this previous MSE question:

https://math.stackexchange.com/a/972067/177399

The formula there only works for six-sided dice, but you get a formula for any $b$-sided dice by replacing $6$ with $b$. That is, $$ P(\sum_{t\in T}t=a)=b^{-m}\sum_{i=0}^{\lfloor (a-m)/b\rfloor} (-1)^i\binom mi \binom{a-bi-1}{m-1} $$

edited Jul 24 '23 at 15:06

answered Jul 21 '23 at 14:56

Mike Earnest

84,902

It's possible that $a-bi-1 < 0$ so should those negative terms be thrown away or should one use the gamma function for the negative factorials? – Stent Jul 24 '23 at 08:52
Those terms should be thrown away. I edited to clarify. – Mike Earnest Jul 24 '23 at 15:07
Thanks! I checked by summing the probabilities from $a=m$ to $a=bm$ and the answer is 1 for the few values of $m$ & $b$ I used. – Stent Jul 24 '23 at 20:22

Probability of finding a solution to variation of subset sum problem (with positive integers and fixed subset size)

Full definition of the problem

Attempt at a solution

Verifying the equation

Where help is needed

1 Answers1

Linked