1

Suppose $\mathcal{S} = \{k\in\mathbb{N} \ | \ 1 \leq k \leq N\}$ and we have $K$ sets $S_i \subset \mathcal{S}$ with $|S_i|=n_i$. It is guaranteed that $\bigcap\limits_{i=1}^K S_i \neq \emptyset$. A measure of overlap $r$ is defined as $$r := \sum_{1 \leq i < j \leq K} |S_i \cap S_j|$$

Given N, K, and all $n_i$, there is obviously no formula to compute $r$. However, is there a formula for the minimum for the given parameters? And if all $S_i$ are randomly drawn from $\mathcal{S}$ (with the same distribution), can we say anything about $E[r]$?

Asaf Karagila
  • 405,794
Alex P
  • 113
  • 1
    Presumably you mean $\bigcap S_i,$ since the union is always non-empty unless all $n_i= 0.$ – Thomas Andrews Jan 31 '25 at 17:39
  • 1
    The expected value is probably pretty easy, since it is an additive function, so you just need to calcul-te $E(|A_i\cap A_j|)$ in terms of $n_i,n_j.$ – Thomas Andrews Jan 31 '25 at 17:43
  • @ThomasAndrews indeed, I edited the question. Also I just realized that, as you say, the expected value is easy. But for the minimum I still can't figure it out – Alex P Jan 31 '25 at 17:46
  • I wouldn't be surprised if finding the minimum was NP-complete. The question seems related to the Knapsack Problem. https://en.wikipedia.org/wiki/Knapsack_problem?wprov=sfti1 – Thomas Andrews Jan 31 '25 at 19:10

2 Answers2

1

Partial answer for the random case.

For the random case, we can use that expectation is additive to compute the expected value for each $i,j,$ then add.

This answer says that the expected value would be $\frac{n_in_j}N.$

This means your expected number is $$\frac1N\sum_{1\leq i<j\leq K}n_in_j=\frac1{2N}\left(\left(\sum_{i=1}^K n_i\right)^2-\sum_{i=1}^K n_i^2\right).$$

The second form takes $O(K)$ calculates while the sum over $i,j$ takes $O(K^2),$ so the right side is faster to calculate.

Thomas Andrews
  • 186,215
  • In the sum, shouldn't the multinomial use $n$ instead of $n-1$ and $k$ instead of $k-1$? [Update: nope, that's how you got the $k$ out of $kp_k$] – Alex P Jan 31 '25 at 18:32
  • Also, assuming uniform distribution, can't we use the expected size of the intersection of two sets as in https://math.stackexchange.com/questions/4577724/expected-size-of-intersection-of-sets – Alex P Jan 31 '25 at 18:33
  • 1
    @AlexP I used: $$k\binom{n}{k,\dots} =n\binom{n-1}{k-1,\dots}$$ which is a basic multinomial rule. That's how I get rid of the $k$ and end up with and $n$ in the numerator. – Thomas Andrews Jan 31 '25 at 18:36
  • @AlexP That's true if we can pick uniformly from two sets of no specific size, but here, they need to be the sizes $n_i$ and $n_j.$1st Your expected size of the intersection on two sets of size $1$ is not the same as the expected size of the intersection of two sets of size $n-1.$ – Thomas Andrews Jan 31 '25 at 18:39
  • @ThomasAndrews The answer Alex linked implies the simple formula $e_{ij}=\frac{n_in_j}{N}$. – Mike Earnest Jan 31 '25 at 19:02
  • Ah, he actually linked to the question, so I read the question. @MikeEarnest – Thomas Andrews Jan 31 '25 at 19:17
1

Minimum value of $r$

For each $s\in \mathcal S$, let $w(s)$ denote the number of subsets in the list $S_1,\dots,S_K$ which contain $s$. Note that $r$ can be computed as the number of ordered triples $(i,j,s)$ such that $1\le i<j\le K,1\le s\le N$, and such that $s\in S_i\cap S_j$. Counting $r$ in a different way, by summing over $s\in \{1,\dots,N\}$ the number of pairs $\{i,j\}$ such that $s\in S_i\cap S_j$, we get $$ r=\sum_{s=1}^N \binom{w(s)}2. $$ We want to minimize this sum, over all choices of subset lists $S_1,\dots,S_k$ for which $|S_i|=n_i$ ($1\le i\le K$) and $\bigcap_{i=1}^K S_i \neq \varnothing$. The intersection condition means there exists $s\in \{1,\dots,N\}$ for which $w(s)=K$. Without loss of generality, assume that $w(N)=K$, so $$ r=\binom{K}2+\sum_{s=1}^{N-1}\binom{w(s)}{2}.\tag1 $$

The sum numbers of $w(1),\dots,w(N)$ must be the same as $\sum_{i=1}^K n_i$, by a double-counting argument. This means $$ w(1)+\dots+w(N-1)=\sum_{i=1}^K(n_i-1).\tag2 $$ Using the fact that the function $x\mapsto \binom{x}2$ is convex, you can show that $(1)$, when subjected to constraint $(2)$, is minimized when the numbers $w(1),\dots,w(N-1)$ are as close to equal to each other as possible. Let $T=\sum_{i=1}^{K}(n_i-1)$. For the optimal distribution which minimizes $(1)$, each the value of each $w(s)$ will be either $\lfloor T/(N-1)\rfloor$ or $\lfloor T/(N-1)\rfloor+1$. The number of $s$ such that $w(s)=\lfloor T/(N-1)\rfloor+1$ will be $T\,\%\,(N-1)$, the remainder upon integer division of $T$ by $(N-1)$. This all proves that the minimum value of $r$ is $$ \boxed{\binom K2 +(N-1)\cdot \binom{\lfloor T/(N-1)\rfloor}{2} +(T\,\%\,(N-1))\cdot \lfloor T/(N-1)\rfloor.} $$ To be complete, I should say exactly how we choose the subsets $S_1,\dots,S_K$ such that the even distribution of the numbers $w(1),\dots,w(N-1)$ is obtained. Place the numbers $1,\dots,N-1$ in a circle. Starting with $1$, walk around the circle, placing the first $n_1-1$ numbers you visit in $S_1$. After placing the last number in $S_1,$ continue walking, placing the next $n_2-1$ numbers you visit in $S_2$. Continue in this fashion until all of $S_1,\dots,S_K$ are chosen. Finally, remember to add the number $N$ to all of the sets $S_1,\dots,S_K$.

Expected value of $r$

Using this answer, plus linearity of expectation, we get that when the sets $S_1,\dots,S_K$ are chosen independently such that $S_i$ is equally likely to be any subset with size $n_i$, then $$ \mathbb E[r]=\frac1n\sum_{1\le i<j\le K}n_in_j. $$

Mike Earnest
  • 84,902