5

This is a curiosity question.

Recently I stumbled across the following problem :

Given three integers $k,m, n$ such that $m+k\leq n$. A friend chooses a subset $S\subseteq\lbrace1,\ldots,N\rbrace$ with $k$ elements, and you have to guess what it is. You can ask him specific questions of the form: for each question you choose a subset $G \subseteq\lbrace1,\ldots,N\rbrace$ with $m$ elements and ask him does it have elements in common with $G$?, and you get an answer "Yes" or "No". How many questions do you need to find the subset?


Attempt

I was working on this question for some time without any breakthrough, let $f(n,m,k)$ be the minimal number of questions needed. I was particularly interested in $f(8,4,4)$. I managed to find a formula for very specific cases for example :

  1. Obviously $f(n,m,0)=0$
  2. $f(n,n-1,1)=n-1$ and $f(n,1,k)=n-1$
  3. $f(n,n-2,1)=f(n,2,1)=\lfloor \frac n 2 \rfloor+1$
  4. Some complicated formulas for $k=2$ but I am not sure if they are correct nothing for $k\geq 3$.
  5. It seems that $f(n,m,k)=f(n,n-m,k)$ but I could not prove it.

I added the condition $m+k\leq n$ because sometimes, it's not possible to find the subset (I think it's sufficient to ensure the existence of a solution, but I am not sure if it's necessary ).

Question : Is there an algorithm to solve the problem ? to compute $f(n,m,k)$ ? or just any formulas for $k=3,4$?

Elaqqad
  • 13,983
  • 1
    This related question might interest you: Guessing a subset of ${1,...,N}$. – Peter Kagey Apr 11 '19 at 00:53
  • That s another very nice question, the problems are very similar but not quite the same. I have already walked through those papers. I found a very good paper a the time (the first comment). – Elaqqad Apr 11 '19 at 06:56
  • 2
    There's another easy special case: if $m+k=n$ there's only one query which gets the answer "No", so $f(n, m, n-m) = \binom n m - 1$ – Peter Taylor Apr 11 '19 at 13:21
  • Re: point #$5$: It is certainly true that $f(n,m,1) = f(n,n-m,1)$, because the subset $G$ and its complement $G^c$ are guaranteed to get opposite answers when there is only $k=1$ chosen number. However, for $k>1$, I would be surprised if $f(n,m,k) = f(n,n-m,k)$... unless I'm missing something? – antkam Apr 11 '19 at 19:16
  • 1
    [cont'd] Oh, in fact, $f(n,n-2,2) = {n \choose 2} -1$ as @PeterTaylor pointed out, but $f(n,2,2) \le 2+ \lceil n/2 \rceil$, as follows: partition $[n]$ into pairs, ask them one by one. At most two of the pairs answer Yes and you need just two more tests to find out which one of each pair is chosen. – antkam Apr 11 '19 at 19:20
  • Is $m$ fixed, or can you change $m$ based on previous responses? – Obinna Okechukwu Apr 11 '19 at 21:36
  • @antkam, don't you need 3 more tests in the worst case? Oh, no, I see: you use a known false to test one of each pair. Special case when $n\le 4$ – Peter Taylor Apr 11 '19 at 21:45
  • @PeterTaylor - haha, sorry, i had "large $n$" as an implicit assumption in mind. :) as you already pointed out, $f(4,2,2) = {4 \choose 2} - 1 = 5.$ – antkam Apr 11 '19 at 22:17
  • @obinnaOkechukwu : $m,n,k$ are fixed in advance, and you can't change them from question to another. – Elaqqad Apr 12 '19 at 08:22
  • @antkam , you have a valid point on #5, I was mislead by the case $k=1$. For $k=2,m=n-2$ I don't see why do we need to test all the subsets of $n-2$ elements, as we can adapt our questions from the previous answers ? – Elaqqad Apr 12 '19 at 08:56
  • For $f(n, n-k, k)$, as @PeterTaylor pointed out, every question gets the answer YES unless your set is exactly the complement of the chosen set. so until you get that NO answer, none of the previous answers help you. indeed this also proves (as you suspected) $k+m \le n$ is necessary: if $k+m > n$ then every answer will be YES since your $m$-subset and your friend's $k$-subset must overlap. – antkam Apr 12 '19 at 12:45
  • @antkam t's the same argument, got it thanks. – Elaqqad Apr 14 '19 at 11:22

1 Answers1

4

We can get asymptotic estimates. For any $m$ and $k$, if $n$ is sufficiently large (say $n > km$), we have $$ \left\lfloor\frac{n-k}m\right\rfloor \le f(n,m,k) \le \left\lfloor \frac nm \right\rfloor + f(km,m,k) $$ so in particular $f(n,m,k) = \frac nm + O(1)$ as $n\to \infty$.

The lower bound is just due to the fact that even if you guess disjoint sets each time, the first $\lfloor\frac{n-k}m\rfloor-1$ answers could be "no" and yet not narrow things down to a single option.

For the upper bound, begin by guessing disjoint intervals of length $m$ until either $k$ answers are "yes", or else $k-1$ answers are "yes" and there's at most $m$ elements remaining. (This takes at most $\left\lfloor \frac nm \right\rfloor$ steps.) Then the union of the $k$ intervals with answers of "yes" is a set of size $km$ that contains all elements of $S$, so we can use the $f(km,m,k)$ strategy on it, which takes a number of guesses independent of $n$. (We might be able to do better here, since we can take advantage of lots of elements we know aren't in $S$, but this is just an upper bound.)

Misha Lavrov
  • 159,700
  • haha, love how $f(km,m,k) = O(1)$ :D one of those instances where the correct perspective really simplifies things! – antkam Apr 12 '19 at 12:41