1

I'm given a bit string of length $n$ and want to approximate the ration of zeros to ones of this string. Given that one chooses $k$ of the $n$ bits, what is the probability that the ratio of zeros to ones of this $k-$bit sample lies in the interval $[\text{actual ratio}-\epsilon, \text{actual ratio}+\epsilon]$? How does this scale when repeating the process several times?

Sebastian
  • 435
  • What is the source of this problem. I note that this question appeared earlier. – lulu May 01 '21 at 13:19
  • @Lulu I have the question from a probability and algorithms book, but it doesn't look like the one you referred to (although the ideas are very similar, I have to agree). Do you have any advice for me how I could start? – Sebastian May 01 '21 at 13:26

1 Answers1

1

Fix $n$ and $k$. Suppose there are $m$ $0$s in the $n$-bit string (hence $n - m$ $1$s), so that the actual ratio of $0$s to $1$s is $\frac{m}{n - m}$.

Let's calculate $P_j$, the probability of there being exactly $j$ $0$s in your $k$-bit selection (and hence exactly $k - j$ $1$s in your selection). Note that such a selection may not actually be possible. One checks that the probability of selecting exactly $j$ $0$s and then exactly $k - j$ $1$s is: $$ \frac{m}{n} \cdot \frac{m - 1}{n - 1} \cdots \frac{m - j + 1}{n - j + 1} \cdot \frac{n - m}{n - j} \cdot \frac{n - m - 1}{n - j - 1} \cdots \frac{n - m - (k - j) + 1}{n - k + 1} $$ Which we can rewrite as $$ \frac{m!}{(m - j)!} \cdot \frac{(n - j)!}{n!} \cdot \frac{(n - m)!}{(n - m - (k -j))!} \cdot \frac{(n - k)!}{(n - j)!} = \frac{m!(n - m)! (n - k)!}{n! (m - j)!(n - m - (k - j))!} $$ But any selection with exactly $j$ $0$s will have the same probability. There are $\frac{k!}{j! (k - j)!}$ such selections,which gives $$ P_j = \frac{k!m!(n - m)! (n - k)!}{j! (k - j)! n! (m - j)!(n - m - (k - j))!} $$ We can clean this up a bit to get $$ P_j = \frac{{m\choose j} {n -m \choose k - j}}{{n \choose k}} $$ which you might have guessed from the start (the number of ways to select $j$ $1$s from the $m$ $1$s multiplied by the number of ways to select $k - j$ $0$s from the $n - m$ $0$s divided by the number of ways to select $k$ bits).


For completeness, we can rewrite $P_j$ in terms of the ratio of $0$s to $1$s in the original string $\rho = \frac{m}{n - m}$ and the ratio of $0$s to $1$s in the sample string $r = \frac{j}{k - j}$. We get $$ P_r = \frac{{(n\rho)/(\rho + 1) \choose (kr)/(r + 1)} {n/(\rho + 1) \choose k/(r + 1)}}{n \choose k} $$ This is, if anything, less illuminating. However, it points clearly to the fact that we'll need to make some assumptions about $n, k, \rho$, and $r$ before we can say anything intelligent about the probability of the sample ratio being within some error of the true ratio.

A common assumption is to assume that $n$ is large enough compared to $k$ that we can approximate $P_j$ as a binomial distribution or even a normal distribution with a certain mean and standard deviation, at which point it becomes quite easy to calculate confidence intervals.

  • wow, thanks for the nice answer, it doesn't however really hit the point of my question. My further task is to give an algorithm that determines whether there are more $0's$ than $1's$ in the bit string in $O(1)$ time, but considering your answer I doubt that this is even possible... – Sebastian May 01 '21 at 14:34
  • Are you grabbing the bits "out of a bag" or do you know their positions in the original string? In the latter case, it is very easy to do $O(n)$. – Charles Hudgins May 01 '21 at 14:39
  • I know the original string. However, I don't see how this is trivial... To do it in $O(n)$ is easy, but the time constraint is $O(1)$, but I only have to give the right answer with a $0.99$% chance what should make it possible – Sebastian May 01 '21 at 14:40
  • My background is not computer science, but I can't see a way to know for sure whether there are more $0$s than $1$s in better than $O(n)$ time. Are you being asked to check whether there are more $0$s than $1$s to some confidence level? – Charles Hudgins May 01 '21 at 14:43
  • I updated my comment, the confidence level is $0.99$ – Sebastian May 01 '21 at 14:44
  • I would wait for a more sophisticated answer to come along. Everything I did was as elementary as possible. I'll edit my answer to write the probability in terms of the ratio of $0$s to $1$s, but then my mathematical knowledge can't take the problem any further. All I'll say is that the algorithm is obvious (sample enough times to beat the confidence interval). The trick is proving that it works. – Charles Hudgins May 01 '21 at 14:48