5

Suppose I have $n$ balls. Among them, there are $m \leq n$ black balls and the other $n - m$ balls are white. Fix a random permutation $\pi$ over these balls and denote by $Y_i$ the number of black balls in the first $i$ positions of our permutation $\pi$. I would like to show that $Y_i$ is sharply concentrated around its mean.

The expected value

It is not hard to compute the expected value of $Y_i$: the probability that a black ball appears somewhere along the first $i$ positions is $i/n$; moreover, there are $m$ black balls, hence $$\mathbb{E}[Y_i] = \frac{m\cdot i}{n}.$$

But can we show that $Y_i$ is actually very close to its mean?

A failed attempt

For every black ball $b$, one can define an indicator variable $X_b$ which equals 1 if $b$ appears in the first $i$ positions of $\pi$ and 0 otherwise. Denoting the set of all black balls by $B$, we have

$$ Y_i = \sum_{b \in B} X_b.$$

I was hoping to be able to use a Chernoff bound type of a concentration bound, but unfortunately for two black balls $b$ and $b'$, $X_b$ and $X_{b'}$ are not completely independent. If $X_b = 1$, then there are only $i-1$ spots left for $b'$ to appear in the first $i$ locations and hence the probability of $X_{b'}$ being 1 gets smaller.

2 Answers2

4

Chernoff's bound applies to negatively correlated random variables, such as your hypergeometric distribution. You can find a full treatment in Dubhashi and Panconesi's very useful monograph Concentration of measure for the analysis of algorithms, as well as in many lecture notes, such as this one by Hariharan Ramesh.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
2

Following Yuval's answer, I found the following inequality in Dubhashi and Paconesi's book:

$$ \Pr \Big[\big|Y_i - \mathbb{E}[Y_i]\big| \geq t \Big] \leq \exp\Big(\frac{-(n-1)t^2}{2(n-i)(i-1)} \Big)$$