10

The Wikipedia page for the Binomial Distribution states the following lower bound, which I suppose can also be generalized as a general Chernoff lower bound.

$$\Pr(X \le k) \geq \frac{1}{(n+1)^2} \exp\left(-nD\left(\frac{k}{n}\left|\right|p\right)\right) \quad\quad\mbox{if }p<\frac{k}{n}<1$$

Clearly this is tight up to the $(n+1)^{-2}$ factor.

However computationally it seems that $(n+1)^{-1}$ would be tight as well. Even (n+1)^{-.7} seems to be fine.

It's not as easy to find lower bounds for tails as it is upper, but for the Normal Distribution there seems to be a standard bound:

$$\int_x^\infty e^{-t^2/2}dt \ge (1/x-1/x^3)e^{-x^2/2}$$

My question is thus, is the $\frac{1}{(n+1)^2}$ factor the best known? Or can $\frac{1}{n+1}$ be shown to be sufficient?

Update: Here is the region in which the conjecture holds numerically, by Mathematica: Where the conjecture holds

Thomas Ahle
  • 5,629
  • 1
    Asymptotically, $P(X\ge an)\sim \frac 1{1-r}\frac 1{\sqrt{2\pi a(1-a)n}}e^{-nD(a||p)}$ for $r$ ration of odds of $Bern(p)$ and $Bern(a)$. – A.S. Dec 08 '15 at 19:50
  • @A.S. Can you expand your reasoning a bit? What is the "ration of odds"? – Thomas Ahle Dec 08 '15 at 20:20
  • 1
    I mistyped. Ratio of odds, where odds of $Bern(p)$ is classical odds $\frac p{1-p}$. See the survey I linked in the formula for a beautiful Cramer's proof that relies on local CLT. – A.S. Dec 08 '15 at 20:25
  • Thank you for the reference, looks like we may state it as $P(X\ge an)= \frac {a(1-p)}{a-p}\frac 1{\sqrt{2\pi a(1-a)n}}e^{-nD(a||p)}(1+O(1/n))$. Cramer's proof is indeed very nice. Feel free to add it as a real answer to the question. – Thomas Ahle Dec 14 '15 at 11:26
  • That looks about right. It's also worth noting that $D(p+\epsilon||p)\sim\frac {\epsilon^2}{2p(1-p)}$, so that $n=\omega(\epsilon^{-1/2})$ for the bound to be meaningful. It also matches standard gaussian tail asymptotics as expected. – A.S. Dec 14 '15 at 17:02

2 Answers2

12

Update: I wrote a note surveying different proofs of varying precisions. This gets all the way down to $1+o(1)$ sharpness.

I can at least show that $(n+1)^2$ can be improved to $\sqrt{2n}$:

$$\begin{align} \sum_{i=0}^k {n \choose i} p^i (1-p)^{n-i} &\ge {n \choose k} p^k (1-p)^{n-k}\\ &= {n \choose k} \exp\left(-n(k/n \log1/p+(1-k/n)\log1/(1-p)\right)\\ &\ge \frac{\exp(n\text{H}(k/n))}{\sqrt{8k(1-k/n)}}\, \exp(-n(\text{D}(k/n||p) + H(k/n)))\\ &= \frac{1}{\sqrt{8k(1-k/n)}}\exp(-n\text{D}(k/n||p))\\ &\ge \frac{1}{\sqrt{2n}}\exp(-n\text{D}(k/n||p)) \end{align}$$

Here I've used the lower bound for the binomial ${n\choose an}\ge\frac1{\sqrt{8na(1-a)}}\exp(n\text{H}(a))\\$ from http://www.lkozma.net/inequalities_cheat_sheet/ineq.pdf . I'd be happy if anyone can provide a better reference.

We see that it is sharp in the sense that ${2\choose1} = \frac1{\sqrt{2\cdot2}}\exp(2\text{H}{(1/2)})$. Also by A.S.'s comments we see that the bound is asymptotically sharp, up to a constant dependent on $p$ and $k$.

Update: R.B. Ash. Information Theory is a reference to the binomial approximation, and in fact they also derive the exact same bound for the distribution.

Thomas Ahle
  • 5,629
  • Hi Thomas, can you show the details about how you derive the inequality $n \choose an$ $\ge$ $\frac{1}{8na(1-a)}e^{nH(a)}$ ? Thanks! – Rafer Aug 22 '16 at 06:02
  • @Rafer It's just Stirling's approximation on the three factors, and then some crude bounds to get the 8. – Thomas Ahle Sep 12 '16 at 09:05
  • @ThomasAhle I know this post is not active anymore; but do we really need the condition $p< k/n$ here? –  Mar 23 '21 at 04:45
  • Also, R.B Ash has used the notation $\lambda$ (in this case it's defined as $k = \lambda n$). They need $\lambda > p$ (or $p<k/n$) to make it work for the Chernoff bound. It seems that your proof doesn't require it. –  Mar 23 '21 at 05:12
0

I came across this question, because in my recent work (Statistical Games arXiv:2402.15892) I also needed a finer than usual bound for Binomial tail distribution. (Appendix D, Lemma D.2, eq. (387) (where is actually a typo, but I will give the correct formula here).)

While searching for known bounds I found the paper of Peter J. Brockwell from 1964: An asymptotic expansion for the tail of a binomial distribution and its application in queueing theory

From this paper I was able to obtain the following expression, connecting the probability mass function and the tail distribution:

$$ \frac{\sum_{k=n}^{r n} p_k(r n,p)}{p_n(r n,p)} = \frac{p^{-1}-1}{p^{-1}-r} + \mathcal{O}(1/n) $$

($p_k(N,p)$ being the mass function of the Binomial distribution.) The paper by Brockwell gives much more detailed results, including an approximation for any finite orders in $n$, and explicit error terms resulting exact upper and lower bounds.

These bounds, coupled with exact bounds for the Stirling formula (for example from Impens' Stirling's Series Made Easy can result exact upper and lower bounds for the tail distribution to any order.