3

I have been trying to establish a meaningful upper bound on the sum of squares of probabilities from the binomial distribution, $\displaystyle\sum_{k=0}^n {n\choose k}^2p^{2k}(1-p)^{2(n-k)}$. This is bounded by 1 can be seen easily. From this post, Sum of squares of Binom(n,p) values, I found that this can be written in terms of hypergeometric function as,

$$\sum_{k=0}^n {n\choose k}^2p^{2k}(1-p)^{2(n-k)}=(1-p)^n {}_2F_1\left(\left.{-n,-n\atop 1}\right|\frac{p^2}{(1-p)^2}\right).$$

I did not know about hypergeometric functions, and this form is probably not useful for me. I am trying to bound it by some order of $n$, say $O(1/\sqrt{n})$, as studied in the post mentioned. Thanks for your help.

Math Attack
  • 5,343
slowowl
  • 133

2 Answers2

3

The asymptotic result I got was $S_n(p,q)\approx ( 4 n \pi p q)^{-1/2}$.

Derivation

Let $f(x)= (q+p e^{ix})$. It has the property that $(f(x))^n= \sum_k c_k e^{i kx}$ where $c_k=C(n,k) q^{n-k} p^k$.

By using the orthogonality relations for these fourier components $e^{i kx}$ one deduces the Parseval formula

$\frac{1}{2\pi} \int_{-\pi}^{\pi} f^n(x) \overline {f^n(x)} dx= \sum_k c_k^2=S_n$. This is the sum of squared binomials you wish to evaluate.

But some trig identities allow us to simplify $f(x)\overline {f(x)}= (1 - 4pq \sin^2(x/2))$.

Thus we seek an asymptotic estimate for the behavior of the exact expression

$$S_n= \frac{1}{2\pi} \int_{-\pi}^{\pi} ( 1- 4p q \sin^2(x/2))^N dx$$

The nonnegative expression $( 1- 4p q \sin^2(x/2))$ has a unique maximum located at $ x=0$ where it peaks at height $1$. Its minima are at the endpoints of integration, where it takes the value $1-4pq<1$. Note it is roughly bell-shaped. Raising it to high powers preserves these features. The maximum height remains 1, the minimum height drops lower as we increase the powers.

The standard trick is to make a change of variables near the origin so that $ 1- 4p q \sin^2(x/2) =e^{-u^2} $. (Then one can treat this as essentially a Gaussian integral.)

The change of variables is $- u^2 = \ln ( 1- 4p q \sin^2(x/2)) \approx -4p q \sin^2(x/2)$ so $u\approx 2\sqrt{pq} \sin (x/2) \approx \sqrt{pq} x$. That allows us to write $ du \approx \sqrt{pq} dx$

Thus $(1- 4 pq \sin^2(x/2))^n dx =e^{-n u^2} dx \approx \frac{1}{\sqrt{pq}} e^{-n u^2} du$.

Integrating in $u$ we get the result

$$ S_n \approx \frac{1}{2\pi} \frac{1}{\sqrt{pq}} \int e^{- nu^2} du $$ The interval of integration is approximately $|u|< \sqrt{pq}$. The change if variable $ u= v/\sqrt{n}$ converts the last integral to $\int e^{- n u^2} du =\frac{1}{\sqrt{n}}\int _{|v|\leq \sqrt{npq}|} e^{-v^2} dv =\frac{1}{\sqrt{n}} I_n$

where $I_n\to \int_{-\infty}^{\infty} e^{- v^2} dv = {\sqrt{\pi}}$.

Concluding, $S_n\approx \frac{1}{2 \pi} \frac{\sqrt{pi}}{\sqrt{ npq}} =\frac{1}{\sqrt{4 \pi n pq}}$

P.S. Should you desire more accuracy you can expand the change of variables to higher order as $u= \sqrt{p q } x+\frac{x^3 (6 p q-1) \sqrt{p q }}{24 }+O\left(x^5\right)$.

P.P.S. When $n=16$ and $(p,q)=(.8,.2)$ the approximation was accurate t0 3 decimal places: .1776 exact vs. .1773 approx

MathFont
  • 6,000
  • Really, a nice approach that allows to get a full asymptotic. Using your method, I got for the second term $S_n=\frac1{\sqrt{4\pi npq}}\big(1+\frac{1-6pq}{8npq}+O(1/n^2)\big)$ – Svyatoslav Apr 25 '23 at 02:44
  • 1
    Thanks! Glad that we got matching results too. Its easy to lose track of a constant here and there. – MathFont Apr 25 '23 at 14:14
2

For sure we can get the answer, using the asymptotics of the specific hypergeometric function. But below we will use a heuristic approach to get the desired asymptotics.

First of all, it is usefull to get the asymptotics at $n\to\infty$ of the sum $$S_0(p)=\sum_{k=0}^n\binom{n}{k}p^k(1-p)^{n-k}=1$$ -just to check that the method works. $$S_0(p)=n!(1-p)^n\sum_{k=0}^n\frac{\left(\frac p{1-p}\right)^k}{\Gamma(k+1)\Gamma(n-k+1)}$$ $$=n!(1-p)^n\sum_{k=0}^ne^{k\ln\frac p{1-p}+\ln\Gamma(k+1)+\ln\Gamma(n-k+1)}$$ To find the main asymptotic term we switch from summation to integration: $$S_0(p)\sim n!(1-p)^n\int_0^ne^{k\ln\frac p{1-p}+\ln\Gamma(k+1)+\ln\Gamma(n-k+1)}dk=n!(1-p)^n\int_0^ne^{g(k)}dk$$ Next, we are searching the maximum of $g(k)$ $$g'(k)=\ln\frac p{1-p}+\psi(k+1)-\psi(n-k+1)=0$$ Supposing that the maximum point $k_0\gg1$, we use the asymptotics of $\displaystyle\psi(l)\sim\ln l$ $$\ln\frac p{1-p}+\ln k-\ln(n-k)=0\,\,\Rightarrow\,\,k_0=pn$$ Decomposing $g(k)$ near this point and using the asymptotics of $\psi^{(1)}(l)=\frac1l$ for $l\gg1$ $$g(k)\approx g(k_0)+\frac12g''(k_0)(k-k_0)^2$$ $$=pn\ln\frac p{1-p}-\ln\Gamma(pn+1)-\ln\Gamma(n-pn+1)-\frac{n(k-pn)^2}{pn(n-pn)}$$ and expanding integration to $\pm\infty$, we get: $$S_0(p)\sim \frac{(1-p)^nn!}{\Gamma(pn+1)\Gamma(n-pn+1)}\left(\frac p{1-p}\right)^{pn}\int_{-\infty}^\infty e^{-\frac{(k-pn)^2}{2pn(1-p)}}dk$$ Integrating and using the Stirling's approximation of gamma-function, $$S_0\sim(1-p)^n\sqrt{2\pi n}\left(\frac ne\right)^ne^n\left(\frac p{1-p}\right)^{pn}\left(\frac 1{pn}\right)^{pn}\left(\frac 1{n(1-p)}\right)^{n(1-p)}\frac{\sqrt{2\pi np(1-p)}}{2\pi n\sqrt{p(1-p)}}=1$$ as expected.

Now, applying exactly the same approach to the sum $$S=\sum_{k=0}^n {n\choose k}^2p^{2k}(1-p)^{2(n-k)}$$ after exactly the same manipulations, we get $$\boxed{\,\,S\sim\frac1{\sqrt{4\pi n\,p\,(1-p)}};\,\,p\in(0;1)\,\,}$$ (we have to bear in mind that we used the condition that $pn, (1-p)n\gg1$, so the approximation does not work if $p$ is too close to 0 or 1).

Quick check: at $p=\frac12$ we should get (for example, here) $$\frac1{2^{2n}}\sum_{k=0}^n {n\choose k}^2=\frac1{\sqrt{\pi n}}$$

Generalisation

We can also consider in the same way a general sum $$S_s=\sum_{k=0}^n {n\choose k}^sp^{sk}(1-p)^{s(n-k)};\,s>0$$ and get $$\boxed{\,\,S_s\sim\Big(2\pi np(1-p)\Big)^{\frac{1-s}2}\frac1{\sqrt s}\,\,}$$ $$S_1=1;\quad S_2=\frac1{\sqrt{4\pi np(1-p)}};\quad S_3=\frac1{2\sqrt3\,\pi np(1-p)}, \,\,etc.$$

Svyatoslav
  • 20,502
  • 1
    Thank you so much! I would need to work on how small the gap between $S$ and $\frac{1}{\sqrt{4n\pi p(1-p)}}$ is, which will require tracing back to all the approximations that have been used in your derivation. But I am convinced that this is a very good approximation, and my simulations back that up as well. – slowowl Apr 24 '23 at 23:20
  • The solution by @MathWonk allows to get in a rigorous way a full asymptotic. In my opinion, it is the most appropriate (and the best). Using it, I evaluate the second-order correction: $S_n=\frac1{\sqrt{4\pi np(1-p)}}\big(1+\frac{1-6p(1-p)}{8np(1-p)}+O(\frac1{n^2})\big)$. – Svyatoslav Apr 25 '23 at 02:52
  • Also, added some generalisation – Svyatoslav Apr 25 '23 at 03:27