2

Let $X_1 \dots X_n \sim B(1, p)$ be i.i.d. random variables. Then the variance of $X_i$ can be approximated through $Y_n = \bar{X}_n(1 - \bar{X}_n)$. What is the limit distribution of $Y_n$ as $n \to \infty$?

I tried to solve this as follows. Let $q = 1 - p$, then from the CLT follows

$$\sqrt{\frac{n}{pq}}(\bar{X}_n - p) \to Z \sim \text{N}(0, 1) \text{ as } n \to \infty$$

Through the delta method with $g(x) = x(1 - x)$, we can conclude that

$$ \sqrt{\frac{n}{pq}}(g(\bar{X}_n) - g(p)) \to g'(p) \cdot Z $$

or equivalently

$$\sqrt{n}(Y_n - pq) \to \sqrt{pq}(q - p) \cdot Z \sim \text{N}(0, pq(q - p)^2) \text{ as } n \to \infty$$

And then I was unsure on how to proceed finding the limit distribution of $Y_n$. The big problem, at least to me, seems to be how to eliminate the occurrence of $\sqrt{n}$ on the left. I had a bit of an ad-hoc idea but I'm not sure if it's justifiable. It goes as follows:

Let $F_n : \mathbb{R} \to \mathbb{R}$ be the cdf and $P_n : \mathbb{R} \to \mathbb{R}$ the probability function of $Y_n$, and $\phi : \mathbb{R} \to \mathbb{R}$ be the cdf of $Z \sim N(0, 1)$. Then, using the definition of convergence by distribution, the previous result can be rewritten as follows.

$$ \lim_{n \to \infty} P_n(\sqrt{n}(Y_n - pq) \le x) = \phi\left(\frac{x}{\sqrt{pq}(q-p)}\right)$$

or equivalently

$$ \lim_{n \to \infty} P_n(Y_n \le \frac{x}{\sqrt{n}} + pq) = \phi\left(\frac{x}{\sqrt{pq}(q-p)}\right)$$

Letting $y = \frac{x}{\sqrt{n}} + pq$ we can conclude that

\begin{align} \lim_{n \to \infty} F_n(y) &= \lim_{n \to \infty} P_n(Y_n \le y) \\ &= \lim_{n \to \infty}\phi\left(\frac{\sqrt{n}y - pq}{\sqrt{pq}(q-p)}\right) \end{align}

If we denote the unknown limit distribution as $F : \mathbb{R} \to \mathbb{R}$, then this means we can define it as

$$ F(y) = \left\{\begin{array}{ll} 0 \quad &\text{ if } \frac{y}{p - q} < 0 \\ \phi\left(\frac{\sqrt{pq}}{p - q}\right) \quad &\text{ if } y = 0 \\ 1 \quad &\text{ if } \frac{y}{p - q} > 0 \end{array}\right. $$

I can already tell this argument fails specifically in the case when $p = 1/2$, but disregarding that, does this argument make any sense?

  • 1
    $\sqrt n$ is the appropriate scaling for $Y_n$ to converge to a non-degenerate distribution, so why 'eliminate' it? The statement $\sqrt n(Y_n-pq)\stackrel{D}\longrightarrow N(0,pq(q-p)^2)$ is enough to indicate the limiting distribution of $Y_n$. – StubbornAtom Aug 11 '19 at 13:56
  • I don't understand what you mean with "appropriate scaling". If it's enough to indicate the limiting distribution of $Y_n$, then what is the limiting distribution of $Y_n$? – WafflesTasty Aug 11 '19 at 13:58
  • 1
    That itself is the limiting distribution. You are showing that it is asymptotically normal, that's it. If I can show that $\frac{Y_n-a_n}{b_n}$ converges to a non-degenerate distribution for some ${a_n}$ and ${b_n}$, I am done. – StubbornAtom Aug 11 '19 at 14:00
  • I can see on the Wikipedia page of Asymptotic distributions there is indeed mention of the same definition as you are doing. I haven't seen asymptotic distributions be defined that way before. The only definition I have is $Y_n \stackrel{D}\longrightarrow Y$ iff for the corresponding cdfs it holds that $F_n(x) \longrightarrow F(x)$ in every point $x$ where $F$ is continuous. Which leaves me wondering how the two are connected. – WafflesTasty Aug 11 '19 at 14:22

0 Answers0