3

If $X_i$ iid with variance $\sigma$ then I want to prove that $S_n^2 = \frac{1}{n-1}\sum_{i=1}^{n}(X_i -\bar X_n )^2$ is an unbiased estimate of the variance $\sigma$. So here I go:

\begin{equation} \begin{aligned} \mathbb{E}(S_n ^2)&= \frac{1}{n-1}\sum_{i=1} ^{n} \mathbb{E}(X_i -\bar X_n )^2\\ &= \frac{1}{n-1}\sum_{i=1} ^{n} \mathbb{E}(X_i^2 -2X_i\bar X_n + \bar X_n^2) \\ &= \frac{1}{n-1}\sum_{i=1} ^{n} \mathbb{E}(X_i^2 -\frac{2}{n}X_i ^2 -\frac{2}{n}\sum_{j\neq i} X_i X_j + \bar X_n^2)\\ &= \frac{1}{n-1}\left\{ (n-2)\mathbb{E}(X_1 ^2) -\frac{2}{n}\sum_{i=1} ^{n}\sum_{j\neq i}\mathbb{E}(X_i)\mathbb{E}(X_j) + \sigma ^2 + \mathbb{E}(X_1)^2 \right\} \end{aligned} \end{equation} where I used the fact that for $X_i$, $X_j$ independent we have $\mathbb{E}(X_i X_j) = \mathbb{E}(X_i)\mathbb{E}X_j)$ and that $\mathbb{E}(\bar X_n ^2) = \frac{\sigma^2 + \mathbb{E}(X_1)^2}{n}$. Finally, after rearranging the first and last terms:

\begin{equation} \begin{aligned} \mathbb{E}(S_n ^2) &= \frac{1}{n-1}\left\{ (n-1)\mathbb{E}(X_1 ^2) -\frac{2}{n}n(n-1)\mathbb{E}(X_1)^2 \right\}\\ &= \mathbb{E}(X_1 ^2) -2\mathbb{E}(X_1)^2\\ &\neq \mathbb{E}(X_1 ^2) -\mathbb{E}(X_1)^2 \end{aligned} \end{equation}

I'm off by a factor $2$. Can someone help me point out my mistake?

3 Answers3

0

Where you have $\displaystyle\sum_{i=1}^n \sum_{j\ne i}$ should you be summing over all ordered pairs $(i,j)$ with $j\ne i$, or over all unordered pairs? Whether the $2$ belongs there depends on how you answer that. You can do it one way with the $2$ and the other way without the $2$.

  • I added an extra step: $-2X_i\bar X_n = -\frac{2}{n}X_i ^2 -\frac{2}{n}\sum_{j\neq i} X_i X_j$ which is completely unnecessary. But if you go this way and use Landon Carter's comment then you end up with the correct answer. – fricadelle Jan 02 '17 at 11:29
0

Probably the easiest way to show this is to use Fisher's Lemma: $\frac{S_{n}^2 (n-1)}{\sigma^2} \sim \chi^2_{n-1}$.

$\mathbb{E}(\chi^2_{n}) = n \Rightarrow \mathbb{E} \left( \frac{S_{n}^2 (n-1)}{\sigma^2} \right) = \frac{n-1}{\sigma^2} \cdot \mathbb{E}S^2_{n} = \mathbb{E}(\chi^2_{n-1})=(n-1) \Rightarrow \mathbb{E}(S^2_{n}) = \frac{n-1}{n-1} \cdot \sigma^2$

Joitandr
  • 996
0

To get an unbiased estimate, one may use the central limit theorem (CLT).

Suppose that a random variable $X$ is distributed in a population with mean $\mu_{X}$ and variance $\sigma_{X}^{2}$. If samples of size $N$ are drawn from the population, according to the CLT, the means of these samples $\bar{X}$ will be normally distributed with variance $\sigma_{X}^{2}/N$.

A biased estimate of the variance may be obtained from the calculation for the sample variance $s_{X}^{2}$

\begin{equation} s_{X}^{2} = \frac{1}{N}\sum_{i}^{N}\left(x_{i} - \bar{X}\right)^{2}, \tag{1} \end{equation}

where the $x_{i}$ are the sampled values and $\bar{X}$ is the sample mean. Now, a better estimate of the variance will require the true mean $\mu_{X}$,

\begin{align} \sigma_{X}^{2} &\approx \frac{1}{N}\sum_{i}^{N}\left(x_{i} - \mu_{X}\right)^{2}, \\ &= \frac{1}{N}\sum_{i}^{N}\left[\left(x_{i} - \bar{X}\right) -\left(\mu_{X}- \bar{X}\right)\right]^{2}, \\ &= \frac{1}{N}\sum_{i}^{N}\left[\left(x_{i} - \bar{X}\right)^{2} - 2\left(x_{i} - \bar{X}\right)\left(\mu_{X}- \bar{X}\right) + \left(\mu_{X}- \bar{X}\right)^{2}\right]. \end{align}

Since $\left(\mu_{X}- \bar{X}\right)$ is a constant, we may take it outside the summation to evaluate the middle terms of the expression above:

\begin{align} \frac{2\left(\mu_{X}- \bar{X}\right)}{N}\sum_{i}^{N}\left(x_{i} - \bar{X}\right) &= 2\left(\mu_{X}- \bar{X}\right)\left(\bar{X} - \bar{X}\right) = 0. \end{align}

Hence, using (1) for $s_{X}^{2}$, we are left with

\begin{align} \sigma_{X}^{2} &\approx s_{X}^{2} + \left(\mu_{X}- \bar{X}\right)^{2}. \end{align}

Now, the average value of the term $\left(\mu_{X}- \bar{X}\right)^{2}$ is just going to be the variance of the sample mean. That is, by the CLT, $\sigma_{X}^{2}/N$. Hence, we have

\begin{align} \sigma_{X}^{2} &\approx s_{X}^{2} + \frac{\sigma_{X}^{2}}{N}. \end{align}

Rearranging, we then obtain

\begin{align} \sigma_{X}^{2} = \frac{N}{N-1}s_{X}^{2}. \end{align}

Substituting (1) for $s_{X}^{2}$, we then find the estimate for the population variance

\begin{equation} \sigma_{X}^{2} \approx \frac{1}{N-1}\sum_{i}^{N}\left(x_{i} - \bar{X}\right)^{2}. \nonumber \end{equation}

We see then that the estimate for population variance is given by substituting $(N-1)$ for $N$ in the expression for the biased estimate. The reason for this is because we are using the sample mean, rather than the true mean, which (by the CLT) varies around the true mean with variance $\sigma_{X}^{2}/N$.