2

Let $X_1,X_2,...,X_n$ be independent normal distributions with a common variance $\sigma^2$. The usual definition of sample variance is $S^2:=\frac{\Sigma_{i=1}^{n} (X_i-\bar{X})^2}{n-1}$.

I want to show that $E[S^2]=\sigma^2$, i.e. the population variance.

My attempt at proof:

$\frac{n-1}{\sigma^2}E[S^2] = E[\frac{(n-1)S^2}{\sigma^2}] = E[\frac{\Sigma_{i=1}^{n}(X_i-\bar{X})^2}{\sigma^2}]=n$,

This is because for each $i$, $\frac{(X_i-\bar{X})}{\sigma}$ has a $N(0,1)$ distribution, so $\frac{(X_i-\bar{X})^2}{\sigma^2}$ has a Chi-squared distribution, each has an expected value of $1$.

But then $E[S^2]=\frac{n}{n-1}\sigma^2\neq 1$.

drhab
  • 153,781
Sid Caroline
  • 3,829

4 Answers4

2

You don’t need the Gaussian assumption for this to hold.

Anyhow the problem is that you miscalculated the variance of $X_i - \bar X$, so those variables are not chi-squared. To compute the expected value of the squared difference (and hence the variance) you should expand the square. The only hard term then is $E[X_i \bar X]$, which you can compute by putting in the definition of $\bar X$ and using independence of the $X_i$.

Ant
  • 21,522
2

It is enough to prove that $\mathsf E\sum_{i=1}^n(Y_i-\bar Y)^2=n-1$ where $X_i=\mu+\sigma Y_i$, so that the $Y_i$ are iid with standard normal distribtution.

$$\sum_{i=1}^n(Y_i-\bar Y)^2=\sum_{i=1}^n[Y_i^2-2Y_i\bar Y+\bar Y^2]=\sum_{i=1}^nY_i^2-2\bar Y\sum_{i=1}^nY_i+n\bar Y^2=\sum_{i=1}^nY_i^2-n\bar Y^2$$

so that: $$\mathsf E\sum_{i=1}^n(Y_i-\bar Y)^2=n-n\mathsf E\bar Y^2$$ This with: $$\mathsf E\bar Y^2=\frac1{n^2}\mathsf E\sum_{i=1}^n\sum_{j=1}^nY_iY_j=\frac{1}{n^2}n=\frac1n$$

so that: $$\mathsf E\sum_{i=1}^n(Y_i-\bar Y)^2=n-1$$


edit:

As remarked by Henry in a comment on this answer this will work also if the $X_i$ have another distribution.

drhab
  • 153,781
2

In the computation of the sample variance, the deviations are not to the population mean, but to the estimate of the population mean via the sample mean. This is a source of a bias.

WLOG, $\mu=0$ (if not, you can center the $X_i$), and let us write $$T:=\sum (X_i-\overline X)^2=\sum (X_i^2+{\overline X}^2-2X_i\overline X)=\sum (X_i^2-\overline X^2)$$ where we used $\sum X_i=\sum \overline X$.

But we know that

$$Var(\overline X)=\frac1n Var(X)$$ so that the above sum is

$$T=n\,Var(X)-\frac nnVar(X)=(n-1)Var(X).$$

The dependency between the $X_i$ and $\overline X$ has a small variance reduction effect, which can be compensated.

1

If you use $E[(X_i-\mu)(X_j-\mu)] =\sigma^2$ when $i=j$ and is $0$ otherwise by independence, then you can show the $n-1$ factor leads to an unbiased result with:

$$E\left[\frac{1}{n-1}\sum_i(X_i-\bar{X})^2\right] \\= E\left[\frac{1}{n-1}\sum_i((X_i-\mu)-(\bar{X}-\mu))^2 \right]\\= E\left[\frac{1}{n-1}\sum_i \left((X_i-\mu)-\frac1n \sum_j(X_j-\mu)\right)^2 \right]\\= E\left[\frac{1}{n-1}\sum_i \left((X_i-\mu)^2-\frac2n \sum_j(X_i-\mu)(X_j-\mu) +\frac1{n^2} \left(\sum_j(X_j-\mu)\right)^2\right)\right]\\= \frac{1}{n-1}\left(n\sigma^2 - \frac{2}{n}n\sigma^2 + \frac{1}{n^2}n (n\sigma^2)\right) \\= \sigma^2$$

Henry
  • 169,616