Why is the sample variance unbiased?

Question

Let $X_1,X_2,...,X_n$ be independent normal distributions with a common variance $\sigma^2$. The usual definition of sample variance is $S^2:=\frac{\Sigma_{i=1}^{n} (X_i-\bar{X})^2}{n-1}$.

I want to show that $E[S^2]=\sigma^2$, i.e. the population variance.

My attempt at proof:

$\frac{n-1}{\sigma^2}E[S^2] = E[\frac{(n-1)S^2}{\sigma^2}] = E[\frac{\Sigma_{i=1}^{n}(X_i-\bar{X})^2}{\sigma^2}]=n$,

This is because for each $i$, $\frac{(X_i-\bar{X})}{\sigma}$ has a $N(0,1)$ distribution, so $\frac{(X_i-\bar{X})^2}{\sigma^2}$ has a Chi-squared distribution, each has an expected value of $1$.

But then $E[S^2]=\frac{n}{n-1}\sigma^2\neq 1$.

It is incorrect to state that $\frac{X_i-\bar X}{\sigma}$ is distributed as $N(0,1)$ If e.g. $n=2$ and (for convenience $\sigma=1$) then $X_1-\overline X=\frac12X_1-\frac12X_2$ so that - if for convenience $\sigma=1$ its variance is $0.5^2+0.5^2\neq1$. — drhab, Apr 23 '18 at 08:54
there is a proof at https://en.wikipedia.org/wiki/Variance#Sample_variance — antkam, Apr 23 '18 at 15:32

Ant · Accepted Answer · 2018-04-23T08:56:27.390

2

You don’t need the Gaussian assumption for this to hold.

Anyhow the problem is that you miscalculated the variance of $X_i - \bar X$, so those variables are not chi-squared. To compute the expected value of the squared difference (and hence the variance) you should expand the square. The only hard term then is $E[X_i \bar X]$, which you can compute by putting in the definition of $\bar X$ and using independence of the $X_i$.

edited Apr 23 '18 at 08:56

answered Apr 23 '18 at 08:53

Ant

21,522

Do you mean $E[(X_i-\bar{X})^2]=E[X_i^2-2X_i\bar{X}+\bar{X}^2]$? – Sid Caroline Apr 23 '18 at 09:10
@SidCaroline exactly :) – Ant Apr 23 '18 at 09:33

drhab · Answer 2 · 2018-04-23T16:42:41.790

2

It is enough to prove that $\mathsf E\sum_{i=1}^n(Y_i-\bar Y)^2=n-1$ where $X_i=\mu+\sigma Y_i$, so that the $Y_i$ are iid with standard normal distribtution.

$$\sum_{i=1}^n(Y_i-\bar Y)^2=\sum_{i=1}^n[Y_i^2-2Y_i\bar Y+\bar Y^2]=\sum_{i=1}^nY_i^2-2\bar Y\sum_{i=1}^nY_i+n\bar Y^2=\sum_{i=1}^nY_i^2-n\bar Y^2$$

so that: $$\mathsf E\sum_{i=1}^n(Y_i-\bar Y)^2=n-n\mathsf E\bar Y^2$$ This with: $$\mathsf E\bar Y^2=\frac1{n^2}\mathsf E\sum_{i=1}^n\sum_{j=1}^nY_iY_j=\frac{1}{n^2}n=\frac1n$$

so that: $$\mathsf E\sum_{i=1}^n(Y_i-\bar Y)^2=n-1$$

edit:

As remarked by Henry in a comment on this answer this will work also if the $X_i$ have another distribution.

edited Apr 23 '18 at 16:42

answered Apr 23 '18 at 09:22

drhab

153,781

1

You do not use $Y$ being normal, though you do use its mean being $0$ and variance $1$ – Henry Apr 23 '18 at 12:14
@Henry You are right. I have edited to make the anwer more complete. – drhab Apr 23 '18 at 16:43

score 2 · Answer 3 · 2018-04-23T14:59:34.320

In the computation of the sample variance, the deviations are not to the population mean, but to the estimate of the population mean via the sample mean. This is a source of a bias.

WLOG, $\mu=0$ (if not, you can center the $X_i$), and let us write $$T:=\sum (X_i-\overline X)^2=\sum (X_i^2+{\overline X}^2-2X_i\overline X)=\sum (X_i^2-\overline X^2)$$ where we used $\sum X_i=\sum \overline X$.

But we know that

$$Var(\overline X)=\frac1n Var(X)$$ so that the above sum is

$$T=n\,Var(X)-\frac nnVar(X)=(n-1)Var(X).$$

The dependency between the $X_i$ and $\overline X$ has a small variance reduction effect, which can be compensated.

score 1 · Answer 4 · answered Apr 23 '18 at 09:36

If you use $E[(X_i-\mu)(X_j-\mu)] =\sigma^2$ when $i=j$ and is $0$ otherwise by independence, then you can show the $n-1$ factor leads to an unbiased result with:

$$E\left[\frac{1}{n-1}\sum_i(X_i-\bar{X})^2\right] \\= E\left[\frac{1}{n-1}\sum_i((X_i-\mu)-(\bar{X}-\mu))^2 \right]\\= E\left[\frac{1}{n-1}\sum_i \left((X_i-\mu)-\frac1n \sum_j(X_j-\mu)\right)^2 \right]\\= E\left[\frac{1}{n-1}\sum_i \left((X_i-\mu)^2-\frac2n \sum_j(X_i-\mu)(X_j-\mu) +\frac1{n^2} \left(\sum_j(X_j-\mu)\right)^2\right)\right]\\= \frac{1}{n-1}\left(n\sigma^2 - \frac{2}{n}n\sigma^2 + \frac{1}{n^2}n (n\sigma^2)\right) \\= \sigma^2$$

Why is the sample variance unbiased?

4 Answers4