1

I am looking into a question about variance induction on an incremental dataset.

To begin with, dataset $D_{n-1}$ contains elements $\{x_1, ..., x_{n-1}\}$, and we have got the values of:

  • mean $\bar{x}_{n-1}$
  • variance $\sigma^2_{n-1}$

If we add in a new element $x_n$ to get a new dataset $D_{n}$ containing $\{x_1, ..., x_{n-1}, x_n\}$, and assume we have computed its value of:

  • mean $\bar{x}_n$ (e.g. by formula $\bar{x}_n = \frac{n-1}{n}\bar{x}_{n-1} + \frac{1}{n}x_n$)

Then which one option is the variance $\sigma^2_n$? ...

By a Python testing script, I have ruled out all other options and validated that the correct answer is:

$\sigma^2_n = \frac{n-1}{n}\sigma^2_{n-1} + \frac{1}{n}(x_n-\bar{x}_{n-1})(x_n - \bar{x}_n)$

However, I need a little help to prove it analytically.

Let me know if you need more details, and I highly appreciate your help.

James
  • 111

0 Answers0