4

Is there a way to update a normal distribution when given new data points without knowing the original data points? What is the minimum information that would need to be known? For example, if I know the mean, standard deviation, and the number of original data points, but not the values of those points themselves, is it possible?

Colin
  • 205
  • Do you mean updating using Bayes' Theorem? – Simon Hayward Dec 04 '12 at 20:26
  • 1
    Maybe? I'm asking the question because I don't know. Let's say I had a list of 100 integers between 1 and 10, and they have a mean of 7.3 and a stddev of 1.1. Now I'm given a new number, say 8. Of course if I still have the 100 integers, I now have 101 and can recalculate the mean and stddev, but what I'm wondering is if I can do it without knowing the 100 integers, just knowing the mean and stddev and that there were 100. Can Bayes Theorem do this? – Colin Dec 04 '12 at 21:12
  • 3
    The minimal information is just what you've listed: you need the zero-th, first, and second moments of the original data points; e.g., knowing their number, mean, and standard deviation is sufficient. – mjqxxxx Dec 04 '12 at 21:21

1 Answers1

6

It is certainly possible. The best way, avoiding some numerical precision issues, is to track the following two values, using the new $n$th observation $a_n$ each time to update the following:

$$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}$$

$$s_n = s_{n-1} + (a_n - m_{n-1})(a_n - m_n)$$

starting with $m_0=s_0 =0$. Then the mean of the first $n$ values is $m_n$ while the standard deviation is $\sqrt{\frac{s_n}{n}}$ or $\sqrt{\frac{s_n}{n-1}}$ depending on what denominator you usually use to calculate the standard deviation. If you would prefer to just track the standard deviation you can calculate $s_{n-1}=(n-1)\sigma_{n-1}^2 \text{ or } (n-2)\sigma_{n-1}^2 $ each time.

Henry
  • 169,616
  • Did you mean $a_n-m_n$ in the expression for $s_n$? – Alex Dec 05 '12 at 00:33
  • @Alex - yes - thanks – Henry Dec 05 '12 at 00:45
  • @Henry how did you come up with that? Is this a Kalman filter? Do you have some (biblio) reference to refer to? – daruma Aug 26 '24 at 07:04
  • @daruma - It was based on https://math.stackexchange.com/a/106720/6460 and some personal calculation. It could be adapted to track the variance rather than the sum of squared differences, but that would need a decision on the choice of denominator. – Henry Aug 27 '24 at 09:16
  • Ok, this is Welford's online algorithm; cf. https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance. – daruma Aug 29 '24 at 01:27