Is there a way to update a normal distribution when given new data points without knowing the original data points? What is the minimum information that would need to be known? For example, if I know the mean, standard deviation, and the number of original data points, but not the values of those points themselves, is it possible?
Asked
Active
Viewed 1,650 times
4
-
Do you mean updating using Bayes' Theorem? – Simon Hayward Dec 04 '12 at 20:26
-
1Maybe? I'm asking the question because I don't know. Let's say I had a list of 100 integers between 1 and 10, and they have a mean of 7.3 and a stddev of 1.1. Now I'm given a new number, say 8. Of course if I still have the 100 integers, I now have 101 and can recalculate the mean and stddev, but what I'm wondering is if I can do it without knowing the 100 integers, just knowing the mean and stddev and that there were 100. Can Bayes Theorem do this? – Colin Dec 04 '12 at 21:12
-
3The minimal information is just what you've listed: you need the zero-th, first, and second moments of the original data points; e.g., knowing their number, mean, and standard deviation is sufficient. – mjqxxxx Dec 04 '12 at 21:21
1 Answers
6
It is certainly possible. The best way, avoiding some numerical precision issues, is to track the following two values, using the new $n$th observation $a_n$ each time to update the following:
$$m_n = m_{n-1} + \frac{a_{n}-m_{n-1}}{n}$$
$$s_n = s_{n-1} + (a_n - m_{n-1})(a_n - m_n)$$
starting with $m_0=s_0 =0$. Then the mean of the first $n$ values is $m_n$ while the standard deviation is $\sqrt{\frac{s_n}{n}}$ or $\sqrt{\frac{s_n}{n-1}}$ depending on what denominator you usually use to calculate the standard deviation. If you would prefer to just track the standard deviation you can calculate $s_{n-1}=(n-1)\sigma_{n-1}^2 \text{ or } (n-2)\sigma_{n-1}^2 $ each time.
Henry
- 169,616
-
-
-
@Henry how did you come up with that? Is this a Kalman filter? Do you have some (biblio) reference to refer to? – daruma Aug 26 '24 at 07:04
-
@daruma - It was based on https://math.stackexchange.com/a/106720/6460 and some personal calculation. It could be adapted to track the variance rather than the sum of squared differences, but that would need a decision on the choice of denominator. – Henry Aug 27 '24 at 09:16
-
Ok, this is Welford's online algorithm; cf. https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance. – daruma Aug 29 '24 at 01:27