0

In some places the variance is defined as the sum of the squared differences of each data point value from the mean, divided the N - 1, and other places it's divided by N:

Here

  1. The variance is the average number of these squared differences:

(2061.16+1128.96+3672.36+2440.36+338.56+0.16+384.16) / 7 = 1432.2

Shouldn't it be / 6 for the variance?

Also here: with N in the denominator?

Zebrafish
  • 325
  • Population vs sample – David P Oct 04 '23 at 12:33
  • 1
    https://en.wikipedia.org/wiki/Bessel%27s_correction – J.G. Oct 04 '23 at 12:41
  • 1
    The punchline is whether you know ahead of time that your list of data is complete and you are trying describe exactly that set of data and nothing more, or if you are using the data in an attempt to extrapolate to describing a larger set with the understanding that the data you have is incomplete and so you should include a bit of leeway. – JMoravitz Oct 04 '23 at 12:42
  • 1
    The key aspect of incomplete in @JMoravitz's comment is whether we know the population mean or have to estimate it from the sample. – J.G. Oct 04 '23 at 12:49
  • @JMoravitz "Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value." How does whether your data is inclusive of the entire population or only a sample of that affect how spread out the numbers are? – Zebrafish Oct 04 '23 at 12:51
  • @JMoravitz Ohhhh, you mean there's variance for the list of numbers, and a variance that was discovered to be more suitable or accurate for a larger population for which that sample is representative? Something like that? – Zebrafish Oct 04 '23 at 12:53
  • Sort of. If I asked you for the mean of men's heights, you'd work out a sample mean. If I asked you for the variance of men's heights, you'd look at your sample again, using what you think is the mean. But unsurprisingly, the sample values cluster a bit tighter around the sample mean they define than around the true population mean. – J.G. Oct 04 '23 at 12:58
  • @J.G. So the distinction between sample variance and population variance is when you're trying to get a number that applies to both sample and population and people realised that whenever you take a sample of a population the variance always ends up being LESS than if you did the calculation on the population that sample was taken from, right? So variance isn't just something that describes a list of numbers, like the mean or the mode, or the count. Variance is defined with relation to a larger population? – Zebrafish Oct 04 '23 at 13:06
  • Yes, but noting that the same word "Variance" is used in both settings. We differentiate between the two by emphasizing "population variance" vs "sample variance." If the extra word is ever left out, it should be able to be assumed which is intended based on context. – JMoravitz Oct 04 '23 at 13:09
  • Another way to think of it: there's only one variance, the population variance, but if all you have is the sample then you can only estimate the population variance. It turns out that just taking the variance of the sample as if it were the whole population (so that the sample mean would be the population mean) is biased, and changing the denominator corrects that bias. See the Bessel correction – lulu Oct 04 '23 at 13:27
  • @lulu Thanks that helps thinking of it that way. In a Python tutorial they took the variance without Bessel's correction, as if the sample variance didn't exist. They had an array of numbers, summed the differences squared an divided by N. – Zebrafish Oct 04 '23 at 13:29
  • Of course, if the sample is large, the correction is mostly cosmetic. And if the sample is small...well, these estimates aren't going to be very good any way. But, still, it's always better to use unbiased estimators if they are available. – lulu Oct 04 '23 at 13:30

0 Answers0