0

Hello I have to calculate the estimated variance between 3 samples

Here is an example

      SP1  SP2   SP3
ni    20   30    46
Meani 67   56    90
Si^2  12.3 23.2  11.2

I know how to calculate the weighted average by doing :

67*21+56*20+90*46/(20+30+46)

But how to calculate the estimated weighted variance ?

I guess we should first calculate the corrected variance such as S^2 * (n/n-1) :

12.3*(20/19)=12.94737
23.2*(30/29)=24
11.2*(46/45)=11.44889

and then do as for mean

12.94737*20+24*30+11.44889*46/(20+30+46)

But I'm not sure it is correct.

Grendel
  • 103
  • Not really they speak about this formula (((n1-1)s1 + (n2-1)s2+ (n3-1)*s3)/(n1+n2+n3-3)) but it seems wrong – Grendel Apr 30 '20 at 08:49

1 Answers1

2

First of all, it is an unfortunate consequence of the way Math.SE has changed the language of flagged duplicate questions that causes those who submitted the question to think that the flag is only a suggestion; i.e. the phrase

Does this answer your question

is in this case not an opportunity for you to disagree. It is a canned reply that I have no control over, when the reality is that it absolutely and certainly does answer your question, and if you do not think it does, you have not studied what the answer says. I am the author of that answer. I know what it says because I wrote it. And I am telling you that it answers your question, which I will now proceed to demonstrate.

Your table of summary statistics is $$\begin{array}{c|ccc} & SP1 & SP2 & SP3 \\ \hline n_i & 20 & 30 & 46 \\ \bar x_i & 67 & 56 & 90 \\ s_i^2 & 12.3 & 23.2 & 11.2 \\ \end{array}$$

Using the formula $$\bar z = \frac{n \bar x + m \bar y}{n + m}$$ adapted to your notation on $i = 1, 2$, we have $$\bar x_{1,2} = \frac{n_1 \bar x_1 + n_2 \bar x_2}{n_1 + n_2} = \frac{20(67)+30(56)}{20+30} = \frac{302}{5}.$$

Using the formula $$s_z^2 = \frac{(n-1) s_x^2 + (m-1) s_y^2}{n+m-1} + \frac{nm(\bar x - \bar y)^2}{(n+m)(n+m-1)},$$ also adapted to your notation, we have $$\begin{align*} s_{1,2}^2 &= \frac{(n_1 - 1) s_1^2 + (n_2 - 1) s_2^2}{n_1 + n_2 - 1} + \frac{n_1 n_2(\bar x_1 - \bar x_2)^2}{(n_1 + n_2)(n_1 + n_2 - 1)} \\ &= \frac{19(12.3) + 29(23.2)}{20 + 30 - 1} + \frac{20(30)(67 - 56)^2}{(20 + 30)(20 + 30 - 1)} \\ &= 48.1327. \end{align*}$$ Therefore, we have combined the mean and variance for $SP1$ and $SP2$. Now we repeat the process to combine $SP(1,2)$ with $SP3$. This I leave as an exercise.

heropup
  • 143,828
  • But how can we have a globale variance between the 3 sample that is higher than the highest variance in one of them ? – Grendel Apr 30 '20 at 09:27
  • @Grendel In your case, the overall variance across all three groups is obviously larger than the individual group variances because the within-group means are quite far apart from each other. If I have two sets of tightly clustered data with $100$ observations each, and a variance of $1$ within each group, but the mean of one set is $10$ and the other is $1000$, the variance of the combined data will be much larger than $1$ because there is variance BETWEEN groups. – heropup Apr 30 '20 at 09:30
  • okay thank you it helped a lot – Grendel Apr 30 '20 at 09:51
  • I does not exist a simplier formula in order to calculate the globale variance direclty for the 3 pop instead of doing P1vsP2 and then P1,P2 vs P3 ? – Grendel Apr 30 '20 at 10:07
  • @Grendel If you read the solution I linked to in my answer, you will find a formula that can be generalized to more than two groups. – heropup Apr 30 '20 at 19:03