Determining variance from sum of two random correlated variables

Question

I understand that the variance of the sum of two independent normally distributed random variables is the sum of the variances, but how does this change when the two random variables are correlated?

There is also a good (and simple) explanation on Insight Things. — Jan Rothkegel, Mar 07 '16 at 14:17

David Mitra · Accepted Answer · 2012-03-02T02:21:07.173

60

For any two random variables: $$\text{Var}(X+Y) =\text{Var}(X)+\text{Var}(Y)+2\text{Cov}(X,Y).$$ If the variables are uncorrelated (that is, $\text{Cov}(X,Y)=0$), then

$$\tag{1}\text{Var}(X+Y) =\text{Var}(X)+\text{Var}(Y).$$ In particular, if $X$ and $Y$ are independent, then equation $(1)$ holds.

In general $$ \text{Var}\Bigl(\,\sum_{i=1}^n X_i\,\Bigr)= \sum_{i=1}^n\text{Var}( X_i)+ 2\sum_{i< j} \text{Cov}(X_i,X_j). $$ If for each $i\ne j$, $X_i$ and $X_j$ are uncorrelated, in particular if the $X_i$ are pairwise independent (that is, $X_i$ and $X_j$ are independent whenever $i\ne j$), then $$ \text{Var}\Bigl(\,\sum_{i=1}^n X_i\,\Bigr)= \sum_{i=1}^n\text{Var}( X_i) . $$

edited Mar 02 '12 at 02:21

answered Mar 02 '12 at 02:05

David Mitra

76,313

1

I am unfamiliar with the summation(i<j). Can you explain what this notation means? – Soo Mar 02 '12 at 02:13
4

@soo You calculate all covariances $\text{Cov}(X_i,X_j)$ with $i<j$ and sum them up. Another way to write $2\sum_{i<j}$ in this case is to write $\sum_{i\ne j}$. (The 2 is there in the first sum because in the second sum you calculate, e.g., $\text{Cov}(X_1,X_2)$ and $\text{Cov}(X_2,X_1)$, but these are equal. – David Mitra Mar 02 '12 at 02:17
David, excellent explanation, the 2 in the 2*cov(...) makes more sense now. Also, can you explain why you wouldn't define an upper limit "n" in the summation(i<j)? Just for my personal curiosity. Thanks – Soo Mar 02 '12 at 02:21
@soo For your first comment, that's correct. I'll just let your comment be the addendum, if that's ok. – David Mitra Mar 02 '12 at 02:23
@soo To be rigorous, I should have written $\sum\limits_{i<j\atop 1\le j\le n }$ or something like that. No upper limit though. The lower limit perfectly describes what the index set is. – David Mitra Mar 02 '12 at 02:26
Are the statements "The covariance is zero" and "The events are independent" equivalent? – Ian Haggerty Nov 14 '15 at 16:53
@IanHaggerty No. See this for example. – David Mitra Nov 14 '15 at 17:15
@DavidMitra, thank you for this nice explanation, I have a follow up question to the discussion. I am dealing with an example where instead of having just one random variable $X$ in the sum, I have a product $Var\left( \sum ^{N}{i=1}I{i}X_{i}\right)$. When I look at the book I get the double summation in the covariance term: $$ \text{Var}\Bigl(,\sum_{i=1}^N X_i I_i,\Bigr)= \sum_{i=1}^N\text{Var}(I_i X_i)+ \sum_{i=1}^N \sum_{j = 1, j \neq i}^N \text{Cov}(X_i I_i,X_j I_j). $$ Could you please explain why that may be the case ? Thank you. – Mark Dec 27 '21 at 21:46

score 9 · Answer 2 · edited May 08 '21 at 00:10

Let's work this out from the definitions. Let's say we have 2 random variables $x$ and $y$ with means $\mu_x$ and $\mu_y$. Then variances of $x$ and $y$ would be:

$${\sigma_x}^2 = \frac{\sum_i(\mu_x-x_i)(\mu_x-x_i)}{N}$$ $${\sigma_y}^2 = \frac{\sum_i(\mu_y-y_i)(\mu_y-y_i)}{N}$$

Covariance of $x$ and $y$ is:

$${\sigma_{xy}} = \frac{\sum_i(\mu_x-x_i)(\mu_y-y_i)}{N}$$

Now, let us consider the weighted sum $p$ of $x$ and $y$:

$$\mu_p = w_x\mu_x + w_y\mu_y$$

$${\sigma_p}^2 = \frac{\sum_i(\mu_p-p_i)^2}{N} = \frac{\sum_i(w_x\mu_x + w_y\mu_y - w_xx_i - w_yy_i)^2}{N} = \frac{\sum_i(w_x(\mu_x - x_i) + w_y(\mu_y - y_i))^2}{N} = \frac{\sum_i(w^2_x(\mu_x - x_i)^2 + w^2_y(\mu_y - y_i)^2 + 2w_xw_y(\mu_x - x_i)(\mu_y - y_i))}{N} \\ = w^2_x\frac{\sum_i(\mu_x-x_i)^2}{N} + w^2_y\frac{\sum_i(\mu_y-y_i)^2}{N} + 2w_xw_y\frac{\sum_i(\mu_x-x_i)(\mu_y-y_i)}{N} \\ = w^2_x\sigma^2_x + w^2_y\sigma^2_y + 2w_xw_y\sigma_{xy}$$

score 8 · Answer 3 · edited Mar 04 '18 at 23:06

8

You can also think in vector form:

$$\text{Var}(a^T X) = a^T \text{Var}(X) a$$

where $a$ could be a vector or a matrix, $X = (X_1, X_2, \dots, X_n)^T$ is a vector of random variables. $\text{Var}(X)$ is the covariance matrix.

If $a = (1, 1, \dots, 1)^T$, then $a^T X$ is the sum of all the $x_i's$.

edited Mar 04 '18 at 23:06

answered Feb 21 '13 at 15:55

qkhhly

234

score 6 · Answer 4 · answered Feb 06 '20 at 04:19

Consider a function of two variables, $ z = f(x, y) $. Then the variation of z, $\delta z$, is $$\tag{1} \delta z = \frac{df}{dx} \ \delta x $$ where $$ \frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y} \frac{ dy}{dx}. $$ Squaring equation (1) we get $$ (\delta z)^2 = \Big[ \left( \frac{\partial f}{\partial x} \right)^2 + 2 \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} \frac{dy}{dx} + \left( \frac{\partial f}{\partial y}\right)^2 \left( \frac{dy}{dx} \right)^2 \Big] (\delta x)^2. $$ Multiplying this out we get $$ (\delta z)^2 = \left( \frac{\partial f}{\partial x} \right)^2 (\delta x)^2+ 2 \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} \delta x \delta y + \left( \frac{\partial f}{\partial y}\right)^2 (\delta y)^2, $$ where we have used that $\delta y = \frac{dy}{dx} \delta x$. Now we can identify the quadratic variation terms with the variances and covariance of random variables: $$ \text{Var}(z) = \left( \frac{\partial f}{\partial x} \right)^2 \text{Var}(x) + 2 \frac{\partial f}{\partial x} \frac{\partial f}{\partial y} \text{Cov}(x,y) + \left( \frac{\partial f}{\partial y}\right)^2 \text{Var}(y). $$ When the function $f$ is just a sum of $x$ and $y$ then the partial derivative terms are all equal to one, giving $$\text{Var}(z) = \text{Var}(x) + 2\ \text{Cov}(x,y) + \text{Var}(y). $$

This was brilliant. Thank you. – user96265 Oct 16 '21 at 23:35 — user96265, Oct 16 '21 at 23:35

Determining variance from sum of two random correlated variables

4 Answers4

Linked