0

For two variables to have zero covariance, there must be no linear dependence between them. Independence is a stronger requirement than zero covariance, because independence also excludes nonlinear relationships. It is possible for two variables to be dependent but have zero covariance.

Page 59

Goodfellow, Ian; Bengio, Yoshua; Courville, Aaron, Deep learning, Adaptive Computation and Machine Learning. Cambridge, MA: MIT Press (ISBN 978-0-262-03561-3/hbk; 978-0-262-33743-4/ebook). xxii, 775 p. (2016). ZBL1373.68009.

I understand the why stochastic independence implies zero covariance, which is $\mathbf{E}[\mathbf{XY}]\triangleq\iint_{\mathbb{R^2}} xy*p_{xy}(x,y)dxdy = \iint_{\mathbb{R^2}} xy*p_{x}(x)*p_y(y)dxdy = \int_{\mathbb{R}}x*p_x(x)dx\int_{\mathbb{R}}y*p_y(y)dy = \mathbf{E}[\mathbf{X}]\mathbf{E}[\mathbf{Y}]$ as long as the independence condition holds. Then, it follows that, since $\mathbf{Cov}[\mathbf{XY}]\triangleq\mathbf{E}[\mathbf{XY}]-\mathbf{E}[\mathbf{X}]\mathbf{E}[\mathbf{Y}]$, substitution yields $\mathbf{Cov}[\mathbf{XY}] = 0$.

I apologize for the digression; now comes my question: I understand what linear and non-linear dependence entail, but I fail to see why the type of dependence between the $\mathbf{X}$ and $\mathbf{Y}$ would affect stochastic independence. I apologize if my question isn't worded in the perfect way possible. Thanks in advance!

  • 2
    "Stronger" simply means the last sentence: "It is possible for two variables to be dependent but have zero covariance." Or in other words, if you know $X,Y$ are independent, you know more about $X,Y$. – antkam Sep 25 '19 at 15:05
  • 1
    Take a look here – RRL Sep 25 '19 at 15:09
  • 1
    I'm puzzled by the phrase "for zero covariance to imply stochastic independence" in your question. It's the other way around. "Stronger" means that stochastic independence implies zero covariance. I'm not sure if this is the source of your confusion, or merely awkward phrasing. – saulspatz Sep 25 '19 at 15:13
  • @saulspatz definitely awkward phrasing. I'll edit that out. Thanks! – NTAuthority Sep 25 '19 at 15:17
  • @RRL took a look at it. Makes more sense now, thanks. – NTAuthority Sep 25 '19 at 15:26
  • More or less duplicate of https://math.stackexchange.com/q/444408 – Jean Marie Sep 25 '19 at 16:34

1 Answers1

1

Correlation only captures linear dependence. E.g. Take $Y = X^2$. These two random variables $X,Y$ are definitely dependent on each other i.e. given the value of one, we can comment on the value of the other. However, the correlation between $Y,X$ is zero. But surely, $X,Y$ are NOT independent.

UPDATE:

Intuition behind why correlation captures only linear dependence:

Say two rvs $X,Y$ have linear relationship i.e $Y = aX + \epsilon$, where $\epsilon$ is independent of $X$ and has variances $\sigma^2_{\epsilon}$.

\begin{align} \rho_{XY} &= \frac{\Bbb E(XY) - \Bbb E(X) \Bbb E(Y)}{\sigma_X \sigma_Y} \\ &= \frac{\Bbb E(X(aX+\epsilon)) - \Bbb E(X) \Bbb E(aX+\epsilon)}{\sigma_X \sqrt{(a^2 \sigma^2_x + \sigma^2_{\epsilon})}} \\ &= \frac{a\Bbb E(X^2) - a\Bbb E(X)^2}{\sigma_X \sqrt{(a^2 \sigma^2_x + \sigma^2_{\epsilon})}} = \frac{a\sigma_X}{\sqrt{(a^2 \sigma^2_x + \sigma^2_{\epsilon})}} \\ \end{align}

If $\sigma_{\epsilon} = 0$ i.e. the error term is not stochastic, then the correlation is 1.

On the other hand, by having a large $\sigma_{\epsilon}$, correlation is close to zero.

honeybadger
  • 1,197
  • That definitely makes sense. But why does correlation only capture linear dependence? I guess the reason lies in the definition? – NTAuthority Sep 25 '19 at 15:24
  • @NTAuthority: Yeah, it is kind of definitional. I updated my answer to add more details on how linear dependence is well captured by correlation. However, this is not an answer to why other sorts of dependencies are not well captured by correlation. – honeybadger Sep 25 '19 at 15:45
  • @NTAuthority - there is another (more advanced) quantity called mutual information which $=0$ iff the two r.v.s are independent. if they have any kind of dependence at all (linear, non-linear, etc) then MI $>0$. Meanwhile, the +/- sign of covariance tells you how they linearly relate, i.e. if one increases, does the other "typically" increase or decrease? That is often useful to know. MI does not give you that information at all. (Indeed in MI the "r.v.s" don't even need to be numbers so there is no notion of increase/decrease.) – antkam Sep 25 '19 at 19:33