3

This has been bugging me for some time. The famous result in probability is like

$E[Y] = E[E[Y|X]]$

Can someone write an intuitive explanation of the above?

  • $\mathbb{E}[\mathbb{E}[Z]] = \mathbb{E}[Z]$ since $\mathbb{E}[Z]$ is a constant and $\mathbb{E}[cste] = cste$. so what did you mean ? :) – reuns Apr 18 '16 at 15:13
  • 3
    @user1952009: Uh...what? – MPW Apr 18 '16 at 15:16
  • that's very clear : your double expectation, each one is on a different variable, you have to precise that – reuns Apr 18 '16 at 15:19
  • 2
    In the first expectation, you're computing the expected value of $Y$ given $X=x$. In the second expectation, you're finding the expected value $Y$, given that you have $X$. One way to think of it is you first integrate with respect to $y$, holding $X=x$ constant, i.e. you have a specific value of $X$ in mind. Then you integrate with respect to $x$, and it's ranging over all possible values of $X=x$. Then you're left with $E[Y]$ since you've integrated out all "information" about $X$. I hope that helps at least a bit. – smingerson Apr 18 '16 at 15:46
  • @MPW : when the context is unclear, you have to specify on what variable you are making the expectation, see here https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm#Description for an example of why it is really important – reuns Apr 18 '16 at 16:32

1 Answers1

1

The object $\mathbb{E} \left[Y \, \middle| \, X \right]$ is actually a random variable. There are very good attempts to explain how it works, e.g. this answer, but the idea is to make $Y$ "coarser" in the sense that it only varies over certain subsets of the underlying set $\Omega$, or, more specifically, over a coarser $\sigma$-algebra (this picture from Wikipedia is what I'm getting at. Note how the conditional expectations of $X$ only vary over certain subsets of the original range).

Now, the random variable $\mathbb{E} \left[Y \, \middle| \, X \right]$ should be consistent with the original random variable $Y$. This is in the sense that if $\mathbb{E} \left[Y \, \middle| \, X \right]$ remains constant in $U \subset \Omega$, it's value should be the "average" value of $Y$ in this set (see, again the picture). Now taking the expectation, i.e. integrating over $\Omega$, the average value of the random variable over the sets $U$ is what matters. Since these are the same for $\mathbb{E} \left[Y \, \middle| \, X \right]$ and $Y$, we have your identity $\mathbb{E}[\mathbb{E} \left[Y \, \middle| \, X \right]] = \mathbb{E}[Y].$

This is all very informal, but the rigorous explanation is a lot easier to find than answer like this. I hope this helps.

  • 1
    Oh, the picture is from here: https://en.wikipedia.org/wiki/Conditional_expectation#Formal_definition You also have a bit of explanation what is going on with it. – Matias Heikkilä Apr 18 '16 at 16:05
  • 1
    I would write $\mathbb{E}{Y | X}[Y]$ and $\mathbb{E}_X[\mathbb{E}{Y | X}[Y]]$ for disambiguation (if it is what you meant), we can even write $\mathbb{E}X[\mathbb{E}{Y | X}[Y]] = \mathbb{E}{(X, Y)}[Y] = \mathbb{E}{Y}[Y]$ – reuns Apr 18 '16 at 16:37