1

This question follows my previous one here, which is about the optimal classifier $g^*$ in case $X$ follows normal distribution.


Let $X,Y$ be random variables in which

  • $X$ follows normal distribution.

  • $Y$ takes values in $\{-1,1\}$.

In measure-theoretic probability theory, $\mathbb P [ Y = 1 | X = x] := \mathbb E[\mathbf{1}_{\{Y=1\}} | \mathbf{1}_{\{X=x\}}]$ and $\mathbb P [ X = x] := \mathbb E[\mathbf{1}_{\{X=x\}}]$. Here $\mathbf{1}_{\{Y=1\}}$ and $\mathbf{1}_{\{X=x\}}$ are both integrable random variables. Then $\mathbb P [ Y = 1 | X = x]$ is well-defined even if $\mathbb P [ X = x] = 0$.


I would like to ask if it's possible that $\mathbb P [ Y = 1 | X = x] >0$ whereas $\mathbb P [ X = x] = 0$?

Thank you so much for your clarification!

Akira
  • 18,439
  • Isn't $\mathbb{P}[X=x]=0$ for every $x$ if $X$ is normally distributed? – uniquesolution Sep 10 '20 at 21:30
  • @uniquesolution I think $\mathbb{P}[X=x]=0$. That's why I choose $X$ that follows normal distribution. – Akira Sep 10 '20 at 21:33
  • If $X$ and $Y$ are independent, then $P(Y=1|X=x)=P(Y=1)$. So if $Y$ is supported on ${-1,1}$ then $P(Y=1|X=x)>0$. –  Sep 10 '20 at 21:38
  • Your definition of $P(Y = 1 | X = x)$ is incorrect. Conditioning on the sigma algebra generated by the measure zero event ${X = x}$ is the same as taking expectation. Your best guess for $Y$ given that a measure zero event happened is just the expectation. – Elle Najt Sep 10 '20 at 22:29
  • @LorenzoNajt If $\mathbb P [ Y = 1 | X = x] := \mathbb E[\mathbf{1}{{Y=1}} | \mathbf{1}{{X=x}}]$ is not correct, please elaborate on the correct definition of $\mathbb P [ Y = 1 | X = x]$. – Akira Sep 11 '20 at 05:26
  • 1
    @LAD I did that pretty extensively in your other question... if there's a specific question about that I'm happy to try to answer. Do you know how to condition on a sigma algebra? – Elle Najt Sep 11 '20 at 05:35
  • @LorenzoNajt I'm sorry, but I seem to be overloaded :( – Akira Sep 11 '20 at 05:36
  • @LAD No problem. Maybe this will help: https://en.wikipedia.org/wiki/Regular_conditional_probability – Elle Najt Sep 11 '20 at 05:38
  • Thank you so much for your dedicated help @LorenzoNajt. – Akira Sep 11 '20 at 05:38
  • @LAD No problem. That wikipedia page seems like it might be more confusing than helpful. I think it would be easiest to follow bullet 1. in my answer on the other page , and come back to this question when you've learned how to condition on a sigma algebra. – Elle Najt Sep 11 '20 at 06:38
  • 1
    The correct interpretation of "$\mathsf{P}(Y=1\mid X=x)$" is a (Borel) function $f(x)$ s.t. $$ f(X(\omega))=\mathsf{P}(Y=1\mid X)(\omega). $$ –  Sep 11 '20 at 09:48

2 Answers2

1

Here's an example to help you understand what's going on.

Suppose that $10$ people enter an elevator with capacity $2000$lbs. Denote $X$ as the combined weight of all $10$ people in the elevator. (I chose this example because weight is a random variable that classically possesses a normal distribution.) Now define an indicator random variable $Y$ such that $Y=1 \iff$ capacity is exceeded and $Y=0$ otherwise. Then $$P(Y=1|X=2200)=1 $$

  • This result is impossible and very counter-intuitive in classical probability in which we define $P(A \mid B)=\frac{P(A \cap B)}{P(B)}$. – Akira Sep 10 '20 at 21:49
  • @LAD, I think the more fundamental relationship is $P(A \cap B) = P(A|B) \times P(B)$ i.e., when you divide by $P(B)$ you've already made the assumption that $P(B) \neq 0$. However, $P(A \cap B) = P(A|B) \times P(B)$ would still hold true in Matthew Holder's case - isn't it? – Kartik Sep 10 '20 at 21:57
  • That's the formula you use if you assume $P(B)\neq 0$. That formula simply doesn't work if $P(B)=0$.

    If you have a discrete random variable $Y$ and a continuous random variable $X$ and you seek to compute $P(Y=1|X=x)$ you need to have some information on the joint density of $(X,Y)$ which is usualy denoted by $f_{XY}$. Then $$P(Y=1|X=x)=f_{Y|X=x}(1|x)=\frac{f_{XY}(x,1)}{\sum_{y}f_{XY}(x,y)}$$

    –  Sep 10 '20 at 22:00
  • 1
    It's kind of like this. We all know and love the fact that $\int x^ndx=\frac{x^{n+1}}{n+1}+C$. But this formula doesn't work if $n=-1$ so you need to look elsewhere to get $\int x^{-1}dx$. This doesn't mean $\int x^{-1}dx$ can't be computed; it simply means you need to use some other tools in your toolbox. –  Sep 10 '20 at 22:03
1

Yes. Let $X,Y$ be independent. Using this for the conditional expectation with respect to the sigma field generated by $X$, we get: $$E[\mathbb{1}_{Y=1}|\sigma(X)] = E[\mathbb{1}_{Y=1}] = P(Y=1)$$

Winger 14
  • 2,329
  • I'm sorry for this duplicate comment. This result is impossible and very counter-intuitive in classical probability in which we define $P(A \mid B)=\frac{P(A \cap B)}{P(B)}$. – Akira Sep 10 '20 at 21:53
  • I am not sure I understand what is counter-intuitive... You're formula would yield $\frac{0}{0}$ which is undefined. Using sigma-fields helps us give meaning to that expression. In general, we need the joint density to compute $P(Y|X)$ anyway! – Winger 14 Sep 10 '20 at 21:59
  • And in this specific example, we get $P(A \cap B) = P(A) P(B)$ by independence and can then simplify the formula – Winger 14 Sep 10 '20 at 22:01