5

I understand the probability of the dart hitting the center point is 0, and I think this is used as an example of how a probability of 0 doesn't mean something is impossible.

Now image we define 3 arbitrary points on the dart board, say, the center point C, and the two points between the center and the top/bottom edges called T and B respectively. Also let's suppose the player throws the dart in such a way that it's equally likely to hit any point on the board. What's the probability of the dart having landed on C given that we know it already landed on one of the 3 points?

Intuitively the answer seems to be 1/3. But working out the math would give an undefined result if I attempt to solve it like P(C|T∪C∪B) = P(C)/P(T∪C∪B) = 0/0. Is there a way to properly solve this? Or is the only way to just assign equal probabilities to all points and ignore the fact that the probability of hitting any one point is zero?


EDIT: I think I made some progress thanks to your comments, but I hope someone smarter than myself can comment on my attempted solution. I think the zero probability comes from dividing 1/∞, which afaik is by itself not well defined. The full rigorous expression would be

$\lim_{x \to \infty} \frac{1}{x}$

Intuitively $x$ is the number of points in the board, so the probabilities of the dart landing on each point is $1/x$ and the probability of landing on any of the 3 points is $3/$x, so the full probability now becomes:

$\lim_{x \to \infty} \frac{1/x}{3/x}$

Which if I'm not mistaken is just 1/3.


EDIT 2: I get the impression that most comments and answers here suggest that it's impossible to calculate these probabilities unless we come up with some ad hoc definitions. I just want to clarify that the dart board is a metaphor for a random point in a circle which, in my above example, has a uniform distribution.

Since the example is so trivial it provides little motivation to actually solve it, so here is another example that is a little less trivial based on @Vincent's comment.

Imagine a random real generator R that generates a real number from -1 to +1 that has a probability density function D. Also, imagine that I wrap R into another function F that returns the absolute value of the number produced by R like F = ABS(R()).

So, let's say we run F and it outputs $n$. What's the probability that the number generated by R was actually $n$ (as opposed to $-n$), given that we know the density function D?

If I'm not mistaken, the probability is just $\frac{D(n)}{D(-n)+D(n)}$, which I can't prove but intuitively seems right.

Applying the same logic to the original dart problem would again give 1/3 without having to deal with divisions by zero (at least not explicitly) and without the need for any ad hoc definition.

Juan
  • 1,069
  • 3
    Conditioning on events of probability $0$ is, at best, very problematic. See this related question – lulu Aug 12 '23 at 13:21
  • "the probability of the dart hitting the center point is 0" because no matter how close you hit the dart it could still be closer to the selected point ? – farnood gholampoor Aug 12 '23 at 13:27
  • 1
    @farnoodgholampoor I don't think that's why, more like because there are infinitely many points on the board, so the probability of hitting any one point is 1/∞=0. But I guess it you put it that way it could be a step towards a solution if you use 1/∞ instead of 0 and cancel out all infinities, which would give 1/3. Not sure if you can toss around infinity like that though. – Juan Aug 12 '23 at 13:37
  • 2
    You might be better off by reasoning for very small (and of course disjoint) $\varepsilon$-neighborhoods of $C,T,B$. You will see that the answer is $1/3$ for any $\varepsilon >0$. – Snoop Aug 12 '23 at 13:52
  • Often when people use dartboard (or other geometric) examples in probability, there is an implicit assumption that every point is equally likely to be hit. This underlies also more well-defined computations where the probability of hitting some region is proportional to the the area of that region. To answer the current question you should make the implicit assumption (and the reasoning behind it) explicit so you can see if this symmetry is preserved under taking conditional probability – Vincent Aug 12 '23 at 20:16
  • An easier example is throwing a die. We first look at how symmetric it is and conclude that all six probabilities should be equal, and only after that compute from that fact and the fact that they must sum to 1 that these probabilities must then each be 1/6. Now if someone would claim that you threw an odd number of eyes in one throw the symmetry is still there and we can easily see that then the conditional probability of having thrown 5 is 1/3 – Vincent Aug 12 '23 at 20:19
  • The dartboard example is a bit unfortunate in this sense because it is not obvious at all that every point is equally likely. If someone is aiming for the center you would expect the probablity of hitting a given point going down with the distance from the center while keeping radial symmetry. But then of course people should specify exactly HOW this probability goes down before you can compute anything – Vincent Aug 12 '23 at 20:20
  • @Vincent I was trying to say that it was uniformly distributed when I wrote the player is equally likely to hit every point on the board. I made a new edit that is less "metaphoric" and also uses an actual probability distribution function. – Juan Aug 13 '23 at 12:06
  • I agree with your intuition, but there's sort of a philosophical issue with even the example with an absolute value of a random number. Probability is sometimes defined as the limiting ratio of events if essentially the same conditions are repeated many times. But the probability the generator will give exactly $n$ or $-n$ ever again is zero, meaning it's not practical to measure or confirm the claimed conditional probability. – aschepler Aug 13 '23 at 12:19
  • On the other hand, there are plenty of practical and useful probability calculations that don't really fit well with that repeated conditions definition. Now this is getting more into philosophy of mathematics... – aschepler Aug 13 '23 at 12:20
  • @aschepler Maybe we could still turn my second example into repeated cases, for instance if we want to define a function that guesses the original random number produced by R given the output of F and the density function D. – Juan Aug 13 '23 at 14:00
  • And then we want the function to guess the right values most of the times, in fact I don't think it'd be too difficult to test this in an actual prgram. – Juan Aug 13 '23 at 15:48
  • 2
  • Yes, @Karl, in particular the E. T. Jaynes quote "the term 'great circle' is ambiguous until we specify what limiting operation is to produce it." The answer $1/3$ is intuitive because one implicitly assumes a limiting process that involves three congruent disks at each step. – Jim Ferry Aug 27 '23 at 18:08
  • @Karl Wow that was mind blowing actually, had no idea about that paradox. I may be completely wrong here but I think the paradox arises because there are two steps at which you implicitly divide by zero, going from area to line and then to point. I don't think you would get a paradox if you go from line to point, or even from area straight to point as in my example. – Juan Aug 29 '23 at 12:47

2 Answers2

3

There is no general definition for conditioning on a particular event of probability zero. You can make your own definition if you like, such as in the comment of Snoop, although its usefulness is limited. Conditional probabilities are useful because they can be summed or integrated such as $$ P[A]=\sum_{i=1}^{\infty} P[A|B_i]P[B_i] \quad, \quad E[X]=E[E[X|Y]]$$ and single events of probability zero do not affect the sum or integral.

Suppose $Y$ is a continuous random variable with a well defined PDF (in particular, $P[Y=y]=0$ for all $y \in \mathbb{R}$). Suppose $A$ is an event. Then there are many versions of $E[1_A|Y]$. All versions have the form $h(Y)$ for some measurable function $h:\mathbb{R}\rightarrow\mathbb{R}$, and any two versions $h(Y)$ and $\tilde{h}(Y)$ satisfy $P[h(Y)=\tilde{h}(Y)]=1$. You can take any particular version $h(Y)$ and "define" $$P[A|Y=y]=h(y) \quad \forall y \in\mathbb{R} \quad (Eq. *)$$ The understanding is that this definition is only meaningful "in the aggregate." It is useful for the "vast majority" of $y\in \mathbb{R}$, but it need not make any sense for a particular finite or countably infinite set of $y$ values in $\mathbb{R}$. You can change the value of $h(0.4)$ to anything you like and it will not change $\int_{-\infty}^{\infty} h(y)f_Y(y)dy$.


You can see what happens when you cast your problem into $E[X|Y]$ notation. Let $(U,V)$ denote the random location of the dart, assumed to be uniform over a ball of radius 1. Fix points $(a_1, b_1), (a_2,b_2), (a_3,b_3)$ in the ball and define random variables $X$ and $Y$ by indicator functions: \begin{align} X &= 1_{\{(U,V)=(a_1,b_1)\}}\\ Y&=1_{\{(U,V)\in \{(a_1,b_1), (a_2, b_2), (a_3,b_3)\}} \end{align} Then $E[X|Y]$ has infinitely many versions. Since $P[Y=1]=0$, it holds that $E[X|Y]$ is a version of the conditional expectation of $X$ given $Y$ if and only if $E[X|Y]=h(Y)$ for some function $h:\{0,1\}\rightarrow\mathbb{R}$ that satisfies $h(0)=0$. That means $h(1)$ is allowed to be any number you like. All such functions $h$ satisfy $P[h(Y)=0]=1$.

So if we define $$P[(U,V)=(a_1,b_1)| (U,V)\in\{(a_1, b_1), (a_2,b_2),(a_3,b_3)\}]=h(1)$$ we see that this value $h(1)$ can take any real number (even negative numbers, or numbers larger than 1). It does not affect anything since $P[(U,V)\in\{(a_1, b_1), (a_2,b_2),(a_3,b_3)\}]=0$.


Weird example: If we assume the above random vector $(U,V)$ can take any value in the set $B=\{(u,v):u^2+v^2\leq 1\}$, and is uniformly distributed over $B$, we can define a random vector $(R,S)$ by $$(R,S) = \left\{\begin{array}{cc} (U,V) & \mbox{if $(U,V) \notin \{(a_2, b_2), (a_3,b_3)\}$}\\ (a_1,b_1) &\mbox{if $(U,V) \in \{(a_2,b_2), (a_3, b_3)\}$} \end{array}\right.$$ Then $P[(U,V)=(R,S)]=1$, and so $(R,S)$ is also uniformly distributed over $B$. However, if we are told that $(R,S) \in \{(a_1, b_1), (a_2, b_2), (a_3,b_3)\}$ then we know for sure that $(R,S)=(a_1,b_1)$.


Towards your new example, suppose for simplicity that $R \sim Unif[-1,1]$ and define $F=|R|$. Since you are now conditioning on a continuous random variable $F$, there is more justification in saying that $P[R\geq 0|F=f]=1/2$ "for almost all $f \in [0,1]$" because we can use the above equation (*) in the aggregate.

Here is how: Define $A=\{R\geq 0\}$. Then $1_A$ is a 0/1 valued random variable that is 1 if and only if $R\geq 0$. Then $E[1_A|F]$ exists and has infinitely many versions, each version has the form $h(F)$ for some function $h:\mathbb{R}\rightarrow\mathbb{R}$. The most basic version is: $$h(f) = \left\{\begin{array}{cc} 1/2 & \mbox{if $f \in [0,1]$} \\ 0 & \mbox{else} \end{array}\right.$$ Then, as in (*), we can interpret $$ P[R\geq 0|F=f] = h(f) \quad \forall f \in [0,1]$$ and so, using this particular $h$ function, we can define $$ P[R\geq 0|F=f]=1/2 \quad \forall f \in [0,1]$$ However, we can define $\tilde{h}:\mathbb{R}\rightarrow\mathbb{R}$ by changing the value $h(0.3)$ to any value we like: $$ \tilde{h}(f) = \left\{\begin{array}{cc} 1/2 & \mbox{if $f \in [0,1]$ and $f\neq 0.3$} \\ 0.9 & \mbox{if $f=0.3$} \\ 0 & \mbox{else} \end{array}\right.$$ and $\tilde{h}(F)$ is also a valid version of $E[1_A|F]$. You may actually prefer to change $h(0)$ to the value 1, but the point is it does not really matter if you change it at particular points. It turns out that any other valid version must correspond to a function, call it $h_{other}(f)$, that satisfies $h_{other}(f)=1/2$ for almost all $f \in [0,1]$. So "in the aggregate" it makes sense to say the answer is really $1/2$.

Michael
  • 26,378
  • I really appreciate your elaborate answer, but I'm unable to parse most of it simply because I have no idea what E[X] means in this context. But it seems that you're implying that there's no way to arrive to the 1/3 probability unless we come up with some ad hoc definitions? Can you please take a look at my second edit which has another version of the same problem? – Juan Aug 13 '23 at 12:03
  • $E[X]$ denotes the expectation of $X$. If $X$ takes values ${0, 0.2, 8}$ with probabilities $1/2, 1/4, 1/4$, then $E[X]=(1/2)(0) + (1/4)(0.2) + (1/4)(8)=2.05$. If $R \sim Uniform [-1,1]$ then $E[R]=\frac{1}{2}\int_{-1}^{1}rdr = 0$. For any random variables $X,Y$ (where $E[X^2]$ is finite) there is a well defined "version" of conditional expectation $E[X|Y]$ and it is this concept of conditional expectation where modern probability resolves its "conditioning on a prob-0 event" issues. – Michael Aug 14 '23 at 00:06
  • I have added to my answer to treat your new example with $R$ and $F$. Since you are now conditioning on a continuous random variable with a valid PDF it makes sense to use the "aggregate formula" $$P[A|F=f]=\frac{f_{F|A}(f)P[A]}{f_F(f)}$$ which in your case, if $A={R\geq 0}$ then it reduces to $\frac{f_R(f)}{f_R(-f)+f_R(f)}$ for "relevant" $f \in [0,1]$ (the denominator is nonzero on relevant $f$). – Michael Aug 14 '23 at 01:02
1

Since the tag was included as one of the tags under this question, I would point out the following. Let's assume for simplicity that we are dealing with a 1-dimensional situation, such as the interval $[0,1]$ (instead of a 2-dimensional situation, where a similar analysis can be performed).

It turns out that the probability of hitting a point does not have to be zero. Bernstein and Wattenberg in their article

Bernstein, Allen R.; Wattenberg, Frank. Nonstandard measure theory. Applications of Model Theory to Algebra, Analysis, and Probability (Internat. Sympos., Pasadena, Calif., 1967), pp. 171–185, Holt, Rinehart and Winston, New York-Montreal, Que.-London, 1969

developed the following approach. One includes the real points of the interval $[0,1]$ in a hyperfinite set $S$ (using a suitable embedding $\mathbb R \to {}^\ast\mathbb R$) of internal cardinality $H$, where $H$ is a nonstandard integer. Then one assigns a probability of $\frac1H$ to each point in $S$. In particular, each real number is assigned a nonzero probability. Then the calculation you mentioned with Bayes formula goes through (since the determinant is nonzero), giving the expected answer $\frac13$.

Mikhail Katz
  • 47,573