28

I have two red points, $r_1$ and $r_2$, and two blue points, $b_1$ and $b_2$. They are all placed randomly and uniformly in $[0,1]^2$.

Each dot points to the closest dot from another colour; closest is defined wrt the Euclidean distance. We use $x \to y$ to indicate dot $x$ points to dot $y$.

If $r_1 \to b_1$, what is the probability that $r_2 \to b_1$ too?

NOTE it must be larger than 1/2 because $r_1 \to b_1$ tells us in a way that $b_1$ is likely to have a centric location, and thus is likely than it is closer to $r_2$ too than $b_2$.

joriki
  • 242,601
fox
  • 521
  • 6
    Quick simulation suggests somewhere between $0.568$ and $0.57$. – Arthur Jul 19 '17 at 13:29
  • @Arthur: Of course a simulation is the way to go (obtaining an exact answer seems unlikely), but intuitively, I would have guessed that the closest point to $r_2$ is least likely to be $r_1$ and most likely to be $b_2$, which, if correct, would imply that $r_2 \to b_1$ has probability less than $1/2$. – quasi Jul 19 '17 at 13:53
  • Oh, I misread the problem. I didn't see that the arrows must go from a point of one color to a point of the other one. I think my intuition was right if the arrows are distance conscious, but color indifferent. – quasi Jul 19 '17 at 14:01
  • Ok, I just wrote a simulation, and my results (with $10^4$ trials) yield a frequency of $.5681$ for $r_2 \to b_1$ confirming Arthur's simulation results. I think the intuition is that when $r_1$ is the actual closest point to $r_2$, then most likely, we also get $r_2 \to b_1$, so that explains the bias. – quasi Jul 19 '17 at 14:18
  • 1
    For those interested in a visual demonstration: https://www.desmos.com/calculator/tqzegmzjii. Written loosely, the probability is $$p=\int^{(1,1)}{(0,0)}\int^{(1,1)}{(0,0)}\int^{(1,1)}_{(0,0)}\begin{cases}\text{Green area inside unit square}&\text{$b_2$ outside circle}\0&\text{$b_2$ inside circle}&\end{cases},,,dr_1,db_1,db_2$$ ($b_2$ is the blue dot). The trouble is finding a formula for the green area because of edge cases. Maybe this can be avoided if the integral was split up into cases. – Shuri2060 Jul 19 '17 at 14:42
  • @Shuri2060: your simulation is correct only if the yellow/orange point is closer to the red point than to the blue point, isn't it? – Jack D'Aurizio Jul 19 '17 at 14:48
  • @JackD'Aurizio Yes - so the blue point has to lie outside the circle. – Shuri2060 Jul 19 '17 at 14:49
  • $5/9$ for the $1D$ analogue of the problem. – Dhruv Kohli Jul 19 '17 at 14:50
  • @expiTTp1z0 : I think it is actually $7/12$ for the analogue in one dimension (confirmed both by simulation and computation). – tristan Jul 19 '17 at 15:30
  • 7
    If the (normal through the segment connecting the) blue points divide the square into areas $p, 1-p$, then the event happens with (conditional) probability $p^2+(1-p)^2$. So $P=\int_0^{1/2} (p^2+(1-p)^2) f(p), dp$ where $f$ is the density of such a division happening. This can certainly be worked out explicitly, but it's messy since various cases will have to be distinguished, depending on where the border between the two areas hits the boundary of the square. –  Jul 19 '17 at 20:01
  • How did you get the 7/12, @tristan? – fox Jul 19 '17 at 21:39
  • @JosuéOrtega Consider the Voronoï diagram generated by $b_1$ and $b_2$ inside the square. The probability we're looking for is $2\mathbb{E}(A^2)$, where $A$ is the area of the Voronoï cell corresponding to $b_1$. In one dimension, $A$ is easy to compute : it is (assuming $b_1 < b_2$) $A=(b_2+b_1)/2$. So by symmetry our probability is $4\int_0^1\int_y^1 ((x+y)/2)^2 \mathrm{d}x\mathrm{d}y$. It is now straightforward : https://www.wolframalpha.com/input/?i=4*int(int(((x%2By)%2F2)%5E2,x,y,1),y,0,1) – tristan Jul 19 '17 at 22:05
  • In theory, this method could apply in two dimensions, but if there is a formula for the area of the Voronoï cell, I guess it would be quite messy (with many cases), so it may be too hard to write, let alone integrate. Maybe it is doable with the right software. – tristan Jul 19 '17 at 22:09
  • 1
    Can someone confirm that the requested probability is the same as $2*P(r_1 \to b_1$ and $ r_2 \to b_1)$? – Varun Vejalla Apr 22 '19 at 23:46
  • @automaticallyGenerated Yes, since $P(r_1 \to b_1) = 1/2$ by symmetry. – ronno Jan 13 '20 at 18:00

2 Answers2

3

Starting off in the same manner as Yanior Weg, assume $b_1$ and $b_2$ are fixed. Then $P(r_2 \to b_1|r_1 \to b_1) = \frac{P(r_2 \to b_1 \bigcap r_1 \to b_1)}{P(r_1 \to b_1)} = \frac{P(r_2 \to b_1 \bigcap r_1 \to b_1)}{\frac{1}{2}} = 2 \cdot P(r_2 \to b_1 \bigcap r_1 \to b_1) = 2(P(r_1 \to b_1))^2$

In this Desmos plot, the depiction of this can be seen (I'll be referring to this later, so it may be useful to have it open). $P(r_1 \to b_1)$ is the area of the shaded region inside the square. By limiting $x_2, y_2$ such that $x_1 < x_2 < 1$, $y_1 < y_2 < 1$, later calculations can be simplified. The final answer is then $$\int_0^1\int_0^1\int_{x_1}^1\int_{y_1}^14\cdot2A^2 dy_2dx_2dy_1dx_1 = 8\int_0^1\int_0^1\int_{x_1}^1\int_{y_1}^1A^2 dy_2dx_2dy_1dx_1$$, where $A$ is the area of the shaded region inside the square. $2A^2$ is multiplied by $4$ to account for $x_2<x_1$ and $y_2<y_1$.

The line separating the shaded region has equation $$y = f\left(x\right)=-\frac{x_{1}-x_{2}}{y_{1}-y_{2}}x+\frac{\left(x_{1}-x_{2}\right)\left(x_{1}+x_{2}\right)}{2\left(y_{1}-y_{2}\right)}+\frac{y_{1}+y_{2}}{2}$$

This intersects $y = 1$ at $$x = I_1 = \frac{\left(y_{1}-y_{2}\right)\left(y_{1}+y_{2}-2\right)}{2\left(x_{1}-x_{2}\right)}+\frac{x_{1}+x_{2}}{2}$$

and $y = 0$ at $$x = I_2 = \frac{\left(y_{1}-y_{2}\right)\left(y_{2}+y_{1}\right)}{2\left(x_{1}-x_{2}\right)}+\frac{x_{1}+x_{2}}{2}$$

There are now four cases: $(1)\ I_1 < 0, I_2 < 1, \ (2)\ I_1 < 0, I_2 > 1, \ (3)\ I_1 > 0, I_2 < 1$, and $(4)\ I_1 > 0, I_2 > 1$.

Letting $$F(x) = \int_0^x f(t)dt = -\frac{x_{1}-x_{2}}{2\left(y_{1}-y_{2}\right)}x^{2}+\left(\frac{\left(x_{1}-x_{2}\right)\left(x_{1}+x_{2}\right)}{2\left(y_{1}-y_{2}\right)}+\frac{y_{1}+y_{2}}{2}\right)x$$ here $A_i$ represents the area of case $i$:

$A_1 = F(I_2)$

$A_2 = F(1)$

$A_3 = I_1 - F(I_1) + F(I_2)$

$A_4 = I_1 - F(I_1) + F(1)$

From here, $$J = \underbrace{\int_{(1)}A_1^2dy_2dx_2dy_1dx_1}_{J_1}+\underbrace{\int_{(2)}A_2^2dy_2dx_2dy_1dx_1}_{J_2}+\underbrace{\int_{(3)}A_3^2dy_2dx_2dy_1dx_1}_{J_3}+\underbrace{\int_{(4)}A_4^2dy_2dx_2dy_1dx_1}_{J_4}$$, where $(1)$ is the region in $x_1, y_1, x_2, y_2$ such that case 1 happens (restricted to $0 < x1 < x2 < 1$ and $0 < y1 < y2 < 1$), etc.

To simplify, making the substitution $x_s = x_2 + x_1, x_d = x_2 - x_1$ and $y_s = y_2 + y_1, y_d = y_2 - y_1$ helps a lot. The integral would then need to be multiplied by the Jacobian of $\frac{1}{4}$.

For case $1$, the integral can be written out as $$J_1 = \frac{1}{4}\int_0^1 \int_{x_d}^{2-x_d} \left(\int_{1-\sqrt{1-x_{d}x_{s}}}^{x_{d}}\int_{0}^{-\frac{x_{d}x_{s}}{y_{d}}+2}A_1^2dy_{s}dy_{d}+\int_{x_{d}}^{\sqrt{2x_{d}-x_{d}x_{s}}}\int_{0}^{\frac{2x_{d}-x_{d}x_{s}}{y_{d}}}A_1^{2}dy_{s}dy_{d}-\int_{1-\sqrt{1-x_{d}x_{s}}}^{\sqrt{2x_{d}-x_{d}x_{s}}}\int_{0}^{y_{d}}A_1^{2}dy_{s}dy_{d}\right)dx_s dx_d = \frac{1}{32}-\frac{1}{24}\ln(2)$$

For case $2$: $$J_2 = \frac{1}{4}\int_0^1 \int_{x_d}^{2-x_d} \left(\int_{x_{d}}^{\sqrt{x_{s}x_{d}}}\int_{0}^{2-\frac{x_{s}x_{d}}{y_{d}}}A_2^{2}dy_{s}dy_{d}+\int_{\sqrt{x_{s}x_{d}}}^{1}\int_{0}^{2-y_{d}}A_2^{2}dy_{s}dy_{d}-\int_{x_{d}}^{\sqrt{2x_{d}-x_{d}x_{s}}}\int_{0}^{\frac{2x_{d}-x_{d}x_{s}}{y_{d}}}A_2^{2}dy_{s}dy_{d}-\int_{\sqrt{2x_{d}-x_{d}x_{s}}}^{1}\int_{0}^{y_{d}}A_2^{2}dy_{s}dy_{d}\right) dx_s dx_d = \frac{2501}{14400}-\frac{3}{80}\pi-\frac{2}{45}\ln\left(2\right)$$

For case $3$: $$J_3 = \frac{1}{4}\int_0^1 \int_{y_d}^{2-y_d} \left(\int_{y_{d}}^{\sqrt{y_{d}y_{s}}}\int_{0}^{2-\frac{y_{d}y_{s}}{x_{d}}}A_{3}^{2}dx_{s}dx_{d}+\int_{\sqrt{y_{d}y_{s}}}^{1}\int_{0}^{2-x_{d}}A_{3}^{2}dx_{s}dx_{d}-\int_{y_{d}}^{\sqrt{y_{d}\left(2-y_{s}\right)}}\int_{0}^{\frac{y_{d}\left(2-y_{s}\right)}{x_{d}}}A_{3}^{2}dx_{s}dx_{d}-\int_{\sqrt{y_{d}\left(2-y_{s}\right)}}^{1}\int_{0}^{x_{d}}A_{3}^{2}dx_{s}dx_{d}\right) dy_s dy_d = \frac{2501}{14400}-\frac{3}{80}\pi-\frac{2}{45}\ln\left(2\right)$$

For case $4$: $$J_4 = \frac{1}{4}\int_{0}^{1}\int_{1-\sqrt{1-x_{d}^{2}}}^{x_{d}}\int_{2+\frac{y_{d}^{2}-2y_{d}}{x_{d}}}^{2-x_{d}}\int_{\frac{2x_{d}-x_{d}x_{s}}{y_{d}}}^{2-y_{d}}A_{4}^{2}dy_{s}dx_{s}dy_{d}dx_{d}+\frac{1}{4}\int_{0}^{1}\int_{x_{d}}^{\sqrt{2x_{d}-x_{d}^{2}}}\int_{\frac{y_{d}^{2}}{x_{d}}}^{2-x_{d}}\int_{2-\frac{x_{d}x_{s}}{y_{d}}}^{2-y_{d}}A_{4}^{2}dy_{s}dx_{s}dy_{d}dx_{d} = -\frac{95}{288}+\frac{1}{12}\pi+\frac{1}{8}\ln(2)$$

Adding these up yields that $J = \frac{39}{800}+\frac{\pi}{120}-\frac{\ln(2)}{180}$. Multiplying by $8$ gives the final answer as $$\frac{39}{100}+\frac{\pi}{15}-\frac{2\ln(2)}{45} \approx 0.569$$, which is in the simulated range Arthur mentioned in the comments.

2

From what you have written, J assume that all random dots in the question are distributed independently.

Now, suppose $b_1 = (x_1, y_1)$ and $b_2 = (x_2, y_2)$ are fixed. Thus for a random dot $r_1$, uniformly distributed on $[0; 1]^2$, the probability, that it lies closer to $b_1$, than to $b_2$ is $\mu(\{(x, y) \in [0;1]^2| (x_1 + x_2 - 2x)(x_2 -x_1) + (y_1 + y_2 - 2y)(y_2 -y_1)>0\})$, where $\mu$ stands for Lebesgue measure. Now, as $r_1$ and $r_2$ are independent, then the probability, that both $r_1$ and $r_2$ lie closer to $b_1$, than to $b_2$ is $(\mu(\{(x, y) \in [0;1]^2| (x_1 + x_2 - 2x)(x_2 -x_1) + (y_1 + y_2 - 2y)(y_2 -y_1)>0\}))^2$.

Now as $b_1$ and $b_2$ are also independent and uniformly distributed, we can conclude, that in our initial problem $$P(r_2 \to b_1|r_1 \to b_1) = \frac{\int_0^1 \int_0^1 \int_0^1 \int_0^1 (\mu(\{(x, y) \in [0;1]^2| (x_1 + x_2 - 2x)(x_2 -x_1) + (y_1 + y_2 - 2y)(y_2 -y_1)>0\}))^2dx_1dy_1dx_2dy_2}{\int_0^1 \int_0^1 \int_0^1 \int_0^1 \mu(\{(x, y) \in [0;1]^2| (x_1 + x_2 - 2x)(x_2 -x_1) + (y_1 + y_2 - 2y)(y_2 -y_1)>0\}))dx_1dy_1dx_2dy_2}$$

Chain Markov
  • 16,012