3

Let $P=\{p_1,\dots,p_n\}$ be $n$ points on the unit square with each of their coordinates drawn from a uniform distribution on $[0,1]$.

Then we add an extra point $q$ by the same distribution and we define $d_i=\|q-p_i\|$.

Would like to know what is the average and the minimum of the $d_i$'s. I assume that $d_i$'s are independent, so the are identically distributed. Am I right?

For the average distance we have $E[\text{Mean}(d_1,\dots,d_n)] = E[d_1]$ since they are all i.i.d. and the expectation can be found by the integral $$\int_0^1\int_0^1\int_0^1\int_0^1\sqrt{(x_1-x_2)^2+(x_3-x_4)^2}dx_1dx_2dx_3dx_4$$ which does not have a nice closed form but its value is approximately $0.5214$.

However, I am not sure how to figure out how $Z=E[\text{Min}(d_1,\dots,d_n)]$ depends on $n$.

I would expect that asymptotically it behaves like $c/n$. I say this because the probability that $Z<r$ can be bounded from above (unless I am wrong) by the area of $n$ disks with radius $r$.

When I generated numbers to test this, it seems that the asymptotic behaviour is closer to $1/n^{1/2}$ than $c/n$. Is there any straightforward way to get this exponent?

rom
  • 841

3 Answers3

3

What follows is really a heuristic rather than rigorous proof, but hopefully provides some intuition.

Consider first the case where $q$ is fixed and is given to be $q_0 = (1/2,1/2)$, so is in the middle of the box. For a fixed radius $r > 0$ let $D_r^{(q_0)} = \left\{x \, \colon \, |x-q_0| < r\right\}$ denote the disk of radius $r$, and define

$$N_r^{(q_0)} = \# \left\{ x_i \, \colon \, x_i \in D_r^{(q_0)}\right\},$$

the number of the (random) points that fall inside the disk. The probability that any given point is inside the disk is equal to

$$\mathbf P[ x_i \in D_r] = \frac{ \text{Area}( D_r \cap [0,1]^2 )}{\text{Area}([0,1]^2)}$$ for small enough values of $r$, $D_r \subset [0,1]^2$, and the formula above becomes

$$ \mathbf P[x_i \in D_r] = \pi r^2.$$

In particular it follows that for sufficiently small $r$, $N_r^{(q_0)}$ follows a binomial distribution with success probability $\pi r^2$. This is well defined so long as $r < 1/\sqrt{\pi}$.

Now we turn to the random variable you were interested in (with the caveat that we have fixed $q$), which we denote $M_n^{(q)}$

$$M_n^{(q_0)} = \min_{i = 1,\ldots,n} \{|x_i - q|\}$$

We can relate this to the random variable $N_r^{(q)}$ by the formula

$$ \mathbf P[ M_n^{(q_0)} \leq r] = \mathbf P[ N_r^{(q_0)} > 0]$$

That is: there is a point within a distance $r$ of $q_0$ (i.e. $M_n^{(q_0)} \leq r$), if and only if there is at least one point inside the disk $D_r^{(q_0)}$, which is to say $N_r^{(q_0)} > 0$.

So (under the assumption that $r$ is small) we have

$$ \begin{aligned} \mathbf P[ M_n^{(q_0)} \leq r] &= \mathbf P[\text{Bin}(n, \pi r^2) > 0] \\ & = 1 - (1-\pi r^2)^n \end{aligned} $$ That is, we have the CDF for the variable $M_n^{(q_0)}$, from which we can derive the PDF by differentiation

$$ \begin{aligned} f_n^{(q_0)}(r)& = \frac{d}{dr}\mathbf P[ M_n^{(q_0)} \leq r] \\ &= 2 \pi n r(1 - \pi r^2)^{n-1} \end{aligned} $$ which we define on the range $[0, 1/\sqrt{\pi}]$. Then we can calculate the expected value

$$ \mathbf E[M_n^{(q_0)}] = \int_0^{1/\sqrt{\pi}} r f_n^{(q)}(r)dr = \frac{n}{2} \frac{\Gamma(n)}{\Gamma(n+3/2)}$$

(I admit, that I used wolframalpha to derive this. We can now evaluate the Taylor series at infinity to get the asymptotic formula for this, which indeed is $O(n^{-\frac12})$, (again, I had to rely on wolframalpha for this).

So we have a heuristic for why given a fixed $q$ away from the boundary of the box we might expect the minimum distance to be of the order $n^{-\frac12}$.

Extending this to the case where $q$ is not fixed is now quite (heuristicallly!) simple by conditioning on q:

$$ \begin{aligned} \mathbf E[ M_n] & = \int_{[0,1]^2} \mathbf E[ M_n^{(\rho)} ] \mathbf P[q = \rho] d \rho \\ & = \int_{[0,1]^2} \mathbf E[ M_n^{(\rho)} ]d\rho \end{aligned} $$ For points $\rho$ away from the boundary, this expectation is independent of $\rho$, and since the vast majority of points are away from the boundary we have

$$ \begin{aligned} \mathbf E[ M_n] &\sim \int_{[0,1]^2} \mathbf E[ M_n^{(q_0)} ]d\rho \\ & = \mathbf E[ M_n^{(q_0)} ] \\ & = \frac{n}{2} \frac{\Gamma(n)}{\Gamma(n+3/2)} \end{aligned} $$

Again, this is all very heuristic, but hopefully gives you some intuition!

owen88
  • 4,660
3

An exact formula for finite values is probably too difficult.

An asymptotic for large $n$ can be obtained by noticing that it turns equivalent to a 2d Point Process with density $I=n/A=n$. In this case, the amount of points in a (small) circle of radius $r$ follows a Poisson distribution with $\lambda = I \pi r^2 = n \pi r^2$.

Then, for an additional point placed in the square (disregarding border effects, which should be asymptotically negligible), calling $D$ its distance from the nearest point, we have

$$P(D\ge d) = \exp (- n \pi d^2) \tag{1}$$

(the value of the Poisson probability evaluated at zero).

Hence the density is $$f_D(d) = 2 \pi \, n \, d \, \exp (- n \pi d^2) \tag{2}$$

which is a Rayleigh distribution . And its mean is

$$E[D]=\frac{1}{2\sqrt{n}} \tag{3}$$

Alternatively (perhaps slightly more precise here, given that $n$ is fixed, but also a little more clumsy, and asymptotically equivalent), is to use a Binomial instead of a Poisson, so that $(1)$ turns into $P(D\ge d) = (1- \pi d^2)^n $ and we get owen88's answer. Then, because $\frac{\Gamma(n)}{\Gamma(n+a)} \to n^{-a}$ we get the same asymptotic mean as in $(3)$.

leonbloy
  • 66,202
0

See D. Moltchanov Distance distributions in random networks (Ad Hoc Networks 10(6), 2012).

They give the distribution to the $n^{\text{th}}$ nearest neighbour of $q$ as a Gamma distribution.

apg
  • 2,815
  • 1
  • 21
  • 34