How to efficiently compute the most isolated point?

Question

Given a finite set $S$ of points in $\mathbb R^d$, how can we efficiently compute a "most isolated point" $x\in S$?

We define a "most isolated point" $x$ by

$$x = \arg\max_{p \in S} \min_{q \in S \setminus \{p\}} d(p,q)$$

(I used the $x=\arg\min$ notation even though it is not necessarily unique. Here $d$ denotes the euclidean distance.) So in other words we're looking for a point with the largest distance to the closest neighbour.

A naive algorithm would be computing all pairwise distances, finding the neighbour with the least distance for every point and then finding the maximum of these. This is takes $O(n^2)$ operations, but can we do better than that?

score 1 · Answer 1 · answered Mar 18 '18 at 17:31

Use any algorithm for all nearest neighbors; then you can trivially solve your problem. Such an algorithm finds, for each data point, its nearest neighbor. The most isolated point is the one whose nearest neighbor is farthest away, so once you've solved all nearest neighbors, you can find the most isolated point by a simple linear scan.

Apparently all nearest neighbors can be found in $O(n \log n)$ time; see the references on Wikipedia. Or, if you want something to implement, take any data structure for nearest neighbors, and for each point $p$, find its nearest neighbor.

score 0 · Answer 2 · answered Mar 18 '18 at 15:32

As suggested in the comments I would look into nearest neighbor queries.

Doing one NN-Query per point should be in the order of $O(n* log (n))$ so it is already better than the naive solution.

You can further improve that by adding a parameter to the NN-Query that contains the nearest neighbor distance $d_{max}$ of the most isolated point that you found so far. You can then abort any NN-query as soon as it finds a point that is closer than $d_{max}$. This should speed up your search quite a bit.

Btw, people often suggest KD-Trees for NN-Search. KD-Trees are very easy to implement but in my experience consistently scale less well with higher dimensions than other trees. For $d > 10$ or so I would recommend using an R-Tree, such as R*Tree (R-Star-Tree), X-Tree or STR-loaded R-Tree, or an PH-Tree (which is more like a bitwise quadtree).

How to efficiently compute the most isolated point?

2 Answers2