5

I'm looking for the most efficient spatial-indexing data-structure for storing and querying bounding boxes which contain individual points. The points represent 2D coordinates on a grid, while the bounding boxes represent regions of the grid. The bounding boxes may vary greatly in size, and multiple bounding boxes may overlap a single point. Both points and bounding boxes are stored as signed integers.

For example, in the diagram below, if I were to query points $B$ and $C$, I'd expect a single bounding box in return. However, if I query point $A$, I'd expect an array containing both bounding boxes in return.

--------     
| B  ============
|    |A|        |
-----|--     C  |
     ============

I'm not concerned with insert/remove efficient for adding bounding-boxes to the structure as all bounding-boxes are added to the structure during a one-time initialization. My main concern is efficient look-ups for finding which bounding boxes contain a point, as such queries will be made frequently.

My initial thought is to use a quadtree, and to test all objects contained in a particular node to see if they contain the point being queried. However, I'm wondering: is there a better data-structure I could use to implement this behavior with?

jocopa3
  • 153
  • 6

1 Answers1

6

Use a 2-d segment tree. Assuming we have $n$ items, construction takes time $O(n \cdot \log^2(n))$ and each query takes $O(\log^2(n))$ time. These times become $O(n \cdot \log(n))$ and $O(\log(n))$ time, respectively, if we use fractional cascading and lowest-level interval tree. These are good times unless there is more problem structure.


The query is called "multi-dimensional stabbing query" or "point enclosure query".

Range tree involves finding points in a query range. Segment tree involves finding rectangles that contain a query point.

On an unrelated note, one might wish to use an R-tree with sort-tile-recursive (STR) bulk-loading. This leads to almost no overlap between bounding boxes for a node's children and the structure is balanced. If we are lucky (i.e. R-tree involves heuristics and we wish to avoid ties for each component), the structure is good for moderate number of dimensions because factor of $d \cdot \log(n)$ for time is lower than $\log^{\textrm{max}(d - 1, 1)}(n)$ (noting that $d \geq 1$). We take advantage of fact that use of R-tree does not involve cloning primitives. Also, nearest neighbor via heuristic can perform quite well with R-tree, which seems to be something that that excels at w.r.t. segment tree and range tree. Additionally, one might wish to for dynamic structure use an R*-tree or one might wish to for slightly more dimensions use an X-tree. The more dimensions one has, the more likely linear scan is more affordable via a kind of "curse of dimensionality".

Further, if one has only distances and no absolute locations, a metric tree will prove useful. Structures that assume rectangle primitives will be aided by fact that a point is a degenerate rectangle. A pair of points that act as opposite corners of a rectangle can be turned into one point via a "corner transformation" from Pagel 1993.

One strategy that may be used with R-tree is augmenting it with "look-ahead" and edge checks to get guaranteed theoretically acceptable time for a query such as point enclosure query. This is even when it seems an R-tree is designed specifically for this kind of query.

References

  • Pagel et al. - The transformation technique for spatial objects revisited (1993)
bzliu94
  • 372
  • 1
  • 8