3

I'm quite new to understanding persistent data structures, so bear with me.

Consider some database with data of the form $(\text{id}, d_1, d_2)$. The last two could e.g. mean production data and sales date, respectively. The database is constant, i.e. we will not be adding or removing data points.

Let $n$ be the number of items in the database. I want to be able to do the following query:

Query$(a,b,s)$: Return the number of objects that have $a\leq d_1 \leq b$ and $d_2 \leq s$.

I want to be able to perform the query in $\mathcal O(\log n)$ time, with the best possible space and preprocessing time complexities. My question is how could one construct a persistent data structure that solves this problem?

I think that I need to use timestamps that relate to the $d_1$ and/or $d_2$ in some way. However, I'm not quite sure how.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Eff
  • 219
  • 1
  • 9

2 Answers2

3

Yes, you can do this in $O(\log n)$ time. Use a sweep-line algorithm together with a persistent tree data structure.

Imagine the point $(d_1,d_2)$ as being on a grid, with $d_1$ as the value of the $x$-coordinate and $d_2$ as the value of the $y$-coordinate. Your sweep line is a horizontal line, namely, at time $t$, the sweep line is the line $d_2=t$.

You will build a data structure that stores all the points $(d_1,d_t)$ such that $d_2 \le t$. You can use a balanced binary search tree, keyed on the value of $d_1$. This data structure will let you answer the query Query$(a,b,t)$ in $O(\log n)$ time.

Now you're going to gradually increase the value of $t$ (sweeping the line upward), and adjusting the data structure as you go. You'll save a copy of the data structure for each value of $t$ that occurs in the $d_2$-database.

This sounds like it requires $O(n^2)$ space. However, there's a clever trick. Use a persistent data structure to store the trees. As you move the sweep line upward, you insert more items into the tree. A persistent data structure will let you pick any time $t$ and do a lookup into the state of the tree at time $t$. This lets you answer Query$(a,b,t)$ in time $O(\lg n)$.

D.W.
  • 167,959
  • 22
  • 232
  • 500
2

If your data are constant, most data structures are persistent. (Not self-adjusting ones like splay trees or the union-find structure of Tarjan)

You can do "three-sided range queries" with a priority search tree in $O(k + \log n)$ time, where $k$ is the number of items returned.

jbapple
  • 3,390
  • 18
  • 21