11

I need a data structure which can include millions of elements, minimum and maximum must be accesable in constant time and inserting and erasing element time complexity must be better than linear.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Janci Kralovic
  • 127
  • 1
  • 1
  • 4

4 Answers4

12

A basic data structure that allows insertion and deletion in time $\Theta(\log n)$ are balanced binary search trees. Their memory overhead is reasonable (in case of AVL trees, two pointers and three bits per entry) so millions of entries are no problem at all on modern machines.

Note that in a search tree, finding the minimum (or maximum) is conceptually easy by descending always left (right) starting in the root. This works in time $\Theta(\log n)$, too, which is too slow for you.

However, we can certainly store pointers to these tree nodes, similar to front and end pointers in double-linked linear lists. But what happens when the elements are deleted? In this case, we have to find the in-order successor (predecessor) and update the pointer to the minimum (maximum). Finding this node works in time $O(\log n)$ so it does not hurt deletion time, asymptotically.

You can, however, enable time $O(1)$ deletion of minimum and maximum by threading the tree, that is maintaining -- in addition to the binary search tree -- a double-linked list in in-order. Then, finding the new minimum/maximum is possible in time $O(1)$. This list requires additional space (two pointers per entry) and has to be maintained during insertions and deletions; this does not make the asymptotics worse but certainly slows down every such operation (I leave the details to you). So you have to trade-off the options given your application, that is which operations occur more often and which you want to be fastest.

Note that trees, as all linked structures, tend to be bad for memory hierarchies since they don't necessarily preserve data locality. If your sets are so large that they don't fit into cache completely, you should check out B-trees which are designed to minimise page loads. The above works with them, too.

Raphael
  • 73,212
  • 30
  • 182
  • 400
4

The name for the abstract data structure that you're interested in is a "double-ended priority queue" or sometimes "priority deque".

A min-priority queue, as you probably know, is an abstract data structure which supports the following set of operations:

  • insert (add an item)
  • findMin (find the item with the smallest value)
  • deleteMin (remove the item with the smallest value)

This is the minimal set; other typical operations may include:

  • delete (remove any item)
  • decreaseKey (alter an item so that its key is smaller)
  • merge (merge two priority queues into one)

For the purpose of time analysis, it is usually assumed that all you have to compare keys is a binary comparison operator. You can also dually define a max-priority queue, where you're interested in the largest value rather than the smallest, by simply inverting the sense of the comparison operator.

A double-ended priority queue is one that supports querying and efficiently removing the minimum or maximum value.

If I'm reading you correctly, this is the set of operations that you definitely want, along with their time complexities:

  • insert - better than O(n)
  • findMin - O(1)
  • findMax - O(1)
  • deleteMin - better than O(n)
  • deleteMax - better than O(n)

and there is one operation that you possibly want:

  • delete - better than O(n)

I'm going to ignore this operation because it complicates things. To delete an arbitrary item, you must locate an arbitrary item. Some priority queue data structures (e.g. Fibonacci heaps) support the concept of a "location" (like an iterator in C++) which stays valid no matter what modifications you do to the queue (apart from deleting the item in question, obviously), but many do not, because items can move around in the data structure. If you really need this operation, then a variant of binary search trees which supports findMin and findMax in constant time is probably what you need. This turns out to be a very simple and pleasant exercise in algebra; see [1]‎ for details, including Haskell source code.

There are a few obvious ways to do this if you already have a priority queue data structure available by maintaining a min-queue and a max-queue, and maintaining correspondences between them. See [2] for some details on how you might go about this.

Most of the other interesting options are based on binary heaps, but combine min-heaps and max-heaps in one data structure, such as min-max heaps [3] and interval heaps [4].

By the way, if your keys are integers (not just binary-comparable blobs) then you can probably do better. vEB trees, for example, generalise to double-ended priority queues in a straightforward manner.


  1. A fresh look at binary search trees by R. Hinze (2002)
  2. Correspondence based data structures for double ended priority queues by K.-R. Chong and S. Sahni (1998)
  3. Min-max heaps and generalized priority queues by M. D. Atkinson et al. (1986)
  4. Data Structures, Algorithms, and Applications in C++ (Chapter 9.7) by S. Sahni (1998)
Pseudonym
  • 24,523
  • 3
  • 48
  • 99
3

You should look into https://en.wikipedia.org/wiki/Van_Emde_Boas_tree. It comes with some compromises, mostly your elements need to be integers and memory consumption may be high (but may be way lower than for binary trees for dense keys). Min and max are constant time, insert/delete/successor are O(log log M), M being key space. Careful implementation may out-preform a binary tree by a factor of 10 for millions of keys (mostly if they are dense).

virco
  • 31
  • 1
0

One of the best heaps to use for that purpose is Fibonacci Heap. It has O(1) insert and O(1) findMin, together with O(1) decreaseKey, if you need it.

If you really need deleteMin and findMin consequtively (meaning you find multiple minimums) then I would not recommend using a heap. QuickSelect algorithm (which is O(n)) for searching all the minimums has worked faster for me.

http://en.wikipedia.org/wiki/Quickselect

Tolga Birdal
  • 537
  • 3
  • 14