Hashing using search trees instead of lists

Question

I am struggling with hashing and binary search tree material. And I read that instead of using lists for storing entries with the same hash values, it is also possible to use binary search trees. And I try to understand what the worst-case and average-case running time for the operations

insert,
find and
delete

is in worth- resp. average case. Do they improve with respect to lists?

score 4 · Answer 1 · answered May 08 '12 at 23:46

For lists, insertion, find and delete are respectively in $O(1)$, $O(n)$, $O(n)$. Sorted list are worse. Binary search itself is for sorted arrays, in which the operations are in $O(n)$, $O(\log n)$, $O(n)$. If you want "insertion" and "delete" operations then you need more than just binary search.

You probably want something like binary search trees. It is a lot easier to find references about it once you have the proper terminology. These operations are in $O(\log n)$ worst-case time, for example for the implementations using AVL trees and red-black trees.

score 4 · Answer 2 · answered May 09 '12 at 00:23

In the worst case, if you happen to store only elements with the same hash values, a hash table stores every element in the same bucket. If you use lists to store the elements of a bucket, then lookup is $O(n)$ in the worst case (where $n$ is the number of elements in the table — more generally, $n$ is the number of elements in the largest bucket), because you need to traverse the whole list if you're looking up an element that isn't in the table. Positive lookup (where you know the element is present) has the same complexity: you need $n-1 = \Theta(n)$ if you're looking up the last element of the list. Deletion has the same complexity (you need $n-1$ lookups if you happen to be deleting the last element). Insertion is also $O(n)$ if you need to check for an existing element, or $O(1)$ if you allow duplicates (in which case you can insert the element at the beginning of the list).

With balanced binary search trees, the worst-case complexity is reduced to $O(\log n)$, because the depth of a balanced search tree grows logarithmically in the size of the tree by definition of balancing.

With an average distribution of data, the elements are spread across different buckets and there are few collisions, so the complexity is close to $O(1)$ regardless of the data structure used in case of collisions.

With random lookups in an adversarially-chosen data distribution in which all $n$ elements are in the same bucket, the average length of list that must be traversed is $n/2$, so the average lookup complexity in this situation is $\Theta(n)$. With a tree, the average is $\Theta(\log n)$, like the worst case.

Hashing using search trees instead of lists

2 Answers2