How exactly Hashing performs better than a Binary Search?

Question

The time complexity of a Binary Search is O(log n) and Hashing is O(1) - so I've read. I have also read that Hashing outperforms Binary search when input is large, for example in millions. But I see that log n, when n is around 30 million, is roughly equal to 25.

If for such a large value of n, log n is just 25, then isn't O(log n) roughly the same as O(1)? Because O(1), basically means that the time complexity is constant. So, it could be bigger than or same as 25!?!(here I am assuming that the coefficient of log n, in the actual time usage/consumption function for Binary Search, is less than or equal to 1)

A related question I have, is regarding the calculation of time complexity of a hash algorithm. In all the literature, I barely find any mention of the time complexity of the hash function used and the division required thereafter(to calculate the index of the bucket).

So, exactly how do memory reads/access and computing a fairly complex mathematical operation(as in hash function), compete against each other, w.r.t. time consumption? One memory read/access roughly equates to what kind of basic mathematical operation, in terms of performance?

score 4 · Answer 1 · answered Jun 29 '17 at 21:38

You are asking excellent questions!

The answer to your first question, regarding the difference between $O(\log n)$ and $O(1)$, is that whereas binary search scales with the size of the array, searching for an item in a hash table scales with the size of the data item, which we often think of as constant. In practice, calculating the hash also takes time (as you also mention), and so to know which data structure has better performance, we need to empirically compare implementations of both data structures.

The answer to your second question, about the time complexity of computing the hash function, is that it takes time linear in the size of the data item. Most hash functions used in this context are "rolling hash", in which a small has value is being updated as the data item is read. This ensures that the time complexity is indeed linear. Hash functions for such uses are designed with speed in mind, and so usually memory access is not an issue. The division operation at the end – if at all – is not too costly for modern CPUs, and can be avoided entirely if the size of the hash table is a power of 2.

score 1 · Answer 2 · answered Jun 30 '17 at 21:51

Speed of calculating the hash function of large keys: You don't actually have to process the whole key. For example if the key is a string, you might process only the first and last 40 characters to calculate the hash function.

The biggest advantage of hashing vs. binary search is that it is much cheaper to add or remove an item from a hash table, compared to adding or removing an item to a sorted array while keeping it sorted. (Binary search trees work a bit better in that respect).

How exactly Hashing performs better than a Binary Search?

2 Answers2