Why can't hash tables provide O(n) sorting?

Question

Since a sufficiently large hash table takes constant time to both insert and retrieve data, should it not be possible to sort an array by simply inserting each element into the hash table, and then retrieving them in order?

You just insert each number into the hash table, and remember the lowest and highest number inserted. Then for each number in that range, in order, test if it is present in the hash table.

If the array being sorted contains no gaps between values (i.e. it can be [1,3,2] but NOT [1,3,4]), this should give you O(N) time complexity.

Is this correct? I don't think I've ever heard of hash tables being used this way - am I missing something? Or are the restrictions (numeric array with no gaps) too much for it to be practically useful?

score 7 · Answer 1 · answered Jun 22 '15 at 15:14

The reason you've never heard of hash tables being used like this is that hash tables are either "too much" or "not enough" in this situation.

If the range of elements being sorted is small, then you can use counting sort, or something similar. But for counting sort, you would almost certainly want to use a simple array rather than a hash table. If you know the max and min values of the numbers, then the array can be of size max-min+1, and value x would be associated with index x-min. By using an array, you avoid the extra complications of hash tables. Those extra complications are buying you nothing in this application, so hash tables are "too much".

Notice that your "no gaps" restriction ensures that the range of numbers is small (no larger than the number of elements in your original input).
However, if gaps can be present then the range of elements is potentially MUCH larger than the number of elements in your original input. Then your running time is dominated by the size of that range, NOT by the size of your original input. Hash tables do not help you deal with those gaps efficiently, so in this case hash tables are "not enough".

score 5 · Accepted Answer · answered Jun 18 '15 at 10:09

The algorithm you give is exponential time, not linear. If you're given $n$ $b$-bit entries, the size of your input is $nb$ bits but the algorithm takes time $\Theta(2^b)$, which is exponential in the input length. In particular, your algorithm takes $2^k$ steps to sort the roughly $2k$-bit input $\{0, 2^k\}$.

score 0 · Answer 3 · answered Jan 17 '22 at 13:49

kudos, The title of question if alone is to be considered- does have an ambitious idea, as there does exist a related research paper which sorts in linear time provided the constraints of no duplicates and knowing the range of input (gaps are allowed): Hash sort: A linear time complexity multiple-dimensional sort algorithm

However the steps mentioned in the paper are not as trivial as the question.

Coming to your last sub-question : Or are the restrictions (numeric array with no gaps) too much for it to be practically useful?

Given that we have the range (lowest and highest) and there are no gaps in numbers, why would we take the toll to sort it, the same practically can be represented by an interval [low, high], numbers if needed in order of this interval can practically be generated simply by incrementing which is not greater than linear time.

The sole purpose of sorting is to find an ordered collection, if the resulting ordered collection is known given the constraints (range and no gaps), its practically of no use.

Why can't hash tables provide O(n) sorting?

3 Answers3