2

If I have a hash table of 1000 slots, and I have an array of n numbers. I want to check if there are any repeats in the array of n numbers. The best way to do this that I can think of is storing it in the hash table and use a hash function which assumes simple uniform hashing assumption. Then before every insert you just check all elements in the chain. This makes less collisions and makes the average length of a chain $\alpha = \frac{n}{m} = \frac{n}{1000}$.

I am trying to get the expected running time of this, but from what I understand, you are doing an insert operation up to $n$ times. The average running time of a search for a linked list is $\Theta(1+\alpha)$. Doesn't this make the expected running time $O(n+n\alpha) = O(n+\frac{n^2}{1000}) = O(n^2)$? This seems too much for an expected running time. Am I making a mistake here?

omega
  • 553
  • 2
  • 7
  • 17

1 Answers1

2

If you think of $1000$ as a constant, then yes, the running time is terrible. The idea of hash tables is that if the hash table is moderately bigger than the amount of data stored in it, then operations are very fast. Suppose for example that the hash table has size $m = 2n$. Then each operation takes constant time in expectation.

In implementations of hash tables, the hash table expands as more entries are inserted, to ensure that the ratio $\alpha$ is reasonable. Amortized analysis shows that this does not cause a performance hit in terms of the total running time, though individual operations might be slower (this is only a problem if you're writing a real-time application).

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514