Is FKS hashing really linear space?

Question

In FKS hashing, I wonder if the size of the table $G[1..n]$ (used to record the functions $g_i$ which is chosen randomly; one entry per bucket) is really strictly $O(n)$. Given that the probability of a hash function $g_i$ is higher than 0.5, I can see that it will mostly hold small values (assuming we start with $g_0$, and if that doesn't work, try $g_1$ and so on). But I think for $n$ to $\infty$ there will be larger values as well, so that the number of bits for each entry is $log^*(n)$ and not fixed. Right?

The reason I ask is that many minimal perfect hash functions, such as the CHD algorithm, rely on a similar mechanism, and those algorithms claim to be strict $O(n)$ space.

score 2 · Answer 1 · answered Jul 06 '16 at 06:57

Space is counted in machine words rather than in bits. A machine word is allowed to contain a member of the universe $U$, and so is $O(\log |U|)$ bits wide. The array $G$ stores hash functions, via their index inside some collection of universal hash functions. The lecture notes don't mention this, but there are collections of universal hash functions of size $O(|U|^2)$. If you use one of these collections, each entry of $G$ takes $O(\log (|U|^2)) = O(\log |U|)$ bits to store, and so $O(1)$ machine words.

Why is it meaningful to count space in machine words? First of all, even if you store your set as an array (with logarithmic access time, using binary search), the number of bits is $O(n \log |U|)$ rather than $O(n)$. If we want such an array to be considered taking up linear space, we need to "divide" by $\log |U|$, and the simplest way to accomplish this is to allow each memory cell to contain $O(\log |U|)$ bits (the exact constant doesn't matter).

Another reason we would like to allow cells to contain an unbounded number of bits is so that they can store indices. When you implement binary search, or any other algorithm, you have variables which store indices in the array. These variables take $O(\log n)$ bits to store. When discussing algorithms, however, we often think of an index as taking $O(1)$ space. Again, the way to accomplish this is to allow each memory cell to contain at least $\log n$ bits.

On the other hand, if we allow memory cells to be of arbitrary size, then the space measure becomes trivial, since anything can be done in $O(1)$ space (though depending on the exact computation model, you might have to pay in terms of running time). The usual convention in algorithms is to allow each memory cell to contain $O(\log n)$ bits, where $n$ is the input size (in bits). In this particular case, it is more meaningful to allow a machine word to contain $O(\max(\log n, \log |U|))$ bits, and this is the way we measure the space consumption of the algorithm.

Is FKS hashing really linear space?

1 Answers1