14

I'd like to know if there is a function $f$ from n-bit numbers to n-bit numbers that has the following characteristics:

  • $f$ should be bijective
  • Both $f$ and $f^{-1}$ should be calculable pretty fast
  • $f$ should return a number that has no significant correlation to its input.

The rationale is this:

I want to write a program that operates on data. Some information of the data is stored in a binary search tree where the search key is a symbol of an alphabet. With time, I add further symbols to the alphabet. New symbols simply get the next free number available. Hence, the tree will always have a small bias to smaller keys which causes more rebalancing than I think should be needed.

My idea is to mangle the symbol numbers with $f$ such that they are widely spread over the whole range of $[0,2^{64}-1]$. Since the symbol numbers only matter during input and output which happens only once, applying such a function should not be too expensive.

I thought about one iteration of the Xorshift random number generator, but I don't really know a way to undo it, although it should theoretically be possible.

Does anybody know such a function?
Is this a good idea?

Paresh
  • 3,368
  • 1
  • 21
  • 33
fuz
  • 913
  • 6
  • 20

2 Answers2

6

You can use Fibonacci hashing, namely

$\qquad h_F(k) = k \cdot \frac{\sqrt{5} - 1}{2} - \left\lfloor k \cdot \frac{\sqrt{5} - 1}{2} \right\rfloor$.

For $k=1,\dots,n$ you get $n$ pairwise-distinct numbers (about) evenly spread in $[0,1]$. By scaling to $[1..M]$ and rounding (down), you get about evenly spread numbers in that interval.

For example, these are $h_F(1), \dots, h_F(200)$ scaled to $[0..10000]$ (left original sequence, right sorted):

enter image description here

This is an instance of what Knuth calls multiplicative hashing. For $w$ the computer's word size, $A$ some integer relatively prime to $w$ and $M$ the number of addresses needed, we use

$\qquad h(k) = \left\lfloor M \left( \bigl( k \cdot \frac{A}{w}\bigr) \mod 1 \right) \right\rfloor$

as hashing function. The above follows with $A/w = \phi^{-1} = \frac{\sqrt{5}-1}{2}$ (make sure you can compute it with a sufficient precision). While this also works with any other irrational number besides $\phi^{-1}$, it is one of only two numbers that lead to the "most uniformly distributed" numbers.

Find more in The Art of Computer Programming, Volume 3 by Donald Knuth (chapter 6.4 from page 513 in the second edition). In particular you'll find why the resulting numbers are pairwise distinct (at least if $n \ll M$) and how to compute the inverse function if you use natural $A$ and $w$ instead of $\phi^{-1}$.

Raphael
  • 73,212
  • 30
  • 182
  • 400
1

For $k$-bit inputs, this function works:

$\mathrm{hash}(n) = (n \bmod 2^{\lceil\frac{k}{2}\rceil})\cdot 2^{\lceil\frac{k}{2}\rceil} + n \,\mathrm{div}\, 2^{\lceil\frac{k}{2}\rceil}$

This is reversible, in that $\mathrm{hash}(\mathrm{hash}(n)) = n$, and has non-sequential pairs $\{n,m\}, n < m$, where $\mathrm{hash}(m) < \mathrm{hash}(n)$. Beware that output and input may correlate, especially if your input is in $\{1,\dots,2^{\lceil\frac{k}{2}\rceil}-1\}$.

Ref: Reversible hash function

Reza
  • 2,296
  • 16
  • 17