1

I would like to hash numbers less than 1000000000, so generally they could be stored on 30 bits. The aim is to obtain numbers that are not reversible, so my initial plan was to use SHA256 (with some salt). However, I would like to keep the output as short as possible, but I want to be sure that there are no collisions. Hash functions do not guarantee me that-- especially when I truncate their output. Probably I could check if there are collisions for my numbers, but maybe there is some better way?

Is it possible to reduce the size of the output of AES and keep uniqueness-- assuming that I don't need to decrypt and it is even desired to not be able to do so? Maybe divide each output into four parts and XOR all of them to get a 32-bit output?

Patriot
  • 3,162
  • 3
  • 20
  • 66
Cob
  • 11
  • 3

1 Answers1

0

For the parameters you gave, the birthday paradox will kick in and you will get collisions, since the set of integers you want to process is much larger than $\sqrt{2^{30}}$.

A very simple countermeasure would be for input $x_k$ to use bits $(y_{k,1}\ldots,y_{k,30})$ of the AES output $$ (y_{k,1}\ldots,y_{k,128}):=AES_K(x_k) $$ (for a fixed randomly chosen secret key $K$, you can use some fixed padding on $x$ if you wish).

Say you you have processed $x_1,x_2,\ldots,x_m$ with no collision and placed the outputs $$ (y_{1,1}\ldots,y_{1,30}),\ldots, (y_{m,1}\ldots,y_{m,30}), $$ in a sorted list. Say $(y_{m+1,1}\ldots,y_{m+1,30})$ collides with a previously chosen output vector. Then keep checking $$ (y_{m+1,j},\ldots,y_{m+1,j+29}),\quad j\geq 2, $$ until you get a 30 bit pattern that doesn't collide with that vector. Due to the good randomness properties of AES this will succeed in a few iterations after each collision, with high probability.

You can then store the cordinate index $j_x$ used to generate this vector together with $x_{m+1}$ if reproducibility is required.

kodlu
  • 25,146
  • 2
  • 30
  • 63