2

What would be an easy way to use the hexadecimal output of a hash function (like md5) to generate n unique numbers from, say 0 to 15. Of course I could generate n numbers by using each digit of the digest, but I need those numbers to be unique.

2 Answers2

9

You should use the combinatorial number system to bijectively map an integer in the range 0 to $\binom{16}n-1$ an $n$-element subset of the first 16 numbers. I wrote a related answer on mapping messages to error positions in the Niederreiter cryptography system.

Daniel S
  • 29,316
  • 1
  • 33
  • 73
4

TL;DR: use the Fisher-Yates shuffle based on a seed of the range represented as a list; the hash value is the seed.


First of all, MD5 generates 16 bytes that are generally well distributed. It doesn't generate hexadecimals; those are just used to represent the bytes in a textual form.

You can generate values in a range using the Fisher-Yates shuffle. However, this requires you first to draw an number in the range [0..n), then [0..n - 1) until [0..2). Generally getting random numbers from bits is tricky because most algorithms use a non-deterministic number of bits.

In this case you can use a factorial of n, then choose a vector using that to compute the required number in a range.

So you'll get something like the following Python code, which implements the shuffle based on a 128 bit seed:

from decimal import Decimal, getcontext
from hashlib import md5
import math

Setting the precision for decimal operations

getcontext().prec = 50

def fisher_yates_shuffle(n, seed): # Guard clause: Check the type and size of the seed if not isinstance(seed, int) or seed.bit_length() != 128: raise ValueError("Seed must be a 128-bit integer.")

# Initialize the array from 0 to n-1
arr = list(range(n))

# Convert the seed to a decimal
random_decimal = Decimal(seed) / Decimal(2**128)

# Calculate the product x of all numbers in [0, 1, ..., n-1]
x = math.factorial(n)

# Pick a starting point within [0, x)
current_vector = random_decimal * Decimal(x)

for i in range(n - 1, 0, -1):
    # Generate a value for i-th index using current_vector
    i_value = int(current_vector % Decimal(i + 1))

    # Perform the shuffle step
    arr[i], arr[i_value] = arr[i_value], arr[i]

    # Update the current_vector for the next iteration
    current_vector //= Decimal(i + 1)

return arr

Generate a 128-bit random seed from MD5 hash of "Hello World"

seed_str = "Hello World" md5_hash = md5(seed_str.encode('utf-8')).digest() seed = int.from_bytes(md5_hash, byteorder='big')

Number of elements in the array (max)

n = 16

Perform the Fisher-Yates shuffle

result = fisher_yates_shuffle(n, seed)

print("Shuffled array:", result)

Written with the help of ChatGPT, but altered & logically validated.

Note that some bias is introduced, somewhere around $\frac{1}{2^{88}}$ for 16 values. Usually that's considered negligible, but it pays to be careful when it comes to cryptography.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323