7

I wrote a toy pseudo-random permutation out of a Feistel network using blake2b. However, looking at the distribution of permutations for small n = 6, it's clearly not uniform unless many rounds are performed. I was under the impression that 3 or 4 rounds were sufficient. What am I missing?

The code below works as follows, to generate a random permutation of $n$ elements.

  • We find the smallest integer m such that $n \leq 2^{2m}$
  • We use blake2b in a Feistel network. Blake2b is keyed with a seed, which determines the random permutation, and is given a salt, which is the round number.
  • We compute a permutation on 2m bits integers using a Feistel network as described.
  • We transform that permutation into one that acts on $n$ elements by following the cycles, that is, iterating the permutation of $2^{2m}$ elements it until it produces a value < $n$.

To test this code, we then draw $100~n!$ pseudo-random permutation and perform $\chi^2$ test for an increasing number of rounds in the Feistel network. It's clear that for only 3-4 rounds, the permutations generated are not uniformly distributed.

import hashlib
import math
from collections import Counter
from scipy.stats import chi2

class Permutation():

    def __init__(self, n, seed, rounds=3):
        self.n = n
        self.rounds = rounds
        # n_bits is least integer suc that n <= 2**(2*n_bits)
        self.n_bits = 1 + math.floor(math.log(n, 4))
        self.seed = seed
        self.low_mask = (1 << self.n_bits) - 1
        self.high_mask = self.low_mask << self.n_bits
        self.digest_size = math.ceil(self.n_bits / 8)

    def __hash(self, msg, salt):
        h = hashlib.blake2b(msg, digest_size=self.digest_size, key=self.seed, salt = salt)
        return int(h.hexdigest(),base=16) & self.low_mask

    def __round(self, i, r):

        def to_bytes(m):
            b = 1 if m ==0 else 1 + math.floor(math.log(m, 256))
            return m.to_bytes(b, byteorder='little')

        low = self.low_mask & i
        high = (self.high_mask & i) >> self.n_bits
        low, high = high ^ self.__hash(to_bytes(low), salt=to_bytes(r)), low << self.n_bits
        return high + low

    def __p(self, i):
        result = i
        for r in range(0, self.rounds):
            result = self.__round(result, r)
        return result

    def __call__(self, i):
        j = self.__p(i)
        while j >= self.n:
            j = self.__p(j)
        return j

n = 6
fact = 1
for i in range(1, n + 1):
    fact *= i

for rounds in range(3, 10):
    cnt = Counter()
    for w in range(0,100 * fact):
        p = Permutation(n, seed = bytes('w=%d' % w, encoding='ascii'), rounds=rounds)
        ss = ''.join([str(p(i)) for i in range(0, n)])
        cnt.update([ss])

    x2 = sum((x - 100.0)**2/ 100.0 for p, x in cnt.items()) + 100.0 * (fact - len(cnt))
    print("n = %d,\trounds = %d,\tx2 = %f,\tchi2-cdf = %f" % (n, rounds, x2, chi2.cdf(x2, fact - 1)))

edit: as a sanity check, I replaced blake2b with an actual random oracle

class RandomOracle():

    def __init__(self):
        self.known = {}

    def __call__(self, msg, digest_size, key, salt):
        entry = (msg, digest_size, key, salt)
        if  entry in self.known:
            return self.known[entry]
        else:
            v = os.urandom(digest_size)
            self.known[entry] = v
            return v

oracle = RandomOracle()

and this still produces non-uniformly random results...

Arthur B
  • 275
  • 2
  • 5

2 Answers2

8

The Luby-Rackoff theorem says that a 3-4 round Feistel network is a pseudorandom permutation for some sufficiently large block size. As this paper by Patarin on Feistel networks with 5 or more rounds puts it:

We will denote by $k$ the number of rounds and by $n$ the integer such that the Feistel cipher is a permutation of $2^n$ bits → $2^n$ bits. In [3] it was proved that when $k ≥ 3$ these Feistel ciphers are secure against all adaptative chosen plaintext attacks (CPA-2) when the number of queries (i.e. plaintext/ciphertext pairs obtained) is $m \ll 2^{n/2}$. Moreover when $k ≥ 4$ they are secure against all adaptative chosen plaintext and chosen ciphertext attacks (CPCA-2) when the number of queries is $m \ll 2^{n/2}$ (a proof of this second result is given in [9]).

If your domain size is very small, then indeed, your number of queries $m$ can easily exceed the bound. If I understand your code right, you're doing cycle-walking on a Feistel network with a block size of 4, so by the time you hit $\sqrt{2^4} =$ four queries you've already hit that bound.

Incidentally this is why real-life format preserving encryption modes like those in NIST SP 800-38g use Feistel networks of 8 rounds (FF3) or 10 rounds (FF1). Note that even then an attack was found against FF3 that required a revision to the mode.

Luis Casillas
  • 14,703
  • 2
  • 33
  • 53
3

The result (Luby-Rackoff) that using 3 rounds of a Feistel structure is enough depends on the $f$ function being a pseudorandom function. This is a theoretical idealized model and since you are using a specific single and concrete function, the result won't apply.

kodlu
  • 25,146
  • 2
  • 30
  • 63