6

Is there a well know (cryptographically secure) algorithm to shuffle a vector of values, deterministically, without using any randomness, such that it is hard (or as hard as possible) to guess its initial configuration?

The goal is to achieve an effect similar to what a hash function achieves, but instead of producing a digest, it should simply shuffle the data.

I know I can hash the data and use the hash value as seed for a pseudo-random number generator and then apply a standard shuffling algorithm using that. But I wonder if there are more established ways of doing it instead of just adopting an ad-hoc solution.

raugfer
  • 171
  • 4

2 Answers2

5

What about sorting the vector? It's then demonstrably impossible to figure out anything about the original order.

If for some reason an ordered vector is undesirable, we can sort the vector according to the hash of each vector value. The order will look random for a party without knowledge of the hash, and it's still demonstrably impossible to figure out anything about the original order.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
3

I think the generic solution you describe (hash the input vector, seed an RNG with the hash, use the RNG to shuffle the array) is basically the best you can do.

Assuming I've understood your requirements correctly, the idealized version of your shuffling algorithm would be a random oracle defined to take as its input a vector $v$ and to return a uniformly chosen random element of the set of all permutations of $v$.

Now, the defining property of a random oracle is that it returns a random result for newly queried inputs, but will always return the same result if queried with the same input again. To implement something that behaves like that in practice, we need to make sure that all the "random" choices made by the oracle are actually determined based on the input (and possible some constant key permanently associated with a particular instance of the oracle).

In particular, an algorithm such as the Fisher–Yates shuffle can be used to deterministically shuffle an $n$-element vector into a uniformly random permutation of itself, given a random integer $0 ≤ r < n!$ (or, equivalently, an $n$-tuple of random integers $(r_1, \dots, r_n)$ where $0 ≤ r_i < i$ for each $i$) as an additional input. Meanwhile, rejection sampling can be used to obtain (with probability exponentially approaching 1 over time) a uniformly distributed integer $r$ (or each of the $r_i$ separately, if you prefer) from a stream of random bits of unbounded length. And a CSPRNG can be used to turn a random seed of finite length an unbounded bitstream that, while not truly random, is (presumed to be) computationally indistinguishable from a truly random bitstream. And, finally, a key derivation function (typically constructed based on some cryptographic hash function) can be used to convert an arbitrary input bitstring, such as a canonical binary encoding of the input vector $v$, into a pseudorandom seed suitable for initializing the CSPRNG.

Of course, you can also choose to combine some of these steps, e.g. by feeding the input vector (canonically encoded into a bitstring) into a cryptographic sponge such as SHAKE128 and then squeezing the arbitrary-length bitstream directly out of it, but the general principle remains the same: generate a pseudorandom bitstream determined by the input vector and then use it to deterministically shuffle the same vector. And since the shuffling part can be done perfectly (the Fisher–Yates shuffle algorithm implements a one-to-one map from an $n!$ element set of integers (or tuples of integers) to permutations of $n$ element vectors), the only part where we need to rely on cryptographic assumptions about computational indistinguishability is generating the pseudorandom input to the shuffle.

Ilmari Karonen
  • 46,700
  • 5
  • 112
  • 189