PRF collision search for input smaller than output

Question

Assume a given pseudo-random function $H:\{0,1\}^a\mapsto\{0,1\}^b$ with $b\in[104,256]$ and $b/2<a<b$. We want to exhibit a collision if there is one, which has probability $>63\%$.

We are ready to perform $2^{b/2+1}$ evaluations of $H$ or slightly more, distributed among several search units. But we don't have $2^{b/2}b$ bits of memory, especially accessible by each search unit. How can we practically organize a search?

Because $a<b$, I fail to adapt techniques based on iterating $H$ and cycle detection, like in Paul C. van Oorschot and Michael J. Wiener, Parallel Collision Search with Cryptanalytic Applications, in Journal of Cryptology, 1999.

Update: Samuel Neves points in comment that the problem has been studied, and can make use of §4.2 in the above reference. Still that warrants explanation: how that's done, expected cost as a function of $a$, $b$ and other parameters, like available RAM and number of search units; and reasonable parameterization.

Daniel S · Answer 1 · 2023-04-30T08:23:11.147

It's possible to go down some deep rabbit holes involving rainbow tables on this one, so I'll give a broad brush overview and hope that you will prod me with questions.

As noted in the comments, the basic idea is to apply a conversion $H'=f\circ H$ to make this a $\{0,1\}^a\top\{0,1\}^a$ problem. If $f:\{0,1\}^b\to\{0,1\}^a$ is relatively even then each $a$ bit output has $2^{b-a}$ corresponding inputs, and a collision $H'(x_1)=H'(x_2)$ has $f\circ H(x_1)=f\circ H(x_2)$ and $H(x_1)=H(x_2)$ with probability $2^{a-b}$. Thus if we generate $2^{b-a}$ collisions on $H'$ (or on a family of $H'$ functions), then we expect to encounter a collision on $H$.

One neat way to generate $f$ with uniform outputs is to use a linear function. We can generate a random binary $a\times b$ matrix $M$ with rank $a$ and define $f$ to be the map $\mathbf t\mapsto M\mathbf t$. Note that gives a large family of possible $H'$ functions. This family includes the obvious truncation function and all of the "selection of a subsets of $a$ out of $b$ bits" functions.

The simplest case is the single processor $O(1)$ memory version, where we use Pollard rho or a basic Pollard lambda/kangaroo to generate a single collision in $H'$ using $\approx \sqrt\pi2^{a/2-1/2})$ evaluations of $H'$. On failure, we can try another $H'$ and our expected work is $2^{b-a}$ versions of Pollard each of which takes $\approx\sqrt\pi 2^{a/2-1/2})$ evaluations of $H'$ for total work $\approx \sqrt\pi 2^{b-a/2-1/2}$ function evaluations.

With more memory we can produce more collisions per $H'$ with less work (with increasing memory, we can perhaps produce $k$ collisions with $O(\sqrt k2^{a/2})$ work). In this case we can probably find a collision in $2^{b-a}/k$ versions of your $H'$ search for total work $2^{b-a/2}/\sqrt k$ evaluations of $H'$. In particular, following the (flawed) heuristic analysis of section 4.2 of the van Oorschot and Wiener paper, using $O(w)$ memory we require $2^{b-a}$ collisions which might be producible using $\approx \sqrt{8\times2^a/w}$ function evaluations for work $\approx 2^{b-a/2+3/2}/\sqrt w$.

Note that the "rigorous optimisation" of section 4.2 is an empirical observation rather than a mathematical derivation.

PRF collision search for input smaller than output

1 Answers1

Linked