Strength of key derived from a hash function considering the birthday attack

Question

When a hash function is used to derive a key from a shared secret (either by simply hashing the shared secret or using a more robust construct like HKDF) what's the strength of the derived material? If, for example, the shared secret is 256-bit, is the security of the derived result also 256-bit or is it $2^{n/2}$ (that is 128-bit in this case) since as per the birthday problem, it "only" takes $2^{n/2}$ guesses to generate a collision. Thus in this case a collision would mean getting the same output and so the same material of the KDF.

Squeamish Ossifrage · Answer 1 · 2019-11-15T06:30:39.170

You—the adversary—have a way to test whether a candidate key $k_i$ might be the true secret key $k^*$. Since $k^*$ is uniformly distributed among all 256-bit keys, each candidate $k_i$ has probability $\Pr[k_i = k^*] = 1/2^{256}$ of being correct. No matter what order you try things in, the expected number of guesses is $$\sum_{i=1}^{2^{256}} i \cdot \Pr[k_i = k^*] = \sum_{i=1}^{2^{256}} i \cdot \frac{1}{2^{256}} = \frac{2^{256} (2^{256} - 1)/2}{2^{256}} = 2^{255} - {\textstyle\frac12}.$$ (You may, of course, encounter false positives, which you will have to deal with; they do not affect this expected number of guesses before finding the true key, only the probability of falsely thinking you have found the true key when you haven't.)

Why don't collisions and the birthday paradox appear in this analysis? Collisions are relevant when you're looking for any $k_i \ne k_j$ such that $H(k_i) = H(k_j)$, but you don't care what either $k_i$ or $k_j$ are. As you try $k_1, k_2, \dots$, searching for a collision, each new key could potentially collide with every previous key, so the probability of a collision among some pair of $n$ keys grows quadratically—specifically, for $n \ll 2^{128}$, it is $$1 - \biggl(1 - \frac{1}{2^{256}}\biggr) \biggl(1 - \frac{2}{2^{256}}\biggr) \cdots \biggl(1 - \frac{n}{2^{256}}\biggr) \approx \frac{n^2}{2^{256}}.$$ (proof; more on birthday paradox)

Note: There may be a batch advantage in the multi-user setting. If the way you can test a candidate key $k_i$ is by testing whether $H(k_i) = h$ where you know $h = H(k^*)$, and you actually have many target keys $k^*_j$ and hashes $h_j = H(k^*_j)$, you can save cost in a batch attack like computing Oeschlin's rainbow tables in parallel, for a total expected cost of about $2^{256}\!/t$ trials to find the first of $t$ targets, and in the total expected time of as little as about $2^{256}\!/t^3$ sequential evaluations of $H$ if you parallelize it at least about $t^2$ ways.

However, if each user had used a different function, that is if you have $h_j = H_{s_j}(k^*_j)$ with a unique salt $s_i$ per user, then the multi-target advantage vanishes, and you're back to the expected cost of about $2^{256}$.

Strength of key derived from a hash function considering the birthday attack

1 Answers1