Do you really need a KDF when you have a PRF?

Question

My understanding is that a KDF is like a PRF, except that it has a preliminary step that "extract" entropy. It is thus needed when the entropy is non-uniform (for example the output of ECDH is modulo a number that is not a power of 2, and is thus non-uniform if represented as a bit-string).

Yet, I am not sure there are modern symmetric algorithms that require the secret to be uniformly distributed. Sure their security might be defined with a uniform key, but using a KDF feels a bit like a cheat since there is no new entropy added, and it's not slow either. Am I missing something?

Squeamish Ossifrage · Answer 1 · 2019-06-11T23:40:11.697

Do you really need a KDF when you have a PRF?

Maybe.

The security contract of a PRF requires that the key be a uniform random bit string. If you have a DH secret or a diceware phrase, then what you have is not uniform random as a bit string, and so your contractual obligations are not satisfied and the PRF may give no security. You need something a little stronger to hash a DH secret or diceware pass phrase into a uniform random bit string, which is what the ‘extract’ step does.

Here's an example. Let $F_k$ be a PRF and let $H$ be a random oracle. Define $$F'_{x,y}(s) := F_{H(u)}(s), \quad u = (y^2 - x^3 - 7) \bmod p,$$ where $p = 2^{256} - 2^{32} - 977$. $F'$ is also a PRF, because when $(x, y)$ is uniformly distributed among 512-bit strings, the probability of collisions in $u$ is small. But if $(x, y)$ is the outcome of a DH key agreement on secp256k1, then $y^2 \equiv x^3 + 7 \pmod p$ so $u$ is always zero and hence $F'_{x,y}(s) = F_{H(0)}(s)$ for all $x$ and $y$—that is, the DH key agreement doesn't affect the output at all, so the system gives no security whatsoever!

If what you have is a uniform random bit string as a master key already, and you just need to derive subkeys from it with distinct labels, then a PRF is all you need, and the ‘expand’ step is, in fact, a PRF. If you look closely at HKDF, for example, you will see that $$\operatorname{HKDF-Expand}_k(L) = \operatorname{HMAC-}\!H_k(L \mathbin\| \mathtt{0x01}),$$ if the desired output size matches the output size of $H$; if more bytes are requested, just repeatedly call $\operatorname{HMAC-}\!H$ with consecutive counters and the previous chunk prepended to $L$, and concatenate the outputs. You could, of course, substitute for $\operatorname{HMAC-}\!H$ your favorite PRF like keyed BLAKE2b, KMAC128, KangarooTwelve, Kravatte, prefix-keyed Gimli-Hash, or ChaCha $\circ$ Poly1305, and while it wouldn't be HKDF it would be just as good for deriving subkeys.

My understanding is that a KDF is like a PRF, except that it has a preliminary step that "extract" entropy. It is thus needed when the entropy is non-uniform (for example the output of ECDH is modulo a number that is not a power of 2, and is thus non-uniform if represented as a bit-string).

Correct: The purpose of collecting the concepts of extract-with-salt and expand-with-label into a single term KDF is that often the two steps are close by—you have a DH secret, or a master diceware phrase, which is not uniform in bit strings but which has high entropy, and you want to derive many secret keys from it for different labeled purposes.

Yet, I am not sure there are modern symmetric algorithms that require the secret to be uniformly distributed. Sure their security might be defined with a uniform key, but using a KDF feels a bit like a cheat since there is no new entropy added, and it's not slow either. Am I missing something?

The security contract for many cryptographic primitives requires the input to be uniform random. Violate the contract, and the security may evaporate. Obviously you can choose the input by a pseudorandom function under a uniform random key—if that broke the composition, merely using the composition would then serve as a distinguisher for the pseudorandom function. But it's not a priori clear whether highly structured keys like the bit encodings of points on particular curves might be exploitable in downstream cryptosystems. Maybe you'll be bitten by a bad interaction like in the contrived secp256k1 example above; maybe you won't be. Using HKDF-Extract, or otherwise hashing the input, renders these concerns moot so you don't even have to think about them.

Luis Casillas · Answer 2 · 2019-06-10T22:07:32.390

Yet, I am not sure there are modern symmetric algorithms that require the secret to be uniformly distributed. Sure their security might be defined with a uniform key, [...]

While people normally say that the key of a symmetric algorithm must be chosen uniformly at random, that's strictly speaking false. What the math actually says is that if the key is chosen uniformly at random and kept secret, then the cryptographic scheme has such-and-such security properties. This doesn't preclude that the system could be secure with some other, non-uniform method of choosing keys. ("If P then Q" doesn't imply "If Q then P.")

The reason we want a key derivation step is keys that are sampled from a pseudorandom distribution (i.e., a distribution that cannot be efficiently distinguished from a uniform random distribution) are one such method. These keys are strictly speaking not uniform random, but as Squeamish Ossifrage notes in passing, if an adversary can exploit this fact to attack the scheme when we use the KDF to generate keys for it, then that can be translated into an attack on the KDF itself—one that shows the generated keys can in fact be distinguished from uniform random.

Or in simpler words, pseudorandom keys aren't uniform random keys but can substitute for them, and that's why we use them—they make the "if the key is uniform random, then [security property]" statements carry over indirectly. In very inelegant English:

If it is the case that:
- if the key is uniform random
- then [security property],
Then:
- if the key is pseudorandom
- then [security property].

[...] but using a KDF feels a bit like a cheat since there is no new entropy added, and it's not slow either. Am I missing something?

ECDH is a scenario where we've got a shared secret value that has plenty of entropy. That means that there's no reason you'd want to "add entropy" to it or slow down the key derivation—concerns that suggest you're getting mixed up between password based KDFs (where those are common themes) and non-password based ones.

But as you've said, the ECDH secret is not pseudorandom in the sense of being indistinguishable from a uniform distribution over bit strings. Which means if you use it directly as a symmetric key you can't avail yourself of security arguments like the above.

Do you really need a KDF when you have a PRF?

2 Answers2