How can a attacker find a collision of a keyed digest without knowing the key? Are collisions not an issue anymore if we apply a keyed-digest?

Question

Same as the title.

How can an attacker find a collision in a keyed digest without knowing the key? Does using a keyed digest eliminate the problem of collisions?

Marc Ilunga · Answer 1 · 2024-06-03T08:13:26.627

Without further restrictions on the definition of “collision,” we can find several examples of easy collisions in keyed hashes. It all comes down to what capabilities the adversary is allowed to have. But first, let's fix a scope.

Definition of a keyed hash: It produces digests $t = h(k,m)$, for a key $k$ and input $m$. All values come from a defined value space (e.g., fixed-length bit string). In general, the key is hidden from the collision adversary, but... the adversary is allowed to get evaluations on chosen inputs. Let's see examples of easy collisions with this out of the way.

The trivial keyed hash: Let $h(k,m) = 0$ for all $m$. This function matches the definition of a keyed hash, but finding collisions is trivial. This example is somewhat useless and only here to illustrate the need for a narrower scope to meaningfully define security.
Polynomial hashes: Universal hash functions are a type of keyed hashes that are used pervasively in cryptography. They are particularly used in AEAD schemes. The most known examples are Poly1305 and GHASH (GCM). A simple example of a polynomial hash defined over some field $\mathbb{F}$ consider $m = (m_1,\ldots,m_n)$, then $$h(k,m) = k^n + m_1k^{n-1} + \ldots + m_n.$$ It is well known that finding a blind collision is hard, even for computationally unbounded adversaries. Here, blind means that the adversary has to output a colliding pair $(m_1, m_2)$ without any other information. However, collisions are trivial if the adversary can get evaluations over chosen inputs. In particular, evaluation over the input $m = (m_1)$ exposes the hash key, allowing collisions with another input $m'$. This is why, in most uses of UHFs, the output is never exposed; in other words, UHFs achieve a weaker notion of collision resistance.
Swap-PRF of HMAC. This is a more subtle example. HMAC is well-known as a PRF and, therefore, a keyed hash for which finding a collision is hard. However, there's also the Swap-PRF of HMAC, where the key and input to HMAC are “exchanged.” In other words, $$\text{Swap[HMAC]}(k,m) = \text{HMAC}(m,k).$$ This usage is fairly common (although implicit), especially to combine multiple secrets (i.e., Key combiner). The Swap PRF of HMAC is secure as well; but this only holds with appropriate restrictions on the message space. Indeed, in the standardized HMAC, the key input is processed via the Pad-or-Hash method. So the key input to HMAC is either padded with $0$'s or hashed first depending on its length. In the Swap PRF case, note that the key is the input that the attacker controls. Therefore, collisions are trivial in this case.

Stronger notions of collision resistance. Going outside the previously defined scope, some functions are not guaranteed collision-resistant when the hash key is known. For example, applying the CBC to a construct of a PRF admits collisions when built upon a block cipher. It might be harder to build CBC on top of a PRF, though.

On a more positive note, several constructions of keyed hashes are expected to provide collision resistance in the scope of this answer. In particular, collisions should be hard to find in secure PRFs.

kelalaka · Answer 2 · 2024-06-02T17:52:21.300

Keyed Hashes require a key to process to apply the generic collision search. This is not possible since the attackers don't have the key. If they have the key, collision is your least concern.
Another way is the total break of the keyed hash, i.e. access the key. This is not possible for HMAC, KMAC, etc.
Collision, in general, is not important for Keyed Hash since those are usually used for Message Authentication Codes(MAC). The second pre-image attack is important. Considering that they saw an authenticated message, they need to find another message that produces the same. To achieve this, they again need the key other than random guessing ( to which point).

a. HMAC is secure even if there is a collision in the used hash function by Mihir Bellare New Proofs for NMAC and HMAC: Security without Collision-Resistance, 2014

b. This doesn't mean that one should construct a MAC with an insecure Hash function ( we expected that first the collision resistance will be broken like MD5 and SHA-1)
Even for malicious entities that publish the keyed hash $HMAC(k,m_1)$ as a commitment with message $m_1$ and later reveal with $m_2$ such that $$HMAC(k,m_1) = HMAC(k,m_2) $$ is not possible for HMAC and KMAC.

HMAC is still the beast since 1997 and it is usually used in mutual authentication between two entities where they established keys with some key establishment mechanism.

How can a attacker find a collision of a keyed digest without knowing the key? Are collisions not an issue anymore if we apply a keyed-digest?

2 Answers2