Hash function collision importance

Question

Suppose a collision has been found in a certain hash function, such that $H(x_1) = H(x_2)$.

However, $x_1$ and $x_2$ are both a seemingly 'random' collection of bits which do not convey a coherent message and cannot be interpreted in a coherent way.

Does this collision make the hash function $H$ not secure? If so how can it be exploited, even if the known collision doesn't convey a coherent message? Thanks.

Mikero · Answer 1 · 2021-10-08T18:01:17.560

I prefer to think of cryptography as infrastructure. We should strive to develop infrastructure that minimizes the number of usage caveats. It's not cryptography's job to define what are "meaningful" messages in your application. Can you look at a specific collision $m_1, m_2$ and say with certainty that no application of a hash function will ever assign meaning to these $m_1$ and $m_2$?

Which hash function would you rather use, one with a guarantee "it's hard to find any collisions" or one with a guarantee "it's hard to find a collision, except sometimes among strings which are JPGs of gerbils and gzip files of Shakespeare"? I would not want to drive a car with a warning sticker that said "car may explode if you are driving 88.1 mph with the left turn signal on and the radio tuned to 88.1 FM", even if that warning was correctly narrow and I would never do those 3 things at the same time.

That's why cryptographic security definitions consider a collision to be any two messages that satisfy $H(x_1) = H(x_2)$, a signature forgery on any message is an attack, encryptions of any plaintext should look indistinguishable, etc. If you want your cryptography to be used, make sure you strive for a security guarantee that puts everyone at ease.

A second practical reason to be concerned about an "unstructured" collision in $H$, is that when one is found, it is often a matter of time before the techniques are extended to find "structured" collisions. For example, finding structured collisions in a random function (using the classic Yuval collision attack) is approximately the same difficulty as finding unstructured collisions (using standard brute force and birthday bound).

score 2 · Answer 2 · answered Oct 09 '21 at 11:32

Even if the collision is completely unstructured and arose from two completely independent evaluations, it would still be a significant concern for a modern hash function. In cryptographic constructions, we regularly assume that the output of a hash function is distributed uniformly at random. For hash functions such as SHA256 there is no proof for this, and we do not know for example even if all outputs are possible.

Now consider that even for SHA256 (probably the most evaluated hash function of all time), I'd still estimate that fewer than $2^{90}$ outputs have been calculated (and most of those values discarded). For a hash function that produces 256-bits outputs uniformly at random to produce a collision in $2^{90}$ outputs is about a $2^{-77}$ event. We would have to conclude that either SHA256 is does not behave as we hoped or that we are incredibly unlucky (and no-one is that unlucky).

Such an event, if it occurred, should be taken as evidence that SHA256 has considerably less collision entropy than we have been assuming and that with significantly less computational power than we had been assuming, people would be able to collide interesting data values simply by appending random blocks. In such a hypothetical event, it should be taken as a sign that something is badly wrong with the hash function but that we don't know why.

score 0 · Answer 3 · answered Oct 09 '24 at 22:44

Yes, hash functions with even a single collision are insecure. Say I wrote a program that was signed using a Merkle tree (or similar) and was tested to always produce the same result. If an adversary would send in such a program where $x_1$ would produce the correct results and $x_2$ would produce an incorrect result. In that case the adversary could switch between the correct and incorrect results by swapping $x_1$ and $x_2$ while the signature would still verify.

In other words, it would make the attack shown in this answer from Squeamish work. This may not be as impractical as it sounds, software is often signed by having a meta file with hashes over all the other files, and only the meta file will then be signed. This is for instance the way that Java .jar files are usually signed.

If the hash allows for length extension then there may be messages $x_1'$ and $x_2'$ that extend $x_1$ and $x_2$ while still producing the same hash. In that case $x_1'$ and $x_2'$ would not consist of just "random" bits; the message could actually be extended to contain meaningful content, even though the extension would need to be identical for $x_1$ and $x_2$. SHA-1 and SHA-256 and SHA-512 allow for length extension.

Regardless of all this, the chances of finding a collision by chance are astronomically small (assuming a large internal state and output size of the hash function, of course). Moreover, we would never be able to argue that the collision was actually generated by chance. So if a hash collision is published we should assume that the hash function has been broken somehow. It would point to a weakness in the hash function which could be exploited by further attacks.

score 0 · Answer 4 · answered Oct 11 '24 at 21:10

0

If your crypto is secure you shouldn’t be able to find collisions. The other way, if you can find collisions then we must assume that the crypto is less secure than believed; the consequences would be costly.

answered Oct 11 '24 at 21:10

gnasher729

1,350
7
9

score -1 · Answer 5 · answered Oct 08 '21 at 15:48

H would be unsecure iff when given x1 it would be possible to algorithmically derive a different x2 with the same hash. As the the hash space is much smaller than the data space there will always be potential collisions, the question is can we find them.

Hash function collision importance

5 Answers5