2

By my math, if you are inputting all possible 17 byte values, and the output is 16 bytes long, then there must always be 256 possible inputs that will result in each output hashsum. However, looking at some rainbow tables, it seems that either this isn't the case, or (more likely) the rainbow tables were incomplete. But, some of the other research I've done says that the smallest known MD5 collision happens well beyond 17 bytes in length.

How can this be possible?

coder543
  • 123
  • 3

1 Answers1

5

Your math is wrong — not the numerical calculation, but your interpretation of it. There are $256^{17}$ possible inputs and $256^{16}$ possible outputs. On average, there are $256$ inputs for each output. But there are no guarantees that this is the case for all outputs: it's in fact overwhelmingly likely that some outputs have more and others have fewer. For example, we don't even know whether every possible output is in fact the MD5 of a message of any length.

What you can state for sure is that there exists at least one 256-bit string $H$ such that there are at least 256 17-byte messages $M_{1}, \ldots, M_{256}$ such that for all $i$, $\mathrm{MD5}(M_i) = H$. Since we don't know any MD5 collision with a 17-byte output, we are unable to come up with an actual value for $M_i$ or $H$, we just know that it exists by the pigeonhole principle.

See also