How strong if I combine two hash functions, such as MD5(SHA256(input))?

Question

If I try to do MD5(SHA256(input)), what is the strength of this so-called double hashing approach?

Is it as strong as SHA256, or as strong as MD5, or as strong as SHA256 + MD5?

This is not a homework question btw, I am asking because of a real issue in my project. By right, I only need to do SHA256(input) on the input, and store it in a column in one MySQL table. But my practical concern is: SHA256 is of length 64 when being base64 encoded, but MD5 is of length 32 when being base64 encoded. I need an index on this column. Would MySQL perform better when I use MD5(SHA256(input)) because it is shorter (either because of a more condensed B+ tree or basically smaller means less storage)?

Edit by myself: try to think about what do we mean by "strong" here? Is it pre-image resistance, or second-preimage resistance?

fgrieu · Answer 1 · 2021-11-20T08:18:16.200

We study a hash $H$ defined by $H(m):=H_2(H_1(m))$, with SHA-256 for $H_1$ and MD5 for $H_2$.

The collision-resistance of MD5 is badly broken, therefore the generic argument¹ that collision resistance of $H$ is at least the weakest of the collision resistance of $H_1$ and $H_2$ leads us nowhere.

The generic argument² that first preimage resistance of $H$ is at least that of $H_2$ works inasmuch as we trust the preimage resistance of MD5; which is questioned (see this), but not broken in practice.

The only thing I can think of about second preimage resistance of $H$ is that it can't³ be better than second preimage resistance of $H_1$. That gives no security assurance, and since second preimage resistance of SHA-256 is believed good that does not give an attack either.

From an applied standpoint (which is what the question seems to be about since it mentions a concrete application), I don't see that any of the existing cryptanalytic attacks on MD5 or SHA-256 are relevant.

MD5's collision resistance is broken. But the short MD5 input size (the 256-bit output size of SHA-256 is only half an input block of MD5), and the heavy constraint that this input is a hash, are each enough to make existing MD5 collision attacks inapplicable (restricting to those much better than brute force).
Both SHA-256 and MD5 have an undesirable length extension property, but the fixed length of the MD5 input blocks that for MD5's property. And the re-hashing of the SHA-256 output to half size blocks that for SHA-256's.

Thus the only cryptographically relevant attacks I see are the ones there remains for ideal hashes $H_1$ and $H_2$ with their respective output width. Since $H_2$ is much narrower than $H_1$, brute force attacks are limited by the width of $H_2$ and speed considerations. In particular:

The question's compound hash is only 128-bit wide, thus collisions can be exhibited with roughly $2^{66}$ evaluations (of both SHA-256 and MD5) using a standard distributed collision search. I can imagine situations where that low collision resistance is an issue (and can't tell here, that's application-dependent).
The compound hash is not usable for key stretching, as would be needed for passwords, for it's much too fast and not memory hard. There's no indication key stretching is a requirement, though.

That compound hash lacks a clear security argument, is not very fast especially for short input, and is bad from at least a public relations perspective because MD5 is broken. As far as I can tell, it's in practice about as good as any mildly fast 128-bit hash can be on all technical standpoints, including unsatisfactory on collision-resistance (but there's no certainty that collision-resistance is a requirement in the use case). I see no reason to use that compound hash rather than something more modern and safer than MD5, including some (possibly truncated) SHA-2, or SHA-3's SHAKE128 with it's tunable output size, or Blake2b (or Blake3 after the dust settles).

¹ Hypothesize a collision for $H$, that is $(m,m')$ with $m\ne m'$ and $H(m)=H(m')$. We can compute $w=H_1(m)$ and $w'=H_1(m')$. If $w=w'$, we have a collision for $H_1$. Otherwise, we have $H_2(w)=H_2(w')$ with $w\ne w'$, thus a collision for $H_2$. Thus we have broken collision resistance for $H_1$ or $H_2$.

² Hypothesize an algorithm finding first preimage for $H$, that is given $h$ finding $m$ such that $H(m)=h$ with non-negligible probability and feasible effort. We can compute $w=H_1(m)$ and it's such that $h=H_2(w)$. Thus we have built a method finding first preimage for $H_2$ with the same non-negligible probability of success and essentially the same effort as the hypothesized algorithm.

³ An algorithm finding second preimage for $H_1$, that is given $m$ finding $m'\ne m$ such that $H_1(m)=H_1(m')$, also finds second preimage for $H$.

How strong if I combine two hash functions, such as MD5(SHA256(input))?

1 Answers1