2

Dropbox uses CRC32 and md5 to create a checksum of each 4 mb "block" as part of the their compression/file system.

I suspect that Dropbox uses an additional cryptographic hash to check if there's a collision, but let's assume that doesn't happen.

Is using CRC32 and md5 in combination cryptographically secure?

user3201068
  • 721
  • 1
  • 5
  • 18

2 Answers2

3

Disclaimer: I have no first-hand knowledge of what hash (or MAC, or whatever method) DropBox uses for de-duplication; about if it is enough to know that (and its key, for a MAC) in order to download something from DropBox; and I see slightly diverging opinions about these points.

If we consider the problem of finding a collision, the 160-bit hash defined by $$H(M)=\text{CRC32}(M)||\text{MD5}(M)$$ is NOT cryptographically secure, for it is only marginally stronger than $\text{MD5}$ is.

$\text{MD5}$ no longer provides a good protection against collisions: with this Fast Collision Attack on $\text{MD5}$ we can find a collision (for messages at least 128 byte) with cost about $2^{18}$ compression functions (finding collisions for $\text{CRC32}$ is totally trivial). Finding a collision for $H$ is harder than for $\text{MD5}$, but by a dumb method (finding $\text{MD5}$ collisions until one is also a collision for $\text{CRC32}$) only like $2^{32}$ times harder, and $2^{50}$ compression functions is non-trivial but feasible. Update: And as pointed by poncho we can (for messages at least 4 kiB) make that only like $32$ times harder, or about $2^{23}$ compression functions, which is nothing. As the saying attributed to the NSA goes, attacks only get better, they never get worse.


If we consider the problem of finding a second-preimage, $\text{MD5}$ remains impractical to attack as far a we know, and $H$ is at least as strong as $\text{MD5}$ is.

If we consider the problem of finding a first-preimage, $H$ can't be more than $2^{32}$ times easier to attack than $\text{MD5}$ is, likely that can only be approached for short messages where brute-force is the best attack, and for long messages likely this is (at least) about as hard as for $\text{MD5}$, which remains impractical to attack as far a we know.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
2

Neither CRC32 , nor MD5 are cryptographically secure. MD5 has known collision weaknesses and is therefore not to be considered cryptographically secure anymore. And CRC32 isn't even a hash… it's a “cyclic redundancy check” algorithm, which produces an “error-detecting code”. Cyclic redundancy checks are not and were never meant to be cryptographically secure.

Even if they were, Dropbox doesn't base it's file-storage on a checksum and/or colliding hash. It's not as if they simply take your upload, cut it up in 4mb parts and throw it into MD5 to prevent duplicates. They would've drowned in chaos if they would have done so. The way they handle file-storage involves smarter things like De-Duplication (with 256-bit block checksums) etc.

Rumours confirm that Dropbox may be using raw SHA256 hashes to “uniquely” identify data, and some articles explain how this can be exploited in a number of ways. Also SHA256, SHA1 and MD5 checksums have been spotted seen along with download links – which rules out that they might be relying on CRC32 and/or MD5 alone. Practical analysis of Dropbox came to the same conclusion. But not being able to peek inside the box, it's hard to tell what we're exactly looking at. All we know is what DropBox published… which isn't that much when it comes to the technologies/means they use to be able to handle such an amount of data in a (somewhat) optimal way. But it's not hard to realize that it's stronger than your CRC32 + MD5 assumption.

Anyway… setting aside DropBox-related speculations and getting back to the more important part of your question: when it comes to file integrity and checking for data-collisions, companies like Tripwire explicitly perform both an MD5 and a CRC32 check on a file to determine a change because it's hard to find collisions matching two different algorithms… from that point of view, it might be practical to use, as it will enhance collision-hardness. Yet, from my personal point of view, that merely lowers the chance of collisions minimally. Therefore, I wouldn't prefer a CRC32 & MD5 combination over (for example) SHA3… but if I would need to increase collision-resistance, and combining a CRC with a Hash would be the only available option, I would most probably agree to opt-in on CRC32 and MD5 combination as it'll surely be able to detect more collisions than MD5 or CRC32 on their own.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240