12

Currently, a magnet link containing a 40-digits long SHA-hash value, is assigned to every torrent which is created. Therefore, this hash should be unique to identify a torrent and send the right bytes (packages) to the right people. So therefore, there are $16^{40}$ (1461501637330902918203684832716283019655932542976) possible hashes to uniqueliy identify a torrent. What happens after this number has been reached or two torrents with different content have the same hash? (Although this practically never happens, there is a chance of $\frac{x+1}{16^{40}}$ where $x$ is the number of public torrents aka. assigned hashes, that your hashes collide.)

So to make long things short: What happens if two magnet-link-hashes are equal for different contents? What happens after every magnet-link has been used?

wythagoras
  • 207
  • 1
  • 6
MechMK1
  • 445
  • 5
  • 18

3 Answers3

19

$16^{40}$ is a huge number. For instance, if you consider each torrent to consist in a single byte each (so they are quite uninteresting torrents), and you pack them all on 10 TB hard disks (for a torrent to exist, it must exist on at least one hard disk on the planet), and if each such disk weighs about 100g, then the total weight of the disks is about 24 billions of times the mass of the whole Earth.

So I am quite confident in stating that this number will not be reached.

Risks of collisions are actually higher. If you accumulate 160-bit hash values (outputs of SHA-1) then, on average, the first collisions should appear after about $2^{80}$ values (this is known as the birthday paradox). So the situation you describe may occur much earlier than after exhaustion of the $16^{40}$ space. But $2^{80}$ is still huge, and cannot be practically achieved ($2^{80}$ torrents mean more than one hundred thousand billions of torrents per human being on Earth).

If a hash collision occurred, then streaming either of the two colliding files would most probably cease to work; downloaders would obtain a mixture of both files. Other torrents would remain totally unaffected.

An interesting question is whether you could, given an existing torrent $T$, handcraft another $T'$ which hashes to the same value -- it would be used as a weapon to prevent usage of $T$. To do that, you would have to break second preimage resistance of the hash function. SHA-1 has a few known weaknesses, but for now there is no known method for obtaining a second preimage, except luck (try random values until one matches). Luck (aka "brute force") has average cost $2^{160}$, which is completely out of reach of today's (and tomorrow's and next decade's) technology.

wythagoras
  • 207
  • 1
  • 6
Thomas Pornin
  • 88,324
  • 16
  • 246
  • 315
1

From my understanding the hash is made from the file metadata and a short subset of the binary file. There is a chance (so small as to be effectively nill but not actually) that 2 files could feed into the formula and come out with the same Hash.

You could fake a file if you know how the hash is formulated so that it appears to be the same file but actually has different content. But there are also error checks in the file process so that assuming you choose a good file as the primary seed the bogus file parts would be discarded in the error checking when reassembling the file and the Torrent client would call out for those parts again. Theoretically you might be able to break that as well but the amount of effort that would be required to break both makes that attack vector unusable.

And the accidental collision is handled by the error checking. So you may end up with some garbage packets but it will not prevent the torrent client from completing the file.

Chad
  • 283
  • 2
  • 10
1

What happens if the computer processing the torrent file gets struck by a cosmic ray and processes incorrectly? What happens if an asteroid strikes the Earth and destroys every computer before the torrent file can finish downloading?

If the probability of a failure occurring is many orders of magnitude lower than the probability of another failure that causes the same result, then there is no point worrying about that failure. This is a failure that is many billions of times less likely than thousands of other failures that cause much worse results.

The probability that one computer will make an error calculating a single SHA-1 hash is much, much larger than the probability that any computer will encounter two sets of data that produce the same SHA-1 hash.

David Schwartz
  • 4,739
  • 21
  • 31