15

When creating a Bitcoin account, you need to issue a couple of private/public ECDSA keys. Then, you derive your account address by taking a 160-bit hash (through SHA-256 and RIPEMD) of the public key and use a custom Base 58 algorithm to convert it to alphanumeric values (see the picture below, I took it from this page).

Computing Bitcoin Keys

My question is that collisions might happen (not necessarily malicious ones, but incidental ones). I would like to know whether there is a specific mechanism in the Bitcoin protocol to sort out these collisions and be ensured that the payment is going to the right place.

In fact, I tried to look through the protocol details but could not find anything dealing with this problem. The only reference I found is on the Bitcoin-wiki and it states that:

Collisions (lack thereof)

Since Bitcoin addresses are basically random numbers, it is possible, although extremely unlikely, for two people to independently generate the same address. This is called a collision. If this happens, then both the original owner of the address and the colliding owner could spend money sent to that address. It would not be possible for the colliding person to spend the original owner's entire wallet (or vice versa). If you were to intentionally try to make a collision, it would currently take 2^107 times longer to generate a colliding Bitcoin address than to generate a block. As long as the signing and hashing algorithms remain cryptographically strong, it will likely always be more profitable to collect generations and transaction fees than to try to create collisions.

It is more likely that the Earth will be destroyed in the next 5 seconds, than that a collision would occur sometime in the next millennium.

I found this explanation extremely unsatisfactory. If any of you have a better explanation than that, or know about a prevention mechanism that has been planted into the Bitcoin protocol, I would be delighted to read about it.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
perror
  • 605
  • 2
  • 10
  • 29

4 Answers4

13

Cryptography (and real security in general) offer quantitative analysis of the security provided - Meaning, real security products will describe how long they will resist a certain class of attack.

With cryptography, we select our parameters such that the time required to perform the best attack would exceed the amount considered to be practical, realistic, or feasible.

With a larger input space then output space, collisions must exist - therefore the only solution is to:

  • Accept that collisions must exist
  • Make collisions so rare as to not be a problem

This is exactly what the bitcoin protocol has more or less done. Granted, $2^{80}$ could arguably be a little bit bigger for more comfort; Bitcoin could have used bigger hash functions, but they may have been limited by availability at the time. If you were going to re-design it, I'm sure you would use larger sizes now. But that's always easy to say in retrospect...

Ella Rose
  • 19,971
  • 6
  • 56
  • 103
7

In this application we don't care about attackers generating collisions with themselves. What we care about is.

  1. Two legitimate users inadvertently generating the same address.
  2. An attacker deliberately trying to generate collisions with the addresses of existing unspent outputs.

It's not possible to reduce the risks of these to zero but it is possible to reduce them to negligible levels, through the use of sufficiently large addresses.

Lets consider case 1 first. I don't know a good way to find the total number of addresses that have ever been used but we can get an upper bound by taking the size of the blockchain and dividing it by the size of an address. The blockchain is now about 150GB, divide that by 20 gives us an upper bound of about 7.5 billion addresses. 7.5 billion is approximately $2^{33}$. In reality the blockchain contains a lot more than just addresses and so I expect the real number is going to be lower than this.

The probability of an accidental collision can now be approximated by the equation.

$$p \approx \frac{n^2}{2m}$$

If we assume there are $2^{33}$ addresses in use and we assume that the hash function is uniform then the chance of an inadvertent collision is.

$$p \approx \frac{2^{66}}{2^{161}} = \frac{1}{2^{95}}$$

Which I would consider to be a negligible probability.


Now onto the malicious side. The malicious actor is attempting to find an address collision with an "unspent" output. Finding a collision with an output that has already been spent doesn't help him.

According to https://blockchain.info/charts/utxo-count there are about 67 million unspent outputs. 67 million is approximately $2^{26}$ so the probability of the attackers hash matching an existing unspent output is about $\frac{1}{2^{133}}$.

Now of course the attacker can try many times. Lets assume that generating an address takes the same effort as attempting to hash a block (in reality it takes more). Lets assume that the attacker has as much hashing power as the whole bitcoin network put together and that they run their attack for a century. The total bitcoin network hashrate is about $2^{64}$ hashes per second and there are about $2^{32}$ seconds in a century.

$p \approx \frac{1}{2^{133}} \times 2^{64} \times 2^{32} = \frac{1}{2^{37}}$

Bottom line, it's far far easier to mine bitcoin legitimately than to steal it through hash collisions.

Peter Green
  • 1,613
  • 1
  • 11
  • 17
0

I've always had a problem with probability and that is that over time it will happen 100% for sure. So even if something won't happen statistically for a millenia there is nothing to stop it happening today at say 12.01 pm and then not happening for another millenia or perhaps even happening today, tomorrow and the next day and then not happening for another 3000 years. Granted in BTC terms the likelihood is even smaller but it is a possibility in other words it is possible that you may lose all your BTC from your address any time any day and though less likely than me sprouting wings and flying to the moon still somehow makes me uncomfortable! Greed v fear What maybe even more troublesome is not being able to know for sure if it's happened already and how often. As far as I know we just have the math.

0

Although, statically speaking this is something indescribably low to happen, but the source of randomness to produce the keys are important, theoretically we assume by default that we are generating keys based on a proper source of randomness, if it's not the case, then the theoretical guarantees no longer apply.

But people's mind start to think what if ..., question just because it's possible to occur.

This is a cognitive error, even if you calculate it, I guess.

Mehdi Raash
  • 101
  • 1