3

Earlier today I was answering a question on the ethereum SE site that analyzed the potential for more than one private key on curve secp256k1 (which maps to a distinct public key) to control the same ethereum address which are derived by hashing the public-key with Keccak256 as a byte array, and whether the rightmost 160-bits of the two resulting distinct hash digest could collide and thus two different private keys control the same address.

In other words, what would be the possibility that any two Keccak-derived hash digests had the same right-most 160 trailing bits (small-endian side)?

And secondly, does that possibility change (and to what degree) if both pre-images can only be valid secp256k1 public-keys of which there are roughly ${2^{256}}$ of them or more precisely n as per this curve's parameters, where n is equal to 0xfffffffffffffffffffffffffffffffebaaedce6af48a03bbfd25e8cd0364141.

Again, ethereum addresses are simply the last 40-hex characters of a 64-hex (256 bit) Keccak Hash of the secp256k1 public-key derived from the users private key (assumed to be less than n).

If the digest space is $2^{256}$, and there exist even 2 digests that share the same trailing 160-bits, what are the chances that the pre-images to those 2 digests are valid 512-bit public-keys? And moreover, that those private-keys were actually derived by a private key known to the user (and not just a random 512-bit string that was brute-forced).

Given the multiple layers of conditions (compared being able to try any arbitrary pre-image regardless of length etc..), I was thinking that it could be possible (by chance perhaps) that no two private keys will derive the same ethereum address, even if there exists more than one Keccak256 hash-digest that partially collides in terms of the last 160 bits.

Although if there is a non-negligible amount of such 160-trailing-bit collisions, perhaps the chances actually exist for one ethereum address to be controlled by two different public keys? (even if not feasible to find them).

Note: My estimate/guess is that one would have to brute-force search at least $2^{160}$ valid private keys on the elliptic curve, including steps to derive the public key and resulting Keccak256 hash digest, to try to find even one such partial collision of the last 160-bits.

Can such a scenario even be computed or is there no way to know?

Steven Hatzakis
  • 401
  • 4
  • 14

1 Answers1

4

There are about $2^{256}$ distinct inputs, secp256k1 private keys, and about $2^{160}$ distinct outputs, Ethereum addresses.

By the pigeonhole principle, there is at least one address which is shared by many private keys. In principle, there could be $2^{160} - 1$ addresses each with exactly one private key, and another address with $2^{256} - (2^{160} - 1)$ different private keys. But that is essentially guaranteed not to be the case in practice.

The map from secp256k1 private keys to Ethereum addresses can reasonably be modeled as a uniform random function. In this model, the expected number of secp256k1 private keys shared by a single hash is about $2^{96} = 2^{256}\!/2^{160}$.

What can you do with this apparently staggering number of colliding private keys?

  • Suppose the world has generated $n$ private keys and published their addresses. The probability that there is a pair of private keys $k \ne k'$ sharing a common address $H(k) = H(k')$ is about $n^2\!/2^{160}$ by the birthday paradox[1].

    If the world has generated a trillion keys, or about $2^{40}$, this probability is below $1/2^{80}$. In other words, you are essentially guaranteed it hasn't happened.

  • Suppose you want to find some pair of private keys $k \ne k'$ sharing a common address $H(k) = H(k')$. The cheapest known method is the van Oorschot–Wiener parallel collision search machine[2], whose expected cost is $\sqrt{2^{160}} = 2^{80}$; of course, if parallelized $p$ ways, it runs in time $2^{80}\!/p$.

    This cost is spent by the Bitcoin network in about a day. But what would you do with this pair of keys? You could give one of the keys to someone and do something nefarious with the other key, but…if you can already choose someone's key, it's hard to imagine what nefarious things you could do with a colliding key that you couldn't already do with the original in the first place.

  • Suppose you have a specific address $h = H(k)$ for a key $k$ that you don't know, and you want to find a key $k'$ that matches $h$. You are essentially guaranteed that $k' \ne k$, but that doesn't matter—$k$ can spend the money of $k'$ and vice versa. Perhaps you actually have $t$ different addresses, $h_1 = H(k_1)$, $h_2 = H(k_2)$, …, $h_t = H(k_t)$, and you will be happy if you find one matching $k'_i$. The cheapest known method is an adaptation of Oechslin's rainbow tables[3] to a $p$-way parallel machine[4], which costs $2^{160}\!/t$ as long as $t < p^2$, and runs in time $2^{160}\!/pt$.

    Even if you had a quadrillion target keys (about $2^{50}$), which you don't, and you had a $2^{100}$-way parallel computer, which you don't, it would still cost $2^{110}$ to power that computer to find the first matching private key, which you won't.


P.S. Public keys may be encoded with 512 bits but there are only about $2^{256}$ of them—specifically, there are exactly as many distinct public keys as there are distinct private keys.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230