8

Given enough RSA ciphertext, is it possible to determine which public key was used to generate the ciphertext?

Presume an unknown public key that was used to generate RSA ciphertext, and we have quite a lot of ciphertext. Would it be possible to determine which key was used to generate the ciphertext?

It seems to me that any ciphertext would be in the range $[0, N)$ where N is the modulus. So if you have enough ciphertext you should be able to estimate a value for N. Now I don't think that this would be enough to establish N even when given enough ciphertext. But I wonder if you could find out which key was used from a set of public keys with some certainty.

Could there be enough information to establish which key is used? Is there a formula that would allow me to calculate the certainty for each key out of a set? Would that formula only consider basic probability over the range or is there anything in the RSA scheme to be more precise?

Only padded RSA with PKCS#1 v1.5 padding or OAEP padding may be considered, not raw/textbook RSA.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323

2 Answers2

6

Yes, it is possible to determine with some certainty which public key in some known set was used to produce RSA ciphertexts made with the same key in this set, given enough ciphertexts, and if neither the public moduli nor the random padding are made with intend to hide which public key was used.

Assume that the set of $k$ public keys have public moduli $N_i$, ordered in increasing order, and we initially have no clue about which might be the one used. If we chose random padding as in any of the two PKCS#1 encryption paddings (randomly and independently of the RSA key), any of $r$ ciphertexts is essentially uniform on the interval $[0,N_j-1]$ where $N_j$ is the public modulus actually used to produce the ciphertexts. The obvious strategy to guess the public key used is to determine the largest of the ciphertexts $C$; the most likely public key in the set is the one with the $N_i$ immediately superior to $C$, and $i\le j$ holds.

Odds that our guess $N_i$ of the public modulus is right are $1/\sum_{j\ge i}(N_i/N_j)^r$, which depends only on where $C$ falls in-between the public moduli, and the values of these (justification: only $j\ge i$ can hold; for any of these $j$, the odds to get a $C$ not higher than what we actually got were $p(C,N_j)=(C/N_j)^r$ before we knew $C$; all these $j$ were then assumed equally likely; therefore the odd of $i=j$ are the ratio of $p(C,N_i)$ over the sum of the $p(C,N_j)$ for $j\ge i$, which simplifies to the expression given).

Note: it is easy to defeat this guessing; for example, when using 2048-bit $N$, encrypt a given plaintext until the result is 2047-bit or less, before sending it.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
4

Could there be enough information to establish which key is used?

If you know the message being signed, and if the signature method is deterministic (e.g. PKCS #1.5 signature padding), then you can do it with 2 signatures (!).

Here's how that would work; first, you guess $e$ (which is most often 65537). Then, you observe that $S^e \bmod N = Pad(M)$ implies that $S^e - Pad(M)$ is a multiple of $N$; hence if we have two messages $M, M'$ and the two signatures $S, S'$, we can compute $gcd( S^e - Pad(M), S'^e - Pad(M'))$, and that'll be a small multiple of $N$

poncho
  • 154,064
  • 12
  • 239
  • 382