83

As I understand it, the RSA algorithm is based on finding two large primes (p and q) and multiplying them. The security aspect is based on the fact that it's difficult to factor it back into p and q. Now, since RSA keys are so large (often 1024 bits and above), the primes have to be at least half that (at least 512 bits then). Such large primes would be difficult to generate (you'd have to check many, many numbers and try to factor each of them), so I understand that the typical approach is to use pre-generated lists of large primes.

But doesn't that make the key very easy to crack? Even if the list container 1,000,000 primes (which I find unlikely), checking all the combinations would only take a couple of hours on a typical desktop computer.

Which part have I misunderstood?

Vilx-
  • 1,247
  • 1
  • 9
  • 14

7 Answers7

72

You don't use a pre-generated list of primes. That would make it easy to crack as you note. The algorithm you want to use would be something like this (see note 4.51 in HAC, see also an answer on crypto.SE):

  1. Generate a random $512$ bit odd number, say $p$
  2. Test to see if $p$ is prime; if it is, return $p$; this is expected to occur after testing about $Log(p)/2 \sim 177$ candidates
  3. Otherwise set $p = p+2$, goto 2

There are other methods to generate primes (e.g., do step 1, if step 2 fails goto step 1). The method I outlined is what OpenSSL uses though.

For an official, public standard on RSA key generation, see FIPS 186-4, section 5.1 and Appendix B.3.1.

mikeazo
  • 39,117
  • 9
  • 118
  • 183
14

"I understand that the typical approach is to use pre-generated lists of large primes."

This is what I also thought. But I had not considered how many primes we might choose from. As it turns out you choose from ~2.8x10^147 primes with a 1024 bit RSA key and from about ~7.0x10^613 with a 4096 bit RSA key. Then you have up to 4.9x10^1227 possible pairs of primes. This amount is enormous, you shouldn't be able to just skim through a list then.

The original answer is at Stackoverflow from David Robinson:

As for whether collisions are possible- modern key sizes (depending on your desired security) range from 1024 to 4096, which means the prime numbers range from 512 to 2048. That means that your prime numbers are on the order of 2^512: over 150 digits long.

We can very roughly estimate the density of primes using 1 / ln(n) (see here). That means that among these 10^150 numbers, there are approximately 10^150/ln(10^150) primes, which works out to 2.8x10^147 primes to choose from- certainly more than you could fit into any list!!

So yes- the number of primes in that range is staggeringly enormous, and collisions are effectively impossible. (Even if you generated a trillion possible prime numbers, forming a septillion combinations, the chance of any two of them being the same prime number would be 10^-123).

king_julien
  • 431
  • 4
  • 8
10

The key is that the test used by crypto libraries to determine whether a number is prime is probabilistic. That is, if the test uses a randomly-chosen value (the "witness") which serves as the basis for the test. If the test passes, then the number is probably prime, but possibly not. We can repeat the same test with a new "witness", and if the test passes again then we have increased our certainty. We can continue to re-test as many times as we want until we've reached the level of certainty that we need.

It is possible, therefore, that the primes used are not actually prime. But it's unlikely enough that it doesn't significantly affect the security of the key.

tylerl
  • 877
  • 6
  • 6
2

You can use the next_prime function available in the GMP library, after generating a random large number. Here's the link : https://gmplib.org/manual/Number-Theoretic-Functions.html

rt_mn
  • 93
  • 1
  • 4
1

The basic method is: You pick a very large number n. Then you perform a test which either tells you “n is definitely composite” or “n is quite likely not composite, but it might be”. If n is definitely composite then you pick a different n. Otherwise you modify the test slightly to give a result independent of the first one. After running these tests say 50 or 100 times you say “it is so very unlikely that a composite n would pass 50 or 100 tests that I can assume n is a prime”.

If you use the Miller-Rabin test with a random parameter a then the chance that a random composite n passes the test is less than 1/4. If you performed the test for 1000 composite numbers and one prime, 250 composite numbers and the prime would pass the first test. 62.5 composites and the prime would pass the second test. About one composite number and the prime would pass five tests. The prime will pass ten tests, the one composite left had about a one in 1,000 chance of passing ten tests. So after ten tests it is very likely but not impossible that n is a prime.

If you first test if n is divisible by 2, 3, 5, 7, 11, 13 filters out more composite numbers, and the chances of a composite number passing is usually less than a quarter. So after passing 64 tests the chances of a 1024 bit composite passing the test is less than the chance of buying five winning lottery tickets in a row.

An interesting fact: If you find p and q which you believe to be prime, set up RSA with these two numbers, and encrypt and decrypt s random message, you hope to get the original message back. Getting the original message back is equivalent to p and q passing one round of the slight less strict Fermat test for both p and Q. So you will most likely detect very quickly that your RSA key doesn’t work because it’s not based on two primes. It has a less than 25% chance to decrypt a random encrypted message correctly.

gnasher729
  • 1,350
  • 7
  • 9
0

I have implemented Maurer's algorithm of generating random provable primes (see Menezes et al., Handbook of Applied Cryptography) in Python, which is, as comparison showed, only moderately slower than via usage of the test of Miller-Rabin (which can deliver only primes that are highly probably prime) for primes of sizes commonly employed for RSA currently in practice. Python is interpreted hence runs not very fast but that doesn't matter much in the present context for the common people who need only to generate a few large primes for RSA etc. My code PROVABLEPRIME is available at: http://s13.zetaboards.com/Crypto/topic/7234475/1/

Mok-Kong Shen
  • 1,302
  • 1
  • 11
  • 15
-3

I'd like to weigh in here, recently starting working in security and I had the same thought as indeed, you correctly noticed that finding primes is very hard and computationally intensive. Actually for the same reason as it offers security: finding primes is in fact comparably difficult as finding the prime factors of the composite number, for any size of the number. If it were easy to find prime factors, it would have also be easy to find primes (= proving that a random number is prime) since finding factors of a number is disproving it's primality. As far as security goes, that's not very good since it must be much much more difficult to break a key then to generate it, otherwise you can forget about security (it 'll either be to easy to break or to difficult to generate).

However, that is, as was already explained by @tylerl, if you want the prime to be a (100%) proven prime. However, you can work with pseudoprimes that are probably prime. The reason is that even if the pseudoprime turns out to be composite after all, anyone is going to have a very hard time finding the factors of it either way (or for that matter of a composite number with that composite pseudoprime as a factor, which would be you public key). Same reasoning here: if it were easy, you'd also easily checked and discarded the number. Moreover, an attacker has no way of knowing which of the keys are composed of a composite pseudoprime, so he would have to try them all (again same reasoning here: if there would be a way, the generator of the key would have been able to do it as well).

You can understand this intuitively: if a number is highly composite, you will be more likely to (quickly) find factors, therefore it would be easy to disprove primality. On the contrary, the fewer factors a number has, the more difficult it is going to be to find them, thus the more likely it is going to be that it is indeed prime if you have tried to find them for a while. To be complete, there are offcourse algorithms that are more optimal than simple trial division, but they are likely going to be only more efficient to some extent (some order of magnitude).

In fact, all you need to make something secure is (apart from efficiently generating candidates) a way to much easier and faster narrow down the probability that a number is prime than to find the factors of that given number, if it turns out not to be prime. As it turns out, there exists ways to do that and that is the fact that security is build on, in contrast with what is is sometimes (incorrect) stated that asymmetric encryption works with primes. It doesn't: it works with pseudoprimes. However, as it turns out, that is for all practical purposes "good enough": A security designer can arbitrarily set the bar of primality certainty as he deems it to be warranted by the value of the information it is protecting weighing against the cost of running the primality check.

Also correctly noted: the predictability of the random numbers generated as input for the key is one of the major aspects of differentiation between levels of security of systems. Randomly picking primes from a small and known list of primes would indeed indicate a low level of security, but can be acceptable given what is to be protected: the knowledge which kind of pizza someone orders probably doesn't have to be protected as well as your credit card number you use to order that pizza online (unless maybe you are the white house and your order may be seen as a declaration of war, see "The Pentagon Pizza Meter theory", but that's a story for another question). Which is why you mostly get redirected to a third party payment provider when ordering something online: Even though both use the exact same security algorithms, they aren't as securely implemented.

Botteries
  • 1
  • 1