Are factorization algorithms parallelizable?

Question

I was reading about the Blum-Blum-Shub random number generator, and its security depends on the hardness of factoring very large numbers (like many things in crypto do).

I'm just wondering, if I have 10 computers, can I break Blum-Blum-Shub 10 times faster? Or is it impossible to factor numbers more quickly using parallel computation?

score 5 · Accepted Answer · edited Jun 24 '15 at 06:45

Unfortunately NO! The factorization is a hard problem on which many cryptosystems or crypto-protocols are built, and is known as the IFP problem, compared to DLP (Discrete Log Problem) in intractability.

Even in the case where $M= 10^{10}$ computers are available, you can't accelerate the resolution of the IFP by M, unless you invent a new and clever algorithm (to be highly parallelisable). Trying a brute force attack, even on modest modulus length of 512 bit is out of range and would cost billion of years. The best algorithm known today for achieving this task (it succeeded breaking up to 768 bit on a large network of computers in some months) is the Number Field sieve or NFS for short. Roughly speaking it's based on two main steps and requires a huge amount of memory storage.

Sieving Step: can be performed in parallel on a large network of computers. But unfortunately need a large amount of memory storage,
Reduction Step: a linear algebra reduction as a Gaussian elimination or refinement as the Bloc Lanczos reduction, which need to solve a huge matrix and can't be parallelisable.

NB: I conclude that factorization Algorithms, is a central and vital question for Cryptography. Don't ignore it! Take a look over the web on the works done in this field, to measure the importance of this kind of question.

score 5 · Answer 2 · edited Apr 13 '17 at 12:48

Summary: to a considerable degree, more computers speed up factorization of a given integer; but the expected time decreases significantly slower than the inverse of the number of computers used: we are in the area of sub-linear speedup.

Some high-performance (but not the best) factorization algorithms, in particular ECM, enjoy near-linear speedup with the number of computers used; in practice, only the initial distribution of parameters and the announcement of success by the lucky computer requires communication between computers. Failure of one computer is not a serious problem, its work can be ignored, or redone.

Problem is, the best algorithm we know for integer factorization is GNFS, which is asymptotically faster than ECM for factoring $N=p\cdot q$ with two factors $p$ and $q$ of about equal size, as often used in RSA (and in the theoretical setup in which BBS is typically discussed); the crossover, though dependent on many things, is perhaps $N$ of 300 bits give or take a factor of 2. So what's really needed to attack that mainstream variant of RSA is efficiently running GNFS on multiple computers.

Simplifying greatly, GNFS has two main bottlenecks:

The relation collection step (also known as sieving step), which is efficiently run on ordinary computers sending the relations they found by ordinary broadband network (internet) for the next step; not much of a parallelization problem here.
The matrix step, which can and is routinely distributed between multiple CPUs, but requires massive communication between these (ideally, high-speed shared memory, which is not achieved beyond a few CPUs or cores even on enterprise-grade servers, where 8 heavily interconnected CPUs each with 18 cores each with 2 threads is the current high-end, see SEJPM's comment), and very expensive in high-performance computers which implement the highest level of interconnection between many nodes using specialized technology (such as infiniband).

However, there are parameters in GNFS that allow a trade-of between making more work in the relation collection step, against manipulating considerably less data in the matrix step (again simplifying: only the best relations are sent for use in the matrix step, others are discarded). The issue is complex, well beyond my full grasp, but I'll give two meta-arguments that parallelization helps a lot:

In the record Factorization of a 768-bit RSA modulus (collective, in proceedings of Crypto 2010), "In total 64 334 489 730 relations were collected, each requiring about 150 bytes" "for an average of about four relations every three seconds" "scaled to a 2.2 GHz Opteron core with 2 GB RAM"; that's about 500 000 core⋅day of said core. For the hard part of the matrix step, "doing the entire first and third stage would have taken 98 days on 48 nodes (576 cores) of the 56-node EPFL cluster" of "3.0GHz Pentium-D processors on a Gb Ethernet"; that's 60 000 core⋅day of pragmatically-coupled (rather than coupled-to-the-max) CPUs; even if cores do not quite compare, significantly more work was done on relation collection than in the matrix step. I conclude that the matrix step was not the limiting bottleneck by any metric other than capital investment (and the expensive cluster was not purpose-built, and was only used for months versus years for the computers collecting relations).
Arjen K. Lenstra, Adi Shamir, Jim Tomlinson, Eran Tromer's Analysis of Bernstein's Factorization Circuit (in proceedings of Asiacrypt 2002) conclude that with a custom hardware "device that costs a few thousand dollars" proposed by Daniel J. Bernstein to perform the matrix step efficiently, "from a practical standpoint, the security of RSA relies exclusively on the hardness of the relation collection step of (GNFS)".

Are factorization algorithms parallelizable?

2 Answers2