2

In the Quadratic Sieve algorithm, we first decide on a B & then look for B-smooth prime factors by sieving using a quadratic polynomial.

I can find a few formulas which help figure out how to decide on B.

To factor a number N, we can use the following:

$L = e^{\sqrt {\ln(N)ln(ln(N))}}$

$B = L^{\frac {1}{\sqrt 2}}$

This gives a rough estimate of what smooth numbers we should look for.

However, I am unable to find any formula or thumb rule to figure out how many numbers to sieve using the quadratic polynomial.

If it's not clear what I am talking about, let me explain using the Wikipedia article on the QS.

In the Data Collection part of the "Example of Basic Sieve", it says the following:

begin the sieving process for each prime in the basis, choosing to sieve the first 0 ≤ X < 100 of Y(X).

So here they choose to generate a list of 100 Y(X)'s to sieve. Is there any rule of thumb or formula for arriving at this number (100 in this case)?

Patriot
  • 3,162
  • 3
  • 20
  • 66
user93353
  • 2,348
  • 3
  • 28
  • 49

1 Answers1

4

In the pure Quadratic Sieve, we need to find a little more relations (equivalently, smooths) than there are primes in the factor base. In this, we count small primes $p_i$ that make $N$ a quadratic residue modulo $p_i$, not their powers. This count is also the number of columns in the (sparse) matrix of relations, where relations are lines, that serves as input to the linear algebra phase. Typically, 5 more lines than columns will be enough. Each further extra relation decreases the probability of failure by a factor of two.

This gives a working criteria to stop sieving and start linear algebra, but does not directly tell how many integers we will need to sieve. The rule of thumb for that, and other parameters, depends on many things, including

  • Critically, if we sieve one (QS) or multiple polynomials (MPQS/SIQS). With QS, it all too often happens that the polynomial grows to overly high values, thus smooths become a rarity; then, enlarging the sieve or reusing it to sieve farther on the same polynomial won't help much.
  • The strategy w.r.t. polynomial values that we are not sure (or highly confident), by summing the logs of their factors found by sieving, that they are smooth enough for our purposes. We can
    • Ignore them, even thought they could still be $B$-smooth.
    • Try small prime factors up to the highest prime sieved $B$, and their powers even if they exceed $B$ (in other words, we restrict to $B$-smooth values of the polynomial that passed the sieving step); that's easy, and typical for basic (MP/SI)QS.
    • Try small factors up to some higher limit $B'$ (in other words, we sieve prime and prime powers to $B<B'$, and restrict to $B'$-smooths that passed the sieving step).
    • Additionally allow one large prime factor (PMPQS) to some higher limit, in hope a collision for these large primes will let two otherwise unusable relations yield a new usable one (that is without large prime, so that it fits the limited number of columns in the matrix)
    • Additionally allow two large prime factors (PPMPQS).
  • The sieving threshold (for the sum of logs of primes accumulated in the sieve entry), which is a compromise between rejecting $B$-smooths, and spending too much time on trying to factor candidates that won't yield usable relations.

Recommendations to get QS working:

  • First make something handling modest $N$ without a sieve array!
    • Use a factor base of $p_i$ making $N$ a quadratic residue, up to some limit $B$. It's safer to first err on the large side of optimum $B$.
    • Find $B$-smooths among the values of polynomial $t^2-N$ with $t\gtrsim\left\lceil\sqrt N\,\right\rceil$ by a most basic way (e.g. trial division). Find a little more smooths than there are primes.
    • Get the linear algebra working to factor $N$.
  • Optimize most (or all) trial divisions of $t^2-N$ by a test that $t\bmod(p_i^k)$ falls into a set to two pre-computed values.
  • At that stage, the bottleneck should be finding smooths by attempting to fully factorize many $t^2-N$ using the (optimized) trial division, with most ending up not being $B$-smooth. Introduce the sieve array, which purpose is to (dramatically) reduce the number of smooth candidates, at the cost of wrongly rejecting a few ones (sieving for prime powers helps making false rejection rare). This allows to increase $N$ (thus raising the optimal $B$).
  • Introduce further optimizations one by one:
    • having $-1$ in the factor base, that is also sieving $N-t^2$ for $t\lesssim\left\lfloor\sqrt N\,\right\rfloor$
    • the easy "multiplier" improvement, which factors $m\,N$ for some selected small integer $m$, such that the factor base is improved (has relatively many small primes)
    • using multiple polynomials, which allows to sieve and factor much smaller candidates, thus more likely to be $B$-smooths
    • the bottleneck may still be attempted factorization of smooths that passed the sieve test (depending on threshold in sieving, and size of factor base); some savings are possible by using the (optimized) trial division only by small primes, then something better, perhaps Pollard's rho with a primality test when it does no yield quickly; such test will be necessary for the large prime(s) improvements.
    • one then two large prime improvements (PMPQS and PPMPQS).
  • Optimize whatever the bottleneck is, and tune the many parameters. Ultimately, the bottleneck should be be sieving. Optimized implementations make a lot of effort there, with techniques and code specialized according to the size of primes sieved, and interaction with the CPU cache(s).
fgrieu
  • 149,326
  • 13
  • 324
  • 622