10

What is a fast algorithm for computing integer square roots on machines that doesn't support floating-point arithmetic?

I'm looking for a fast algorithm for computing the integer square root of an integer $N \ge 0$, i.e., the smallest integer $m \ge 0$ such that $m^2 \le N < (m+1)^2$.

The reason is that I'm operating on a virtual machine that doesn't support floating-point arithmetic (real-numbers), so algorithms like Newton's cannot be implemented, right?

The naive solution for finding the square root of $N$ is to check for $m=0,1,\ldots$ whether $m^2 \le N < (m+1)^2$. Is this the best algorithm?

Shuzheng
  • 5,821
  • Apart from the integer version of Newton's algorithm, a binary search is also faster than the naive method (except for very small $N$, but then all methods that aren't insane are fast). – Daniel Fischer Oct 12 '17 at 18:10
  • The optimal solution highly depends on the amount of RAM and machine commands available, and whether are you going to perform arbitrary precision operations or just use a fixed-size (which one) integer. – g.kov Jun 24 '20 at 10:19

5 Answers5

13

I think the easiest method will Babylonians sqrt method and it works well with machine supporting only integer division.

def babylonian(n):
x = n
y = 1
while(x > y):
    x = (x+y)//2
    y = n//x
return x

This algorithm runs in approximately $O(\log n)$ time. More details can be found on StackOverflow.

Lakshman
  • 677
  • Does your algorithm end too early as written? When the algorithm achieves x=y, you have found your integer square root, but in the step before that it seems it can occur that x<y, so if you stop at that iteration you will be off.

    For example with n=50 the iterations for (x,y) will be (50,1), (25,2), (13,3), (8,5), (6,8), (7,7) and of course the answer is 7.

    – POJ Oct 18 '24 at 03:16
  • I think you can avoid this problem if instead you update y before you update x in the while loop. That seems more like the traditional Babylonian for real numbers. – POJ Oct 18 '24 at 03:45
10

A fast ($O(\log n)$) way to calculate the integer square root is to use a digit-by-digit algorithm in base2:

$$\text{isqrt(n)} = \begin{cases} n & \text{if $n < 2$} \\ 2\cdot\text{isqrt(n/4)} & \text{if $(2\cdot\text{isqrt(n/4)} + 1)^2 > n$} \\ 2\cdot\text{isqrt(n/4)+1} & \text{otherwise} \end{cases}$$

Making sure to calculate $n/4$ using bitshifts, and doing the above iteratively. An example for 16-bit integers can be found on Wikipedia.

orlp
  • 10,724
9

Assuming integer log2 is available, you can get a pretty good starting estimate by rewriting $\sqrt{x}$ like this: $$\sqrt{x}=x^{\frac{1}{2}}=2^{\frac{\log_{2}{x}}{2}}$$ Integer $\log_{2}$ is often one or two cheap instructions. You just have to refine the result a couple times since everything is floored. This O(1) implementation in C++ finds the square root of every integer $[0, 2^{32}-1]$ in 15 seconds on my PC - about 3.5 ns per iteration.

#include <bit>
constexpr uint16_t Sqrt(const uint32_t x) {
    // Avoid divide by zero
    if (x < 2) {
        return static_cast<uint16_t>(x);
    }
    // This code is based on the fact that
    // sqrt(x) == x^1/2 == 2^(log2(x)/2)
    // Unfortunately it's a little more tricky
    // when fast log2 is floored.
    const uint32_t log2x     = std::bit_width(x) - 1;
    const uint32_t log2y     = log2x / 2u;
    uint32_t       y         = 1 << log2y;
    uint32_t       y_squared = 1 << (2u * log2y);
    int32_t        sqr_diff  = x - y_squared;
    // Perform lerp between powers of four
    y += (sqr_diff / 3u) >> log2y;
    // The estimate is probably too low, refine it upward
    y_squared = y * y;
    sqr_diff  = x - y_squared;
    y += sqr_diff / (2 * y);
    // The estimate may be too high. If so, refine it downward
    y_squared = y * y;
    sqr_diff  = x - y_squared;
    if (sqr_diff >= 0) {
        return static_cast<uint16_t>(y);
    }
    y -= (-sqr_diff / (2 * y)) + 1;
    // The estimate may still be 1 too high
    y_squared = y * y;
    sqr_diff  = x - y_squared;
    if (sqr_diff < 0) {
        --y;
    }
    return static_cast<uint16_t>(y);
}

The two variable integer divisions in the refinement step aren't cheap, but this was still faster than any of the looping solutions I came up with.

4

For $1 \leq N \leq 15,$ suggest table lookup. $1 \leq N \leq 3,$ output $1.$ For $4 \leq N \leq 8,$ output $2.$ For $9 \leq N \leq 15,$ output $3.$

Otherwise, as far as finding a good starting point, I suppose you can begin with $1,4,16,..., 4^k$ until $4^k \geq N > 4^{k-1}.$ Then let the starting point be $$ x_0 = 2^k $$

Newton's method with integers and the floor function, but at the very end prevent loops: $$ x_{j+1} = \left\lfloor \frac{x_j^2 + N}{2 x_j} \right\rfloor $$ As soon as $$ |x_{j+1} - x_j| < 5, $$ switch to your one-at-a-time idea between those endpoints. Otherwise, an infinite loop is possible.

When I programmed this, I let Newton go until consecutive terms were very close together ($<5$). Then I set $m$ to the smaller one, and increased $m$ by one until the square exceeded $N.$ Then I decreased $m$ by one until the square is no larger than $N.$

Will Jagy
  • 146,052
  • You have found the integer square root - namely $x_j$ - when $x_{j+1} \geqslant x_j$ with $j \geqslant 1$ (that is in case the initial guess $x_0$ was too small, then $x_1 > x_0$ doesn't imply that $x_0$ is the integer square root). – Daniel Fischer Oct 12 '17 at 18:06
  • This is slow because it uses integer division. – orlp Oct 12 '17 at 18:09
  • @DanielFischer I did not put in everything. When I programmed this, I let Newton go until consecutive terms were very close together. Then I set $m$ to the smaller one, and increased $m$ by one until the square exceeded $N.$ Then I decreased $m$ by one until the square is no larger than $N.$ – Will Jagy Oct 12 '17 at 18:17
  • 1
    I'm just saying that there is a quite simple stopping rule. Iterate until $x_{j+1} \geqslant x_j$, then $x_j$ is it (if $j > 0$). – Daniel Fischer Oct 12 '17 at 18:20
  • @DanielFischer I see what you mean now; that seems better than what I did. At the time, I was dismayed to find an infinite loop I did not understand, until I started putting in cerr commands and so on. The thing I described was my cure, first thing to come to mind. – Will Jagy Oct 12 '17 at 18:23
0

In addition to the Babylonian method (the same as Newton's method), there is another log(n) complexity algorithm, namely the bitwise algorithm, which in practice is slightly faster than the Babylonian algorithm as it uses only binary operations. Here is the Python implementation of it:

def integer_square_root_bitwise(n):
    orig_n = n
    result = 0
    bit = 1 << (n.bit_length() // 2 * 2)
    while bit > 0:
        if n >= result + bit:
            n -= result + bit
            result = (result >> 1) + bit
        else:
            result >>= 1
        bit >>= 2
    is_square = (result * result == orig_n)
    return is_square, result

However, if you are working with huge numbers (say over 100 digits), there is a faster algorithm (it is not in the Wikipedia list https://en.wikipedia.org/wiki/Methods_of_computing_square_roots, so here I refer to it as Bo's method):

For each iteration, we are trying to get the best estimation of $\Delta = x - x_k$

  1. find the upper limit of $\Delta_{max} = \Bigl \lfloor \frac{y-x_k^2}{2 x_k} \Bigr \rfloor$
  2. find the best estimation of $\Delta$ as $\Delta = \Bigl \lceil \dfrac{y - x_k^2}{2 x_k + \delta_{max}} \Bigr \rceil$

then we have a update $x_{k+1} = x_k + \Delta$

The details of the algorithm can be found at https://botian.replnotes.com/posts/fast_integer_square_root

Here is an example output of Bo's method of computing $\sqrt{5438224}$

Finding the square root of 5438224 using Bo's method
iteration                x            range y - x^2               
        0             2304                - 129808          
        1             2332                0 0               
The square root of 5438224 is 2332

The step 0 is an initial estimation. It finds the solution in one iteration. By comparison, here is the output of the bitwise method:

Finding the square root of 5438224 using the bitwise method
iteration       result          bit            n
        1            0      4194304      5438224
        2      4194304      1048576      1243920
        3      2097152       262144      1243920
        4      1048576        65536      1243920
        5       589824        16384       129808
        6       294912         4096       129808
        7       147456         1024       129808
        8        73728          256       129808
        9        37120           64        55824
       10        18624           16        18640
       11         9328            4            0
       12         4664            1            0
The square root of 5438224 is 2332

It takes 12 iterations for the bitwise method to complete the processing.

We have to use a bigger number to demonstrate the iterations of Bo's method. Here is the process of computing $\sqrt{2758815150486084950425754176}$

Finding the square root of 2758815150486084950425754176 using Bo's method
iteration                x            range y - x^2               
        0   48378511622144                - 418334763712162868194597440
        1   52517137969970     184933324488 765369929220257803953276
        2   52524424323189           505498 3676709702624455
        3   52524424323224                0 0               
The square root of 2758815150486084950425754176 is 52524424323224

The same number requires 47 iterations for the bitwise algorithm to complete the computation.

Bo Tian
  • 161
  • Interesting approach! Noting that your code is in Python, did you try benchmarking this against the standard library math.isqrt function? – Mark Dickinson May 28 '25 at 16:07