Is there an equation similar to square root, but faster for a computer to compute?

Question

I'm making an app that uses a Fermat's spiral to space objects out in an aesthetically pleasing way. This is a change from my first attempt, which used an Archimedean spiral, but I felt that the outer objects became too far apart.

Fermat's spiral uses a square root to calculate the current radius: $r = \sqrt{\theta}$ and $\theta = x$

Square root is a relatively slow algorithm for a computer to calculate for each point, and I don't really need the function to be exactly square root, just to increase slower as it gets larger (preferably non-asymptotically, but could be). When I look for functions "similar to square root" I get nowhere. Remembering high school math, I think

$f(x) = 1-\frac{1}{x+1}$

might work alright.

What are some more functions $f(x)$ that, similarly to square root, increase fast at first, but slow down as $x$ increases?

In case it causes problems, I'm not asking for opinions on which are better, I just want options (or a link to a bunch, if such exist), though help on making my spiral thing better would also be neat, if I'm just barking up the wrong tree for aesthetic fairly-even-distribution around a point.

For large $x$ , I doubt that there is a really fast way to even get a resaonable approximation of $\sqrt{x}$. There are several possibilities for small numbers , say $x\approx 1$ or $x\approx 0$ — Peter, Jul 11 '23 at 05:53
Nth roots, logarithm, and the $f(x)$ you mentioned are all "simple options". You could also put some points on a graph and solve for a polynomial passing through them (flip $x/y$ axes for various results). — abiessu, Jul 11 '23 at 05:53
If computing time is an issue, you might consider ways to invert the operation you are performing... Is it possible to run the algorithm using $r^2=\theta$ with some tricks instead? — abiessu, Jul 11 '23 at 06:05
@abiessu And a double precision square root typically takes less than 10 nanoseconds, so is the computing time really an easy? — gnasher729, Jul 11 '23 at 06:23
I wonder what kind of hardware is being used if it requires some faster square root computation than inbuilt one of some programming language. I suggest double-checking the whole app’s code – it may be the case that some other thing slows it down. — Yalikesi, Jul 11 '23 at 07:15
Don't you have also sines and cosines to compute, since this function is defined in polar coordinates? They rate to be even slower than the square root. — Federico Poloni, Jul 11 '23 at 14:37
You could post your code on [codereview.se] and ask for ways to make it faster. I doubt that the sqrt call is a significant cost. How many points are you generating? How much time do you have to generate these points? — Cris Luengo, Jul 11 '23 at 15:32
It's worth noting that even though the sqrt function is slow, the bottleneck might not be caused by it at all. — Filip Milovanović, Jul 11 '23 at 18:06
not an answer but may be helpful if your trigonometrics are faster than your square roots: https://astronomy.stackexchange.com/a/48892/7982 Except for $\sqrt{5}$ the sunflower_on_sphere() algorithm only uses trig and inverse trig. — uhoh, Jul 11 '23 at 22:26
@uhoh That's a neat link! Thank you! The sqrt isn't used there because the surface of the sphere is defined by trig, not square root. That said, the sunflower idea does help, showing the importance of using the golden angle to generate uniform positions. — meboyhe, Jul 11 '23 at 22:38
On modern x86s with a hardware FPU, the time to process the fsqrt opcode is comparable to a simple division. (As opposed to sin and such which are much slower). When the root was "manually" computed it was much slower, but that was about 25 years ago. Surprised me too. — Peter - Reinstate Monica, Jul 12 '23 at 10:02
You may be able to slightly beat sqrt() by extracting the floating point exponent (with e.g. ilogb(), which gives you a fast approximation to the base-2 log — Tavian Barnes, Jul 12 '23 at 18:59
sqrt() will unlikely hold you back, see this simple JS animation with 720k sqrt() calls per frame: https://stackoverflow.com/a/58485681/7916438 - and also see the other one, which is slow as hell because of the change in drawing, not the calculation. If you have performance issues, don't rule out that they're somewhere else. — tevemadar, Jul 13 '23 at 12:51
A method that works for any plot is to produce a lookup table of precomputed values. You can compute the plot a single spiral for small increments of the input parameter, and interpolate between them. — coredump, Jul 13 '23 at 21:26

score 26 · Accepted Answer · answered Jul 11 '23 at 16:08

26

This question is better suited for a different stack exchange.. However, I believe fast sqrt cpu instructions are on a lot of chips. For example this one https://stackoverflow.com/questions/7724061/how-slow-how-many-cycles-is-calculating-a-square-root looks like you need 20 cpu cycles for a square root. For $10,000$ objects on screen querying square roots $250$hz would use 5% of 1GHz cycles. And surely you can do better than this, so I really don't agree about your statement "Square root is a relatively slow algorithm for a computer to calculate for each point" unless you are talking many hundreds of thousands of square root calls every frame at 250+fps, in which case, you would already be a specialist developer and you likely wouldn't be posting this question here.

answered Jul 11 '23 at 16:08

Snared

1,068
4
17

I could manage a 5% load risk into any of my existing applications, by the way, so it really goes to show you are pushing your stack far too close to its limit if you can't handle a 5%cpu sqrt load at all times – Snared Jul 11 '23 at 16:10
and if you need time sensitivity AND more square roots than that, I have absolutely no idea what you are doing lol, maybe some sort of expensive geometry calculations, but likely sqrt would be the least of your concerns at that point – Snared Jul 11 '23 at 16:12
2

I'm choosing this one as my accepted answer, because it does seem like sqrt() is the fastest of the options, though big thank you to @abiessu for the comment with some additional options for equations. – meboyhe Jul 11 '23 at 21:55
2

With recent CPUs sqrt is even faster. From Skylake on, sqrt is 18/12 (double, single precision) cycles in latency and 12/6 in throughout. This is very similar to division, but still much more than addition and multiplication. – benjamin-lieser Jul 12 '23 at 13:54

score 21 · Answer 2 · answered Jul 11 '23 at 06:02

21

Depending on how low level the programming language your working with implementing the fast inverse square root algorithm originally developed for Quake 3 could be a solution. Through bit-manipulation and newtons method it finds $\frac{1}{\sqrt{x}}$ then you could take the reciprocal of that value. This would be very fast as the limiting factor is the single division at the end. You would have to be careful about values very near zero however.

answered Jul 11 '23 at 06:02

Aidan R.S.

1,042

Theres also this post that I found but the highest up-voted answer is also my answer: link – Aidan R.S. Jul 11 '23 at 06:55
14

Given $y=1/\sqrt x$, instead of taking the reciprocal $1/y$, you could multiply $y\cdot x$. – mr_e_man Jul 11 '23 at 14:22
2

This algorithm is designed for a different job though, accuracy in a finite interval. I don't think it has the "decreasing derivative" property that OP is looking for. – Federico Poloni Jul 11 '23 at 14:35
1

The algorithm was not developed for Quake (as the wiki page you linked explains), that's just where it's best known from. – llama Jul 11 '23 at 20:03
20

It also explains that it's slower than using the standard square root function on modern instruction sets. – llama Jul 11 '23 at 20:12

score 8 · Answer 3 · answered Jul 11 '23 at 21:01

I'm going to assume that your computer uses IEEE 754 double-precision to store numbers. Write a function for converting between a floating-point number and its bit representation. In Python, you can do:

import struct
def get_bits_from_double(x):
    return struct.unpack('=q', struct.pack('=d', x))[0]
def get_double_from_bits(n):
    return struct.unpack('=d', struct.pack('=q', n))[0]

Of course, if you want fast calculations, you'll probably be using C instead of an interpreted language like Python. In C, you can use a union to store two numbers at the same memory location.

#include <stdint.h>
typedef union
{
    uint64_t  bits;
    double    value;
} DOUBLE_un;
inline uint64_t get_bits_from_double(double x)
{
    DOUBLE_un un;
    un.value = x;
    return un.bits;
}
inline double get_double_from_bits(uint64_4 bits)
{
    DOUBLE_un un;
    un.bits = bits;
    return un.value;
}

Or something like that; I don't have a C compiler handy, so I haven't actually tested it.

Now, recall that a double is stored as a three-part bitfield, in the following order:

sign (1 bit, 0 = positive / 1 = negative)
exponent (11 bits)
significand (52 bits)

Since the square root of a negative number isn't real, I'm going to assume that you're only going to pass positive numbers to the function. Thus, the sign bit will always be zero, and the exponent field will dominate the number. So the bit pattern is like an approximate logarithm. We want to approximate a square root, so let's see what happens when we cut the bit pattern in half.

double halve_bits(double x)
{
    uint64_t bits = get_bits_from_double(x)
    bits = bits / 2;
    return get_double_from_bits(bits)
}

If you evaluate the expression halve_bits(x) / sqrt(x) for various numbers, you'll get a very tiny number on the order of $10^{-154}$. This is because we neglected to count for the bias in the exponent field. We could adjust the numbers by multiplying them by 1e154. But you wanted a fast function, so let's try to apply strength reduction by replacing a floating-point multiplication by an integer addition. Work on the bit pattern directly. And of course, write in C for speed.

double approx_sqrt(double x)
{
    uint64_t bits = get_bits_from_double(x)
    bits = bits / 2 + 2303426388484757850;
    return get_double_from_bits(bits)
}

The magic number on the penultimate line is the bit representation of the number $1.0914553763271334 \times 10^{-154}$. YMMV depending on exactly how you're calculating the error. But AFAICT, this function has a maximum relative error of 3.5%. If you need a more accurate square root, you can use the output of approx_sqrt as the initial guess for an iterative algorithm like Newton's method. But since you explicitly want a rough approximation, this will be fine.

So there you go: A fast approximate square root algorithm done entirely with integer math instead of floating-point. But you'll want to do some timing tests to make sure it's actually faster than the standard sqrt function.

Instead of the union trick, you might as well just cast the pointer to the value to an integer pointer type: *(uint64_t*)&x. It's shorted and just as much UB as the union trick. It's UB because what happens depends on the CPU (internal representation of the double type, alignment, etc.). If both types are the same size on your target architecture, then you don't need to worry about alignment differences. — Cris Luengo, Jul 12 '23 at 20:35
@CrisLuengo Under ANSI C90, "type punning" (the term for reinterpreting the bytes of one data type as a different data type) via pointer casts is undefined behaviour (§6.3/G.2) and should not be done at all. Type punning via union access is implementation defined (§6.3.2.3/G.3.9) which means it is valid, behaviour may vary depending on toolchain and platform, but must be documented and supported. — detly, Jul 13 '23 at 01:53
@CrisLuengo I think C++ has the reinterpret_cast<T>() mechanism as the preferred way for doing this? I don't use C++ much so I can't remember. — detly, Jul 13 '23 at 03:25
In Rust, the equivalent functions are the to_bits and from_bits functions, which are stable and well-defined: https://doc.rust-lang.org/stable/std/primitive.f64.html#method.to_bits — izzyg, Jul 13 '23 at 21:13

score 6 · Answer 4 · answered Jul 11 '23 at 06:45

6

If your goal is to only place objects on the Fermat spiral, you could use the suggestion given by @abiessu in the comment above and use $\theta = t^2$. Then $r = \sqrt \theta = \sqrt{t^2} = \pm t$ and $x = \theta = t^2$.

You can vary the parameter $t$ and get $(r,\theta)$ to place the objects. This only requires squaring and avoids the square root altogther. The tradeoff is that you can only place fewer objects.

answered Jul 11 '23 at 06:45

vvg

3,526

I don't thing this one works, because at high $t$, $\theta$ is just gonna spin around the circle at radius $r$, but not put more dots on the circle at that radius. It may help for pseudorandomizing the spawn positions, but doesn't help with the density. – meboyhe Jul 11 '23 at 20:44
2

@meboyhe you might in turn be able to run $t$ slower as it increases, so that $\theta$ doesn't increase too quickly; something like changing the $t$ increment by $\frac 1t$ perhaps. – abiessu Jul 12 '23 at 05:23

score 6 · Answer 5 · answered Jul 11 '23 at 07:08

6

Since it is time critical, you obviously calculate many square roots. If the arguments are similar, then you can take one step of the Newton iteration with the previous square root as the starting value. So given $\sqrt { x - \epsilon}$ you estimate that $\sqrt x$ is the same as a starting value and do one step of the Newton iteration.

answered Jul 11 '23 at 07:08

gnasher729

10,611
20
38

1

Do you mean you first tabulate a certain number of values, then... ? – Jean Marie Jul 11 '23 at 09:37
1

@JeanMarie No, the idea is that if you're placing things on a spiral the successive angles are likely to be similar and, if so, their sqrts are close together. – Anton Sherwood Jul 11 '23 at 20:41
2

I reeeeealy like this, from a pure math standpoint, even if after profiling, sqrt() is faster – meboyhe Jul 11 '23 at 20:46
1

You can also store the value for 1 / sqrt (x), where newton iteration is slightly faster, because it can be done without divisions, and sqrt (x) is just x * (1 / sqrt x). – gnasher729 Jul 11 '23 at 23:07
1

Another function that can be done very very efficiently: Any function of the form f(x) = a * sin (tx) + b cos (t*x) for equidistant values x can be calculated as a linear combination of the two previous values with the right constants, depending on t and the distance of the values. That lets you calculate sine / cosine wave with any fixed frequency. – gnasher729 Jul 11 '23 at 23:13

score 5 · Answer 6 · answered Jul 12 '23 at 04:08

To expand on the answer @Aidan R.S. gave, if you want to generate a list of sequential square roots you can do it iteratively, and save time by using the inverse square root algorithm. By defining $y(x)=\frac{1}{x}$ and using the expansion $y(x+\varepsilon)\approx y(x)-\frac{1}{2}\varepsilon y(x)^3 +\frac{3}{4}\varepsilon^2 y(x)^5$, then substituting $\varepsilon=1$ and using $\sqrt{x}=x\frac{1}{\sqrt{x}}$, we can generate a sequence of square roots without using any division or square roots:

N=10000
y=2
h=y**(-0.5)
y_list=[1,y]
x_list=[1,y*h]
for i in range(N):
    y+=1
    h=h-0.5h3+0.75h*5
    x=hy
x_list.append(x)
y_list.append(y)


plt.plot(y_list,x_list)
plt.plot(y_list,[y**0.5 for y in y_list])

This is more than accurate enough for visual purposes

Is there an equation similar to square root, but faster for a computer to compute?

6 Answers6