2

I recently answered no to Is there a floating point CSPRNG? Unpredictable rounding errors, especially across dissimilar hardware were my reasoning.

In it's most basic form, a CSPRNG can be constructed by securely hashing a randomly initiated counter. This is what happens inside the secure random function for Java. So if we can find a hash that operates with floating point numbers, we'd have a floating point CSPRNG as originally asked for.

The following abbreviated abstract is from this paper:-

This paper shows how one of these systems can hash messages at extremely high speed—much more quickly than previous systems at the same security level—using IEEE floating-point arithmetic.

My problem is that I can't understand the working. It's all Greek to me, but I can spot multipliers of 0.98 , 2.2 and 1.01 which are clearly floating point numbers.

Was I wrong, and floating point CSPRNGs are actually possible if not common?

Clarification: Some comments are appearing that are only relevant to a non deterministic RNG or generating floating point output. The original question asked for a native floating point deterministic generator. So internally it should only work with floating point numbers, as does (I think) my linked paper. It's clarification of this last fact regarding the linked paper that I'm looking for as that would pave the way for a native floating point CSPRNG.

Addendum

I've now come across Uni64() which is a random number generator that uses double precision variables. It's by George Marsaglia. This is an extract of the initialisation:-

const double r=9007199254740881.0/9007199254740992.;
const double d=362436069876.0/9007199254740992.0;
static double c=0.; static int i=97,j=33; double x;

Without doubt, r,d and c are double precision variables. So for example, r = 999.999999999987676524e-3 and d = 40.2384869730987304592e-6. To be honest, I don't know what this means. Is r meant to be 1 an I've rounded, or it this correct? Not sure what d could be rounded to.

"the uni64() doubles have been multiplied by 2^53"

Is this the secret? Rounding /accumulation errors do occur as expected. But by shifting left 53 times, they effectively become part of the randomness in the lower significance bits. What seems like random errors in precision become part of the randomness. And if so, this would appear to be a floating point random number generator...

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83

2 Answers2

3

Floating-point arithmetic is just rounded arithmetic on a simple integer-based representation of a subset of real numbers. Floating-point arithmetic is not unpredictable—on the contrary, it is extremely well-standardized in IEEE 754 (paywall-free), unlike, e.g., signed integer arithmetic which sometimes traps or wraps or behaves unpredictably in cases of overflow, shift of negatives, etc., depending on hardware, programming language, compiler, and operating system.

Every computer you are likely to encounter supports IEEE 754 arithmetic, and most of them will by default behave in exactly the same way (32-bit x86 with x87 FPU excepted, though configuring it to do IEEE 754 binary64 arithmetic is easy), giving bit-for-bit identical answers, for the most common arithmetic operations: addition, subtraction, multiplication, division, and a few others. Floating-point arithmetic is typically very fast, is vectorized by many CPUs, and even usually turns out to be constant-time on finite normal numbers.

Specifically, a binary64 (or double-precision) floating-point finite normal number is a real number of the form $$2^e\cdot(1 + f/2^{52})$$ where $f$ is an integer in $\{0,1,2,\dots,2^{52} - 1\}$ and $e$ is an integer with $-1021 \leq e \leq 1023$. There are a few other binary64 floating-point numbers, $\pm\infty$ and subnormals, and there are also floating-point values that are not numbers, namely NaN values, but they're not likely to matter for cryptographic applications.

The basic arithmetic operations are defined so that for any floating-point numbers $x$ and $y$, $x \oplus y = \operatorname{round}(x + y)$, where $\oplus$ is floating-point addition and $\operatorname{round}$ yields the floating-point number nearest to the real number $x + y$, or the floating-point number with least-significant significand digit zero in the case of a tie. (Yes, you can configure other rounding modes, but by default this is the one you will get.)

There are subsets of floating-point numbers on which $x \oplus y = \operatorname{round}(x + y) = x + y$. For example, the set of integers in $[-2^{52}, 2^{52}]$ have this property. The hash127 paper you cited uses this property to reliably compute integer arithmetic modulo the large prime $2^{127} - 1$ using floating-point arithmetic on an eight-digit representation $t_7 2^{112} + t_6 2^{96} + \cdots + t_1 2^{16} + t_0$, where each term $t_i 2^{16i}$ is represented by a floating-point number.

Floating-point rounding is reliable enough that you can use it to mask off the top or bottom bits of an integer, e.g. to get the limbs $t_i$ in the above form. For example, let $x = \text{0x896da38f}$, or 2305663887 in decimal. Suppose you want to compute $y = x \mathbin\& \text{0xffff0000} = \text{0x896d0000}$, i.e. mask off the bottom sixteen bits of $x$, using binary64 arithmetic, which has 53 bits of precision. You can add a sufficiently large number that the bottom sixteen bits get rounded away, and then subtract it: $(x \oplus 3 \cdot 2^{16 + 53 - 2}) \ominus 3 \cdot 2^{16 + 53 - 2}$ gives $2305622016$ in decimal, or $\text{0x896d0000} = y$ as intended. Here $3 \cdot 2^{16 + 53 - 2}$ is a binary64 floating-point number already, so it can be represented exactly, and $\oplus$ and $\ominus$ are floating-point addition and subtraction in the default rounding mode.

However, while floating-point arithmetic can be reliably exploited for fast constant-time vectorized integer arithmetic, what floating-point arithmetic does not provide is bitwise operations or Galois field arithmetic. For example, there's no easy fast way with floating-point arithmetic to compute $f + g$ for degree<=32 polynomials $f, g \in (\mathbb{Z}/2\mathbb{Z})[x]$ or two elements $f, g \in \operatorname{GF}(2^{32})$—also known as 32-bit xor.

So while you can do integer arithmetic and mask off chunks of an integer at a time with floating-point arithmetic, manipulating individual bits is tricky, and manipulating them in parallel as vectors is even costlier, whereas integer units in CPUs do it nearly effortlessly. Thus it would be hard to compute, say, the Keccak or Salsa20 permutations fast with floating-point arithmetic—Keccak probably even harder than Salsa20, because at least Salsa20 has some 32-bit integer addition that could take advantage of a floating-point adder.

Could you use floating-point arithmetic to build a CSPRNG? Sure—you could even implement Keccak or Salsa20 using floating-point arithmetic, and you might even conceivably want to do that in some contexts such as Python in which integer arithmetic is notoriously leaky due partly to the small integer cache (if for some reason you couldn't just write a C extension).

Would it beat speed records for existing CSPRNGs at a comparable standard security level? Probably not, and any cryptanalysis of it would no doubt be based on the bit-string-to-bit-string functions just like any integer- or bit-based CSPRNG rather than any numerical analysis.

(I'd guess the floating-point representation would probably just make the analysis more irritating, so it would get less scrutiny.)

Note that the ‘hash’ of hash127 does not mean fixed preimage-resistant hash function like MD5 or SHA3-256. Rather, like Poly1305, hash127 is a universal hash family (paywall-free; Wikipedia)—specifically, hash127 a family of functions $H_r$ indexed by $r \in \mathbb{Z}/(2^{127} - 1)\mathbb{Z}$ with difference probability $\Pr[H_r(m_0) - H_r(m_1) = \delta]$ over uniform random $r$ bounded by $\ell/2^{128}$ for every difference $\delta$ and all messages $m_0$ and $m_1$ up to $\ell$ bytes long.

forest
  • 15,626
  • 2
  • 49
  • 103
Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
2

I think this is a question where the ambiguity of common terms like "CSPRNG" gets in the way. The problem is that when talking about "CSPRNGs" in different contexts, people variously refer to either:

  1. Deterministic algorithms that must produce the same results when given the same inputs. Example: stream ciphers, where Alice and Bob must produce the same keystream if they are to communicate successfully.
  2. Nondeterministic algorithms where there is no such requirement, and ideally could be replaced by a true random generator. Example: operating system random generators, where the practical solution has been to compose a deterministic pseudorandom generator with a physical noise source, to yield output that is both nondeterministic and pseudorandom.

Unpredictable rounding errors are a problem for #1, but perhaps not so for #2. And if the latter is so, then floating point numbers are just bit-strings and floating point operations are just bit-string operations; the question about building a counter-mode "CSPRNG" is then whether there are there ways of composing these operations into a scrambling function that yields something resistant to cryptanalysts' best efforts.

Luis Casillas
  • 14,703
  • 2
  • 33
  • 53