12

To show some colleague programmers exactly how broken C's rand() is (at least on Windows) I decided to break it. So everyone knows the exact parameters, MSVC's implementation is as follows:

uint32_t state;

int rand() {
    state = state * 214013 + 2531011;
    return (state >> 16) & 0x7fff;
}

Because the top bit never affects anything but the next top bit in the next state and is never returned you could see this as the LCG $X_{n+1} \equiv (214013X_n + 2531011) \mod 2^{31}$ where at each iteration we only get to see the top 15 bits.

The problem is, I have not found any references that attack fixed LCG's with partial output.

Of course it requires only 2-3 calls of the rand() function to enable a bruteforce attack with at most $2^{16}$ operations on the secret part of the state, but that's hardly a break. Something similar would not be feasible with larger constants and a larger modulus.

So far I've been unable to find a break that shows this generator to be severely broken. I'm interested in results on similar LCG's that can be transformed to an attack on this one.

P.S.: If an attack exists that could be done on a graphical calculator without any programming, it increases how convincingly I can show the generator to be broken. But this is just bonus points.

orlp
  • 4,355
  • 21
  • 31

3 Answers3

19

Notations:

  • $v=u\bmod m$ means $m$ divides $u-v$ and $0\le v<m$, including if $u<0$.
  • All variables are non-negative integers (except for the above).
  • $a=214013$, $b=2531011$ are the LCG parameters.
  • $X_n$ is the 31-bit state with $X_{n+1}=(a\cdot X_n+b)\bmod 2^{31}$.
  • $R_n=\lfloor X_n/2^{16}\rfloor$ is the 15-bit output.
  • $S_n=X_n-2^{16}\cdot R_n$ is the hidden 16-bit portion of the state.

Assume we have the first few outputs $R_0$, $R_1$.. and want to find $X_0$.

We derive:
$$\big((2^{16}\cdot R_1+S_1)\bmod 2^{31}\big)=\big((a\cdot (2^{16}\cdot R_0+S_0)+b)\bmod 2^{31}\big)$$ $$2^{16}\cdot R_1-a\cdot2^{16}R_0-b+S_1\equiv a\cdot S_0\pmod{2^{31}}$$ Using that $S_1<2^{16}$ and defining $r=2^{16}-1-S_1<2^{16}$, that becomes:
$$2^{16}\cdot R_1-a\cdot2^{16}\cdot R_0-b+2^{16}-1\equiv a\cdot S_0+r\pmod{2^{31}}$$ Defining $K$ as the quotient of the Euclidean division of $a\cdot S_0+r$ by $2^{31}$, it comes:
$$\big((2^{16}\cdot R_1-a\cdot2^{16}\cdot R_0-b+2^{16}-1)\bmod2^{31}\big)+2^{31}\cdot K=a\cdot S_0+r$$ Using that $r<2^{16}\le a$, we find that $S_0$ must be the quotient of the Euclidean division of the new left side of the equation by $a$, and $r$ the reminder; we derive:
$$S_0=\Big\lfloor{{\big((2^{16}\cdot R_1-a\cdot2^{16}\cdot R_0-b+2^{16}-1)\bmod 2^{31}\big)+2^{31}\cdot K}\over a}\Big\rfloor$$


Therefore the following outputs the possible values for $X_0$ given the first two outputs $R_0$ and $R_1$, by trying all possible values of $K$:

  • Compute $T=(2^{16}\cdot R_1-a\cdot2^{16}\cdot R_0-b+2^{16}-1)\bmod 2^{31}$
    Notice that $T+2^{31}\cdot K=a\cdot S_0+r$ with $S_0<2^{16}$ and $r<2^{16}$
  • For $K$ from $0$ to $\big\lfloor\big((2^{16}-1)\cdot(a+1)-T\big)/2^{31}\big\rfloor$
    • If $\big((T+2^{31}\cdot K)\bmod a\big)<2^{16}$
      • Output $\big\lfloor(T+2^{31}\cdot K)/a\big\rfloor+2^{16}\cdot R_0$

The quantity $(T+2^{31}\cdot K)\bmod a$ tested in the loop is $r$, hence the test's bound. $\big\lfloor(T+2^{31}\cdot K)/a\big\rfloor$ is $S_0$. What's output is a candidate $X_0$, since $S_n=X_n-2^{16}\cdot R_n$. It is easy to filter these candidates using $R_2$.

This works for any $a\ge2^{16}$. With the parameters at hand, the loop is performed at most $7$ times, with typically $2\pm1$ outputs. I can run the algorithm with pen and paper, and some would be able to perform it mentally.

The method works regardless of the bit size $x$ of $X$, an output $R$ with $x/2$ high bits (give or take a constant), and $a$ big enough that the attack works: the cost is $O(x^2\cdot a/2^{x/2})$ using standard arithmetic algorithms. I think we could avoid most values of $K$ that do not pass the test, bringing the cost to $O(x^2)$ even for huge $a$ (and quoting Bruce Schneier attributing that saying to the NSA: "Attacks always get better; they never get worse").

When $a$ is too small to carry that attack, a trivial attack applies, recovering the state $X_0$ from most to least significant bits, at a rate of about $\log_2a$ additional bits for each additional $R_n$.

Improvement: if $a$ is small or otherwise unfavorable, we can use $R_j$ rather than $R_1$, replacing $a$ with $a_j=a^j\bmod2^{31}$ for some $j>1$ giving a favorable $a_j$ (a small maximum $K$), and replacing $b$ with $b_j=b\cdot(1+a+\dots+a^{j-1})\bmod2^{31}$. That allows to efficiently tackle much larger modulus!


Generally speaking, Linear Congruential Generators are very poor from a cryptographic standpoint, even when some post-processing is applied; see this answer for an example.

More often than not, Linear Congruential Generators are not even suitable for simulation purposes. To demonstrate this to the incredulous, generate a million values using the LCG in the question; count how many times each of the 32768 values is reached; and either compute the standard deviation of that, or graph the counts sorted by increasing values; repeat with a sound RNG, and compare.

fgrieu
  • 149,326
  • 13
  • 324
  • 622
5

You already answered your own question: given the output from 2 calls to the rand() function and $2^{16}$ steps of computation, you can recover the internal state completely, simply by brute-forcing the parts of the state that you are not aware of. That is a break.

You say it's "hardly a break", but that is too-narrow thinking. A break is a break; anything that suffices to predict future outputs of the generator is a break. It does break Window's rand() implementation.

Maybe you are wondering whether some other generator that looks sorta like Window's rand(), but is not Window's rand(), would be breakable. Well, that's a different question. The answer to that will depend upon what specifically you have in mind -- but regardless, at that point we're not talking about Window's rand() any longer. The break you've got is already good enough to show that Window's rand() should not be used for cryptographic purposes. It's plenty adequate to show how broken Window's rand() is.

If this doesn't count as "good enough" of an attack, then probably your question isn't really a question about cryptography, but something else -- because for cryptographic purposes, this is a valid break.


That said, there is a slight improvement on the attack. I believe you can build a meet-in-the-middle attack that needs about $2 \times 2^8$ steps of computation and 2 outputs from rand(), and that infers its internal state.

Here's the idea. Suppose you observe the top 16 bits of $X_0$. Now write $X_0 = c + 2^8 d + e$, where $0 \le d,e < 2^8$ and $c$ is a known constant. Notice that this means

$$X_1 = (214013 c + 2531011) + 2^8 \cdot 214013 d + 214013 e.$$

The first part ($214013 c + 2531011$) is a known constant, so we have

$$X_1 = \text{known} + \text{known} \times d + \text{known} \times e.$$

or in other words,

$$X_1 - \text{known} - \text{known} \times d = \text{known} \times e.$$

Now we're set up to do a meet-in-the-middle attack. We know the top 16 bits of $X_1$. We can enumerate all $2^8$ possibilities for $e$, calculate the resulting top 16 bits of the right-hand side, and store them in a table. We can then enumerate all $2^8$ possibilities for $d$, calculate the resulting top 16 bits of the left-hand side, and look it up in the table to check for a match. When we guess them right, we should find a match.

D.W.
  • 36,982
  • 13
  • 107
  • 196
3

Related to "show some colleague programmers"

You could code and use a smaller 8-bit rng to show them a "mini version" of C's rand. The period will be shorter and therefore more obvious… you can print the complete period (256 bytes) on screen. Seeding the rng in +1 steps and printing the next 256 bytes will show them they're actually only rotating a sequence of 256 predicable bytes. After that, it shouldn't be a huge step to convince them that bigger numbers don't fix the predictability problem. Also, the 8-bit version provides a playground to build upon towards attacking MSVC's implementation.

Related to "references that attack fixed LCG"

There is a paper that might be interesting for you to take a look at:

It includes a C code example which allows the calculation of the seed used in an LCG.

Related to "graphical calculator"

Well, it is probably not what you were thinking of as it doesn't attack any LCG, but there is a piece of software that allows to experiment with the parameters of an LCG and which visualizes the subtle correlations between the pseudo-random numbers successively created by an LCG.

screenshot of the software

You can grab either the English or the German version at
http://www.vias.org/simulations/simusoft_lincong.html

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240