17

I'm trying to figure out the best way to generate a cryptographically secure random number between 0 and 200 (inclusive) from a cryptographically secure random string of bytes (ie. read from /dev/urandom or some such).

I could do random[0] % 201 but then numbers 0-54 would be more likely than 55-200 which would make the resulting number more predictable than it should be.

How can I use the random octet string to create a number in a range 0...k-1 (where k is 201)?

neubert
  • 2,969
  • 1
  • 29
  • 58

5 Answers5

20

The method given in this other answer is correct: to choose a uniformly random integer in range $[0,k-1]$ given a string of uniformly random integers in range $[0,n-1]$ with $1<k≤n$, get one integer $x$ from the string until $x<⌊n/k⌋⋅k$, then output $y=x\bmod k$. If we have $k≤n<2k$ (as in the question where $k=201$, $n=256$), that simplifies to: get one integer $x$ from the string until $x<k$, then output $x$.

However, that's not the best method, as asked: it consumes more from the input string than strictly needed. Otherwise said, when the input string is of limited length, odds that the algorithm fails are higher than needed. If there are $m$ values in the input string, odds of failure are $(1-⌊n/k⌋⋅k/n)^m$. For $k=201$, $n=256$, $m=3$, that's nearly $1\%$. It can get much worse: for $k=129$, $n=256$, $m=3$, odds of failure are over $12\%$.

We can modify the algorithm to improve on that; and also remove the $k≤n$ requirement.

  1. Set $r←0$, $s←1$.
  2. Get one new $x$ from the random string, set $r←r⋅n+x$ and $s←s⋅n$.
  3. If $r≥⌊s/k⌋⋅k$, set $r←r-⌊s/k⌋\cdot k$ and $s←s-⌊s/k⌋⋅k$, and proceed to step 2.
  4. Output $y=r\bmod k$, and stop.

Sketch of proof of correctness: by induction, before step 2 and before step 3, $r$ is a uniformly random integer with $0≤r<s$.

The integers $r$ and $s$ never exceed $(k-1)⋅n$, and thus the algorithm does not require arbitrary-precision arithmetic. It minimize odds of failure for a given input string of length $m$, to the bare minimum possible for exactly uniform output: $(n^m\bmod k)/n^m$; that is $≈0.0009\%$ (rather than $≈1\%$) for $k=201$, $n=256$, $m=3$; or $≈0.0007\%$ (rather than $≈12\%$) for $k=129$, $n=256$, $m=3$.

The algorithm works if the input string contains bits, or dice rolls with any number of faces, including variable (just change $n$ at step 2 according to the $x$ extracted from the input string).

If we need to generate more than one output $y$, there are several options:

  • The easiest is to start over; but that's clearly sub-optimal, especially when $n≫k$.
  • We can change "stop" at step 4 to: "set $r←⌊r/k⌋$, set $s←⌊s/k⌋$ (then, if desired, change $k$ for the next output); and proceed to step 3". The resulting algorithm still produce uniformly distributed output (the recurrence in the proof sketch holds), drastically reduce the consumption from the input string when $n≫k$, but I'm uncertain about optimality is not optimal (that's blatant for $k=2$, $n=3$).
  • If we know the number $j$ of desired outputs $y$ in advance, we can set $\hat k=k^j$, generate one $\hat y$ uniform in $[0,\hat k-1]$ using the optimal algorithm, then split $\hat y$ into $j$ outputs by expressing $\hat y$ as a $j$-digit number in base $k$. That's back to optimal, but is not a sequential algorithm, and requires arbitrary precision arithmetic when $j$ grows, with $O(j)$ extra memory.
  • We can generate any number of outputs using several batches of $j$ as in the previous tweak, but using the suboptimal sequential algorithm described above. That allows using bounded extra memory, and becomes close to optimality as $\hat k=k^j$ grows. However this is not a sequential algorithm.

I now lean towards belief that any optimum algorithm is bound to require $O(\log m)$ extra memory in some worst case. However it is possible to make a sequential algorithm using bounded extra memory, much less than above, and closer to optimality. I plan to detail such an algorithm (if someone else finds a reference, or otherwise has a suitable algorithm, please tell us!).

fgrieu
  • 149,326
  • 13
  • 324
  • 622
10

A trivial way to achieve this is to use the number is it's smaller than 201 and discard it (move on to the the next byte) if it's not.

The derived numbers will be truly random provided that the bytes are:

Since the sequence is random, every possible value of every number in the original sequence has probability $\frac{1}{256}$. Thus, at any given point, the probability that the next byte will be $k\leq200$ is also $p_0=\frac{1}{256}$.

However, the probability that the byte will be discarded is $\frac{55}{256}$. Because of this and the fact that the numbers are independent, the probability that the second byte provides us with the number $k$ is $p_1=\frac{55}{256}\cdot\frac{1}{256}$.

Continuing this reasoning, the probability that the first $j$ bytes will be discarded and that the next byte is $k$ is $p_j=\left(\frac{55}{256}\right)\cdot\frac{1}{256}$. Thus, the probability of the next derived number being 42 is $$p=p_0 + p_1 + p_2 + \cdots = \sum_{n=0}^{+\infty}\left(\frac{55}{256}\right)^n\cdot\frac{1}{256}.$$

This is a geometric series, and the probability is $$p = \frac{1}{1-\frac{55}{256}}\cdot\frac{1}{256} = \frac{1}{\frac{201}{256}}\cdot\frac{1}{256} = \frac{256}{201}\cdot\frac{1}{256} = \frac{1}{201}.$$

Since this holds for every of the 201 possible values of $k$, the derived sequence is random.

In general, if your RNG outputs numbers $x\in\{0,\cdots,n - 1\}$ and you need to derive a random sequence of numbers $y\in\{0,\cdots,k - 1\}$, you can take $y = x\bmod k$ if $x < \left\lfloor\frac{n}{k}\right\rfloor\cdot k$ and discard $x$ otherwise.

Since $\left\lfloor\frac{n}{k}\right\rfloor\cdot k$ is an entire multiple of $k$, there is no bias.

Dennis
  • 2,141
  • 16
  • 21
6

The answer by fgrieu is excellent as always. However, I just wanted to document an alternative approach.

If bias is small enough it is practically the same as there was no bias at all. For this reason, some standards like NIST's public key standard FIPS 186-4: Digital Signature Standard allow random number generation, where some extra octets are generated in order to reduce (but not completely eliminate) the bias. According to that document, formula B.1.2, for generation of keys to be used in DSA algorithm (common public key standard), it is thought that bias of less than $1 / 2^{64}$ is acceptable.

In other words, instead of generating just eight (8) bits, generate at least 76, and then use modulo operation. The bias will be small enough that in many purposes it cannot be seen (at least if the amount of generated random numbers is not very larger).

The benefit of accepting some bias is that the number of input bits needed is deterministic and therefore, it is possible to avoid worst cases for amount of random bits needed before output can be produced.

So in then end you could do:

random[0...8] % 201

user4982
  • 5,379
  • 21
  • 33
4

Various tested methods to convert bit/octet strings into random numbers are proposed in NIST SP 800-90: Recommendation for Random Number Generation Using Deterministic Random Bit Generators. They are defined in Appendix A.5:

  • A.5.1, The Simple Discard Method: generate values of the minimum of $m$ bits until you get one that is smaller than $r$ - the exclusive maximum value (i.e. the method described by Dennis in the answer here);
  • A.5.2, The Complex Discard Method: a complex method of generate a larger amount of random numbers that has a smaller chance of having to regenerate random bits.
  • A.5.3, The Simple Modular Method: generate at least 64 bits more than the minimum number of bits $m$, then return that value divided by $r$ (some skew will be present; you can also make it unbiased but non-deterministic by combining it with the simple discard method).

Obviously you can add / subtract a constant value $x$ if you want a range of $[x..n + x)$ instead of $[0..n)$.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
-4

Try to use bitmasks. For example - M is your desired high limit, so use r ?< M/n as a bit mask constructing your number. And after all XOR it with one extra number from RNG.

Alexey Vesnin
  • 226
  • 5
  • 8