5

For a project using RC4, the output is to be mapped to values of 0 to 35. Thus, only 36 unique output values representing the letters A to Z and 0 to 9.

To avoid bias, the RC4 byte output (with a range of 0 to 255) is discarded if the value is greater than 251. The output is then mod 36, which then maps the output into the range of 0 to 35.

Does this effectively reduce the strength of RC4 by taking this mapping step, or does the security remain unchanged? How does one measure the amount of decrease in the security? Does it relate to the loss of the values 252, 253, 254 and 255? Or, would it also be a factor of doing the mod 36 step because of reducing the possible output values to 1/7th of the original.

Thanks!

Deskguy
  • 51
  • 1

2 Answers2

6

By discarding values 252 to 255, you effectively avoid introducing any new bias; the generic method is expressed in many places, e.g. this article (page 3). To generate random values between $0$ and $d-1$ (inclusive) from a PRNG which produces bit, you do the following:

  1. Choose an integer $r$ such that $2^r \geq d$.
  2. Obtain a $r$-bit word $x$ from the PRNG.
  3. Compute $t = \lfloor \frac{x}{d} \rfloor$.
  4. If $d+td \gt 2^r$, go back to step 2.
  5. Output $x - td$.

Your proposal is equivalent to the algorithm above with $d = 36$ and $r = 8$ (you work with 8-bit words, colloquially known as "bytes").

There are trade-offs about the choice of $r$, which depend on what the hardware and software can do. On an 8-bit CPU with very little power, using bytes is certainly efficient, but involves either doing a division (which will be expensive in CPU time) or using a lookup table with 256 entries (so 256 bytes of ROM -- probably tolerable, since RC4 itself requires 256 bytes of RAM, and RAM is much more expensive than ROM). Speaking of which, RC4 is not necessarily the best choice, performance wise: there are other stream ciphers worth considering.

Note that even though your method does not introduce any bias, you still have the shortcomings of RC4, namely some detectable biases in the raw output (nothing lethal in practice, but enough to give the willies to academic cryptographers), and the absence of any IV, which means that you need a new key for every generated stream (if your messages are consecutive and serialized, you can consider them all as parts of a single big stream, but each new session / reboot MUST imply a new key; otherwise that's the infamous "two-times pad").

36 is for letters and digits; for general messaging you might want to add a few signs for spaces and basic punctuation (at least a word separator).

Thomas Pornin
  • 88,324
  • 16
  • 246
  • 315
3

On the first glance, this base 36 key stream looks at least as secure as RC4 itself - you are simply discarding some of the output, and not introducing any bias.

Note that there are some general weaknesses with in the start of the output of RC4, which means that it is normally recommended to discard the first 1000 or so bytes after initialization (I have to look the details up, but right I'm on a quite slow computer). And of course, never reuse one key for more than one message stream.

Also, the discarded bytes might be traceable with a side-channel attack (timing and/or power traces), if your micro-controller lives in a hostile environment.

Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119