6

First of all, I don't recommend doing this. This was something I created when I didn't know better and didn't have a solution available to me.

Long ago I created my own entropy gather for a cryptography project. I used a window and had the user type with the keyboard while moving and clicking the mouse. I kept the milliseconds of the timestamp on each message, as well as the X, Y of the mouse and the values of the keystrokes.

Basically in designing the method I examined the messages and only kept the values that changed significantly (unpredictably) from message to message. For example the date, hour and minute of the message was discarded since all the messages were collected on at close to the same time.

I then created an algorithm to "stir" the pool. This just used the built in simple random number generator to shuffle the values around a user defined number of times. After the fact I wondered if this was a good idea or not.

After that I would extract numbers from the pool (removing them not to be used again) each time I needed a random value as a seed. I would warn the user if the pool got to small and allow them to add more entropy. I would stir the pool before each use and after gathering.

I know that today there are cryptographically secure random number generators built into most operating systems and development libraries, so I would discourage anyone from rolling their own solution today, but I am curious if the methodology I described was effective and if there was anything I did or didn't do that could improve it.

Jim McKeeth
  • 931
  • 1
  • 8
  • 16

2 Answers2

11

You should not remove any part of the pool, or do some more-or-less random selection out of it. Instead, just hash the whole thing with SHA-256. This will get you all the entropy there is to get out of the data, up to 256 bits, which is more than enough.

Once you got 256 bits of entropy, i.e. you accumulated physical measures which should amount, together, to at least that many bits of entropy, and then you hashed all of it, into a single 256-bit value, then you can use that 256-bit value in a good PRNG, which will produce as many random bits that you could wish for (i.e. bits computationally indistinguishable from true alea, which is the best you can practically hope for or actually need).

Now that's exactly what is happening behind /dev/urandom, CryptGenRandom(), java.util.SecureRandom, or whatever name your OS gives to its strong random number generator.

Thomas Pornin
  • 88,324
  • 16
  • 246
  • 315
5

The high level architecture is alright. You implement a sponge-like structure that absorbes randomness from any source, and when you need randomness back, you squeeze it out. (Note: there actually is something called a cryptographic sponge, used in some new hash functions, which I am not referring to although it is related).

There are a few improvements you can make. You don't need filter for entropy first. If you update the random pool with a well-designed update function, you can update with any source. If the source isn't actually random, it won't make the pool any less random. But if it is random, then the pool randomness will increase. Similarly, an update should touch every bit of the pool negating the need to stir it.

These are just high-level points, the real details are in the exact primitives you use and how you use them. A very nice model for an entropy pool is given in the paper “A model and Architecture for Pseudo-Random Generation and Applications to /dev/random” (PDF). It is relatively simple yet provably secure.

It uses a PRG and an extractor. An extractor is sort of like a hash, often mis-implemented as a hash, but could be a block cipher in CBC-MAC mode, for example, with a non-secret but randomly-generated key. Importantly, it can take an arbitrary-sized input and reduce it down to a fixed-size (m bits in this case). If the input has at least 2m-bits of min-entropy over the whole string, it will condense it to m bits that are indistinguishable from a truly random m bits.

Call the pool $p_i$ at time $i$ and say you want to add newly harvested randomness $r$. The function is:

$p_{i+1}=\mathsf{PRG}(p_i \oplus \mathsf{Ext}(r))$

In this case the PRG is taking m-bits and returning m-bits (its not actually expanding the size). Now to get a random value $x$ out of the pool, you do:

$\langle x,p_{i+1} \rangle=\mathsf{PRG}(p_i)$

In this case the PRG is taking m bits and returning 2m bits. The first m bits are used as your value $x$ and the second m bits are the new value for the pool.

cHiMp
  • 389
  • 3
  • 11
PulpSpy
  • 8,767
  • 2
  • 31
  • 46