0

I need an algorithm for "key generator". It should be initialized with some secret data, such as a (hashed) passphrase, or something generated once by some 3rd-party code.

Given some small input it should be able to generate N bits in a deterministic way, which'd be used as a material for some key. The input consists of a needed key index, and, perhaps, some other flags and parameters.

Naturally the generated keys should not reveal other keys or the generator secret.

What is the preferred way to do this? Is it ok to just hash all the input? I mean:

K = H(input | secret)

whereas H - is a common hash function (such as SHA-256)?

To be concrete, I need a sort of a DRBG, whereas:

  • I don't really care about preimage attacks and etc. It's not intended for message authentication. I believe there's no need in HMAC.
  • The generator must be immutable, i.e. should not mutate after each generation (like auto-hashing).
  • It should work in O(1) wrt the key index. In simple words, generating k'th key should be no harder than generating the first one.
Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
valdo
  • 359
  • 2
  • 10

2 Answers2

2

I need an algorithm for "key generator". It should be initialized with some secret data, such as a (hashed) passphrase, or something generated once by some 3rd-party code.

This is called a Key Derivation Function or KDF. As your question indicates a pre-established secret with enough randomness / entropy a Key Based Key Derivation Function is what is required. KDF's that are used over passwords (that generally require strengthening) are called Password Based Key Derivation Functions or PBKDF's or password hashes.

Given some small input it should be able to generate N bits in a deterministic way, which'd be used as a material for some key. The input consists of a needed key index, and, perhaps, some other flags and parameters.

The small input is called the input key material (IKM or $K_I$), the N is generally called $L$ for the length of the output key material (OKM or $K_O$).

This input is generally called Info, OtherInfo or, for NIST, FixedInfo or Context depending on the standard used. There are a lot of impromptu KDF's so the terminology can change from one protocol / standard to another.

Naturally the generated keys should not reveal other keys or the generator secret.

That's the general idea of KDF's. They are commonly build from one way functions such as hashes or MAC's.

What is the preferred way to do this? Is it ok to just hash all the input?

It is important to be careful with regards to the input. One thing to make sure of, even if a standardized function is used, is that the input data cannot overlap, i.e. there should be a canonical representation of the Info data and the key.

For instance, if variable $x$ can consist of "A" or "AB" and variable $y$ can consist of "BC" or just "C" then two different combinations of $x$ and $y$ will be able to produce "ABC" if you just concatenate the two variables.

Furthermore, if an attacker can influence the input then you may also need to avoid length extension attacks or even attacks on the hash function. Otherwise, yes, it is fine to just hash the Info - or input as you call it.

To be concrete, I need a sort of a DRBG, whereas:

  • I don't really care about preimage attacks and etc. It's not intended for message authentication. I believe there's no need in HMAC.
  • The generator must be immutable, i.e. should not mutate after each generation (like auto-hashing).
  • It should work in O(1) wrt the key index. In simple words, generating k'th key should be no harder than generating the first one.

You don't want a DRBG. DRBG's are defined to generate pseudo random bits using seed information. However, many implementations will be seeded by the system, and may be reseeded at specific times. They also may not return the same random data if you ask it for, say two times 16 bytes or one time 32 bytes. They generally do not have a good means to supply a other info. If you are unlucky you may generate keys that cannot be recalculated and you lose all data; see for instance here.

KBKDF's do fulfill all of these requirements - except that quite a few are based on HMAC as it is immune to some of the attacks such as those that rely on collisions and length extension techniques.

Available KDF's

Currently one of the most advances KBKDF's is called HKDF. It is based on HMAC with a configurable hash function underneath. It consists of two functions: HKDF-extract and HKDF-expand, where the first one extracts (or compresses) the entropy in the input data to a fixed length pseudo-random key. HKDF-expand then uses this fixed length key to derive one or more keys of any length, using the info (counter etc.) provided. It is also NIST approved.

A more simple hash based KDF is called KDF1 (or MGF1 in the simple form with just the counter), KDF2 or KDF3 (which differ only minimally). It is defined in ISO-18033-2, IEEE Std 1363-2000 with similar functions in ANSI X9.62, which are unfortunately all payware. It uses a hash function, a counter and possibly other Input to derive keys in a pre-defined way (although creating the "Info" needs to be canonical, as already explained). I asked a question about it here and described the most simple form.

An alternative for KDF1 called KDM has been defined in NIST SP 800-56C Rev 1 which can be used both with a hash or MAC; it includes options for SHA3 and KMAC. Basically it is a re-ordered KDF1 which could be well suited for your purpose and has a publicly available document describing it. NIST SP 800-56A has some interesting information on what to put in FixedInfo (a renamed OtherInfo).

Otherwise any of the key derivation functions of NIST SP 800-108 which is comprehensively called "Recommendation for Key Derivation Using Pseudorandom Functions". It uses MAC's and includes some alternatives that are based on block ciphers instead of hashes.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
1

DRBG (Deterministic Random Bit Generator) and PRNG (Pseudo-Random Number Generator) implies it's not immutable. You want a pseudo random function.

  • Use HMAC if you have an implementation available. If you combine multiple values or use a single variable length input then make sure that no two unequal inputs are encoded to the same hash input. (ie. Don't encode $(x, y)$ as concatenation unless x is constant length.) Your key must be secret.
  • Sha-3 finalists and related hashes can be used with secret keys without HMAC. Other than that they're the same as say, HMAC-SHA-512.
  • SipHash is an acceptable PRF to use with caveats in mind but I don't recommend it over HMAC for your purposes. It also must use a secret key.
  • ChaCha and Salsa generate outputs in blocks and are random access. I still recommend HMAC, again, because HMAC will probably be easier to use.
  • If instead of keys you want to generate one time passwords there is HOTP and TOTP. I think TOTP is less likely to be implemented erroneously by people. (Hopefully implementations use a hard-coded choice of T0, T1, and the hash algorithm.)

Conclusion: Use HMAC

Edit: Remember that you need to use a secret key with a sufficient amount of entropy. If it's too low it can be brute-forced and all keys past, present, future are compromised. One can check if the guessed key is correct by looking at just one generated key. Or discovering if a generated key successfully decrypts data. These are problems with any construct that meets your requirements, but it doesn't make anything weaker. See this post for the explanation.

Future Security
  • 3,381
  • 1
  • 10
  • 26