How to prevent accidental nonce reuse with AEAD cipher

Question

I'm trying to correctly implement ChaCha20-Poly1305 in my node.js app (...crypto is hard). I understand that the cipher requires a unique key/nonce pair for each message or the security breaks down.

The nonce doesn't have to be random or unpredictable, so a simple counter is sufficient. My problem is that I can't figure out how to implement an efficient counter that will work with multiple instances of my app running simultaneously on the same machine and guarantee that no nonce will ever be reused.

Basically, I'm wondering how to avoid this footgun.

Some ideas I had:

Use a timestamp. Apparently not a great idea.
Use redis INCR. Complicates development environment (I now need redis installed on every computer I develop on), and I'm still left without strong guarantees that I won't get a duplicate value in case of an unexpected shutdown (even with Redis AOF).
Write to a file or database table every time a nonce is created. This should work, but the performance would not be great, and I'm probably still stuck in the event of an unexpected shutdown.
Use a UUIDv1 or v4 type thing (from the this comment). Initially v1 seemed perfect (guaranteed uniqueness), but it's tied to the system MAC address, so when I spin up multiple copies of the app, I'm probably killing any type of guarantee. Then there's v4, which should be good, but because the nonce is only 96 bits, I would need to truncate the ID and increase my collision probability (same goes for v1 actually). If I went this route, I was looking at nanoid.
Regularly rotate keys. I realize that this needs to be done anyway, but I want to minimize how often it needs to happen.
Something like Twitter's Snowflake. flake-idgen seems well suited for this (guaranteed uniqueness, partitioned by machine/worker or instance id, doesn't require coordination beyond determining instance ids, generates short enough ids).
Use XChaCha20-Poly1305. It's recommended in the libsodium docs specifically for this reason (the nonce is big enough that you can just throw random values at it and not worry). But, it's not a standard (yet?), and it would require bringing libsodium in for crypto instead of relying on node's built in OpenSSL wrapper. This complicates development (dealing with native bindings) and/or undermines security (constant time issues running crypto in js/WebAssembly...way over my pay grade).

Am I missing something obvious? How do people usually solve this kind of problem? Right now, I'm liking flake-idgen, but it doesn't seem like it's gotten a lot of scrutiny, so it seems possible that there are corner cases that could cause a duplicate id (leap seconds, leap year, accidental server clock reset, malicious NTP server, etc.).

Update: Ok, so the consensus seems to be to just use a random nonce and not worry too much about it. For my scale, I think that will be fine.

I found some issues in testing flake-idgen, so I don't think I would actually use that (although it still looks like a cool project).

My alternative plan would be to write a counter implementation that generates a 64bit BigInt counter (support added in Node 12) + a 16 bit instance ID + maybe 16 random bits for fun. I would persist the counter to disk after every X increments so that I'm not constantly writing to the disk. In the event of an unexpected server shutdown, the system would read from the disk file and add X + 1 to the current counter to ensure no-collisions.

Unsolved problems with this implementation: how to handle wrap around? what happens if the server is reset and there is no disk file to read from?

Thanks for your help on this, and if someone wants to post "just use random" as an answer, I'd be happy to accept it.

Squeamish Ossifrage · Answer 1 · 2019-05-24T13:39:03.277

If you are having a sequential conversation, like TLS where each record is sent in order before sending the next record, you should use message sequence numbers for the nonce. These will never overflow in ChaCha even if you managed to send one message per nanosecond for four centuries. As a bonus, you can reject replays and force the attacker to keep up with your conversation, and you can immediately detect incorrect implementations that use nonces insecurely, e.g. because they randomized it or rolled it back.
If, for some reason, you are using a single key for (say) thousands of conversations in parallel, you can carve up the nonce into smaller pieces and assign to each conversation a unique (say) 16-bit number and then use 48-bit message sequence numbers for the rest, which is still a lot of messages, or 32-bit/32-bit, etc.
But if you're not restricted to using the same key—and it seems unlikely that you are, since you get to dictate how the key is used—then you can derive subkeys from a master key with a key derivation function, KDF, such as HKDF-SHA256, in what is sometimes called a cascade. In fact, this is exactly how XSalsa20 and XChaCha work, using Salsa20/ChaCha itself as a KDF! (Note: KDF does not just mean password hash; KDFs can be extremely cheap to compute—no costlier than generating a single block of output with ChaCha, for instance!)
- For example, if you have a master key $k$ for your application, and many uniquely labelled instances of it, you can use $k_i := \operatorname{HKDF-Expand}_k(L_i)$, where $L_i$ is a unique label called the ‘info’ for the $i^{\mathit{th}}$ instance or purpose. If you don't have HKDF logic handy, but you do have HMAC, it's easy to implement HKDF in terms of HMAC.
  
  In a pinch, if you need a randomized system, you could just choose a sufficiently long info label at random, say 256 bits, and there will be no danger of collision (unless there's a VM rollback, etc.). You could also include a time stamp, counter, favorite color, process id, phase of the moon, or anything else you want in the HKDF info string, as long as you can reason yourself into confidence that it will be unique.
Caveats: In some cryptosystems like AES, there's a high cost to changing keys, and for 128-bit keys, subkey derivation is safe only for a modest number of keys, which is part of why I advise against AES and particularly against AES-128. But there is zero key setup cost in Salsa20 and ChaCha, and the danger of collision in 256-bit keys is negligible no matter how many subkeys you derive within humanity's capacity for computation.
You may also have the option of generating and storing additional keys, possibly encrypted with a master key, or with a master key pair to separate concerns, or using a deterministic authenticated cipher. But this is getting a bit far afield of the original already broad question.

(Technical details ahead, not necessary for the principal conclusion above.)

A note on XSalsa20 and XChaCha: If you have a library that computes Salsa20 or ChaCha for arbitrary nonce and block counter, you can easily compute XSalsa20 or XChaCha!

XSalsa20 and XChaCha are defined as follows: for 128-bit $n$ and 64-bit $n'$, the stream $\operatorname{XSalsa20}_k(n \mathbin\| n')$ with 192-bit nonce $n \mathbin\| n'$ is the stream $\operatorname{Salsa20}_{k'}(n')$ where $k' = \operatorname{HSalsa20}_k(n)$, and $\operatorname{HSalsa20}_k(n)$ is defined as a function of the Salsa20 core which is also used to generate the Salsa20 stream.

Specifically, the Salsa20 core $\operatorname{Salsa20}(k, i, \sigma)$ combines a 256-bit/8-word key $k = (k_0, k_1, \dots, k_7)$, a 128-bit/4-word input $i = (i_0, i_1, i_2, i_3)$ (which, for the stream cipher, is composed of a nonce and a block counter), and a 128-bit/4-word constant $\sigma = (\sigma_0, \sigma_1, \sigma_2, \sigma_3)$ into a matrix $$x = \begin{pmatrix} \sigma_0 & k_0 & k_1 & k_2 \\ k_3 & \sigma_1 & i_0 & i_1 \\ i_2 & i_3 & \sigma_2 & k_4 \\ k_5 & k_6 & k_7 & \sigma_3 \end{pmatrix},$$ and returns the 512-bit/16-word string $$\operatorname{Salsa20}(k, i, \sigma) = \pi(x) + x$$ where $\pi$ is a permutation. HSalsa20, in contrast, computes $$x' = \pi(x) + x - \begin{pmatrix} \sigma_0 & 0 & 0 & 0 \\ 0 & \sigma_1 & i_0 & i_1 \\ i_2 & i_3 & \sigma_2 & 0 \\ 0 & 0 & 0 & \sigma_3 \end{pmatrix},$$ and returns only the 256-bit/8-word string $(x'_{00}, x'_{11}, x'_{12}, x'_{13}, x'_{20}, x'_{21}, x'_{22}, x'_{33})$. (The indexing is a little different for ChaCha, as is the permutation $\pi$, but that's all.) Effectively, this is simply returning half the words of just the permutation $\pi(x)$, skipping the additions that Salsa20 does at the end, but the difference between those words of HSalsa20 and those words of Salsa20 depends only on the public nonce $n$ and the public constant $\sigma$ so that breaking HSalsa20 can't be easier than breaking Salsa20.

What this means is if you have a way to generate a single block of output with the Salsa20 or ChaCha stream cipher using a prescribed nonce and block counter, then you can compute HSalsa20 or HChaCha by just subtracting off the nonce/counter and constants to derive an XSalsa20/XChaCha subkey, and from there use your Salsa20/ChaCha code to compute the rest of XSalsa20/XChaCha.

score 3 · Accepted Answer · edited May 24 '19 at 10:14

Hard to provide a definitive answer without more information about your application (which would render the question off-topic for Crypto SE). But one relevant technique that many systems use is a hierarchy with two types of key:

Short-term data encryption keys (a.k.a. DEKs), which are generated randomly in narrow scopes (e.g., individual executions of your application) and used to encrypt the actual data;
A long-term key encryption key (a.k.a. KEK or master key), that's used to encrypt DEKs.

The idea then is along with each message's ciphertext you include a copy or reference to the encrypted DEK that was used to encrypt the message. A reader, to decrypt the message, first decrypts its DEK and then the ciphertext. This is often called envelope encryption, with the metaphor that each ciphertext is enclosed in an "envelope" that includes its encrypted DEK.

Envelope encryption is commonly coupled with a key management system (KMS), a hardened service that "owns" the master key and performs encryption/decryption operations with it on behalf of authenticated and authorized clients. The idea is that your application instances don't actually get a copy of the master key; instead they must authenticate to the KMS and request it to perform master key operations on their behalf. (Of course you need to think carefully about the performance implications of this—you ideally want a small ratio of KMS encryption/decryption requests to volume of data encrypted/decrypted with the DEKs.)

One reason for the popularity of this architecture is there are cloud services and software products that implement the KMS functionality:

The way this ties into your question about nonce management is that if you have this infrastructure (which you might independently want), the nonce management problem is much simplified because in any context where you're about to encrypt but you can't guarantee you can produce an unique nonce, you can just spawn a new DEK. For example, each instance of your application might just do so at startup, and keep a counter that's guaranteed to be unique within the process.

The cloud vendors have useful conceptual documentation on this if you're interested:

score 0 · Answer 3 · answered May 23 '19 at 20:17

Imo, the timestamp idea gets worse rep than it deserves. It doesn't take 64 bits. 42 bits of timestamp give you 120+ years worth of milliseconds. That leaves you with quite some bits for UUID/randomness/counter of your choosing.

That being said, there is still the point of tampering with the timestamp and depending on your scope and overall setup this can be a serious issue. But there are issues with PRNG/counters as well (remember the WPA2 Krack attack?). Personally, I like my nonces with a little timestamp on the side.

How to prevent accidental nonce reuse with AEAD cipher

3 Answers3

Linked