Is it dangerous to encrypt lots of small files with the same key?

Question

I want to encrypt 1000s of old files on old hard drives before uploading them to the cloud. I'm planning to use AES-GCM with only one or a few keys. (I'm using Apple's CryptoKit library which provides this algorithm and a few others.) It would be convenient to encrypt and store the files individually, so I could download and decrypt them individually. Is this bad practice? I'm wondering if having many small examples of ciphertext could give an attacker an advantage.

A bit of background on my goals: I travel a lot and work with a laptop. Where some people would use an external drive, I want to use the cloud because 1) I don't want to be carrying around external drives, and 2) I want online storage in case of theft or damage to my devices.

I said "one or a few keys" because I don't want to store (in Keychain) or remember tons of keys. But if there is another way to generate multiple keys from a password or master key - appropriate to this situation - then I'd like to know about that.

I should break this into two sub-questions about two similar situations:

Situation 1: Lots of individual files, but no names or directory structures visible.

So an attacker sees a drive full of stuff like:

file0000001.enc             20 KB
file0000002.enc            123 KB
file0000003.enc         600 bytes
file0000004.enc             1.1MB
...

Situation 2: The directory structure, or maybe even file names, are preserved. So if something standard like an OS config directory were encrypted, the attacker would know some or all of the plaintext for some example files.

/personal-files/foo.txt   500K
...
/etc/http/httpd.conf       21K    -- a common file, could guess some or all of the plaintext
...

So, are either of these a problem? Or are they safe with AES-GCM (or a similar recommended algorithm)?

score 3 · Accepted Answer · edited Oct 07 '21 at 07:59

I'm assuming that the OP wants to access all files individually and updates or rollback to old versions, too.

The Cryptokit has only AES-GCM and ChaCha20-Poly1305.

Using the same key for all of the files is a dangerous path. Both AES-GCM and ChaCha20-Poly1305 are vulnerable in the case of (key,IV) reuse. For example,

if you encrypt the file $f$ with AES-GCM with the key $k$ and $IV_f$ to get $$(ciphertext,tag) = AES\text{-}GCM(k,IV_f,f)$$ and the file stored as $$IV_f \mathbin\|ciphertext\mathbin\|tag.$$ Where the IV's are unique for each file, otherwise you already in the pit of $(Key,IV)$ pair reuse.

Now, later you access the file and updated it. Used the same $k$ and $IV_f$. Now you gave an observer, not necessarily an active attacker, two files encrypted under the same $(key,IV)$ pair, the old version of the file and the updated). Now the confidentiality is lost or even more, an active attacker can make forgeries, too.

What are the options?

Continue to use the same key with a new IV.

You need to keep track of all of the used IVs and determine a new unique IV for every new file and whenever a file update. The better is;
Use a new key and new IV for each file and updates.

Generate a new random key and IV for a new file or update of a file to a fresh encrypt. How?

Get a master key $K_m$ and store it with a password manager like Keypass.
- Using the file name and the update date and time of the file in the $$HKDF(K_{m}, fileName \|update)$$ can be a solution. However, there are traps here, too. The filename can be changed or you move the file into another location, the update information changed and you lost all!
- The better one is using random information for each file like a 256-bit random bit as salt and use it together with HKDF. The HKDF can be used to derive multiple keys and even the IV. First, Extract a pseudorandom key (PRK) from the Initial Key Material (IKM) as $K_m$
  
  $$PRK = \text{HKDF-Extract}(salt, IKM)$$
  
  Then use the Expand with the additional info to arrive the Output Key Material (OKM) with the desired length $L$ and prefer AES-256.
  
  $$OKM = \text{KDF-Expand}(PRK, info, L)$$
  
  call it two times
  
  $$File_{key} = \text{KDF-Expand}(PRK, \text{"KEY"}, 256)$$
  
  $$File_{IV} = \text{KDF-Expand}(PRK, \text{"InitialValue"}, 96)$$ to derive the key and IV.
  
  The IV size is 96 since GCM's recommended IV size is 96. During the update of the file, get a new random salt and use the HKDF again. The random salt can be generated /dev/urandom/ where available or in IOS SecRandomCopyBytes.
  
  Store the file as;

$$Salt \mathbin\|IV \mathbin\|ciphertext\mathbin\|tag.$$

HKDF: HMAC-based Extract-and-Expand Key Derivation Function (HKDF) is defined in and proposed by H. Krawczyk and P. Eronen in May 2010 as RFC5969.

The age library

What we write above for developing a tool/library yourself. This has lots of pitfalls, too. Just consider the OpenSSL's evolution over time then it becomes almost unmanageable.

SAI-peregrinus made a comment about the age library. I've overlooked the library and saw that it is an active library and the developers are aware of the recent incoming attacks/problems and apply patches about them.

So instead of building your own library consider one that is well maintained and fit your needs.

Is it dangerous to encrypt lots of small files with the same key?

1 Answers1

Linked