1

For a cloud storage upload client I want to encrypt files during upload, so that they cannot be read by the cloud service provider or anyone who gains access to the account. I would like to use some standard format, instead of rolling my own crypto, but I couldn't find anything that does what I want.

The main problem is that I want to split large files and upload them in parallel to maximize bandwidth. If uploading of a part fails, I might have to go back and upload the same part again. Therefore, I need an encryption scheme which allows encrypting only some parts within a file. Also, a full authentication pass before uploading should be avoided.

After a lot of reading in the past two days I came up with the following scheme: Split the file into fixed sized (few KB) blocks. Calculate the SHA-1 hash of each individual block and append it to the plaintext. Encrypt the plaintext and the following hash using AES in CBC mode with a random IV for every block. This is similar to what RFC 4880 specifies for symmetric encryption, just with blocks. This provides authentication for every individual block. To protect the file as a whole, an additional last block with a concatenation of all block hashes is processed using the same technique.

From the RFC 4880 document it's already clear that this only provides modest security against modifications. So I'm thinking about further improvements. The use of a simple hash function could be replaced with a HMAC. From this document it's clear, that authenticate-then-encrypt is not generically secure. However, for CBC mode this doesn't apply, so I would like to keep authenticate-then-encrypt, since it allows easy checking of the encryption key, without decrypting the whole file. I'm also not very happy about, the last block which repeats all the previous hashes. Maybe a HMAC off all previous hashes would suffice? I hope someone can shed some light on the practical security of this technique.

I also stumbled across GCM, which seems exactly what I want. However, I understand that there is a limit of ~68GB, which can be encrypted for a given IV/key pair. Is it safe to approach this limit, or should one stay far below? Also, I didn't find any good Python library which actually allows parallel encryption using GCM.

sebi707
  • 187
  • 6

2 Answers2

0

To answer your ideas of:

  • "I use hash-then-encrypt" with CBC + SHA-1
  • "I use MAC-then-encrypt" with CBC + HMAC
  • "I won't use GCM because of unclear usage guidance"

First, hash-then-encrypt is a really bad idea. You really should avoid it if possible. Details are explained in this related question on Crypto.SE.

Second, MAC-then-encrypt is a concept you also should avoid. Not only do generic compositional security proofs don't generally hold, but it also allows for padding oracle attacks in some circumstances, so encrypt-then-MAC is better and there's no significant advantage to use authenticate-then-encrypt over the more generically secue encrypt-then-authenticate. Also see our canonical question on this subject.

Third : GCM. You indeed only should encrypt than less than 60GiB of data under a given key-nonce pair. However, you could assign (ascending) nonces to your data chunks (of a few kBs each) and authenticated encrypt them each using a different nonce. This way you stay way below the 60GB limit and you generate your nonces deterministically (using counter incrementation) which will also quite reliably protect you against nonce re-use based on nonce collisions. Upon retrieval of the data you could just check then whether all chunks are valid and whether a chunk is missing (because there's a gap in the nonces) and maybe you'd want to encode the number of chunks into each chunk to allow proper verification.

SEJPM
  • 46,697
  • 9
  • 103
  • 214
0

You can use CCM or EAX mode. EAX mode is not standardized by NIST, but it uses AES + AES-CMAC as underlying primitive, implementing a secure authentication mode. You could use AES-CMAC (or EAX mode without any ciphertext and just Additional Authenticated Data) to calculate an authentication tag over the authentication tags of the blocks. Advantage of EAX mode is that it doesn't really have the tricky requirements that GCM has.

GCM may be faster though if implemented correctly. CCM is also an option (packet based encryption, and you define packets, kind of). CCM is however also not that easy to use (actually, I'd go a bit further than that and call it a major PITA).

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323