10

I have to encrypt big files. Say their size ranges from 500mb to several of gigabytes.

I would like to use AES/GCM/NoPadding as provided by Java 1.8 since that gives me automatic authentication and encryption.

I would like to use the handy Cipher Input/Output Stream cause I can chain it with GZip I/O Streams to compress the data a little before encryption.

However I was reading that the implementation of Java appends the authentication tag at the end of the stream. That means that for a long file, if I were using CipherInputStream to decrypt it, it wont be able to tell whether the contents have been tampered or not until it reaches the end of the stream, right?

If that were the case, wouldn't it be problematic to actually use that operation mode for what I'm trying to accomplish becuase since the file can't be decrypted in memory it will have to be decrypted somewhere in the filesystem until the failure is detected, leaving some time for an attacker to see the plain text?

Is this a potential thread and a real concern that I should be worrying about or there's something about the algorithm / cipher streams that I'm missing and prevents that from ocurring?

alejo
  • 127
  • 1
  • 2
  • 7

3 Answers3

12

There is nothing in the GCM cipher that prevents it's use it in streaming mode. You should however not use the resulting plaintext during decryption for anything that requires security before you have verified the authentication tag.

The authentication tag is not to prevent you from decrypting the ciphertext. It is there to provide for integrity and authenticity. You should never decrypt where an attacker can see the plaintext. If possible, you should even try and make it hard for an attacker to perform side channel analysis.

Note that GCM is bounded to encrypting about 68 GB ($2^{39} - 256$ in bits) of data for a single IV. The amount of invocations is $2^{32}$ but you should be advised to stay well away from those limits. Note that repeating the IV for two separate encryption invocations is a catastrophic event for GCM.


CipherInputStream in general is horrible. I would suggest to reprogram it using Cipher and memory mapped files and ByteBuffer itself. The Java implementation (where the tag is automatically put at the end) and CipherInputStream make for this horrible buffering mess.

I'm rewriting the Bouncy Castle implementation and I see a code & complexity reduction of about 30% when I separate the tag from the decryption, plus it enables to decrypt each byte separately. In other words it restores the online properties of the underlying CTR cipher.

With Java 8 however you may want to stick to the Java 8 implementation as GCM may be sped up using intrinsics (for the server VM on the latest Intel processors). Note that according to archie below this functionality is not yet present.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
1

This is a common problem which most software doesn't handle, surprisingly. Yes, decrypting to the filesystem is problematic as you guessed.

A better approach would be to split the file in "packets" and encrypt/authenticate each packet separately. However you will have to take care of many details (e.g. you can write only the first IV and implicitly increment it for the next packets; you will need to write the tag for each packet; and you will need to somehow mark the last packet using an e.g. additional data field in order to prevent an attacker from extending or truncating the ciphertext; there may be other steps I'm not aware of). Sadly I don't know any common/standard way to do this in order to guide implementors.

Conrado
  • 6,614
  • 1
  • 30
  • 45
0

Very late to answer this but smoke solution for this is just use cipher.final. I tested this and it is very fast 1GB I was able to decrypt in just 30 Sec

Cipher cipher = Cipher.getInstance(ENCRYPT_ALGO); cipher.init(Cipher.DECRYPT_MODE, this.secretKey, new GCMParameterSpec(TAG_LENGTH_BIT, iv)); byte[] plainText = cipher.doFinal(fileIn.readAllBytes());

Express
  • 1
  • 1