14

I have implemented a simple encryption/decryption program based on AES-256 in CBC mode1.

Actually, it is more precise to describe it as a compression+encryption / decryption+decompression program.

If one provides the wrong key to the decryption+decompression function, the decompression stage will explicitly fail, as expected, since the decrypted content (being based on the wrong key) will be just noise.

I would like to modify the decryption phase of the scheme so that it detects on its own that the wrong key has been used, before it proceeds to the decompression phase. I am looking for a scheme to support this functionality that does not weaken the overall cryptographic strength of the framework.

A naive approach would be to generate the encrypted content as2

AES256_CBC(key, iv, SENTINEL_STRING + plaintext)

where SENTINEL_STRING is a string that the decryption phase can know in advance. While I am being naive about it, I could make the SENTINEL_STRING be equal to the key, for example.

I imagine there are pretty standard ways to solve this problem. In fact, for all I know, the design of AES-256-CBC already has built into it a way to check that decryption succeeded.

I hope someone can enlighten me on these matters.

Also, if this problem is common enough to have a generally accepted name (suitable as search engine fodder) please let me know.


1 For what it's worth, my current implementation of this uses Python's pycrypto module, but an earlier implementation used Perl's Crypto::CBC package. Both versions can reciprocally decrypt+decompress files compressed+encrypted by the other. I bother mentioning all this to stress the fact that this question is primarily about AES-256-CBC in general, and not about any specific implementation of it.

2 I hope my ad-hoc notation here is not too stupid. I mean it as a shorthand for "encrypt the string SENTINEL_STRING + plaintext with AES-256 in CBC mode, using the key key and initialization vector iv".

kjo
  • 329
  • 1
  • 2
  • 7

5 Answers5

34

You should use an authenticated encryption mode. There are several reasons for that, but one (relatively minor) one is that it automatically gives you the ability to detect incorrect keys, since the authentication will fail.

If you insist on using a traditional non-authenticated encryption mode, or if you'd like to have some way of distinguishing "incorrect key" from "corrupted ciphertext", you can include a key check value together with the ciphertext, and verify it before attempting decryption. There are several possible ways to implement one.

A traditional method is to encrypt an all-zero block using the raw block cipher (i.e. "ECB mode") and use the resulting ciphertext as the key check value, but note that this makes it possible for an attacker to tell whether two messages have been encrypted using the same key by comparing the key check values. Alternatively, assuming that you're using an authenticated encryption mode anyway, you could just generate the key check value by encrypting an empty message using the same mode. Assuming that you're using a unique nonce / IV for each encryption, as you should be, this should eliminate the information leak by making every key check value also unique.


(BTW, note that compressing data before encryption can leak (extra) information about the plaintext. Basically, this happens because all general-purpose encryption schemes necessarily leak information about the length of the plaintext, and compression makes the plaintext length depend on its content. Padding reduces this leak slightly, but does not eliminate it. This has been used in actual attacks.)

Ilmari Karonen
  • 46,700
  • 5
  • 112
  • 189
3

Ilmari Karonen already mentioned using an authenticated encryption mode, which would solve the concerns, but should you not go that way, please note the flaw in your premise:

If one provides the wrong key to the decryption+decompression function, the decompression stage will explicitly fail, as expected, since the decrypted content (being based on the wrong key) will be just noise.

Pretty much any wrong key or change in the ciphertext will decrypt to something which is not valid for the decompression algorithm you used, and that it will detect (and hopefully your program will detect that error). But that isn't guaranteed.

A trivial solution would be to include an HMAC/hash of the plaintext/compressed content, which would allow you to validate that you decrypted the right content.

This is even more important taking into account that your use case is long-term storage. I wouldn't be surprised if it was possible to truncate one such file at the right boundary that it was unnoticed not only by the AES-CBC but also by the underlying compression function. Just adding a key-check value won't detect that issue.

Since you are aiming for storage, it probably won't matter for you, but note that if this was used in eg. an online protocol, detecting decompression errors would provide an oracle to an attacker that sent your decrypter multiple corrupted blocks, based on the difference of time it needed for processing it.

Ángel
  • 496
  • 1
  • 4
  • 7
3

This question specifically asks about AES-256-CBC, so this answer shows how to determine programmatically whether or not the correct key is provided for decrypting ciphertext generated by AES-256-CBC. It turns out that with a little knowledge of the padding used during encryption, this is possible to do, by focusing on the last block of cipher text. openssl can be used to do the heavy lifting.

To make things a little more interesting, meet Paul. Paul used an encryption program to encrypt his bitcoin address information (including his private key!), using AES-256-CBC. The program uses a very simple (and very weak) key derivation function to derive a key and an iv from a password provided by the user, based on just a single round of SHA384 hashing of the password. The first 256 bits of the SHA384 output is the key, the trailing 128 bits of the SHA384 output is the iv. The encryption program uses PKCS#7 padding. Paul has the file containing the ciphertext, but he doesn’t remember the password that he used to encrypt the plaintext. However, he thinks the password was a date in the form mmddyyyy, because he creates all of his passwords this way.

Paul runs the ciphertext file through xxd to see the underlying ciphertext bytes:

xxd -c 16 bitcoin.enc 

This produces:

00000000: cb2e 9d66 38c4 8dd7 344b 04cd d4ab 7023  ...f8...4K....p#
00000010: b5ff ae4c 6a76 388c 5c80 2e56 12b3 b482  ...Ljv8.\..V....
00000020: 2442 ae3e 29a7 9f17 3bb3 95fc bfac bec8  $B.>)...;.......
00000030: 79ad d118 dac9 685b 1e49 74b6 9b9c 2d16  y.....h[.It...-.
00000040: cef9 faf1 17e0 7829 d5eb c966 bdb6 6500  ......x)...f..e.
00000050: 40b2 b89f d1b0 1b96 2107 2b79 9e9e 2b56  @.......!.+y..+V
00000060: 3dd8 6294 09c6 6637 fbe8 268c db64 d9a0  =.b...f7..&..d..
00000070: 38a4 2700 1e2f 724c c015 c778 2413 274e  8.'../rL...x$.'N
00000080: 2a3a 38da 2b0c 0d83 45c5 72dd 70bc f52d  *:8.+...E.r.p..-
00000090: fb4a 19be fce9 99e6 2079 ffb7 61f3 0740  .J...... y..a..@
000000a0: 3fef aca0 2602 a51d 0652 d4f7 3a8f 6068  ?...&....R..:.`h
000000b0: b37d ef35 e35f 455d 1cc6 c7d2 a33e 1e3d  .}.5._E].....>.=
000000c0: 4633 73f4 44fb 4ae3 4e3a 3972 7b5f 3f50  F3s.D.J.N:9r{_?P
000000d0: b1c5 05b2 912d 6971 0a12 2646 9afa b6ec  .....-iq..&F....
000000e0: c1a1 9216 67ba 4922 8408 8cfc 7642 79c1  ....g.I"....vBy.
000000f0: 02ea 6450 44e2 898d f486 1ce3 182d b475  ..dPD........-.u
00000100: 617a d397 a264 d850 a1e2 2bae e0d5 ad98  az...d.P..+.....
00000110: 6c7e e875 db83 59d3 141f 0791 5a26 af27  l~.u..Y.....Z&.'
00000120: 3c83 e455 47ba e1f8 66fa bb65 32a6 ddca  <..UG...f..e2...
00000130: d564 1b9a 7d9b 7e3f 1e22 a399 f573 a7ef  .d..}.~?."...s..
00000140: 4645 160c cbe6 4bfb e0d8 cb18 c0f4 7a73  FE....K.......zs
00000150: 60cf 5e5c 03ff 6365 1c61 11d7 db01 c79e  `.^\..ce.a......
00000160: c109 e9c6 7298 67d1 7a2a cb83 98e4 e1e8  ....r.g.z*......
00000170: ec86 1ea7 c5dd d520 a9c8 e213 71ec a2a0  ....... ....q...
00000180: 3b23 64d1 d04a 35c8 081b bc6f deac bd86  ;#d..J5....o....
00000190: 5307 f7af ffa3 798f 386e 7c6c 144c 6a9c  S.....y.8n|l.Lj.

The output above is formatted with 16 bytes in each row, so that each row of 16 bytes represents one block of ciphertext.

See this diagram at Wikipedia, which shows how AES-CBC chaining works. For the decryption process - to produce each block of plaintext, the ciphertext for that block is needed, as well as the ciphertext from the previous block. For the first block, there is no previous block of ciphertext, so the iv is used instead.

Now, in Paul’s case, consider the inputs to the last block of the decryption process. The ciphertext for the last block is 5307f7afffa3798f386e7c6c144c6a9c, and the ciphertext for the previous block is 3b2364d1d04a35c8081bbc6fdeacbd86. This is equivalent to decrypting one block of ciphertext 5307f7afffa3798f386e7c6c144c6a9c, using an iv of 3b2364d1d04a35c8081bbc6fdeacbd86.

Now, consider how PKCS#7 padding works. AES requires blocks of 16 bytes in length. If the last block of plaintext is less than 16 bytes, bytes are appended to make the length of this block 16 bytes, where the value of the appended bytes is the number of bytes appended (e.g. if 5 bytes are appended, the value of these bytes is 0x05). If the last block of plaintext is 16 bytes, then an entire block of 16 bytes is appended, where the value of these bytes is 0x10 (0x10 is hexadecimal for 16). So, a computer program can easily evaluate the last block of plaintext to determine whether or not the trailing bytes in this block comply with the PKCS#7 standard.

Knowing this, Paul can easily determine if a key is correct, by using it to decrypt the last block of ciphertext, using the second to last block as the iv, and checking if the plaintext produced contains valid PKCS#7 padding.

Paul thinks his password might have been his birthday: ‘03261985’. He runs this through the SHA384 key derivation function:

echo -n '03261985' | sha384sum

This produces:

dba50aff3f87d7d41429f9b59380ac539cc62a89adfdefcd5157015e0e768382a27e591a544e7b824ab002b502fb44fa

The first 32 bytes are the key, so the key is dba50aff3f87d7d41429f9b59380ac539cc62a89adfdefcd5157015e0e768382.

Using openssl, Paul tries decrypting the last block of ciphertext, with this key, using the second to last block of ciphertext as the iv. He runs the plaintext output of the openssl command through xxd so that he can see the plaintext bytes:

echo -n '5307f7afffa3798f386e7c6c144c6a9c' | xxd -p -r | openssl aes-256-cbc -d -nopad -K dba50aff3f87d7d41429f9b59380ac539cc62a89adfdefcd5157015e0e768382 -iv 3b2364d1d04a35c8081bbc6fdeacbd86 | xxd -c 16

This produces:

00000000: 7926 e22d ac62 41da d133 9f40 3466 38be  y&.-.bA..3.@4f8.

Clearly, the trailing bytes are not PKCS#7 padding. No luck.

Paul tries his wife’s birthday. No dice. He tries each of his three kids’ birthdays. Still, no love.

Finally, Paul decides to write a program to crack his own password. His program loops through each date since Jan 1, 1800 to present. For each date, his program applies the above process. When his program reaches ‘07072014’, it hits paydirt!

To be sure, Paul verifies this using the process above:

echo -n '07072014' | sha384sum 

produces:

3985f3b3a10bc487988629a0533750d44898c1bf18a9ffe4e92cc27e21b33a7dd204d2f29a1f23e9737b39c4b02397d4

The first 32 bytes are the key: 3985f3b3a10bc487988629a0533750d44898c1bf18a9ffe4e92cc27e21b33a7d.

Again, decrypting the last block, using this key, and the ciphertext of the second to last block as the iv:

echo -n '5307f7afffa3798f386e7c6c144c6a9c' | xxd -p -r | openssl aes-256-cbc -d -nopad -K 3985f3b3a10bc487988629a0533750d44898c1bf18a9ffe4e92cc27e21b33a7d -iv 3b2364d1d04a35c8081bbc6fdeacbd86 | xxd -c 16

produces:

00000000: 0a0a 0a0d 0d0d 0d0d 0d0d 0d0d 0d0d 0d0d  .............…

The trailing 13 bytes are 0x0d. 0xd is hexadecimal for 13. So, that’s PKCS#7 padding. Indeed, this must be the correct key! It immediately dawns on Paul that 07072014 is his dog’s birthday. He kicks himself for not thinking of this sooner!

Now that Paul knows his password, he can decrypt the ciphertext file. The iv is the trailing 16 bytes of the SHA384 key derivation function above, so the iv is d204d2f29a1f23e9737b39c4b02397d4.

openssl aes-256-cbc -d -K 3985f3b3a10bc487988629a0533750d44898c1bf18a9ffe4e92cc27e21b33a7d -iv d204d2f29a1f23e9737b39c4b02397d4 -in bitcoin.enc

This produces:

bitcoin address info
--------------------
private key: 61a794c172e53593c6aba712c6732ffe9de89ebd86fcb2e4102cd1ce5cf2608
public key: 1c48274b9431e5971ef1be633e71e4108d5d601dc4f2ba1653816b965c401f0,ec39ca6cb0ee8cd6fca703e13f2ac257444cc90c04061efbe5b7130a66d95f0
public key compressed: 021c48274b9431e5971ef1be633e71e4108d5d601dc4f2ba1653816b965c401f02
bitcoinaddress: 14iY4jPDTujMFYVTV7dbFFdf3e6iofSLM8

Sadly, after all of that work, Paul has no bitcoins. But, Paul lives happily ever after anyway.

mti2935
  • 969
  • 8
  • 10
1

There is no way to detect the wrong key before decrypting: I can create a message M, a totally valid and legitimate key K, and then I encrypt the message with a different key K': Any information that I give you about the key K will be totally legitimate and fine, it will just not be about the key I used to encrypt the message.

You can try to include additional information, but whatever you ask for, I can supply you with information that would be correct with the key K, and then use the key K'.

I can even start encrypting using K, encrypt 90% of the message that way, and use K' for the remaining 10%. So there is no way to detect an incorrect key other than decrypting the complete message, and checking that the decrypted message is valid.

gnasher729
  • 1,350
  • 7
  • 9
0

I'm no crypto expert, but I remember that PGP added some random number (32-bit I think) twice at the start of the data to encrypt. So when decrypting you can check whether the first two 32-bit values are identical (without knowing the actual values, making clear-text attacks (more?) difficult). Maybe you could try that, too.

I know the answer is much too late, but I found the question when looking for something else.

U. Windl
  • 239
  • 3
  • 11