1

As a start, I'm by any means no expert or anything near that in cryptography. I know the very basic about this, enough to more or less choose a method to implement and then read about it so I knew what I was implementing. So please excuse any supposedly dumb questions haha.

Having that in mind, I've created a AES-256/CBC/PKCS#7 + HMAC-SHA512 encryption/decryption class in an Android assistant app I'm making (supposed to have locally (very?) personal things and I might publish it on Play Store, so it's not only for me). That combination is supposed to be highly secure for the next ??(?) years. Might be a bit slower, but I don't mind, at least for now (very few data). Though, I also read that there's one problem with this, which is when the initialization vector gets reused. With CBC, it seems it's not possible to get the data back (right?), as it's possible with other modes, and that's why I chose this one. [If there are any other problems with this method, I'm happy to know about them.]

But it's possible to, after some time, detect patterns and see where messages are equal (equal blocks of 16 bytes). Knowing what that block means, one could know that hasn't changed over time after various encrypted versions, for example.

So I had an idea, which is: all the data I encrypt must be encoded using UTF-7. The remaining byte values (128-255) are used as random values to be put one each 16 bytes, in a random position. For example, in index 4 a 154 byte is added, and in index 19 a 234 byte is added. This way, it's always random, and actually equal blocks in data will be different if the same IV is reused ("random" can repeat values, and I can't check if I've already used them in this case, so I thought on this to prevent problems).

Is this a good approach? Might it mitigate the problem? Maybe solve it completely for the next infinite years and the method would be now completely safe or at least much safer?

Also, if anything I said is wrong, I'm happy to be corrected! Thanks!

Edw590
  • 113
  • 5

1 Answers1

1

The only reason why you should be afraid of IV reuse is if the random number generator is off.

Assuming a fully random IV you could encrypt $2^{64}$ blocks within AES-CBC and still only have a one in $~2^{64}$ chance that there is a collision (approximately). Note that the repeat input problem doesn't limit itself to the IV; each ciphertext is used as a "vector" for the next AES-block-encrypt after all.

Your idea is to randomize the plaintext blocks somewhat, to help against IV collisions, something that is less frequent than the collisions in the output of AES (assuming that messages are larger than one block). But that won't really help, as a collision can still take place, and if the attacker knows what is in most of the plaintext it will be easy to guess what is in the other block.

There is however yet another problem. You now seem to have random information to perform the randomization of the blocks. If you would just use that random data for generating a secure random IV then you would likely not run into the problem in the first place.


If you want to protect your data you are better off deriving or encapsulation message specific keys from your "master key".

If your source of randomness is not cryptographically secure, you're probably in trouble anyway. You could take a look at AES-SIV mode to mitigate the problem somewhat. In that mode the IV depends on the plaintext message.

What you should definitely not do is to bugger up your protocol to try and hide the shortcomings of the algorithms. That's unlikely to succeed, and it adds all kinds of unnecessary complexity. AES encryption should not require any tapdancing, if the right constructs are used.


Security notes:

  • To protect against future changes I would recommend also including the IV into the MAC calculation. Using a MAC over the IV will only add 16 bytes to the calculation, which is relatively insignificant. Currently your IV cannot be changed by an attacker, but a change in protocol could make the IV and therefore your message vulnerable to change.
  • I would also like to warn you that if the data is MAC'ed that you are vulnerable to substitution attacks: replacing the data of one file with another. You'd need something verifiable / unique in the header / metadata and MAC it together with your ciphertext (we have AEAD schemes to help with that).
Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323