Why AEAD instead of encrypting with a simple hash?

Question

I hear a lot about AEAD encryption (GCM, EAX, ...). Why is it unsecure (it seems to be as AEAD exists) to do do that: Imagine a block of data, hash it, append the hash to the data, then encrypt the result (data + hash) ? I would intuitively think that it is not possible to currupt the data then corrupt the hash to hide the changes as both are encrypted.

EDIT: I found something that that made me realise the utility of the header in AEAD, it stores informations like the algorithm used, however I don't need a header in my case, so is it ok to encrypt(data + hash(data))?

hakoja · Accepted Answer · 2020-06-21T12:18:20.597

So suppose we define our encryption scheme as follows:

$E(K, M) = \operatorname{CTR}(K, M || H(M))$,

where $H$ is a hash function (e.g., SHA2-256), and $\text{CTR}$ is the counter mode-of-operation of some underlying blockcipher (e.g., AES-128). Now suppose we observe the ciphertext $C = C_M || C_T $ of a known message $M$ and want to modify some bits in $C$ so that it decrypts to some other message $M'$. Here $C_M$ denotes the part of the ciphertext which contains the encrypted part of the message itself, while $C_T$ contains the encrypted part of the hash of the message. In more detail:

$C = \overbrace{10000110100011}^{C_M} || \overbrace{1100010}^{C_T}\\ \phantom{C} = \overbrace{00100011110010}^{M} || \overbrace{0010100}^{T = H(M)} \\ \hspace{3.5cm} \oplus \\ \phantom{C =}\ \underbrace{10100101010001 || 1110110}_{\text{CTR keystream}}$

For simplicity, suppose we want to create $M'$ by flipping bits 1, 3, and 13 in the original message $M$. First we start by simply flipping bits 1, 3, and 13 in $C_M$. This gives

$C' = \overbrace{\color{red}{0}0\color{red}{1}001101000\color{red}{0}1}^{C_{M'}} || \overbrace{1100010}^{C_T}$

When the $C_M'$-part is decrypted, this will yield $M'$ due the properties of the CTR mode-of-operation:

$\overbrace{\color{red}{0}0\color{red}{1}001101000\color{red}{0}1}^{C_{M'}}\\ \hspace{1.5cm} \oplus \\ 10100101010001 \dots \quad (\text{CTR keystream})\\ \color{red}{1}0\color{red}{0}000111100\color{red}{0}0 \quad = M'$

However, now the hash won't match anymore. So we also need to modify $C_T$ into $C_{T'}$ such that when $C_{T'}$ is decrypted it yields $T' = H(M')$, i.e., the correct hash of our modified message $M'$. But this is easy since we know $M$ and $C_T$: first compute $T' = H(M')$ and suppose $T$ and $T'$ differs in bits, say, 2, 3, and 7, i.e. $T' = H(M') = 0\color{red}{10}010\color{red}{1}$. Now we simply flip bits 2,3, and 7 in $C_T$ to get $C_{T'}$, and this will decrypt to $T'$. Thus our full ciphertext is:

$C' = C_{M'} || C_{T'}$,

which when decrypted yields:

$C' \oplus \text{CTR keystream} = M' || T' = M' || H(M')$.

Note that this attack won't work as-is on another mode-of-operation which doesn't provide integrity. However, analogous attacks are usually easy to come up with. For example, see here for the analogous attack on CBC-mode.

In conclusion: your suggested scheme, albeit natural, fails to provide integrity. That is why modes like GCM, CCM, and EAX exist.

Why AEAD instead of encrypting with a simple hash?

1 Answers1