Is it safe to use the hash of the data as the key to encrypt them?

Question

That is, if I give one some encrypted data and tell them that the key that was used to encrypt them was the hash digest of the data before encrypting, will it be easier for them to decrypt them?

Edit as I can not reply to comments: The recipient will have the keys, that is no problem.

Maarten Bodewes · Answer 1 · 2017-09-27T21:15:41.153

It's kind of pointless to use the hash over the data even if you only hand over the key to the receiver of the ciphertext.

First of all, it's insecure. Consider an empty or very simple message. In that case an attacker can guess the message, create the hash, and verify correctness by decrypting.

Second, it binds the key to the data of the message. You will have to refresh the key each time you change the message. The idea of symmetric ciphers is that you can reuse the key for different messages. This scheme is not as bad as a one-time pad where the key has to have the size of the message as well, but it is still very inefficient.

Creating a secure key for most symmetric ciphers isn't hard; you just take 128 to 256 bits of secure random data and use that as key. There is no need to make the key dependent on the plaintext message. If you communicate with another party then often key agreement (DH or ECDH) is performed to agree on a key instead. There are of course countless other methods of key establishment.

In case the key is reused then you would have to use a different IV for each message. This IV can however be included with the ciphertext; it doesn't need to remain secret.

Note that this answer assumes a cryptographically secure hash such as SHA-2 or SHA-3. It doesn't consider a keyed hash or PRF such as HMAC-SHA-2 or KMAC-SHA-3.

Squeamish Ossifrage · Answer 2 · 2017-09-27T21:08:55.467

Giving an adversary $E_{H(m)}(m)$ for uniform random $H$ doesn't help them to guess $m$ any better than giving someone $E_k(m)$ for uniform random $k$: their only way to guess the key $H(m)$ is to guess $m$ in the first place!

Why might you want to do this? It provides a deterministic way to pick an encryption key for a content-addressed encrypted storage scheme, such as Tahoe-LAFS.

This doesn't work if $H$ is known to the attacker, e.g. $H = \operatorname{SHA-256}$. But you could use $H(m) = \operatorname{HMAC-SHA256}_k(m)$ for some long-term deduplication key $k$. Revealing $k$ to the attacker lets them distinguish between two possible messages $m_0$ and $m_1$, but doesn't help to decrypt unknown messages better than guessing them.

score 0 · Answer 3 · answered Sep 27 '17 at 14:24

In addition to Maarten Bodewes's answer, this scheme also invalidates the whole point of using a hash.

Hash values are meant to be public. You can safely show the hash of a message to an attacker and be confident that they gain no information about the plaintext message. In your scheme you are giving the attacker a copy of the message encrypted with the hash of the message. Now the main premise of a hash is no longer true because the attacker learning the hash is equivalent to the attacker learning the plaintext.

Moreover, it's not clear what you even gain from this scheme. You plan to hash the message, use that as a a key to encrypt the message. Then send the ciphertext over an insecure channel, and deliver the hash of the message to the decryptor by some out-of-band method? Why not just create a random encryption key? What do you gain by using the hash?

Is it safe to use the hash of the data as the key to encrypt them?

3 Answers3

Linked

Related