4

Assume that Alice has a file $F$ which she is going to send, in encrypted form to Bob. Alice possesses $F$ and an encryption key $K$. She sends to Bob the encryption of $F$ using $K$, $E(F,K)$ as well as a compact message authentication code which could be the hash of the file $H = {\rm hash}(F)$, which is used as a unique blinded identifier of the file. It's assumed that multiple users may have the same file, encrypted with different keys, and we wish to detect that they are the same by virtue of $H$ being the same, but Bob needs to know that $H$ matches $F$ without having $F$ unencrypted.

Does there exist any kind of proof that the hash $H$ (or other compact MAC) corresponds to the file $F$? The hashing/MAC algorithm and encryption algorithm (symmetric, asymmetric, homomorphic) are secondary to being able to prove this.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
Bob McElrath
  • 121
  • 5

3 Answers3

3

If you use a one-time pad as your encryption function then this simplifies to a proof that Alice knows some $F$ that hashes to $H$. $K$ can be trivially derived from $F$ and $E(K, F)$ by xor-ing.

An interactive zero-knowledge proof of this simpler problem - Alice proving she knows a pre-image of $H(F)$ - would go something like this:

You need to use a partially homomorphic hash such that $H(A+B) = H(A) \oplus H(B)$ for some composition operation on each domain. An example might be to add/xor the pre-images and use scalar multiplication of an elliptic curve base point as your hashing function - though note for an arbitrary length message this is going to be magnitudes slower than a traditional cryptographic hash function.

In a single trial in the proof Alice generates a one-time pad - $P$ - and commits to $H(P)$. Bob calculates $H(F+P) = H(F) \oplus H(P)$. He can challenge Alice to reveal either $P$ or $F+P$. Each trial halves the probability that Alice is cheating.

You can turn this into a non-interactive proof by Fiat-Shamir. It's not very succinct however, the proof ends up being many times longer than $F$.

I guess with a homomorphic DRBG you could extend the proof to show that Alice knows a stream cipher key $K$ that defines the tranformation $F \rightarrow E(K, F)$. This would improve the succinctness of the proof as well as being a closer match for what you asked for.

geoff_h
  • 337
  • 1
  • 10
2

Do you allow Bob to ask all the users having $F$ to encrypt, then to upload the file with a same secret key deriving from $F$ ?

Here would be the principle of the tweaked protocol : all the users must encrypt the file with the same key which depends on the file (calculated via another hash). The key would be a kind of shared key, but which can only decrypt the file $F$. Bob would calculate the hash of the first file he receives, store it, and then calculate the hash of all the new received encrypted files to check whether or not they match the initial hash.

Example of derived symetric key : $H(F | 1)$ (the bit "one" is added at the end of the file). In this construction the unique identifier of the file would be the hash of the - unique - encrypted file .

Fraktal
  • 229
  • 1
  • 5
1

You could use non-interactive zero-knowedge proofs along with functional encryption. With functional encryption, you could give a secret key for the hash function. You would encrypt the file with the FE scheme. However FE schemes are not practical. You can also solve the problem with functional encryption for inner products. You use the scheme to compute the function Ax where x is a plaintext vector (encoded to have short entries). Now Ax is the GGH hash function and it can be seen be one-way and collision-resistant based on the hardness of the shortest vector in a certain type of lattice. This is an interesting problem.

user39668
  • 11
  • 1