How to use hashes for proof of retrievability (PoR)?

Question

Suppose an owner has a file $F$ stored in a server, and wants proof that the server has the full file. The owner possibly does not store the full file. I was thinking about the following simple schema of challenge-response using only hashes:

Preparation

The owner generates a key $k$ randomly.
The owner calculate and store the commit $c=hash(hash(k||F))$ ("$||$" stands for concatenation).

Verification

The owner sends the key $k$ to the server.
The server calculate $p=hash(k||F)$ and send $p$ to the owner.
The owner verifies if $hash(p) = c$.

I was researching challenge-response PoR and could not find this simple schema, usually related works use more sophisticated cryptographic functions. Deswarte y. et al. seems to be a general case of this schema. I appreciate any reference to previous work with this schema.

Supposing the hash function $hash$ is a cryptographic hash with properties of collision resistance and irreversibility. Is this PoR schema safe? Any attack?

Edit

Squema was fixed by exchange of $p=hash(hash(F)|hash(k))$ to $p=hash(k||F)$, as pointed out by Titanlord and Paŭlo Ebermann

score 2 · Accepted Answer · answered Jan 21 '25 at 14:35

In your case, the owner can check exactly one time, if the server stores the file. But I think the owner can not even do that.

My dishonest server obtains file $F$, computes $t = H(F)$ with hash function $H$, and stores $t$ but not $F$. On challenge $k$, my server computes $t' = H(k)$ and $c' = H(t || t')$, and sends $c'$ back. The owner checks $c' = c$ and is happy, but my server never (really) stored $F$.

I think one fix would be to create $n$ many checks $c_1, \dots c_n$ with $c_i = H(F || k_i)$, where $k_i$ is some secret random nonce.

If you think about it the other way around, where the owner has the complete file (e.g., a password) but the server does not, you are entering the field of authentication schemes. Additionally, exploring Merkle trees might be valuable for your case.

score 2 · Answer 2 · answered Jan 22 '25 at 01:24

No, even the "fixed" scheme is not safe when instantiated with most hash functions in common use today. That's because hash functions tend to be designed sequentially: They process the input in blocks one after another, while only keeping a small internal state.

This means the server could just partially evaluate the hash function on $F$, stopping before any finalization step happens, keeping note of the final internal state $S$ (if the file length is not divisible by the hash function's block length, a few bytes $F'$ corresponding to the last partial block also need to be saved separately). It can then discard $F$ itself. The check can still be passed by performing a final update to the hash function state $S$ with the bytes $F'\|k$ and then the finalization step, resulting in the desired output $p$.

Note that this is applicable even when the hash function is not susceptible to length extension attacks, because we don't need to extend a finalized hash value, but just the internal state of the hash function, which we can always do by design as long as the hash function works sequentially.

How to use hashes for proof of retrievability (PoR)?

Preparation

Verification

Edit

2 Answers2