12

Alice has bought a brand new hard disk, $K$ (with $K \sim 10^{12}$) bytes in size. She is very happy about her purchase, and tells Bob about it. Bob claims he also bought a $K$ bytes hard disk. Alice doesn't really trust Bob on this, so she asks him to prove his claim:

  • Alice sends Bob an $O(\ln(K))$ sized problem she has generated.
  • The problem can only be solved if one has $K$ bytes of memory.
  • Bob sends back an $O(\ln(K))$ sized solution.
  • Alice verifies the solution.

In particular, there can be two distinct cases:

  • Alice verifies the solution without having to carry out the same operations as Bob (in this case, she doesn't even need to have a $K$-bytes hard disk herself)
  • Alice computes the same solution as Bob on her side, using her own hard disk.

What problem could we use? It is perfectly okay if the nature of the communication between Alice and Bob changes a bit (for example requiring more than a two-way communication is okay), but the communication should never grow in size to something like $\Theta(K)$.

Solutions I tried

(Feel free to skip this section if you don't need inspiration!)

  • Using $\Theta(K)$ communication, Alice could obviously generate $K$ random bytes, send them to Bob and ask him to compute the hash of some subset of them that she will send only after sending the whole payload. (We need the hash of a subset of the data otherwise Bob can compute the hash on the go as he receives the bytes). But this solution is obviously discarded because it requires too much communication.
  • Alice sends a seed, Bob generates random numbers with the seed to fill the $K$ bytes, then he sorts the bytes and computes the hash. However, an $O(K^2)$ (in time) solution exists that will allow Bob to compute the hash by extracting the first, second, third... item by repeatedly re-generating the random numbers. So this doesn't actually prove that Bob has a $K$-bytes hard disk.
  • Alice sends a seed, Bob generates random numbers with the seed to fill the $K$ bytes, then a shuffling procedure is put in place to randomly take small blocks and hash them together, or XOR them, or anything in that fashion, so that after a few iterations each byte depends on every other and it is impossible to compute a partial solution on the go. Alice then asks the hash of a subset of the total result (or even the total hash). This could work, but it has the issue of producing an huge amount of random I/O on the disk, which could actually damage a real-life, non-SSD hard disk.
  • Bob generates a sequence of proofs of work (as in bitcoin mining). Bob then generates a Merkle tree on the sequence and sends the root to Alice. Alice asks for $N$ random proofs of work, and Bob sends back the whole navigation on the Merkle tree from the root to the leaf, Alice verifies everything. However, Bob could just save the first $L$ layers of the Merkle tree, and re-compute on the go all the leafs under the last node memorized. If he stores $10^9$ values, for example, he will just have to recompute $10^3$ leafs, which is feasible on the go.
Matteo Monti
  • 1,477
  • 2
  • 14
  • 19

2 Answers2

6

There has been a huge amount of work on related questions in the past years. As Thomas Prest mention, this problem was considered for memory-hard function, which provably (in some idealized models) require some amount of space to be evaluated.

However, MHF alone are only the weakest primitive of this kind; many primitives have been designed that enhance MHFs with other desirable properties, such as efficient verification. What you are looking for is probably best captured by proofs of transient space, where you provably use a given amount of space during a computation, so that the proof can be verified in a time and space considerably smaller (polylogarithmic) than the space used by the prover.

I recommend going through the introduction of this article, which does address your problem and furthermore points to many interesting references if you want to dig the subject a bit. Most constructions use graphs with specific combinatorials properties (such as expander graphs and depth-robust graphs), which imply lower bounds on the hardness of playing a certain game (the pebbling game) on the graph. These lower bounds then translate into bounds on the memory usage in the random oracle model.

Geoffroy Couteau
  • 21,719
  • 2
  • 55
  • 78
3

I see at least one way of doing what you want to do: memory-hard functions.

Alice just needs to store a value $m$ and its hash $H(m)$, where $H$ is a memory-hard function and where the parameters are scaled so that you cannot compute $H(m)$ unless you are above a certain memory threshold. See e.g. this article which provides a provably memory-hard hash function. Alice would only need to store the short values $m$ and $H(m)$, but someone (possibly Alice) will need to compute $H(m)$ beforehand and send it to Alice.

Thomas Prest
  • 1,100
  • 8
  • 14