15

Assume a user $U$ and a server $S$. $U$ uploads its data and wants later to perform an authenticity check. It also sends a Merkle tree to the server. Let’s say we would like $U$ to ask for a specific element in the tree. $S$ then returns the leaf node and the path to the root that would allow $U$ to verify. This acts as a proof of completeness.

In the case an element is not part of the Merkle tree, how the server can prove the non-membership proof?

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
curious
  • 6,280
  • 6
  • 34
  • 48

4 Answers4

9

If you are not worried about revealing information, then you can commit to the set of data items in one Merkle tree, and to the frontier of that set in another Merkle tree. A frontier is the set of ancestors to all values that are not in the tree (note that this is about the same size as the size of the set itself). Then, in order to prove that a value is inside the set you work with the first tree, and in order to prove that it is not you work with the second tree.

However, I strongly recommend that you read this paper on Zero-Knowledge Sets by Micali, Rabin and Kilian. First, the notion of a frontier mentioned above is defined. Second, they provide a formal definition and analysis. Third, they achieve something much stronger which is that nothing but the fact that the element is in or is not in the set is revealed (not even the size of the set). There was some follow up work on zero-knowledge sets as well, but you can find them easily by Googling.

Yehuda Lindell
  • 28,270
  • 1
  • 69
  • 86
6

If your data is a set $S$ of key-value pairs, such that $S = \{(k,v) \mathrel\vert k \in K, v \in V\}$, you can have non-membership proofs for your data by using a sorted Merkle tree (sorted by the keys in $K$).

This tree can be a binary-search tree, a trie, a sparse Merkle Tree (similar to the one in Micali's Zero Knowledge sets paper, and also reintroduced by Google).

You can learn more about this by reading some of the research literature on authenticated dictionaries. This paper by Scott Crosby could be a good starting point.

Later edit: If you have unkeyed data, you might be able to key it by computing $k = \mathsf{SHA256}(v), \forall v$. Depending on your use case, this could or could not be viable.

Squeamish Ossifrage
  • 49,816
  • 3
  • 122
  • 230
Alin Tomescu
  • 1,054
  • 10
  • 31
5

This is a less formal answer, but describes the same thing as Alin's answer above.

Standard binary merkle tree:

R / \ N N / \ / \ N N N N / \ / \ / \ / \ 6 3 9 0 8 4 7 2

Verifier knows only R, and for prover to prove membership they have to supply leaves along the path from given member towards R. So far so good.

To prove non-membership, what you can do is to have a sorted merkle tree:

R / \ N N / \ / \ N N N N / \ / \ / \ / \ 0 2 3 4 6 7 8 9 <-- values 0 1 2 3 4 5 6 7 <-- binary index

To prove that 5 is not in the set, you supply proof of membership for 4 and 6, which are in successive binary order (3 and 4 in this case), and the hashed values cover a range the query value falls into. And since we have the assumption of order, 5 can't appear anywhere else.

To be specific, the properties are:

  1. The verifier has to trust R to be result of honestly ordered tree. This can be verified probabilistically (including Fiat-Shamir) by samping with few queries and observing the order is always maintained. This may become quite heavy depending on the nature of data (if keys allow for large gaps or not). Better is to simply assume the R we know is honest.

  2. The worst-case proof size, with completely distinct branches is only 2log2(n) of the set.

  3. Worst of all, you cannot easily update the tree without rebuilding it. To make an update you need to know the whole universe, not just the tip R as is the case for membership. Thus the construct is suitable for static dictionaries which are seldom updated, as well as short round membership protocols like joinmarket below.

  4. The tree must be binary and of known size, if it is truncated (some part of the DAG terminate earlier), use some graph rule, like that truncation may occur along the rightmost branches.

R / \ N N / \ / \ N N N 7 / \ / \ / 0 2 3 4 6

For more long-winded description, see https://gist.github.com/chris-belcher/eb9abe417d74a7b5f20aabe6bff10de0

kat
  • 183
  • 1
  • 5
0

Merkle trees do not provide the option for a "non-membership-proof". However, as the user knows the whole tree in your case, the server can just send the hash of the element (or the element itself, depending on how you construct the leafs). The user can verify that the hash is unequal to every leaf of the tree. While this is still not very efficient (linear time in the number of elements), it will be more efficient than building an authentication structure for the inverse of your data set as long as your data set is small compared to the (data-)universe.

mephisto
  • 2,968
  • 20
  • 29