17

SHA-1, SHA-2, and the standardized version of SHA-3 are all sequential. This is impractical for hashing very large files distributed across machines. Any sequential hash can be straightforwardly converted into an efficiently parallelized hash using Merkle trees, but then I lose the standardization, and this is undesirable if the hash is to be used for long term authenticated storage.

Question: Is there an officially standardized tree-based (parallelizable) hash?

The most important property of the tree-based hash for my purposes is that for any partition of the input string, each portion can be reduced down to $O(\log n)$ intermediate space in parallel such that the intermediate values can be combined into the final hash, with at most a constant factor overhead over a sequential hash.

otus
  • 32,462
  • 5
  • 75
  • 167
Geoffrey Irving
  • 404
  • 2
  • 12

3 Answers3

7

With SHA-3 Derived Functions (SP 800-185, pdf) there is now a standardized parallel hash based on SHA-3, called ParallelHash, appropriately.

However, it is not a tree hash, but more of a hash-list-based mode. The string to be hashed is divided into equal-sized blocks, which are hashed, concatenated and then hashed again.

While it is not a tree hash it should cover the use case of hashing very large files across many machines. The arbitrarily small blocking size allows you to get to your $O(\log n)$ requirement if you base it on an appropriate function of the input size.

otus
  • 32,462
  • 5
  • 75
  • 167
6

The BLAKE3 hash function was just announced today. Internally, it's a Merkle tree.

Jack O'Connor
  • 647
  • 6
  • 13
3

I'm working on a tree hash based on BLAKE2 that's absolutely not in any way official, but it's exactly the sort of design you described. Maybe this would be useful once it's been properly reviewed and stabilized: https://github.com/oconnor663/bao

One of the features of Bao that might be nice for authenticated storage, is that it can verify any part of a file against the root hash, without needing to retrieve the entire file first. So for example, if the authenticated file is a big video, and you want to stream it and seek around it on a mobile phone, you can do that.

Edit: With the release of BLAKE3, the latest version of Bao is no longer its own hash function. Instead, it's based on BLAKE3, implementing the verified streaming part of the spec.

Jack O'Connor
  • 647
  • 6
  • 13