12

I need a SHA3-255 or 511. What if I simply truncate a standard SHA3-256 or 512? Apart from the doubled probability of hash collision, are there any other things I should be aware of? I could also truncate one byte instead of one bit, if useful.

What I need is being able to store something different than the hash, in the same 32 bytes, or 64 bytes, so I need to sacrifice one bit to mark when the bytes represent a hash or something else.

Alternatively I could say that if the first byte, not bit, is 0xff, then the remaining ones represent something else. This should reduce the probability of hash collision, but what I would have a probability of 1/256 that a hash starts with 0xff, generating ambiguity in my encoding. I could say that if the for 4 bytes are 0xffffffff, then I would end up with probability of generating an ambiguous encoding of 1/2^32, but I would prefer having a well-defined encoding under any circumstances.

Any well known approach I'm not aware of?

kelalaka
  • 49,797
  • 12
  • 123
  • 211
ragazzojp
  • 423
  • 4
  • 8

3 Answers3

18

With all well-regarded hash functions, the bits of the hash all have equal worth: as far as anyone knows (unless they aren't telling), the bits are not correlated. If you take $k$ bits of an $n$-bit hash, you get a $k$-bit hash function. Truncating SHA-256 to 255 bits gives you a hash that's almost as good as SHA-256: it has $2^{255}$ strength against preimage attacks and $2^{127.5}$ strength against collision attacks.

There are precedents for taking certain bits of a hash. SHA-224 and SHA-384 are obtained with calculations that are essentially the same as SHA-256 and SHA-512 respectively, just with different initial values (which shouldn't matter to the strength of the algorithm) and with the output truncated to the smaller size. Another precedent is that UUID can be constructed from 122 bits of MD5 or SHA-1 hashes (out of 128 or 160 respectively).

For SHA3, there's a cleaner construction. Instead of taking SHA3-256 and cutting off one bit, take the SHAKE256 255-bit or 248-bit output. SHAKE256 is exactly the same as SHA3-256 except for two bits near the beginning of the calculation, so it has the same security properties, but it's explicitly designed to have variable-length output. You can even use SHAKE128 instead of SHAKE256, which is slightly cheaper to compute with no meaningful security loss.

It's unlikely that there's an actual security problem with truncating a hash, but using SHAKE gives you greater confidence that nothing will go wrong. Even better, to avoid domain collisions (where two parts of the system calculate the hash of the same string for different reasons), calculate the cSHAKE output with a unique string as the customization string.

6

Apart from the slightly reduced resistances, there is no problem:

Resistances for SHA3-512;

  1. Pre-image resistance decreased to $2^{511}$ or $2^{504}$, if 1 bit or 1 byte trimmed, respectively.
  2. Secondary preimage resistance decreased to o $2^{511}$ or $2^{504}$, if 1 bit or 1 byte trimmed, respectively.
  3. Collision resistance decreased to o $\sqrt{2^{511}} = \sqrt{2}\cdot 2^{255}$ or $\sqrt{2^{504}} = 2^{252}$, if 1 bit or 1 byte trimmed, respectively.

Similarly resistances for SHA3-256;

  1. Pre-image resistance decreased to $2^{255}$ or $2^{248}$, if 1 bit or 1 byte trimmed, respectively.
  2. Secondary preimage resistance decreased to o $2^{255}$ or $2^{248}$, if 1 bit or 1 byte trimmed, respectively.
  3. Collision resistance decreased to o $\sqrt{2^{255}} = \sqrt{2}\cdot 2^{127}$ or $\sqrt{2^{248}} = 2^{124}$, if 1 bit or 1 byte trimmed, respectively.

The trimmed case resistance should be good enough for your application.

Actually, this is not uncommon, for example, SHA-224 is the truncated version of SHA-256 with different initial values that provides domain separation.

Trimming is secure as long as the generic attacks are in good bounds. We require the hash functions to have avalanche criteria on the output bits, that is a change in the any of input bits must randomly affect half of the output bits. Each bit of the hash function must depend on the input bits; removing 1 bit or 1 byte doesn't affect the results of other bits.

kelalaka
  • 49,797
  • 12
  • 123
  • 211
3

Lets get one thing out of the way: forcing one bit to 0 or 1 does not change the output size of the hash. A hash output is not a number, so the output size would not be affected.

Reducing hash output is common practice. Although maybe not a direct requirement, generally the output of a hash is considered indistinguishable from random - if the input is unknown, of course. So basically you can do with it what you want. The common thing to do is to take the leftmost bits or bytes of the hash output. You're taking the rightmost bits or bytes, and that's OK too.

As you already found out, trying to use one value out of 256 to indicate a special case is tricky. You can of course use set of bytes to escape values, but since your output size is static, you'll have to sacrifice more security for the special cases: the hashes starting with 0xFF in your case. As SHA3-512 has plenty of security, I'd just sacrifice a bit or even byte.

Finally, there is one rather odd issue with taking the leftmost bytes. You might get in trouble if you have other full hashes over the same data (the domain collision in Gilles' answer). To compensate for this, most hash functions have special bits or other constants when generating a shorter hash. Usually publishing the shorter hash is not a problem though. You could play it safe though and start off by hashing a specific magic value in advance, or by using SHAKE instead.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323