7

I was reading about reversible computing while thinking about FPGA circuits. I realized that it would be possible to write a reversible SHA-256 algorithm if you stored some additional information as "garbage". E.g. The algorithm/circuit would compute a SHA-256 hash, but it would also produce some metadata that allows you to walk backwards through the logic gates and arrive at the original preimage.

A NOT gate is already reversible, and an XOR gate just needs to pass one of it's inputs as an output. Toffoli gates can be used for everything else, so any boolean function can be made reversible. A fully unrolled SHA-256 algorithm is just a boolean function with an input and output.

If you used reversible gates for every step in a SHA-256 circuit, how many garbage bits would be in the output? And would it be possible to compress this data?

Let's say you have a 447-bit preimage (so one 512-bit chunk.) Would it be possible to compress the garbage from a reversible SHA-256 circuit into less than 447 bits, so that [SHA-256 hash plus garbage bits] would be enough to reconstruct the preimage? (If not, then you may as well just store the original preimage.)

Another way of framing this question: How many bits of information about the preimage are stored in a SHA-256 hash?

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
ndbroadbent
  • 243
  • 1
  • 11

2 Answers2

2

I believe your question can be answered without any knowledge of the algorithm, only information theory. You say:

Would it be possible to compress the garbage from a reversible SHA-256 circuit into less than 511 bits, so that [SHA-256 hash plus garbage bits] would be enough to reconstruct the preimage?

This can be restated as:

Is it possible for [an algorithm] to compress all 511 bit inputs into strings of less than 511 bits.

The answer is "no", because there are 2511 different inputs, and therefore they cannot be represented by a set of bit-strings shorter than 511 bits.

We could relax the constraint to:

Is it possible for [an algorithm] to compress some 511 bit inputs into strings of less than 511 bits (while leaving some strings longer)

This, on the other hand, is trivially true:

  • If all "garbage bits" are 1, save the SHA-256 hash followed by 1
  • Else, save the SHA-256 hash followed by 0, then the "garbage bits"

Obviously, this wouldn't be useful, but since you can choose your compression based on observed results, you can always demonstrate some outputs that are shorter than the output. Other inputs, though, are guaranteed to give outputs longer than 511 bits.

Compression is only useful if it exploits patterns in the input, either because the input space is constrained, or because some inputs are more likely than others. Since hash algorithms are designed to do the opposite, avalanching small changes in input and making all outputs equally probable, it's unlikely that the "garbage bits" would exhibit any patterns that could be usefully compressed.


From your comments, I realise that your actual aim was not to fit the result in less than 511 bits, but in less than 256 + 511 bits. We can trivially prove that exactly 256 + 511 bits is possible: plaintext XOR hash gives a string 511 bits long, which can be combined with the hash to recover the plaintext.

But we can go better by knowing two facts about SHA-256, which are common to all well-designed hash algorithms:

  • It always gives the same output for the same input (if this wasn't the case, it would not be useful at all). This implies that its output is entirely dependent on its input. It also implies that the state of your reversible circuit will be entirely dependent on the input.
  • It's output is evenly distributed; that is, every output is equally likely to occur. I don't know if this is proven for SHA-256, but it is a desirable property of any hash, so we can safely assume that it is true to some approximation.

This is enough to answer your last question:

How many bits of information about the preimage are stored in a SHA-256 hash?

256.

We know this because 2256 equally possible outputs represent 256 bits of information; and that information all comes from the input, otherwise the same input would sometimes produce different output.

This in turn answers the original question:

how many "metadata" bits would be required for reversibility?

511 - 256 = 255.

To confirm this, divide the 2511 possible input strings among the 2256 possible output strings. So each hash can be generated by 2255 different inputs. Each of those inputs creates a distinct state of your reversible circuit, so there are 2255 such states for each hash output. If we list all the states for a particular hash output, we can label them all uniquely using 255 bits.

So, mathematically, it must be possible to derive some scheme that classifies the state of the circuit into 255 bits, which, when combined with the 256 bit hash, can be used to reconstruct the input.

[Nonsense to stop markdown breaking things as usual: abab]

forest
  • 15,626
  • 2
  • 49
  • 103
IMSoP
  • 302
  • 1
  • 7
2

If you wrote a reversible SHA-256 algorithm, how many "metadata" bits would be required for reversability?

If you would write such an algo, it would not be SHA-256 anymore. Also, it would lose its cryptographic usefulness.

SHA-256 is a non-reversible, one-way compression function. Voiding the non-reversability would void the core functionality that makes SHA-256 a cryptographically secure hash.

In the end you would end up having a non-lossy, two-way compression function — and a bad one too as it would waste too much space compared to simple compression functions like (for example) ZIP.

Also, since the goal ends up right there, it is not about cryptography anymore… rendering any further discussion about your idea off-topic.

TL;DR:

  • Assuming you do not modify SHA-256 and simply rely on your "metadata" for reconstruction, the size of your "metadata" would be equal to the size of the input.
  • Modifying SHA-256 itself to optimize the space needed for any kind of "metadata" is not an option, as the algo would lose its cryptographic usefulness (and not be SHA-256 anymore anyway).

Besides that, your question seems to try work around something which has been asked several times before.

For example, see Would it be possible to generate the original data from a SHA-512 checksum?.

So, in a broad sense your question is a duplicate of all existing questions asking about restoring the original input data from a cryptographically secure hash output.

Any other interpretation of your question (in its current state) would make it an information theory question which would be more on-topic at Computer Science StackExchange.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240