Can a neural network be trained to learn the inverse of a SHA-256 hash function using a dataset of smallest preimages?

Question

Let me define a mathematical function:

$$ f: A \to B $$

$A$: The set of all possible SHA-256 hashes (i.e., $\{0,1\}^{256}$).
$B$: The set of all non-negative integers (i.e., $\mathbb{Z}_{\geq 0}$).

The function $f(h)$ is defined to be the smallest non-negative integer $x$ such that:

$$ \text{SHA256}(\text{encode}(x)) = h $$

Here, $\text{encode}(x)$ means: convert the integer $x$ to its binary representation (as a sequence of bits). For example:

$x = 5$ becomes 101

SHA-256 will handle any required internal padding, so the only requirement is that the integer is fed in as its raw binary form — not as a decimal string or text.

If no such $x$ exists in $B$ (for example, say a hash $h1$ is obtained for a negative integer $x1$), then $f(h)$ is undefined — but for the purpose of training, we only consider the set of $h \in A$ for which such a value is found.

To generate training data for a neural network:

I iterate over all integers from $0$ to $N$ (say, up to one trillion).
For each integer, I compute its SHA-256 hash.
If a hash has not been seen before, I store the pair $(\text{hash}, \text{integer})$, effectively recording the smallest preimage for that hash seen so far.

The resulting dataset contains hash-to-integer mappings, where the integer is the smallest known preimage for that hash.

Key assumption:
Neural networks are universal function approximators. Since $f$ is a mathematical function from hashes to integers, the neural network should — in principle — be able to learn $f$

The question is:

If successful, this would effectively give us an inverse of SHA-256, at least for the subset of hashes we trained on — and potentially generalize to unseen hashes.
Where is the flaw in this reasoning?

hobbs · Accepted Answer · 2025-04-07T19:18:32.643

The flaw basically lies in the hidden assumption behind "universality". A neural network of finite size, given finite time and finite training data, can learn an approximation to a given function. Given a bigger NN, more time, and more data, we can improve that approximation. Given unlimited units, unlimited time, and unlimited data, we can make the approximation arbitrarily good. But the amount of resources we need to achieve any specific error bound depends on how "smooth" the input function is.

And any decent cryptographic hash is constructed in such a way that its inverse function is as "non-smooth" as possible. Even knowing the inverse-SHA of 0x00000...1 and the inverse-SHA of 0x00000...3 gives you essentially no head-start at figuring out the inverse-SHA of 0x00000...2.

The universal approximation theorem says that it is nonetheless possible for a feedforward neural network to learn such a function, but in this case it's only true because a neural network of sufficient size can "memorize its input" (overfitting). But to do that you would first need to find inputs for all 2^256 SHAs to use as training data, you would need a neural network with a capacity of several times 2^256 bits, and you would need several times 2^256 training examples (every possible input times however many epochs it takes to get convergence). That's one of those "computer bigger than the universe running for longer than the age of the universe" problems, and if you could somehow solve it, it would be much easier to skip the NN and use your ability to brute-force SHA-256 directly, because training the NN is predicated on being able to brute-force every possible hash to start with.

score 12 · Answer 2 · edited Apr 06 '25 at 18:12

This is extremely unlikely to work. The main feature of learning algorithms is the massive amount of data (literally hundreds of thousands of real valued not binary coefficients) they need to learn even simple functions such as linear or low degree partitionings of continuous space into two or a few regions. Note that this is a useless ability from the point of cryptanalysis, all it does is it learns a single bit if it is partitioning the space into two zones.

The SHA function is a function on discrete space, where there is no continuity or nice metric for convergence.

In addition your suggestion to look at first trillion bits is unlikely to give anything useful since this number is $10^{12}$ or $2^{40}$ or so bits in size which is tiny compared to the output space of SHA-256.

To see how unlikely this idea is to work, feel free to implement on possibly a smaller range of integers and try to learn the function whose output is a small fraction of the lower order bits of SHA-256.

score 9 · Answer 3 · answered Apr 07 '25 at 15:17

Both training and generalization in neural networks relies on gradients and local linearity. Non-linear functions can be learned and approximated through the use of non-linear activation functions between the neuron layers. Learning the non-linearities is much slower than learning the linear parts of the response.

Cryptographic hashes like SHA-256 are specifically designed to be very non-linear, in order to be resistant to differential cryptanalysis. Any single bit changing in the input typically changes about half of the output bits, and which bits change depends on the state of all of the other bits.

The approach of teaching a neural network with computed hashes could work if there is a detectable pattern that has not been noticed by researchers so far. But other than that, the neural network would be less efficient at it than other known methods, which themselves are less efficient than bruteforcing.

Maarten Bodewes · Answer 4 · 2025-04-07T14:15:58.187

Here, $\text{encode}(x)$ means: convert the integer $x$ to its binary representation (as a sequence of bits). For example: $x=5$ becomes 101.

That's just one possible representation of $x$ and usually we operate on bytes. Sure, 00000101 would be the most common one, but other representations exists. That said, if you're using a counter to represent all possible messages, then sure, this scheme works. In that case you don't need to worry about unsigned integers either; with just a positive integer representation you can represent all messages after all (with the possible exception of the empty message and the value $0$).

I iterate over all integers from $0$ to $N$ (say, up to one trillion). ... If a hash has not been seen before, I store the pair (\text{hash}, \text{integer})

Yeah, OK, but that will take a lot of data to store, and a trillion sounds as a large number, but e.g. when using text that's still only a few characters. For passwords we'd use rainbow tables to make sure that you'll have to store less.

Neural networks are universal function approximators. Since $f$ is a mathematical function from hashes to integers, the neural network should — in principle — be able to learn $f$

This is where the construction completely breaks down. The whole idea is that a cryptographic hash function is hard to inverse. You cannot just train an AI to learn an unknown function where each output is designed to look random. It won't learn anything about what that function should look like. Heck, you could substitute the hash for another - even one that can be reversible - and the AI would be just as clueless.

If successful, this would effectively give us an inverse of SHA-256, at least for the subset of hashes we trained on — and potentially generalize to unseen hashes.

It won't be successful. The maximum usefulness, because of the previous reasoning, is that it perfectly learns the table. Which makes the AI useless because you could just use the table.

Where is the flaw in this reasoning?

That an AI can learn anything about the unknown mappings from the known ones as those are deliberately randomized. Humans learn how to do this by studying the algorithm and trying to find mathematical weaknesses.

Studying a hash algorithm by just looking at the inputs / outputs is not feasible. Other than the deliberate complexity of reversing the functions hash algorithms also have initial constants that should be taken into account - deriving those would already be impossible.

Using an LLM, which is definitely not good at deep reasoning, is worse than using a general AI for this. In this case though it won't matter, as neither is a feasible option.

fgrieu · Answer 5 · 2025-04-08T13:04:08.870

The question's approach won't work as is. Here are two intuitively convincing arguments, that could be made rigorous. Both center on an inadequate choice of hash inputs during training.

The neural network is trained to output integers at most $N$. It has no incentive to produce integers much larger. Thus when approximating $f$, it has no reason to much exceed $N$. But the correct value is above $N$ for all hashes not in the training data. The neural network's guess for these will be almost always too low by a huge factor, thus pointless.
The inputs of SHA-256 in the training phase are bitstrings of at most 60 bits (assuming a trillion is $10^{28}$). Considering the padding in SHA-256, it follows that in the 512-bit bitstring at the input of the first and only compression step in SHA-256, bits 62 to 505 always are zero. The neural network thus has no way to learn what SHA-256 does with these bits. Unlearned bits 62 to 250 matter to evaluating $f$ at most points not learned to a degree such that each bit ignored about halves the likelihood of producing the correct output for a point not learned.

The above counterarguments fail if we change the integers in the training data to have the correct distribution over $[0,2^{384})$. After that, any counterargument must now somewhat invoke that SHA-256 meets it's goal of being a "good" hash function. Meta proof: the modified method could be made to work for particularly poor hash functions (e.g., one that truncates the input to the first 256 bits).

But even with good training data and restricting to weak-but-no-ridiculously-so hashes (e.g. that could be proposed as feasible exercise for hash cryptanalysis in an one-trimester course on applied cryptography), the approach will fail, at least because it's trying to break the hash from input/output pairs without using positively essential information: the definition of SHA-256, which in hash cryptanalysis is assumed available, and is very arbitrary (including 256-bit IV and 2048-bit table).

Where is the flaw in this reasoning?

The gaping one lies in "Neural networks are universal function approximators". If that's correct, it comes with lots of conditions, including that training data covers a suitable portion of the input set, and restriction to functions with a kind of partial continuity, to cite two conditions that are not met (the first due to poor training data, the second because any passable hash is such that hashes of nearby inputs appear unrelated).

A better approach to breaking hash preimage resistance using neural networks (as attempted in the question) would be training with

as inputs, the expression as CNF of the SAT problem of finding a hash preimage for a hash function similar to the target hash to be attacked, and a hash value;
as desired output, an input of the hash function yielding that hash value, that is a solution to the SAT problem.

It's computationally easy to automatically generate (input, desired output) pairs for random variations of the target hash function, and random input to that hash, with the hash value computed from that. I conjecture that with appropriate neural network techniques, enough training and simple/weak enough target hash function, the network would be able to solve a sizable portion of solvable preimage problems for the target hash.

Note: In neural networks, it's useful to restrict training data to things similar to the problem to be solved. Here: if we want to compute preimage for a particular target hash value, we should filter the training data to have a hash value differing by as few bits as possible from the particular target hash value.

I think that's unlikely to beat state-of-the-art SAT solvers fed a straight expression of the problem as CNF, which itself fails miserably with even fair hashes. So don't call me optimistic.

qwr · Answer 6 · 2025-04-08T03:33:13.090

Universal approximation theorems are theoretical results in the style of analysis, convex optimization, and statistical learning theory. The first classical result is Cybenko (1989) who proved a neural network of arbitrary width can approximate arbitrarily well suitably smooth functions. It does not provide a method to train such a network, nor does it say how many neurons would be enough. Needless to say, cryptographic functions like SHA-256 are extremely non-linear by design.

Currently there is a lot of research on provable learnability capabilities of networks. Our understanding in this area is still very limited. Often the results assume unrealistically simple distributions or strong assumptions like linear activation functions, and we still only understand toy networks with few layers like 2 or 3. Even AlexNet (2012), with 60 million parameters, 650,000 neurons, and 8 layers which is tiny by today's standards (ResNets that go from 50 to hundreds to thousands of layers deep) is way out of theoretical reach.

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers by Allen-Zhu et al. (2018) is an example of the kind of research that is going on right now (well, when I was in grad school). The introduction goes through recent(-ish) results.

score 0 · Answer 7 · answered Apr 08 '25 at 09:52

"Neural networks are universal function approximators" means roughly "any approximation can be converted to a neural network", not "neural network learning algorithms can learn to approximate any function".

There is at least one neural network that accurately represents the inverse SHA256 function. It has at least 2^256 nodes and simply stores the correct output for each input. That is roughly what the universal approximation theorem means in this case.

This is not very useful for you.

score -1 · Answer 8 · answered Apr 07 '25 at 20:09

A lot of answers already here highlight many good points, and instead I would like to highlight just two things that make any hash inversion strategy totally impossible.

1: All hashes do collide after you have exceeded the number of bit permutations they contain. After 2^256 unique entries, collisions are guaranteed for the most theoretically perfect hash. You can easily exhaust this space by just looping through 256 bits, or 32 bytes (assuming 8 bits per byte). If you have a 'message' that is 33 (IE hash output size + 1 byte containing 8 bits) bytes long, you are guaranteed to have at least (2^8) 256 hash collisions. If you what you are hashing is (N + 32) bytes in length, the most theoretically perfect hash is guaranteed to have N^256 collisions. In this case, a single hash can come from a countably infinite number of inputs and therefore is not possible to ever recover without a priori about what has gone into it, since there are a countably infinite number of inputs that could have produced that hash; and furthermore they all have equal probability.

2: The mathematical complexity of recovery is overwhelmingly challenging. Let's say that in order to avoid hash collision we only want an inverse function that maps the 256 bit output space back to a 256 bit input space, no existing ML is capable of inverting modern hashing algorithms. The Keccak algorithm behind SHA is an absolute complexity beast that makes entropy explode. A significantly simplified explanation of how Keccak works is by XORing seemingly random ('seemingly' is doing some heavy lifting here) vectors across a 3D data array who has subarrays that are shifting through it. SHA does some extra things atop Keccak that helps with stability and maximizing entropy. While this wild oversimplification cannot do it justice, hopefully this explains why a perfect mapping a 32 byte space to another 32 byte space would be challenging to invert. With how such entropy maximizing operations stack, it is incredibly challenging to consider how a perfect inversion function can exist. Let's say that you do find a potential set of weights that perform such an inversion, it is difficult to consider how they would be correct, only running every single 256 bit permutation seems like a reasonable option. The only case where I would see such an ML based inversion work, is if the size of the weights exceeds the size of the mapping space. IE: there are 2^256 variations of 32 bytes, this corresponds to 3.7e78 bytes when taken in totality, if the size of all the weights comes close or exceeds 3.7e78 bytes I would have generally greater confidence that such a ML based inversion would work. That being said, I think it is reasonable to create a 32 byte --> 32 byte ML inversion that is "good enough", an imperfect mapping that might generally be right about each bit. Meaning that it has more than 50% accuracy for each bit, but even an incredibly impressive system with 90% accuracy for each bit it inverts is still not particularly useful to deploy since it may be difficult to establish which 90% of bits is correct in any single inversion instance it attempts.

Can a neural network be trained to learn the inverse of a SHA-256 hash function using a dataset of smallest preimages?

8 Answers8