The question's approach won't work as is. Here are two intuitively convincing arguments, that could be made rigorous. Both center on an inadequate choice of hash inputs during training.
- The neural network is trained to output integers at most $N$. It has no incentive to produce integers much larger. Thus when approximating $f$, it has no reason to much exceed $N$. But the correct value is above $N$ for all hashes not in the training data. The neural network's guess for these will be almost always too low by a huge factor, thus pointless.
- The inputs of SHA-256 in the training phase are bitstrings of at most 60 bits (assuming a trillion is $10^{28}$). Considering the padding in SHA-256, it follows that in the 512-bit bitstring at the input of the first and only compression step in SHA-256, bits 62 to 505 always are zero. The neural network thus has no way to learn what SHA-256 does with these bits. Unlearned bits 62 to 250 matter to evaluating $f$ at most points not learned to a degree such that each bit ignored about halves the likelihood of producing the correct output for a point not learned.
The above counterarguments fail if we change the integers in the training data to have the correct distribution over $[0,2^{384})$. After that, any counterargument must now somewhat invoke that SHA-256 meets it's goal of being a "good" hash function. Meta proof: the modified method could be made to work for particularly poor hash functions (e.g., one that truncates the input to the first 256 bits).
But even with good training data and restricting to weak-but-no-ridiculously-so hashes (e.g. that could be proposed as feasible exercise for hash cryptanalysis in an one-trimester course on applied cryptography), the approach will fail, at least because it's trying to break the hash from input/output pairs without using positively essential information: the definition of SHA-256, which in hash cryptanalysis is assumed available, and is very arbitrary (including 256-bit IV and 2048-bit table).
Where is the flaw in this reasoning?
The gaping one lies in "Neural networks are universal function approximators". If that's correct, it comes with lots of conditions, including that training data covers a suitable portion of the input set, and restriction to functions with a kind of partial continuity, to cite two conditions that are not met (the first due to poor training data, the second because any passable hash is such that hashes of nearby inputs appear unrelated).
A better approach to breaking hash preimage resistance using neural networks (as attempted in the question) would be training with
- as inputs, the expression as CNF of the SAT problem of finding a hash preimage for a hash function similar to the target hash to be attacked, and a hash value;
- as desired output, an input of the hash function yielding that hash value, that is a solution to the SAT problem.
It's computationally easy to automatically generate (input, desired output) pairs for random variations of the target hash function, and random input to that hash, with the hash value computed from that. I conjecture that with appropriate neural network techniques, enough training and simple/weak enough target hash function, the network would be able to solve a sizable portion of solvable preimage problems for the target hash.
Note: In neural networks, it's useful to restrict training data to things similar to the problem to be solved. Here: if we want to compute preimage for a particular target hash value, we should filter the training data to have a hash value differing by as few bits as possible from the particular target hash value.
I think that's unlikely to beat state-of-the-art SAT solvers fed a straight expression of the problem as CNF, which itself fails miserably with even fair hashes. So don't call me optimistic.