Is SHA-256 bijective on a certain domain?

Question

Following the discussion on bijectivity and given that function inverse is studied on domain/codomain, I would like to ask whether we can find subsets where this function can be bijective.

Let's say the subset of all alphanumeric strings of exactly 5 characters. The cardinal of this subset should be Arangements of 62 characters in a line of 5 characters = 62! / 57! ~ 776 mln . This is definetely lower than 2^256 . Is this enough to say the function is injective on this domain or do I need to calculate every single value to make this conclusion? What happens to surjectivity in this case?

OBS: I might make confusion with continuous functions so do let me know if I have mistaken something.

Paul Uszak · Accepted Answer · 2021-04-01T22:36:20.623

Of course it is. If you tightly restrict the input domain, then the problem is simple. The function is deterministic, so just inject clusters of random bits within the input domain of interest (domain $A$). You then select unique hashes (co-domain $O$) and discard the colliding input/output pairs to create input sub-domain $B \in A$. You will have nullified collisions and will have a bijection as $B \to O$.

Note: We haven't seen collisions on SHA-256 output domains yet, but the above theory holds. And using this brute force approach, the co-domains become more biased towards a bijection as the input domain deceases in cardinality, as $p(\text{bijection}) \propto \frac{1}{|A|}$ through simple computability.

But I'm having a hard time understanding a cryptographic use for such strange domains. I'm unconvinced that sha256inv would actually exist at all as restricting inputs is kinda cheating. And they still only analytically compute one way as $\text{sha256}:B \to O$ which is due to fundamental pre-image resistance. $\text{sha256inv}: O \to B$ remains elusive. And general $\text{sha256inv}: O \to A$ must remain impossible as you've deliberately eliminated collisions which we know mathematically exist.

P.S. $|A|= 916 \times 10^6$, if you consider 5 no. 62 alphanumeric values chosen by total randomness. That's easily computable on an enthusiast's machine.

P.P.S. My last para refers to your comments.

score 0 · Answer 2 · answered Apr 01 '21 at 20:43

As you know, cryptographic hash functions generally produce a fixed-size output while accepting inputs of much larger size. As such, collisions necessarily exist but finding them is prohibitively difficult.

As for the question of: given a particular set of inputs, is there an easy way to determine if there are two elements that have an identical hash ("easy" here meaning substantially less work than computing all of the hashes). The answer has to be no. If that were an easy question to answer, you could use it in a fast algorithm to easily find a collision. The sketch is: increase the range $0..2^k$ until there is a collision. Now run a binary search from each end to find the smallest range that contains a collision. The values you are looking for are the endpoints. You've now broken the hash with O(n) work, where n is the hash length.

Also, your 5 alphanumeric characters calculation should be $62^5$.

Is SHA-256 bijective on a certain domain?

2 Answers2