10

I came across a question stating:

We have a message consisting of 10,000 characters. After computing its message digest using SHA-1, we decide to change the last 19 characters in the original message. How many bits in the digest will change if it is recomputed, and why?

Until and unless this is an explicit attack on the hashing scheme, the new hash would be different than the original one (sort of exhibiting an avalanche effect).

But I am unable to understand how does small changes in the input text affect their corresponding hashes. Like does there exist a relationship on how many bits of the hash will be affected, if a certain change is done to the input text?

P.S.: I am not a 100% sure, but I believe I have read it somewhere that a hashing algorithm should change the bits of hash by $X$% if a new character/bit is added/substituted to the previous input text.

U. Windl
  • 239
  • 3
  • 11
Vasu Deo.S
  • 469
  • 5
  • 16

3 Answers3

26

For any one of the SHA hashes, the hash should be indistinguishable from pseudo-random. That means each and every bit flips with a chance of 50%. So on average half of the amount of bits gets flipped, as long as the input message doesn't repeat (because that will match 100% with the hash of the identical message, of course). It doesn't matter how many input bits are removed, added or altered, this is always the case as long as the input message isn't identical to the previous one.

Hashes are likely well distributed, so the likely number of bits that get flipped is a bell curve like you would get by throwing dice. Same thing for the number of zero bits - or one bits of course. So you cannot give an absolute number as an answer to this question. The result is not a function on $x$ if $x$ is the number of bits flipped in the input - as long as $x$ is larger than zero.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
10

how many bits in the resultant hash will change, if the x bits are changed in its the original input

50% on average, regardless of how many bits are changed.


SHA-1, like all cryptographic hash functions, attempts to model a pseudorandom function according to the random oracle model.* This means that any change to the input will result in, on average, 50% of the output bits changing. Another way to put it is that each bit has exactly a 50% chance of toggling when the input changes. It doesn't matter what the input is and whether it differs by a single bit or nineteen bits.

In the random oracle model, every output bit is completely independent of every other bit. Because we don't have access to a "real" random oracle, we can only approximate it using mathematical functions. While this does mean that the output bits are not independent, they appear to be, and any distinguisher would constitute a cryptanalytic attack against the core hash function, which would be pretty big news.

* It doesn't model it perfectly, as evidenced by both the length extension attack and extant collision attacks, but that doesn't matter for the sake of your question. Despite its weaknesses, it still exhibits the avalanche effect extremely well.

forest
  • 15,626
  • 2
  • 49
  • 103
6

Actually, the number of changed bits is approximated by using a formula. It's about 80 with a 6% probability. As you've guessed, the hash exhibits an avalanche effect. On average each output bit will flip with a probability $P =0.5$ if an input bit flips. And the output bits are independent of each other (as far as we can tell). Thus one input flip or multiple input flips has the same effect. That generates a classic Binomial distribution defined as $Bin(160, 0.5)$, resulting in $\mu = \frac{160}{2}, \sigma = \frac{\sqrt{160}}{2}$.

And it will look like this:-

dist

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83