4

This is a classical scenario: For our web based authentication platform we have a username and a password field. User enters these values and presses Log In. The browser gathers the user data, calculates a digest of the password and sends everything to the server side. We don't want to send the password over the network, it might not be safe.

On the server side we have the database which stores the hashes for passwords in a table (among other stuff).

Initially we were using MD5 for this purpose but after a while we threw out the idea as being the most over hacked message digest with large available rainbow tables, so once someone captures the packet they easily can guess the password (if it's a weak one).

Now we have switched to SHA1 for calculating the digest... There remains only one problem. The initial database layout was to use 32 characters for the message digest of the password (the length of the MD5 hex string) ... however the SHA1 is 40 characters long (when transformed to hex bytes). The bigger problem is that we don't want to increase the length of the message digest table's column (nor the length of the used message digest itself) since the digest value is used at some stages in a few algorithms and it relies on the fact that it has 32 characters.

So someone from the architecture team came up with the idea: Let's add an extra validation step to the login procedure and ask a 4 digit long pin from the user. And when we have this PIN we simply remove two times the characters found at the given PIN position from the SHA1 digest of the password (from the beginning and from the end), and this way we end up with a 32 character long digest, which cannot be traced back to the password since it hasn't got all the required information.

There is just only one thing which bothers us: what are the theoretical chances that two different passwords which give two different hashes after applying two different character removal algorithms (based on two different PINs) will give the same result.

And since we have no cryptanalyst nor cryptographer in our team we have to turn to the community :) Thank you for helping.

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
Ferenc Deak
  • 143
  • 3

2 Answers2

12

Short answer: don't. Use a password hash like PBKDF2, scrypt or bcrypt.

Also, if at all possible, use a library that takes care of the low level stuff like password database for you. E.g. passlib might work if you use Python. I'm sorry if that sounds blunt, but that's how it is.

To answer your actual questions:

There is just only one thing which bothers us: what are the theoretical chances that two different passwords which give two different hashes after applying two different character removal algorithms (based on two different PINs) will give the same result.

The chances that two different password give the same final hashes are the same regardless of whether the PINs are different or not. The chance is $2^{-n}$ for an $n$-bit final hash. I.e. $2^{-128}$ for a 128-bit hash.

(To see why the PINs don't matter, consider the output hash as a random number. That's the best you can do. Different PINs would only matter if the password hashes are equal.)

You can instead just truncate the hash, including the PIN in the string to be hashed together with the password. Also, if you went this way, you might as well use SHA-256 which is believed secure, instead of SHA-1 which has theoretical attacks that truncation might make practical.


Again, you should not use just a normal hash to protect passwords. Further, your idea of hashing the password in the client is insecure:

The browser gathers the user data, calculates a digest of the password and sends everything to the server side. We don't want to send the password over the network, it might not be safe.

This is a good instinct, but it doesn't work as you described it. Anyone who steals your password database will be able to use a hash to log in if you don't hash on the server side. (Your site/app may ask for a password, but the attacker doesn't have to use it.)

You can hash both on the client and the server, although that only protects the plaintext password in case the encryption is broken. That's nice for users who reuse passwords, but will not protect their accounts on your system any more than just hashing on the server.

(Unless the connection where you send the plaintext password over is encrypted, an eavesdropper can always just get the password/hash and log in!)


One more additional suggestion: if you really are limited to 32 characters in the password database, consider using base 64 encoding for the hashes if at all possible. That allows you to store 192 bits of the password hash. Binary would be even better, allowing the full 256 bits of scrypt or PBKDF2 with SHA-256.

otus
  • 32,462
  • 5
  • 75
  • 167
2

Sending a hashed password adds no value. Consider this, what the client sends after the hash is what the server considers as a password. Additionally; you are leaking an implementation detail on the client. SSL/TLS already ensure channel security; and then using a slow password hashing algorithm has already been proven to work. Precomputing the hash does not prevent an attacker from carrying out a replay attack. The advantage of using BCrypt or SCrypt on the server side on the other hand allows you to increase the complexity for new passwords without breaking backward compatibility with existing passwords.

Bugsta
  • 21
  • 2