3

I think I've understood properly, but I want to make sure as this will involve money.

  1. Password requirement is a minimum of 16 characters and must contain one of each [Upper, Lower, Digit, Other]
  2. My salt is Crypto RNG generated 64 bytes
  3. I convert salt to Base64 (without the trailing ==)
  4. I then get the UTF8 bytes of concatenating (SaltString + Password)
  5. I then SHA512 those bytes in a loop 100k times
private static byte[] SaltPassphrase(byte[] salt, string password)
{
    var sha512 = SHA512.Create();
    // Salt as Base64 string (without the == at the end) followed by password
    string input = $"{Convert.ToBase64String(salt)[0..^2]}{password}";
    byte[] bytes = sha512.ComputeHash(Encoding.UTF8.GetBytes(input));
    for (int i = 0; i < 100_000; i++)
        bytes = sha512.ComputeHash(bytes);
    return bytes;
}

EDIT Thanks everyone for your detailed explanations and useful comments!

Peter Morris
  • 149
  • 1
  • 6

4 Answers4

33

Except for the iteration amount ( we can argue that is small, too and some entropy loss *), the answer is no!

  • There is no memory hardness that mainly prevents massive ASIC/GPU searches
  • There are no multiple threads that mainly cripple the massive CPU searches.

There are already password hashing algorithms other than the cryptographic hashes like the SHAx series. Some important are Scrypt (2009), PBKDF2 (2000), Argon2 (2015), and Balloon Hashing (2016). Argon2 was the winner of the password hashing competition held in 2015. Now, Balloon Hashing can fight with Argon2. Use Balloon Hashing or Argon2 for secure password hashing.

There are already libraries for each that you only need to parametrize like for Argon2 instead of writing the details yourself.

A good Argon2 library

Here are the parameters;

Usage:  ./argon2 [-h] salt [-i|-d|-id] [-t iterations] [-m memory] [-p parallelism] [-l hash length] [-e|-r] [-v (10|13)]
        Password is read from stdin
Parameters:
        salt            The salt to use, at least 8 characters
        -i              Use Argon2i (this is the default)
        -d              Use Argon2d instead of Argon2i
        -id             Use Argon2id instead of Argon2i
        -t N            Sets the number of iterations to N (default = 3)
        -m N            Sets the memory usage of 2^N KiB (default 12)
        -p N            Sets parallelism to N threads (default 1)
        -l N            Sets hash output length to N bytes (default 32)
        -e              Output only encoded hash
        -r              Output only the raw bytes of the hash
        -v (10|13)      Argon2 version (defaults to the most recent version, currently 13)
    -h              Print argon2 usage

I might ask what is $i,d,$ and $id$ in Argon2. The answer is from draft-irtf-cfrg-argon2-03;

If you do not know the difference between them or you consider side-channel attacks as viable threats, choose Argon2id.

  • Argon2d is faster and uses data-depending memory access. Data dependency immediately enables side-channel. This is suitable for cryptocurrencies and applications with no threats from side-channel attacks.

  • Argon2i uses data-independent memory access and this is preferred for password hashing and password-based key derivations.

  • Argon2id In the first half of the first iteration works as Argon2i and the rest works as Argon2d. This enables both side-channel protection and time-memory trade-off.

Balloon Hashing

Now, It is in NIST Special Publication 800-63B and it needs some time for adaptation. There are some libraries ready to use.

The main properties are;

  • The memory-hardness has been proven
  • It is designed on the standard primitives: it can use any of the non-memory hard cryptographic hash functions like SHA2, SHA3, and BLAKE2.
  • It has built-in resistance against side-channel attacks that requires that the memory access pattern is independent of the bits of the data to be hashed.

Little about passwords

For the user's passwords, you should educate them to use dicewire based password generation. If they are not using them, then restricting them can be helpful. Note that, dicewire based passwords don't need upper, lower, digit, etc restrictions. So be careful about restricting users to some rules.

Little about password managers

Besides, almost everybody has a smartphone. One can use password managers to store passwords that are almost random passwords generated by the password managers and stored. Let users know about this, too. Some of them are free like LastPass, 1Password, and KeePass.


Answer to comment:

I can't really use an algorithm that uses lots of memory (if I understood Memory Hardness correctly) because it's a server. Is the approach correct, is SHA512 suitable, and are the iterations suitable considering password length?

SHA-512 has very little memory usage on the passwords, only needs to store 1024 bits to produce a password hash. Argon2's default memory parameter is $2^{12}$KiB = 4MiB. If we consider that a system can run any amount of threads then the power to brute-force an SHA-512 based algorithm is reduced by 32768-times. So, even a little memory usage strengthens the password hashing complexity by $2^{15}$. 4MiB is totally fine for servers and one can adjust the memory according to login access per second or better buy some more memory. Remember, password hashing algorithms left the memory to the system. The bottleneck to consider is the number of login requests per second.


‡) OpenSSL speed result to show that SHA-512 100K is small.

Doing sha512 for 3s on 64 size blocks: 10896142 sha512's in 3.00s
Doing sha512 for 3s on 256 size blocks: 4673961 sha512's in 2.99s

*) How much entropy do we lose entropy per iteration of a cryptographic hash function

Squeamish Ossifrage wrote a great answer about this.. The core of the result is

Rather, there's a good chance that for any fixed function F, there are many cycles on which F is a permutation, and restricted to which F thus preserves entropy

U. Windl
  • 239
  • 3
  • 11
kelalaka
  • 49,797
  • 12
  • 123
  • 211
10

Beyond @kelalka's excellent answer it is worth pointing out key differences between your implementation and more standard iterative hashing approaches like PBKDF2. Though PBKDF2 isn't memory hard, like some newer password hashing algorithms I do consider it (or bcrypt) sufficient for most purposes. It is perhaps most similar to your proposed solution so it will be easier to understand. https://en.wikipedia.org/wiki/PBKDF2

A key change is that it hashes the password in repeatedly with the previous results. This prevents convergence of the hash. The iteration count is presumably small compared to the space of the hash function so this may not be significant, but if the effective space ends up smaller than we think it could be. When you hash the values repeatedly each iteration reduces your potential output space by a tiny amount due to collisions. When iterating over and over again you end up with less and less different values.

Also password complexity requirements are considered harmful. Consider a modern password policy: https://auth0.com/blog/dont-pass-on-the-new-nist-password-guidelines/

Meir Maor
  • 12,053
  • 1
  • 24
  • 55
9

It's a bad idea to design your own password hash.

Making cryptographic functions is still very much a process of trial and error. Sure, over time we have stumbled on some things that are a good idea or even mandatory. And we certainly have learned a lot of things that don't work. But we don't know how to make things that are proven secure all the way. A very subtle small mistake can cause a cryptographic construct to fail catastrophically. And there are a lot of opportunities for such mistakes. I'll name some later.

The best we can do right now is that people with experience in this make a crypto thing, and then everyone tries and break that crypto thing. If everyone tries enough and doesn't succeed at that for a long time then we have something that's probably good enough. We still can't prove the absence of problems, but if thousands of cryptanalysists couldn't, then hopefully the next thousand also can't.

But even this fails from time to time. Sometimes after years, decades even, a critical problem is found in something that was scrutinized by the smartest people for ages. That is why MD5 is no longer considered OK, and why RC4 is considered problematic. Or the landscape changes, leading e.g. to the need for salts in password hashes, and intentionally slow hash functions.

By using the best known current cryptographic algorithms, we can leverage this collective experience, and probably avoid repeating past mistakes. New mistakes may be discovered, but that can't be helped. Someone who builds their own crypto functions though, is likely to repeat some of those mistakes, or create entirely new ones.

I know, I know, it's fun to design that stuff yourself. And it feels good to build something that feels secure. Stack Exchange is littered with questions about custom encryption and (particularly) password hashes. Problem is, it's easy to build something that you yourself cannot break. But that doesn't mean others can't. This adage is ancient, and comes up often enough that it's now known as Schneier's law.

What can go wrong?

So, there can be problems with the very core of a crypto algorithm. This is what broke MD5 and RC4.

Or the algorithm can be completely fine, but an implementation can be broken. This can be a simple bug, or it can be something more subtle like a side-channel attack (which are surprisingly tricky to avoid sometimes).

But it's also possible (easy, even) for secure cryptographic primitives with bug-free implementations to be used incorrectly, rendering the system insecure. For example, correctly using encryption and digital signatures, but making naive assumptions during a communication handshake (SSL had its share of issues with this). Another well-known example is AES-ECB, which uses the perfectly fine AES in a way that can leak significant information.

In your hash function?

Your own password hashing function also falls prey to something like this here:

repeat 100_000:
    hash = sha512(hash)

SHA-512 is perfectly fine (as far as we know). Using it like this is frowned upon.

SHA-512 generates an "indistinguishable from random" 512-bit output for every input. It may generate the same output for two distinct inputs, this is called a collision. The problem is that N-bit hash functions on N-bit inputs are generally not bijective. That means that when feeding a SHA-512 output to itself, some of the possible 512-bit values will collide with others, producing the same output. By necessity, this means also that some other hash values cannot be generated at all.

Let's look at how this works for SHA-512, truncated to 4 bits. Specifically, we take the first byte from the output, and zero out the high 4 bits in this byte. The resulting single byte is used as input to the next round. Different ways of selecting the bits gives different results, but the principle is the same.

Trying all 16 possible inputs for several rounds gives us these chains:

in      1.      2.      3.      4.      5.      6.      7.
00  ->  08  ->  06  ->  08  ->  06  ->  08  ->  06  ->  08
06  ->  08  /
07  ->  06  ->  08  ->  06  ->  08  ->  06  ->  08  ->  06
08  ->  06  /   
03  ->  04  ->  05  \\
13  ->  15  ->  05  ->  09  ->  02  ->  10  ->  14  ->  14
04  ->  05  ->  09  ->  02  ->  10  ->  14  ->  14  /
15  ->  05  ->  09  ->  02  ->  10  ->  14  /
05  ->  09  ->  02  ->  10  ->  14  ->  14  
01  ->  11  ->  02  ->  10  ->  14  /
09  ->  02  ->  10  ->  14  ->  14  
11  ->  02  ->  10  ->  14  /
02  ->  10  ->  14  ->  14  
12  ->  10  /       /
10  ->  14  ->  14   
14  ->  14  /

After just 6 rounds, the 16 possible outputs have been reduced to just 3. The same effect can be seen for larger truncations. I ran some simulations for SHA-512 truncated to the first 1 byte (2⁸), 2 bytes (2¹⁶), and 3 bytes (2²⁴):

                bytes:  1    2      3
          input space:  256  65536  16777216
converged to M values:  13   330    2765
       after N rounds:  31   518    7114

You can see that the convergence effect is present in all three scenarios. As for uncut SHA-512, my computer isn't fast enough to compute this, I couldn't find any information on the strength and certainty of the effect, and I'm not smart enough to construct a proof (if a proof is even possible). But odds are you'll see the effect in full SHA-512 too. My instinct suspects that it reduces the hash space by its square root (halving the bit length), but I could be very wrong [update: see links provided in the comments: one, two, three]. 100k rounds probably limits the damage, too.

Now, does this really harm your hash? Probably not. But it is a weakness, one that real-world password hash functions take care to avoid (e.g. by ensuring the password and salt are mixed back in regularly).

I think this is a great example of the subtle traps in crypto design. It's so easy to miss something like this. That's why I recommend using a battle-tested algorithm. They're better than anything you or I can come up with.

The other answers already make some recommendations for what hashes to use; Okay: pbkdf2, bcrypt. Better: argon2, scrypt. I hear Balloon is a thing, but I don't know anything about it so I have no opinion on it.

marcelm
  • 596
  • 3
  • 7
1

It think the answer is: "Secure" is relative. The 512 bits of salt (8*64 bits) alone gives roughly 6*10^57 possibilities. Transforming the salt (via BASE64) does not change its strength.

Then you have 992 bits (62*16 bits) that make the password (26*2 letters plus 10 digits), or roughly 4*10^28 possibilities. Assuming uniform probability (rather unlikely if chosen by a human) you'll have about 1504 input bits to produce 512 output bits, up to (has conflicts are likely) 5*10^452 possibilities. That's a huge number.

The numbers assume that the salt cannot be "factored out" from the hash, so the salt extends the search range.

Repeating the hash will only slow down building the table, not looking up the result (the resulting table will be at most 2^512 entries large).

U. Windl
  • 239
  • 3
  • 11