34

I'm not even sure if they are serious, but I've heard many times that some people refuse to not only trust their computer to generate a random string (which is understandable) but also don't trust themselves to do it. So, instead of simply typing/writing down:

fcC540dfdK45xslLDdfd7dDL92

And then randomly changing a few of those to other ones a few seconds later, such as one in the beginning, one in the middle and one in the end, they use dice which they roll again and again to generate random numbers, which are then treated as "truly random" and thus "truly secure".

Why would a dice roll be "more random" than simply coming up with a sequence in your head, and then changing some of them?

I simply don't believe that this could possibly be "not secure". Why the need to do the very tedious dice rolling? It feels like a ritual that they go through, based not on logic and reason, but on some sort of delusion that their brain is going to generate the same sequence as others who guess it, or a computer, even though they also change some of them after the phrase is "done".

I don't understand this. Is there anything that speaks for this practice as being "more random"/"more secure"?

Oliphaunt
  • 105
  • 2
K. B.
  • 373
  • 1
  • 4
  • 4

8 Answers8

115

In short, it is more than a belief: there is strong evidence that humans are not good entropy sources. There is a test for this

Try to win!

So we don't rely on whether generating a random number from the mind or random keyboard typings and mouse movements that seem like a monkey playing on the computer from outsiders. We rely on good entropy sources like the /dev/urandom. That kind of sources comes from good research.

Some researches on supporting this;


Other online tests;

kelalaka
  • 49,797
  • 12
  • 123
  • 211
31

For me, the fraud-related applications of Benford's Law come to mind. When people make up data they tend to create overly uniform data, even when it's not appropriate. There's a definite psychology going on that may cause people to be less random than they are intending to be (Wikipedia links to a paper claiming humans are in fact bad at this). Or perhaps misconceptions about what randomness "looks like." In any case, knowledge of things like this may generate self-doubt about generating randomness. In fact, the very idea of explicitly changing some of the allegedly random data you just generated may seem error-prone to some, and potentially the root of any problems that could later arise.

Dice, on the other hand, people trust to be random despite any unconscious bias they may be introducing. By following the outcome of dice rolls people can be more certain that there is no "gotcha" that might make their data less random. They had no real input and therefore feel sufficiently removed from the generation of the data.

Perhaps people are different enough that no general analysis could be done to make a case for a reduction in apparent entropy in human-generated random data. But I think this is ultimately a risk assessment -- i.e. are you willing to bet whatever you're protecting with the password on the assumption that your attempt at random data is truly random?

All of that said, I question whether this matters much, provided enough data is being generated. For example, a human-generated 8-character password is probably fairly insecure no matter how good a job they did at making it "random." In contrast, a 32-character password is probably fairly secure if they were trying at all. In either case, the way the password is actually used and/or secured may well matter more to whether their account will ultimately get compromised.

Still, it would be frustrating, even embarrassing, to learn that your carefully generated "random" password was able to be guessed due to its human origin, or because other "random" strings you had previously generated were compromised. Eliminating all possibility of that scenario, no matter how unlikely, is undoubtedly attractive to some, if only for that reason.

thesquaregroot
  • 1,289
  • 14
  • 25
18

Why would a dice rolled be "more random" than simply coming up with a sequence in your head, and then changing some of them?

Humans have too many biases regarding what a random sequence is. If you ask humans to generate a random sequence, they will probably pay attention not to use the same character in a row, i.e., aa or bb, as they think that ab is more random than aa. He or she will also have a bias due to the language used, where some combinations are more frequent than others. Humans easily but wrongly generate values based on what they have generated before, so there is no true independence between values! Also note that many people put semantic on numbers (7, 13, 666, etc), and then avoid some of them! All of this is very well known, and many experiments exist to demonstrate it. You may think that rolling dice is not really random, but alas, as there is no link between each roll, they are truly independent (at least nobody can control dependencies).

I simply don't believe that this could possibly be "not secure". Why the need to do the very tedious dice rolling? It feels like a ritual that they go through, based not on logic and reason, but on some sort of delusion that their brain is going to generate the same sequence as others who guess it, or a computer, even though they also change some of them after the phrase is "done".

Alas, don't believe is not sufficient. There are many scientific results on the subject, and generating a "secure" random sequence is not easy. Even a small bias may be dangerous and exploited. The human mind has a real difficulty understanding/tackling what randomness is.

I don't understand this. Is there anything that speaks for this practice as being "more random"/"more secure"?

There is true random physical randomness-- for example, rolling a dice. Of course, this can't be used in computer systems as the throughput will not be sufficient. Truly random sequences can be generated by radiation decay, but it is not easy to integrate it into a computer. So, modern random sequences are generated by a mix of pseudo-random generators and physical events. Pseudo-random generators are algorithms. Thus they can't produce true randomness but something very close. Then mixing the result with true randomness gives even more security.

15

Randomness is a measurable, statistical property of a set of values. It doesn't mean the same as "hard for a human to guess."

Your sample string is hard for a human to guess, but it isn't very random.

There is a tool called "ent" for most Unix systems that can quantify the randomness, by some measures, of a file.

Available here: https://www.fourmilab.ch/random/

Your string was 27 characters long, all ASCII, and limited to the set of [a-zA-Z0-9] . Let's compare your string to 27 characters from /dev/urandom limited to that same range, using "ent".

Your string: fcC540dfdK23xslLDdfd7dDL92

Here are the results from "ent".

$ ent test1.txt

Entropy = 3.926572 bits per byte.

Optimum compression would reduce the size of this 27 byte file by 50 percent.

Chi-square distribution for 27 samples is 532.41, and randomly would exceed this value less than 0.01 percent of the times.

The arithmetic mean value of data bytes is 77.9259 (127.5 = random).

Monte Carlo value for Pi is 4.000000000 (error 27.32 percent). Serial correlation coefficient is 0.271042 (totally uncorrelated = 0.0).

27 characters from /dev/random: Q9HpOpJrS3yYKlLc71yq003IMR

Here are the results from "ent".

$ ent test2.txt

Entropy = 4.458591 bits per byte.

Optimum compression would reduce the size of this 27 byte file by 44 percent.

Chi-square distribution for 27 samples is 304.85, and randomly would exceed this value 1.76 percent of the times.

The arithmetic mean value of data bytes is 78.8889 (127.5 = random).

Monte Carlo value for Pi is 4.000000000 (error 27.32 percent).

Serial correlation coefficient is -0.024251 (totally uncorrelated = 0.0).


The program was easily able to quantify how much less "random" (in the statistical sense) your string was.

"People believe" we humans are bad at generating randomness because we are.

Nij
  • 103
  • 4
JesseM
  • 472
  • 5
  • 7
14

People are not that bad, but we're slow. See How were one-time pads and keys historically generated? In summary, MB's of 100% secure key material were generated for one time pads by people simply key smashing on type writers. Sufficient to win three world wars. It's just that a human's entropy rate is a little lower than a laser phase based TRNG.

fcC540dfdK23xslLDdfd7dDL92

is pretty much random. But do it again, and again and again. Randomness is a function of sample size, and the more you create by keyboard smashing, the more it becomes susceptible to frequency analysis.

That's not to say that raw irreducible information (entropy) isn't being generated, but it has to be uniformly distributed for use with cryptography. The uniformity aspect is the difficult bit. So try it. Write out 500 kB of 'randomness' and then run it through a program called ent. I can guarantee that your data will fail the test. And yes, comments below correctly highlight the speed issue.

That's not to say your typing wasn't random, but it won't have been random enough. Refer back to my linked answer, and read about randomness extraction which statistically reshapes biased randomness into useful cryptographic entropy.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83
9

Evidence suggests that people asked to generate random data will produce repetition in the data substantially less often than random chance would.

For example, let's assume you were asked to generate random digits (i.e., just 0 through 9).

In purely random data, a sequence like NN (i.e., the same digit twice in a row) happens about 10% of the time. That is, given some arbitrary first digit, there's a one in ten chance that we'll randomly choose the same digit the next time.

But when people are producing (what they want to be) random digits, most people see this as something that's unlikely to happen by random chance, so what they produce will have substantially fewer instances of the same digit twice in a row than random chance would suggest.

Two digit runs are only the tip of the iceberg though. By the same logic, we see that runs of three identical digits should happen around 1% of the time. That is, given some arbitrary digit N, there's a one in ten chance that the next digit we select will also be N, and a one in ten chance that the third time, we'll select N again. 1/10 * 1/10 = 1/100 = 1%.

That continues with longer strings as well--4 digit runs should happen with a frequency of about 0.1%, 5 digit runs with a frequency of about 0.01%, and so on.

Testing indicates, however, that when people are asked to generate random numbers, they'll produce repeated strings like this considerably less often than random chance would. And the longer the string, the worse the disparity between human-generated and randomly-generated strings becomes, to the point that most people simply won't produce a run of the same digit (say) 4 or 5 times in a row, no matter how many random digits you ask them to produce. To most people, the chances of that happening randomly seem so remote that they simply never do it. The same happens with other things that seem like obvious patterns such as "1234" or "3210"--most people won't produce them nearly as often as they would occur by random chance.

Jerry Coffin
  • 1,134
  • 12
  • 16
0

I suppose the problem is not that a human would generate a biased random number. Computers also use biased random sources, but as long as there is entropy in them, they could be hashed into a shorter random enough number. However bad humans are, what humans think of obviously has entropy in it.

The problem is, humans are bad at memorizing true random numbers, and doesn't have an internal hash mechanism (at least there isn't one humans are known to be able to feel and make use of). If they hash mechanically, it would take much time and need to memorize more numbers. Everyone would be lazy and choose to just use a computer. The rest of people who don't feel lazy are the ones who don't know how biased they are and how to make random numbers correctly. What they could get in average is to be expected.

user23013
  • 101
  • 2
0

It's mathematics and psychology. People tend to create patterns that aren't random even when they try not to.

Randomness isn't just any gibberish that doesn't mean anything, it's data NOT HAVING ANY PATTERN. Humans create patterns.