15

When I calculate entropy for the xkcd Password Strength (comic 936) I don't get nearly the amount of entropy stated in the comic.

So why doesn't the the first password "Tr0ub4dor&3" have an entropy of around 50 bits? And why doesn't the passphrase sentence "correcthorsebatterystaple" represent over 100 bits of entropy?

Maarten Bodewes
  • 96,351
  • 14
  • 169
  • 323
Blafasel
  • 163
  • 1
  • 5

2 Answers2

23

I don't get nearly the amount of entropy stated in the comic.

Interestingly enough the reasoning for the entropy rating are actually justified in the comic by the little boxes which each represent 1 bit of uncertainty.

This means for Tr0ub4dor&3

  • It's estimatated that the word itself "Troubador" comes up in dictionaries which contain about $2^{16}$ words
  • It adds one bit for each of o,a,o of the word to encode whether the letter was replaced or not
  • It adds one bit to decide whether the word was capitalized or not
  • It adds one bit for the ordering of the trailing numeral and special character
  • It adds 3 bits for the unknown numeral, approximating $10$ with $2^3$ instead of $2^4$ which is more accurate
  • It adds 4 bits for the unknown punctuation, ie which of the approximately 16 standard ones it is

This sums up to $16+3+1+1+3+4=28$

For correct horse battery staple the reasoning is that each of the four words is drawn from a dictionary of size $2^{11}$ which means $4\times 11=44$ bits of entropy.

In both cases it can be assumed that the attacker knows the possible choices influencing the entropy estimation and that it's actually a uniformly random decision which word / pick is done.


If you want an even more thorough explanation of this comic, I can only recommend you read the bear's answer on this over on InfoSec.SE.

SEJPM
  • 46,697
  • 9
  • 103
  • 214
2

One official way to estimate the strength of a user selected password such as "Tr0ub4dor&3" is to look at NIST recommendations. Granted that this is now deprecated, but the relevant publication was NIST Special Publication 800-63 Version 1.0.2, Electronic Authentication Guideline.

Table A.1 (reproduced below in case of link rot):-

Table A.1

The reasoning behind this table is within the document at $\S$ A.2.1 Guessing Entropy Estimate. NIST therefor estimates that the entropy is 33 bits if we interpolate for 11 characters and use dictionary and composition rules.

The difficulty of assessing the entropy of short sequences, particularly human produced ones is the take away from this question. The two current answers diverge in strength by a factor of 32. If we compare NIST's estimate to Blafasel's original query on 50 bits, the entropy diverges 131,072 times. NIST says of the above, "Readers are cautioned against interpreting the following rules as anything more than a very rough rule of thumb method". True.

Another take away is that very few sites will allow the stronger and easier to remember technique of choice from a word list, such as "correcthorsebatterystaple". The on-line version of the UK government doesn't, no bank I'm aware of does, and stackexchange.com doesn't.

Paul Uszak
  • 15,905
  • 2
  • 32
  • 83