3

I have read this post and this post about how best to calculate a pseudo accurate entropy in bits of a password.

I can do this fine for passwords of a uniform nature, such as 8 letters of an a-z range is:

26(a-z)^8(length) = 208,827,064,576 

Then to bits:

log2 (208,827,064,576) = 37.6035177451 = 38, 

So it's guassable entropy S is 38 - 1 = 37 bits.

My problem:

Now, I have a password made up of the following structure:

A-z, 0-9, a-z, 0-9 .... to ten characters long (5 of each type).

Example:

A9B8C7D6E5

I figure the entropy value of that would be:

(b): 26 * 10 * 26 * 10 * 26 * 10 * 26 * 10 * 26 * 10 = 1,188,137,600,000

I had tried other variations for shorter syntax such as (26 * 10)^5 but these figures didn't seem to corellate.

Anyhow the above gives entropy bits of:

   log2(1.1881376e+12) = 40.1118390651

So S is 40. This strikes me as being only a tiny bit larger than the first example, and I think I'm working this correctly, and I am aware the numeric characters do severely limit the potential entropy of the whole password.

  • Is my caculation and conclusion (S~40) correct?

  • If so; is there a more efficient way of working out equation (b)?

Thanks.

Martin
  • 135
  • 7

2 Answers2

4

Is my caculation and conclusion (S~40) correct?

Yes, but it's important to state the premises behind these calculations and not relegate them to a footnote. Namely, that each character in the password is selected uniformly at random, independently of every other one. We routinely see people talk about password "entropy" in contexts where passwords are evidently not being selected randomly. And in such contexts your calculation doesn't tell you anything about the password's security.

Keep in mind also that overwhelmingly most people don't select passwords randomly. These numbers apply in circumstances that are, in real life, exceptional.

If so; is there a more efficient way of working out equation (b)?

Yes, by using the laws of logarithms, which allow you to work it out by adding small numbers instead of multiplying huge ones:

$$ \log(n \times m) = \log(n) + \log(m) $$

This is why entropy is normally expressed in bits—the base 2 logarithm of the number of equiprobable alternatives—because it just makes the math much easier. In this case:

  • The uniform random choice of one character out of a set of 26 is $\log_2(26) \approx 4.7$ bits.
  • The uniform random choice of one character out of a set of 10 is $\log_2(10) \approx 3.3$ bits.
  • The entropy of a sequence of independent random choices is the sum of their individual entropies.

Therefore, a randomly generated eight letter password has entropy of about $8 \times 4.7 = 37.6$ bits, and one randomly generated according to your latter schema has approximately $5 \times 4.7 + 5 \times 3.3 = 5 \times 8 = 40$ bits of entropy.


Addressing this comment:

@CodesInChaos your comment seems to imply that the order of certain character sets doesn't seem to matter when calculating entropy. I was wary of making this assumption myself when facing potential solutions to this problem.

If the order is predetermined then it doesn't matter, because then there's no choice about the order, therefore no uncertainty. Another way of looking at this is that there's a bijective function between the set of passwords that look like ABCDE12345 and A1B2C3D4E5, so therefore the uniform random choice of an element out of one set must have the same entropy as from the other. (The entropy is the base 2 logarithm of the number of alternatives in the uniform case, and in both cases the total number of alternatives is the same.)

Luis Casillas
  • 14,703
  • 2
  • 33
  • 53
1

In the initial exposition (8 letters), there is no reason to round entropy to the next integer; fractional entropy is fine, including in bits. We can compute it as $$\begin{align} S&=\log_2(26^8)\\ &=8\log_2(26)\\ &\approx37.6\text{ bit} \end{align}$$

I had never met "guessable entropy"; but yes, subtracting one from the entropy gives the $\log_2$ of the expected number of guesses before finding the right one.


For the problem (5 groups of one letter and one digit), we can write $$\begin{align} S&=\log_2(26\times10\times26\times10\times26\times10\times26\times10\times26\times10)\\ &=\log_2((26\times10)^5)\\ &=5\log_2(260)\\ &\approx40.1\text{ bit}\end{align}$$ thus yes, the question's calculation is essentially OK (if not maximally simple, and briefly erring in but these figures didn't seem to correlate).

As illustrated, computations are easier using the properties of the logarithm that $$\begin{align} \forall(b,x,y)\in\mathbb R^3&\text{ with }b>0\text{ and }x>0\text{ and }y>0,\;&\log_b(x\,y)&=\log_b(x)+\log_b(y)\\ \forall(b,x,y)\in\mathbb R^3&\text{ with }b>0\text{ and }x>0,\;&\log_b(x^y)&=y\log_b(x) \end{align}$$

Using these properties, we can validly say that each letter accounts for $\log_2(26)\approx4.70\text{ bit}$, and each digit accounts for $\log_2(10)\approx3.32\text{ bit}$.


Edit: that excellent other answer was first, and additionally points out that it is essential for the calculations made that each letter or digit is chosen uniformly and independently of others.

fgrieu
  • 149,326
  • 13
  • 324
  • 622