0

How do I calculate the entropy of a password selected as described?

Choose 4 distinct words randomly from a list of 2000 words. Words can contain special-character substitutions. For example, the following substitutions may be used:

 Sub = {a; 0; i; e; /a; 8}
 Letter a -> @; Letter o -> 0; Letter i -> {1; !} Letter e -> 9;
 Letter a -> 6; Letter 8 -> &

Assume, uniform selection of alternatives:

ex: i is mapped to {i; 1; !} with the same probability.

Assume 90% of words have 1 of the letters in Sub, and 50% of them have 2 letters in Sub.

I know that since we're selecting 4 words from a list of 2000, then entropy in general must be 2000x2000x2000x2000, which expressed in bits would be around 44bits, since each word contributes about 11 bits (2^11 = 2000). Now I don't know how to attribute the fact that these words may contain special symbols. Please help. Thanks

1 Answers1

1

If you have a rule such as "substitute 5 random letters with a fixed symbol" a direct calculation might be possible if you know average word length.

If you can't enumerate all the symbol substitutions into each word via a rule, it will be difficult to calculate directly.

A reasonable way estimate entropy when you can't calculate it is to use data compression. Write a script to output 1 million passwords generated using your method to a text file. Then compress with a high compression, slow algorithm such as LZMA on maximum settings. Perhaps use the 7-zip utility and try all of its available algorithms, picking the smallest result. The size of the resulting file in bits divided by the number of passwords in it should be pretty close to the entropy per password in bits.

rmalayter
  • 2,297
  • 17
  • 24