2

I am reading about mono-alphabetic ciphers, which are prone to letter frequency analysis. To counter this, we can provide multiple substitutions, known has homophones for a single letter e.g. e could be assigned 17, 74, 35, 21. Each homophone could be assigned in rotation or randomly.

This is where I am confused:

Even with homophones, each element of plaintext affects only one element of ciphertext, and multiple-letter patterns (e.g. diagram frequencies) still survive in the ciphertext, making cryptanalysis relatively straight forward.

How do multiple-letter patterns survive? For example if for h we have homophones 17, 74, 35, 21 and for t we have 11, 69, 27, 24, th could be 11 17 or 11 74 or 11 35 or 69 21 etc.

ishaq
  • 143
  • 5

1 Answers1

4

For starters, you don't have 4 or 5 codes for every letter. The basic idea is, that each symbol in your alphabet for the ciphertext has the same frequency. Most letters will still have only 1 or 2 symbols, and just the more common ones will have 3 or more.

So the frequencies of single symbols is roughly uniform. Let's take your example of a th, which is the most common bigram in english language (the frequencies are just examples, other sources might have different ones)

  • $t$ has frequency $0.0894$
  • $h$ has a frequency $0.0496$
  • $th$ has a frequency of $0.0271$

Now let's imagine we have a total of 100 symbols in the ciphertext, where we have 9 symbols for the letter $t$ and 5 for the letter $h$, so that each symbol has roughly the frequency $1/100$.

Now there are $9\cdot5 = 45$ possible combinations for $th$. With 100 symbols in the alphabet, the number of possible bigrams is $10000$ and the frequency should be 0.0001 if it is uniform. We know that $th$ has a frequency of $0.0271$, spreding that uniformly over 45 combinations and we get a frequency of $0.0271 / 45 \approx 0.0006$.

So, even if we have the monogram probabilities almost at $1/100$ for every symbol, then those $45$ bigrams for $th$ have a frequency that is roughly 6-times as high as the average bigram should be. In comparison: The monogram $t$ has just a frequency, which is not even 2.5 times higher than $1/26 \approx 0.0385$.

tylo
  • 12,864
  • 26
  • 40