4

I was given a ciphertext and now I am trying to break it via looking for the keyword. This is a keyword cipher. So:

PlainEnglish: ABCDEFGHIJKLMNOPQRSTUVWXYZ

If keyword is HELLO,

CipherLetters: HELOABCDFGIJKMNPQRSTUVWXYZ

So, I had done a little frequency analysis where I've managed to get the percentage of most frequent letters used and I've made a table of letters that I've arranged in a descending manner and compared it to the normal expected English letter frequencies.

Well I could carry on and try and match the text one by one, but I read that there is a possibility that a keyword can be found as well. I'm looking for a little idea on how I can start off with this task.

English on the left and cipher letters on the right:

e   12.702      o   10.772
t   9.056       k   10.611
a   8.167       t   9.003
o   7.507       d   8.521
i   6.966       n   8.039
n   6.749       j   6.913
s   6.327       q   6.431
h   6.094       r   4.984
r   5.987       c   4.823
d   4.253       h   3.939
l   4.025       l   3.617
c   2.782       s   3.617
u   2.758       u   3.296
m   2.406       i   2.894
w   2.36        w   2.492
f   2.228       a   2.412
g   2.015       b   1.849
y   1.974       m   1.688
p   1.929       y   1.367
b   1.492       v   1.125
v   0.978       e   0.884
k   0.772       g   0.241
j   0.153       p   0.161
x   0.15        z   0.161
q   0.095       f   0.08
z   0.074       x   0.08

So I arranged it back according to normal ABCDEFs. Just trying to make sense out of it more. I sort of have a clue. I thought the keyword might be TVSHOW, but it can't be, as I tried deciphering using it already, and W is like way down in the chain. Any ideas on how I can solve the keyword?

abcdefghijklmnopqrstuvwxyz
tvshoabrnpglijdyfcqkuewzmx
Ilmari Karonen
  • 46,700
  • 5
  • 112
  • 189
nfnmy
  • 43
  • 1
  • 1
  • 3

2 Answers2

10

Let's start by considering which cipher letters should correspond to the most common letters E and T. According to your frequency analysis, the most likely candidates are O, K, T and maybe D and N.

Now, E is the fifth letter of the alphabet, so unless your keyword is very short, it's going to encrypt to some letter in the keyword (and if the keyword is very short, E would likely encrypt to A or B, which doesn't seem likely in this case). So let's ignore E for now.

However, T is the twentieth letter in the alphabet, so unless your keyword has twenty or more (unique!) letters, T will encrypt to one of the "left-over" letters. Thus, the encryption of T will tell you how many letters in the range T to Z the keyword contains.

So let's consider the possibilities. We can rule out T encrypting to K entirely — that would require the keyword to contain nine letters in the range T to Z, but there are only seven letters in that range! The same goes for D, and N, while barely possible, doesn't seem plausible either. Even O, at four positions before T in the alphabet, seems rather unlikely, given that most of the letters after T in the alphabet are fairly uncommon. So my guess would be that T — and thus also every letter after it in the alphabet — encrypts to itself. Looking at your letter frequencies, that seems plausible.

That leaves O and K as plausible encryptions for E; the other one of those then probably stands for A, which would mean that both of them occur in the keyword (and, in particular, that one of them is the first letter in the keyword). Alternatively, K might stand for O, and D or N for A.

What about P, Q, R and S, then? Well, there's no way Q encrypts to itself — the frequency of Q in English pretty much never exceeds that of U. However, Q (nominal freq. 0.095) might well encrypt to P (ciphertext freq. 0.161); it certainly doesn't encrypt to Z, F or X, which are the only less frequent letters in the ciphertext than P. That means that one of Q, R and S must occur in the key; the ciphertext frequency of S (3.617) is far enough from its nominal frequency (6.327) that we might guess that it's in the key, which would give us the following partial cipher table:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
????????????????PQRTUVWXYZ    key letters: S, (O/K)

We might also guess that P (nominal freq. 1.929) encrypts to M (ciphertext freq. 1.688) – it certainly can't encrypt to N (freq. 8.039) or O (freq. 10.772), unless your message is about peppy popping pepper pips or something. Could O (n.f. 7.507) then encrypt to L (c.f. 3.617)? Maybe, but K (c.f. 10.611) seems more likely.

That would mean that E encrypts to O and that L and N also occur in the key. Which plaintext letters could they stand for? Well, looking at the frequency table, N might stand for A and L for C or D. That gives us the following partial cipher table:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
N???O?????????KMPQRTUVWXYZ    key letters: L, S

Presumably, L and S stand for C and D in either order, but it's hard to tell from the frequencies which. What letter could stand for B, then? Well, E looks like a plausible choice — the frequencies are pretty close, and E is common enough to be likely to appear in the keyword. Guessing that L might stand for C, that makes the cipher table look like this:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
NELSO?????????KMPQRTUVWXYZ

Could the keyword be NELSO(N)? If so, A would stand for F, B for G and so on:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
NELSOABCDFGHIJKMPQRTUVWXYZ

Looking at the frequencies, I see no obvious mismatches, so I'm going to guess that that's the solution. Of course, examining the actual ciphertext could either confirm or refute this, but based on the frequencies alone it looks plausible.

Ilmari Karonen
  • 46,700
  • 5
  • 112
  • 189
1

If TVSHOW would be the keyword, you would have this substitution table (written horizontally for space reasons):

ABCDEFGHIJKLMNOPQRSTUVWXYZ
TVSHOWABCDEFGIJKLMNPQRUXYZ

This is not quite what you have in your table.

The frequency distribution only gives you an approximate solution, and will normally not give the exact one for the less often letters. You'll have to use trial-decryption and matching of words, too.

Paŭlo Ebermann
  • 22,946
  • 7
  • 82
  • 119