Calculating XOR Key Sizes

Question

I've been playing around with the Matasano crypto challenges (cryptopals.com). I had a couple false-starts on the challenge that has you creating a program to calculate the key size of a XOR encrypted file using Hamming Distance of the bits (Index of Coincidence). After some head banging and pen and paper work on graph paper, I arrived at a working solution.

I've been playing around with my new toy script with different ciphertexts and key lengths and I noticed that for key sizes greater than 12 bytes, it works well (in my limited testing) and will accurately return the correct key size. For shorter key sizes, the most probable key length returned is a multiple of the correct key size. In fact the first few most probable key sizes are multiples of the correct key size and then the correct key size will appear further down in the results but with a higher probability ranking (based on Hamming Distance), than smaller, incorrect values.

Here's some examples that demonstrate what I'm talking about:

key = "eatoin shrdlu"

key.length

13

Let's XOR encrypt our plaintext with the key and get cipherText:

cipherText = XOR-Encrypt -String data -key key

cipherText (line breaks added) 21041a01001d003501010c090745151503021d000401060c4c31040f54240803491d1b191d4c14 070e011b491a4816482421223a2841161a0e4200070017441a140914114f060800050100101914 0941190e0a06491d0d52011f160411111c454e571b1152011a1017181b010c4e57120606174c01 0a41190e020b00161e171615550714134f1d0645531f1d161f01450e1a0a49014653091e084c01 0c0c114f061c00191d01104c14450301010a06001c0e520c1505004115010d4e571b090644181d 004135190c0047161a014404141304541b064e441c48050d181d45170103070b52120a1b080501 1c4110061a0d4c1c1b0716095b

Let's run this through the XOR brute forcer to see what it says about the key size:

XOR-Brutr -MaxKeySize 40 -String cipherText -Encoding base16 | Sort-Object AvgDist

KeySize          AvgDist
-------          -------
13               2.74660633484163
39               2.78461538461538
26               2.81730769230769
33               2.8989898989899
40               2.905
25               2.93
38               2.94210526315789
...

Beautiful! The result correctly calculates that the key size is 13 bytes, but with smaller keys....

key = "eatoin shrdl"

key.length

12

cipherText = XOR-Encrypt -String data -key key

XOR-Brutr -MaxKeySize 40 -String cipherText -Encoding base16 | Sort-Object AvgDist

KeySize          AvgDist
-------          -------
24               2.58333333333333
36               2.66111111111111
38               2.85789473684211
40               2.915
19               2.95215311004785
29               2.97536945812808
22               3.00909090909091
16               3.02232142857143
23               3.03381642512077
18               3.03703703703704
37               3.03783783783784
14               3.05357142857143
30               3.05714285714286
32               3.0625
27               3.06481481481481
12               3.06578947368421
28               3.06632653061224
...

Our top result is not 12, though 12 is the greatest common denominator of the top two results.

If we set the MaxKeySize to a higher value and use a 26 byte key:

key = "eatoin shrdluuldrhs niotae"

ciphertext = XOR-Encrypt -String data -key key

XOR-Brutr -MaxKeySize 104 -String CipherText -Encoding base16 | sort avgdist | select -First 10

KeySize          AvgDist
-------          -------
52               2.59615384615385
78               2.62179487179487
104              2.64423076923077
81               2.66666666666667
26               2.66826923076923
84               2.67857142857143
...

Again, a multiple of the actual key rises to the top.

Question: Why does this pattern emerge? Why does the Hamming Distance favor multiples of the actual key size? Should my script determine the gcd of the first few results and weight the gcd more heavily and return it as the probable key?

score 2 · Accepted Answer · answered Jun 08 '15 at 20:55

First off, the values above that are showing as AvgDist are actually Normalized Average Hamming Distances, not Average Hamming Distances.

I wrote a quick script to do some testing around this issue. Details are here, http://trustedsignal.blogspot.com/2015/06/xord-play-normalized-hamming-distance.html, and I won't rehash it all. Multiples of the actual key size come out as more probable to be the actual key size because of the way normalization is done. When the average Hamming Distance is divided by the key size, that's what normalization is, that larger key size generally results in a smaller Normalized Average Hamming Distance -- see the blog post for an example.

For instance, if the Average Hamming Distance with a XOR key size of 3 is 7 and you divide 7 by 3, you get 2.333. If an Average Hamming Distance with a XOR key size of 9 is 19, when you divide by 9, you get 2.111.

In my testing across an admittedly limited sample set, I ended up using random text samples from 80 different English language texts from Project Gutenberg, the XOR key size with the smallest Normalized Average Hamming Distance was the correct key size in about 40% of the cases, where I calculated Normalized Average Hamming Distance for key sizes up to four times the size of the actual key size.

I also found a relatively simple way to improve the probability of returning the correct key size by calculating the frequency of occurrence of the greatest common denominator for the top n probable key sizes. So you take each of the top n key sizes, I was playing around with using the top five or six and you get the GCD for each pair in that list, if one value occurs more frequently than the others, isn't 1 or 2 and is in the list of the top 5 or 6 values, then that's likely to be the correct key size. Using this method, I was able to get the correct key size more than 90% of the time.

Again, I'm a relative noob when it comes to all things crypto, so it's entirely possible my results are bogus and the result of a flawed testing methodology, flawed programming logic, bad sample size, etc.

It would be great if smart folks would read the post and tell me where I went wrong.

score 1 · Answer 2 · answered May 30 '19 at 11:52

I'd like to address the 'why' in more detail

All thanks to Dave Hull for the blog post above and his investigation and fix. I worked through it too and just want to add some more thoughts as to why this happens in the first place - i.e. why multiples of the true key length usually end up top dog. This might be more appropriate for comment but I don't have the rep.

N.B. I'm also a crypto novice so this may not be the complete picture.

1. NAHDs for multiples of the key length have the same expected value.

The first thing to note is that, if your true key length is $k$, then the expected value of the normalised average Hamming distance (NAHD) for key length $k$ is the same as for key length $2k$, $3k$, etc. This is because in both cases, when you align up the characters for XOR, each character is XORd against another that has been encrypted using the same key.

This explains why NAHDs for multiples of the key length end up near the top - they have the same expected value as for $k$. But it doesn't explain why they usually end up higher than the true key length - see below for that.

Note that there is nothing nefarious about the normalising process. The effect of 'normalising' and 'averaging' is that you divide by the total number of XORs that have taken place. So the expected value of the NAHD is 8 times the probability that a XOR in one of the places evaluates to 1 (the 8 coming from the 8 bits in a byte).

2. NAHDs have more variance for longer keys

So why do the NAHDs for multiples of $k$ usually end up very high on the list, when then NAHD for $k$ can be stuck lower down?

For a start there are several multiples of $k$ kicking around, so one is bound to be higher. For example, in Dave's Experiment he tried all key lengths up to 4 times the true key length. That means $k, 2k, 3k$ and $4k$ were in the mix. So intuitively there's only a 25% chance that the true key length would be the highest out of the four.

But there's more to it. If you're using a longer key length, fewer bits end up being compared. For example, if your key length is half as long as the ciphertext length $C$, then the number of XORs you do is only $C/2$. But if your key length is 2, then you do $C-1$ XORs. So the longer the key length, the higher the variance in the NAHD.

So you have NAHDs for $k, 2k, 3k$ and $4k$, all having the same expected value, but the latter ones having higher variance. So this would explain why, if the NAHDs for $2k, 3k, 4k$ are low, they are very low and hence end up trumping the list rather than wallowing in the middle, like the NAHD for $k$.

However according to this analysis, the NAHD for $4k$ is just as likely to be above that for $k$ as below. It's just more extreme.

Calculating XOR Key Sizes

2 Answers2