1

I'm currently trying to solve an online challenge, where I'm given two ciphertexts, encrypted using a one-time pad and the same key for both messages. The task is to decrypt those messages.

What I'm trying to do is to xor a common English word ("the" in this case) and for that word step by step XOR with the result of c1 XOR c1.

So basically: bits("the") XOR c1 XOR c2

My problem is that all the results I get are not usable. I know that "the" should be the right word to find something, as the instructor dropped that hint.

My code looks like this:

def text_to_bits(text, encoding='utf-8', errors='surrogatepass'):
    bits = bin(int.from_bytes(text.encode(encoding, errors), 'big'))[2:]
    return bits.zfill(8 * ((len(bits) + 7) // 8))


def text_from_bits(bits, encoding='utf-8', errors='surrogatepass'):
    n = int(bits, 2)
    try :
        return n.to_bytes((n.bit_length() + 7) // 8, 'big').decode(encoding, errors) or '\0'
    except:
        return "Not processable"



def xor(bit1, bit2):
    result = ''
    for idx, val in enumerate(bit1):
        result = result + ('0' if (bit1[idx] == bit2[idx]) else '1')
    return result


def crib_brute(bit_str, crib_word_bit):
    for i in range(50):
        xor_result = xor(crib_word_bit, bit_str[i: len(crib_word_bit) + i])
        print(str(i) + ": " + text_from_bits(xor_result))


c1 = "1010110010011110011111101110011001101100111010001111011101101011101000110010011000000101001110111010010111100100111101001010000011000001010001001001010000000010101001000011100100010011011011011011010111010011000101010111111110010011010111001001010101110001111101010000001011110100000000010010111001111010110000001101010010110101100010011111111011101101001011111001101111101111000100100001000111101111011011001011110011000100011111100001000101111000011101110101110010010100010111101111110011011011001101110111011101100110010100010001100011001010100110001000111100011011001000010101100001110011000000001110001011101111010100101110101000100100010111011000001111001110000011111111111110010111111000011011001010010011100011100001011001101110110001011101011101111110100001111011011000110001011111111101110110101101101001011110110010111101000111011001111"
c2 = "1011110110100110000001101000010111001000110010000110110001101001111101010000101000110100111010000010011001100100111001101010001001010001000011011001010100001100111011010011111100100101000001001001011001110010010100101011111010001110010010101111110001100010100001110000110001111111001000100001001010100011100100001101010101111000100001111101111110111001000101111111101011001010000100100000001011001001010000101001110101110100001111100001011101100100011000110111110001000100010111110110111010010010011101011111111001011011001010010110100100011001010110110001001000100011011001110111010010010010110100110100000111100001111101111010011000100100110011111011001010101000100000011111010010110111001100011100001111100100110010010001111010111011110110001000111101010110101001110111001110111010011111111010100111000100111001011000111101111101100111011001111"

crib_word = "the"
crib_word_bit = text_to_bits(crib_word)

crib_brute(xor(c1, c2), crib_word_bit)

Those are my first 15 results:

0: eP
1: Not processable
2: Not processable
3: Not processable
4: Not processable
5: Sgi
6: :v}
7: Not processable
8: L
9: Not processable
10: Not processable
11: Not processable
12: Not processable
13: {d
14: Not processable
15: Not processable

Nothing looks like part of the English language. Can someone tell me what I'm doing wrong?

Robinbux
  • 13
  • 2

1 Answers1

0

You're assuming that the messages consist of Unicode text in the UTF-8 encoding (or something similar enough to be analyzed as such), which means you're implicitly assuming the messages to consist of 8-bit bytes. But they're each 847 bits long. 847 is not divisibly by 8, so something's wrong.

If you try printing out the bitwise XOR of the ciphertexts (i.e. xor(c1, c2)) and plot its index of coincidence at various intervals, you'll notice peaks at multiples of 7 bits:

  1  53.10% #####################################################
  2  53.15% #####################################################
  3  53.14% #####################################################
  4  53.32% #####################################################
  5  53.10% #####################################################
  6  53.09% #####################################################
  7  58.77% ###########################################################
  8  53.28% #####################################################
  9  52.98% #####################################################
 10  53.08% #####################################################
 11  53.01% #####################################################
 12  53.27% #####################################################
 13  52.73% #####################################################
 14  58.60% ###########################################################
 15  53.04% #####################################################
 16  53.22% #####################################################
 17  52.86% #####################################################
 18  53.02% #####################################################
 19  52.91% #####################################################
 20  53.46% #####################################################
 21  58.74% ###########################################################
 22  52.95% #####################################################
 23  52.58% #####################################################
 24  53.60% ######################################################
 25  53.34% #####################################################
 26  52.65% #####################################################
 27  52.78% #####################################################
 28  59.19% ###########################################################
 29  52.58% #####################################################
 30  52.72% #####################################################

This strongly suggests that your messages are actually encoded with 7 bits per character, not 8.

Ilmari Karonen
  • 46,700
  • 5
  • 112
  • 189