XOR cipher for encrypting compiled C code

Question

I'm exploring ways of encrypting Intel hex files we send to customers for flashing onto an embedded device. The embedded processor itself has a built-in mechanism that prevents anyone from reading the contents of the flash memory unless you unlock it using a 128-bit password.

A trivial method I thought of to encrypt the contents of the hex file is to XOR its data fields with this 128-bit password. The embedded bootloader (which knows the password) would then XOR the incoming cipher text to recover the plain text before flashing it.

The hex files are several hundred kilobytes in size. Typically such a scheme is susceptible to a frequency analysis attack, but (according to Wikipedia) the success of such an attack is dependent on the ability to make assumptions about patterns in the plain text. I have two conflicting arguments with regard to this, and I can't decide which one is more accurate.

Since the underlying plain text is processor instructions one cannot make assumptions about repetition patterns.
Because the plain text consists of opcodes interspersed with operands for those opcodes, you're working with a finite set (the opcodes supported by this processor), this fact can be exploited to identify patterns.

Pertaining to the second argument, it is a CISC processor that supports several hundred instructions; some of these opcodes (parts of the opcode) vary depending on the operand. It is certainly not a small set we're talking about.

Is this approach flawed or does it make for reasonable encryption under these circumstances? (I realize the question is subjective, but I don't know how to make it more quantitative).

The password certainly does not have to be limited to 128 bits, but due to size concerns it'll always be several orders of magnitude smaller than the code being encrypted.

What if I were to introduce some heuristic where, depending on the data content, the password was first rotated before XORing with a given block of plain text?

Lastly, I'm open to other ideas of dealing with this situation as well.

score 6 · Accepted Answer · edited Apr 13 '17 at 12:48

The encryption scheme seems to be:

re-use an existing 128-bit secret, originally used to unlock a read-prevention mechanism, as the 128-bit key;
split the plaintext (data to protect from prying eyes) into 128-bit blocks;
XOR each block with that 128-bit key.

That approach is flawed. Two cardinal mistakes are made:

Use XOR with a keystream that repeats.
Reuse a secret (and what's worse, without any level of derivation).

Due to 1., if there is anything guessable in the plaintext (such as, a block with like 40 consecutive 00h bytes, or an ASCII string of about that size that appears in the user interface, or a code sequence from a common library), the 128-bit secret is easily recovered, and the confidentiality of the whole plaintext is lost; that's kid's play, really. Same (for more effort and data, like a some kilobytes) if some characteristics of the plaintext are known (like instruction set and compiler), enabling attacks like frequency analysis.

Due to 2., the above attack also reveals the 128-bit key used to unlock a read-prevention mechanism, ruining it. The confidentiality of both earlier and later code is lost.

Why not use true cryptography? See this answer, especially the final paragraph and what it links to.

What if I were to introduce some heuristic where, depending on the data content, the password was first rotated before XORing with a given block of plain text?

If the rotation is the same for each block, that will not help at all to protect the confidentiality of the ciphertext; the above attacks guessing the key work unmodified. That helps, even so slightly, to protect the secret used to unlock a read-prevention mechanism (because it is now mildly different from the one used in a futile attempt to protect the code's condifdentiality).

If the rotation varies with each block (say, as the low 7 bits of block number), it might help slightly w.r.t. the confidentiality in some cases, but in many others it harms significantly: for example, knowing that rule for rotation, and on the sole assumption there exist a contiguous zone of few kbyte where 90% of the bytes have their bit 7 of zero (with about random distribution of those that do not), it is now trivial to recover the whole key by frequency analysis; then the whole plaintext. What's needed here (if for some reason it was indispensable to use the same secret for two uses) is a derivation mechanism, not security-by-obscurity.

Also, are we talking password, or key? That's not the same thing. A key is assumed chosen at random, a password is not.

XOR cipher for encrypting compiled C code

1 Answers1