29

I am very new to cryptography (so be kind), but I have a question that may seem silly.

If the one-time pad is the perfect cipher and impossible to crack, why would the following algorithm not be one of the strongest:

To encrypt:

  • generate a random bit array using a pseudo number generator with the key as the seed
  • The generated bit array is the size of the data
  • use a simple XOR to encrypt using the generated bit array

To decrypt:

  • pseudo-random number generator will generate the same bit array with the same seed (the key)
  • generate bit array the same size of the encrypted data
  • run XOR on encrypted data using bit array that's generated

This is very simplified, but you can imagine how for larger encryption you could break down the data into blocks, and in order to ensure that the block bit arrays are random a new seed for each can be derived from the key. So, if $f(k)$ is the pseudo random number generator, and b is the block number, you can encrypt each block with $f(k+b)$. ($k$ is key)

Instead of using a pseudo-random number generator, you could use some hash algorithm to generate the pad.

Ceteris paribus (assuming you can use equivalently strong keys) would this not be a very strong cipher method? Perhaps it may be strong and heavy on memory since the pad needs to be the size of the data. So during encryption and decryption you need to generate a pad that's equivalent to the size of the data.

Is there a flaw in this algorithm, or would something like this not be practical and much slower than current standards?

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
dardawk
  • 401
  • 1
  • 4
  • 6

9 Answers9

26

It's a good question. As pg1989 said, this is the basis behind stream ciphers, which are very fast in practice.

I thought I'd quickly expand upon your statement that "the one-time pad is the perfect cipher and impossible to crack." This is true, in a sense, but it's worth pointing out that sometimes an attacker wants to do something simpler than "cracking" the encryption scheme in order to recover all plaintexts.

For instance, an attacker may just want to change the message so that the honest recipient decrypts something else. As a simple example, suppose that the attacker knew for sure that the message was either "yes" or "no!" and just wants to change the message to the other value. Note that:

  1. This is simple to do for messages encrypted using a one-time pad. The linked Wikipedia page describes why this is true, but it's probably more fun to figure it out for yourself.

  2. The attacker can do this without learning the plaintext. Hence, the original claim of a one-time pad being uncrackable is still true.

There are some other issues with a one-time pad construction too, such as the fact that usually some state will be required to make sure that you don't re-use the seed during the next encryption. Hence, while your question is definitely not silly, and stream ciphers are very useful in certain circumstances, they also have their limits.

Mayank Varia
  • 826
  • 6
  • 6
21

Yes, this is a widely-used cryptographic construction called a stream cipher. For more information about this and other encryption schemes, Coursera's cryptography class is a good resource.

Adi
  • 139
  • 6
pg1989
  • 4,736
  • 25
  • 43
9

Congratulations you just reinvented the stream cipher.

The main strength of the one-time pad is that the key space is as large as the message space. This means that any cipher-text only attacks always fail because all plaintexts are valid.

This automatically means that any construct that decreases the key space (like using a seed for a PRNG) severely weakens the one-time pad.

Besides this a one-time pad (and the XOR based stream cipher) is very vulnerable to repeated-key attacks while there are other ciphers which are (more) secure against it.

ratchet freak
  • 502
  • 2
  • 9
6

If using a cryptographically-secure random number generator then the result is a stream cipher. If using actual random numbers, then it's a one-time pad.

Any output you get from a random source needs to be run through a randomness extractor anyway in a 2:1 ratio (2 bits in, 1 bit out).

Don't forget to provide a MAC along with the ciphertext to prevent an attacker altering the message.

If you want to use CSPRNG don't use just one algorithm. There's potential for a powerful attacker to determine a pattern from it. Generate a few separate keys with quality randomness of 512 bits+ length each. Then with each key generate a separate key stream using a different algorithm. Then combine/mix the streams together, perhaps with XOR.

Don't forget you need a way to keep in sync with the other person even in network failure so you don't re-use part of the stream/pad.

NDF1
  • 430
  • 2
  • 8
5

From what I understand from your question, you are describing a stream cipher.

If the one-time pad is the perfect cipher and impossible to crack, why would the following algorithm not be one of the strongest ...

You're on the right track; a one-time pad is essentially a perfect (unbreakable) stream cipher.

Without going into (any) mathematical details, the main difference between your average stream cipher and a one-time pad (OTP) is that an ideal OTP requires a random key at least as long as the plaintext, whereas your average stream cipher requires a key length of only N, regardless of plaintext size. (Assuming keys for both cases are completely random).

In a perfect universe, your stream cipher would use a short key size to come up with an unguessable, unpredictable, completely random number of length equal to the plaintext, and XOR the two together.

What, however, would be the purpose of that? If the end result of the cipher is nondeterministic, there would be no way, given only the key, to recover the original message.

What we need then is a deterministic mathematical function where we can create a long string of predetermined digits given a particular input (the key), which are otherwise (very) unlikely to be guessed without the key.

Our system's security is now defined by the weaker of either our key size, or the flaws in our mathematical function.

Paul
  • 151
  • 3
3

I also want to throw in a bit more on this - as why you would not want to do that. To use the analogy of sending secret message from a capitol to a diplomatic embassy, it is a great and secure, somewhat "foolproof" encryption method - but the practical use has some drawbacks:

  • You would need as much random data (i.e. the "codebook") as you would ever want to transmit. If you sent enough messages with sufficient size, you have exhausted your random pool. You cannot re-use the same random numbers ("codes") or it would be insecure.
  • You would now need a secure and reliable way to hand the codebook over. If there were any eavesdroppers, you couldn't do this. This means that you can't just email your key over to someone, you'd need to transport it in something akin to a locked attache case handcuffed to your wrist with a tamper proof (or tamper evident) seal.
  • Your local embassy has the codebook. Therefore, it is possible that a spy inside the embassy could use it to forge a message to the embassy, pretending it was from the capitol. (i.e. it is a symmetric cypher, the codebook can be used two-ways).

So when your random data pool ran out, you'd need to send out a new attache, and could not use encryption until this was done.

With other methods, a public key can be sent in the clear - if an eavesdropper were to receive it, it would do them little good, and there is no "pool" of random data that would be exhausted.

As awkward and restrictive as it is however, I still believe there are HIGH SECURITY applications in which this may provide the strongest encryption.

Brad
  • 131
  • 4
3

This is indeed a good question; let me try to make it a bit more precise. Suppose:

  • Alice has a plaintext message of some number of bits, call it p.
  • Alice and Bob share a crypto-strength random number generator that generates n truly random bits.
  • Alice and Bob share a pseudo-random number generator that can take a seed of size n and produce one of 2n sequences of p bits.
  • Alice and Bob have an insecure channel and a secure channel.
  • Alice (or Bob) creates a truly random key KEY of size n and sends it to Bob (or Alice) over the secure channel.
  • Alice creates a pseudo-random sequence SEQ of size p and xors it with the plaintext to produce a ciphertext of size p
  • Alice sends the ciphertext to Bob over the insecure channel.
  • Bob decrypts the ciphertext in the obvious way.
  • Details of the system are not part of the key. That is, the attacker should be presumed to know precisely how the PRNG works.

Ok, so first of all, the obvious criticism is: if Alice and Bob have a secure channel then why are they messing with encryption at all? And the answer is: the secure channel may be more expensive or only available at inconvenient times.

The second obvious criticism is: if Alice and Bob have a device that can generate n truly random bits then why not use it to generate p truly random bits in the first place, and get the pseudo-random number generator out of the picture? And the answer is: because sending p bits over the secure channel may be too expensive.

So what attacks could be mounted against this system?

The accepted answer points out that this system only defends against one attack, namely, it makes it very hard for Eve to decrypt the intercepted ciphertext. It does not provide any mechanism for Bob to verify that the ciphertext received was actually produced by Alice. If bits of that ciphertext are flipped, they'll be flipped in the plaintext too and Bob has no way of knowing that.

Another attack is: suppose n is a relatively small number, say, 32. That means there are only four billion possible values for KEY and therefore only that many pseudo-random bit sequences SEQ. Eve can obtain the ciphertext and simply try all of them until one of them decrypts it into something sensible. This implies that n had better be pretty large; large enough that this brute-force attack is infeasible. (And of course, if n has to be so large to prevent this attack that it is a significant fraction of p, then again, take the PRGN out of the picture and just generate as many truly random bits as there are bits in the plaintext.)

And yet another attack is: suppose the attacker manages to obtain the whole ciphertext -- easy -- and manages to obtain or guess k bits of the plaintext. This is not as hard as you might think; lots of messages have common patterns in them. (WWII era allied codebreakers made good progress by assuming that a significant fraction of messages would end in a greeting to the Führer; and they were right.) If k bits of the plaintext and all of the ciphertext are known then k bits of SEQ are known. Now the question is: Can the attacker make a good guess as to the value of KEY given k bits of SEQ and knowledge of the internals of the PRNG? If they can then the task of generating SEQ just got a lot easier for the attacker. The PRNG had better be carefully designed to make this kind of key recovery infeasible.

Mike Edward Moras
  • 18,161
  • 12
  • 87
  • 240
Eric Lippert
  • 135
  • 4
1

You would need to generate random numbers with the same size and rate as your data. True randomness is curical, and you should not re-use them. Quantum RNGs may be the solution.

Aydin
  • 21
  • 1
0

I'm not an expert either, but I think an obvious flaw is that if you used a well-known PRNG algorithm with well-known constants, then it might be possible for an attacker to guess the pseudo-random number stream.

I'm using a method similar to the one you describe, but I have 4 PRNGs, each with "non-standard" constants I chose myself. I generate the pseudo-random key stream like this:

  1. using PRNG#4 generate a random seed#1 for PRNG#1, then use this to generate key stream 1, to the same length as the cleartext.

  2. using PRNG#4 generate a random seed#2 for PRNG#2, then use this to generate key stream 2, to the same length as the cleartext.

  3. using PRNG#4 generate a random seed#3 for PRNG#3, then use this to generate key stream 3, to the same length as the cleartext.

  4. XOR key streams 1, 2 & 3 together, to create the final key stream, which is XOR'd with the cleartext to create the cyphertext. Append seed#1, seed#2 & seed#3 to the cyphertext to enable decryption by the exact same process.

If each PRNG can generate (e.g.) 64000 PRNs before repeating, then there will be 64000 * 64000 * 64000 = 260000000000000 different final key streams. This will help ensure that any particular final key stream will very rarely get re-used. I tested my algorithm for 300 million encryptions and no final key stream ever got repeated. Because I know the number encryptions I perform per year (approx 1 million), I'm confident that no final key stream would get re-used within 300 years - well beyond the lifetime of my system or its data. And because I have chosen my own PRNG constants, it would be difficult for an attacker to guess them. An important point to make here is that should you decide to choose your own PRNG constants, you need to do some sanity checking of the randomness of the resulting PRN stream - badly chosen constants can result in a "PRN stream" that is completely predictable. My system depends on the encryption algorithm and implementation code being kept secure & private, and with only the cyphertext publicly visible (which is the case).

The usual warnings apply here... when there are so many theoretically proven encryption algorithms available, you need a good reason to devise your own, unless it's just for fun.