Are picture files "random enough" to be usable as a one-time pad?

Question

Say you have a picture with 1 megapixels taken at random and with $2^{24}$ possible colours per pixel (RGB-24). That image would be unique and the possible combinations $(2^{24})^{10^6}$ immense.

However when taking a picture in the real world, say of a clear sky, there will be a lot of repetition.

The question is: would such repetition present a security risk when used as a one-time pad, where the requirements of randomness is so high?

My hunch is that it is, as true randomness would require the possibility of all pixels being #FF0020 or whatever, but I would like to be proven right or wrong.

If I've been unclear at some point, please let me know and I will edit my post.

score 15 · Accepted Answer · answered Sep 08 '12 at 00:39

No. This is not safe. The one-time pad requires that the pad be generated by a true-random process, where each bit of the pad is chosen uniformly at random (0 or 1 with equal probability), independent of all other bits.

Any deviation from that, and what you haven't is no longer the one-time pad cryptosystem -- it is some kludgy thing. In particular, once you deviate from that requirement even a little bit (and you're talking about a huge deviation), you are skating on thin ice and there will probably be security problems with your scheme.

If you're gonna use the one-time pad, you gotta use it exactly as it is defined, with a truly-random pad. There are no shortcuts, no halfway stuff. Messing around with this sort of thing is exactly what enabled the US to cryptanalyze Soviet use of a "one-time pad" in the VENONA project.

But in practice, you probably don't want to use the one-time pad anyway. The key management issues are enough that it is rarely a good choice in practice.

ronalchn · Answer 2 · 2012-09-07T23:05:30.487

You should not use the raw data of any image as a one time pad. This is even worse with an image of a sky, because of the large amount of blue pixels. For all images, adjacent pixels tend to be the same colour - which means there is a large amount of repetition.

If you want to use some of the data of the image as a one time pad, you will need to condition the data (concentrating the entropy present in the image).

A simple example of concentration of entropy is to take 2 not-so-random integers, a and b, and perform an operation, such as (a*b+a+b), then extracting the lower order bits (probably half). This scheme would eliminate bias present in the original integers. Of course, a more complex scheme is probably required.

A simple scheme you could use, which would be quite random, is to use a digest on the data. If for example, you believe that a third of the bits in the image contain useful entropy, then from every 64 pixels, containing 64*24 = 1536 bits, feed it into a SHA512 hash function, which will output a 512 bit digest (that is 64 bytes). You can then use that output for your one-time pad.

An IEEE article on Bull Mountain, Intel's Random Number Generator, includes some discussion on "concentration" of randomness, when the input data is not random enough.

score 3 · Answer 3 · answered Sep 09 '12 at 18:59

The reason repetition is so dangerous is imagine trying to attack a worst case scenario: a BMP picture file that contains all black. The contents of the image file will be #000000 #000000 #000000 #000000 ... Now consider how a one-time pad works: it XORs the cleartext with the bit stream. So if your plaintext was "ATTACK ON 10 SEPT", and you XORed it with an image that started with some repeating black pixels, the resulting "cipher text" would be "ATTACK ON 10 SEPT". I wouldn't be surprised if your enemy is not surprised.

Any swath of repeating bytes in the key file will do the same. The attacker just has to try 255 guesses to look for stretches of intelligible ASCII text.

Long ago a friend of mine wrote a proxy that used XOR "encryption" like this. My first attempt to discover his key was to download a black .GIF file, and his "secret key" printed itself in front of my eyes.

PulpSpy · Answer 4 · 2012-09-10T14:06:55.683

The amount of randomness in common pictures has actually been studied thoroughly, just not for applications to encryption, but rather for stenography. An artifact of images is that the least significant bit (it is what changes between slightly different shades of blue) has the highest entropy.

A simple stego-system is to overwrite the least significant bits of a picture with, say, a ciphertext or key (both of which are random—either pseudo or truly random respectively), a compressed plaintext (high entropy) or a raw plaintext (which is of low entropy). In either case, it is generally practical to distinguish between the true distribution of LSBs of an image and either things of higher/lower entropy.

A consequence of this is that an image does not make a good one-time pad, as even the most random aspect of the image is not random enough.

score 2 · Answer 5 · answered Sep 09 '12 at 19:17

Great question. I had actually been thinking about the same thing some time ago, but I realized that using an image as a one-time pad isn't a good idea. Try to take some random pictures and then open the pictures with a hex editor (like XVI32). I did that and noticed that the bytes were not all that random, for example many picture files have a lot of 0x00 bytes. Even though this is only for a part of the picture, it would still give someone a head start on trying to decrypt.

Paul Uszak · Answer 6 · 2017-06-13T01:06:59.763

Photographs make perfect sources of randomness for OTPs. This question may be a little stale, but most of the answers here are wrong and it's an interest of mine.

One of the great questions facing Mankind is where to get entropy from? Entropy is a fundamental tenet of the Universe and is all around. It's just a question of getting at it. Since this is a photographic question, the following is an example of a typical cryptographic cat:-

The original unopened JPEG file is 3.4MB in size.

My best estimate via compression (fp8) of the entropy is 2.5MB. Let's be extremely clear - I'm compressing the original JPEG file without opening it. Never ever open a JPEG to measure entropy. You'll just measure the JPEG extraction algorithm and fall in to the entropy vs. complexity trap.

Let's be ultra conservative (and lazy) and use a safety factor of 2.5. It is impossible that compression will improve this much as all the latest compression tests already show a very pronounced asymptotic tendency. fp8 is amongst the best (non text specific) compression program available that you can compile reasonably easily .

So usable entropy of image = 1 MB.

You then extract 1MB of entropy using a simple extractor on the original (unopened) JPEG file. You can use:-

Multiple Pearson hashes
Large matrix extractor
Wide substitution & permutation network
SHA1 & counter based extractor

Each way will render the 1MB of pure entropy to use as a perfect OTP. This is sufficient for 7000 Twitter messages. Then you can take another photo for next month.

The reason this technique works perfectly is for two simple reasons:-

Assume I didn't show you the photo. I just take a photo of something, extract the entropy and then eat the storage card. The cat is an example, please do not say that looks like my cat. You wouldn't know what the image was. It could be anything in the world, from any angle and under any lighting condition.
The avalanche effect will ensure that even photographs that look identical to your eye will have entirely different extracted entropy sequences. And you have to factor in the sensor noise that makes unique all JPEG images ever taken by Man. All that's required is a single bit's difference in the unopened JPEG file.

Ultimately this relies on the entropy of your camera's viewpoint. And considering how many views there are on Earth, that's why photographs make perfect entropy sources for OTPs. This is trivially proven beyond doubt. I challenge anyone reading this to produce photo1 and photo2 where:-

SHA3-512(photo1.jpg) = SHA3-512(photo2.jpg)

EDIT.

As an exercise I did ent CUTE_CAT.JPG.fp8 which produced:-

Entropy = 7.999926 bits per byte.

Optimum compression would reduce the size of this 2503292 byte file 
by 0 percent.

Chi square distribution for 2503292 samples is 256.14, and randomly
would exceed this value 46.81 percent of the times.

Arithmetic mean value of data bytes is 127.4831 (127.5 = random).

Monte Carlo value for Pi is 3.140577400 (error 0.03 percent).

Serial correlation coefficient is -0.000503 (totally uncorrelated = 
0.0).

This is actually a very good prima facie pass for randomness. Even without the randomness extraction surprisingly.

score -2 · Answer 7 · edited Sep 28 '24 at 14:08

I have zero credentials on this subject. But common sense says (1) don't mess with jpeg (unless you know what you are doing), (2) BMP LSBs should be fine (or RAW or TIFF, may be better; I haven't studied these yet), provided you (a) assume only lowest bit can be profitably used, so 1/8 the data captured is usable as entropy, (b) you stage your photos very carefully (no blue skies or solid colors with even illumination) and (c) you set your camera to native BMP, TIFF or RAW capture, i.e. conversion from jpeg to these other formats risks spoiling your beautiful randomness with orderly algorithms, discarding natures beautiful ugliness !

In the end, I think, whatever randomness you extract this way should be checked by an algorithm

Write a quick and easy python program to test how random the extracted bits' entropy actually is.

Even I--a mere marine biologist --could fake a good enough python program to get that accomplished.

In the end, this seems cheap and easy. Is the LSB same as shot noise in these bitmap type formats? I feel like it HAS to be? Where else would you hide or store the randomness of miniature little, misbehaving, quantum indeterminate photons?

Xxd hexdump and a hexeditor -- notepad++ --- might be enough to get it done. And a quick little python program to extract the LSBs and then to check the entropy after extracted.

P.S. Another intriguing possibility would be to take a picture of an outdoor scene, irrespective of complexity of scene (leave a little blue sky in, maybe if you want.). And take an indoor scene, either soft warm colors of an incandescent light at midnight, or the sterile hum and glow of a fluorescent bulb. Either way, overlay image one onto image two, by mere addition of pixel values. Or subtraction.

Do note the rules about "byte stiffing" I think those -- alot -- matter here!

Maybe even get 4 bits of every 8 to act random this way? I dunno. I am just guessing. Obviously wrote a program to test your random digits ; your output entropy file.

If anybody finds anything wrong with what I wrote, please correct it, politely.

Are picture files "random enough" to be usable as a one-time pad?

7 Answers7

Linked