Key generation algorithm based on bytes in a drive

Question

I am a novice cyber security student and thought of a way to generate a random cryptographic key.

The algorithm works as follows:

Get the size (in bytes) of every file in a directory (drills down into each folder to only get files)
Concatenate each byte sizes together, EX: A file length of 2 bytes and a file length of 4 bytes would would make "24"
This creates a string thousands of digits (I've done up to 10,000,000 digits long)
If the user needs an extra long key, hash each individual character using SHA-256
Take the final string (millions of bytes) hash it down to two 64 byte hashes and combine them to make a 128 byte or 256 byte based on user selection.

Some reasons I chose this method include the randomization of the string created since it's very unlikely that two people would have the exact same drive structure, and the size of the bytes hashed to make the key grows very rapidly since most files are several hundred bytes or more.

I guess this isn't a very direct question, but does this algorithm sound useful/secure? What are some ways I could improve on this algorithm?

Paul Uszak · Accepted Answer · 2017-05-30T21:20:28.257

Useful: No. Secure: No.

The most pertinent part of your question is the title which I'd slightly correct to "Key generation algorithm based on file lengths in a drive". It is here at the start that you've made invalid assumptions. There are others later on too though.

A random cryptographic key is essentially entropy, which you can read as uncertainty. So from a high level, key = entropy = uncertainty. The file structure on a typical machine is fairly well specified, whether it's a proper *nix machine or one of those other ones. If the machine is newly created (especially as a virtual image or embedded), the file layout may be identical between machine instances. So any key derived from the file system (lengths or contents thereof) will be highly predicable. There for insecure. You have confused complexity with uncertainty. The file system is certainly complex, but think of all of those free AOL CDs that ended up as coasters. If you had run your algorithm across the newly installed AOL directory, all of the generated keys would be identical across any machine you'd just installed on. All of you 10 million digit concatenations would have been identical, or very nearly identical.

The second error is the key length. As a security student, you'll come across typical acceptable key sizes. No one will even require more than 256 bits of key security. There are mathematical proofs of this limit. Your attempt at creating an extra long key is meaningless. A lot of cryptographic primitives are designed around the 256 bit size. What were you attempting to use a 10 million byte key for, as there is no legitimate function to absorb such a key? [You're not thinking one time pad are you?]

With respect, you're probably quite new to cryptography on the basis of this question. There is no effective improvement that can be made to this algorithm without a much more extensive understanding of the nature of a cryptographic key and it's relationship to entropy.

You are though ultimately correct in realising that a key can be generated from the chaotic churn inside a modern computer environment. Chaos principles can be applied to a file system and there is true entropy but it requires a much greater understanding of the principles before you can extract it safely.

Key generation algorithm based on bytes in a drive

1 Answers1