Database row level encryption scheme

Question

I am wanting to secure some highly sensitive data in a database. This would mean that the data needs to be encrypted and remain secure for 100 years if it were to fall into adversary hands. I also want to limit the amount of data that is vulnerable in plaintext at a time in RAM. This is so there is less chance of plaintext data being paged to disk. Also the database may be quite large so it needs to be more efficient than decrypting the whole database at a time just to access it. Therefore I am thinking about encrypting the sensitive data on a database row level. This would mean a unique index which references the record is unencrypted, so each record can still be found/retrieved, however the sensitive data itself is encrypted.

My solution would be to have the data per database row:

index | IV | sensitive encrypted data | MAC

A 256 bit database key will be used to encrypt the sensitive data which will be generated using /dev/random.
The IV for each row will be 256 bits from /dev/urandom (faster than /dev/random).
The encryption algorithm will be Twofish.
The MAC of each record will be HMAC-SHA3 of the index, IV & sensitive data using the key.

The system is single user. The user will create a strong alphanumeric passphrase (minimum 14 characters).

A password based key derivation function will be run on the passphrase to create a derived encryption key which will then be used to separately encrypt the database key with Twofish. This is so the user can change their password without having to re-encrypt the entire database - they can just create a new password and re-encrypt the database key instead.

To derive the key from the passphrase, PBKDF2 will be used with 10,000 iterations using HMAC-SHA3 with a 256 bit output and a salt of 256 bits obtained from /dev/urandom.
What I am trying to do is balance the number of password characters required to make the data secure versus making it reasonably fast for users on a mobile device which have slow processors and limited memory. I don't expect the user to wait more than 5 seconds for the PBKDF to complete.
A MAC is created using HMAC-SHA3-256(derived encryption key, salt | encrypted database key) and stored next to the salt and encrypted database key on disk. This can be verified when logging in to make sure they entered the correct password.

When the program loads, the user enters the passphrase. The KDF runs, which generates the key to decrypt the database encryption key. The real encryption key is then the only thing kept in RAM while the program is running and used to verify and decrypt individual database records when required.

What's the optimal length for the row level IV? Is 256 bits fine?
Is the minimum password strength of 14 characters and 10,000 iterations of PBKDF2 strong enough to protect the 256 bit database key? If not, what parameters would work?
Is PBKDF2 still a good algorithm still to use here? If not, what Scrypt parameters?
Any further changes or recommendations to make the system secure?

score 2 · Accepted Answer · answered Oct 15 '14 at 10:06

The way the iterations work is that it roughly increases your security (in bits) by $\log_2(iterations)$. So you would still need $\frac{\log{2}}{\log{97}}\cdot (256 - \log_2(10000)) \approx 37$ characters in your password to have 256-bits of security. Think of it this way, if you have $2^{256}$ possible keys, that is an astronomically large number. Much more than the number of elementary particles in the universe. In comparison, 10,000 is basically nothing. Increasing the keyspace gives you exponential security, while adding iterations to your key derivation function gives you only linear security. It's nice, but at the end of the day it doesn't help as much as you think it does.

Having said that, you probably don't need 256-bits of security anyway. As I said before, that is a ridiculous amount of security. No increase in computational power will ever let anyone brute force a 256-bit key. There are physical limits to computation, and it would take something like the entire energy of a dozen stars to be able to do that. The reason we have things like AES-256 is that we anticipate in the future people will find more efficient attacks against AES which do not require brute forcing. But, with a longer key we hopefully delay these attacks. So, if someone discovers an attack which works in time $2^{n/2}$, AES-128 will be effectively broken. But, with AES-256 we still have 128-bits of security, which should suffice for a while longer.

For 100 years, the number of bits of security you really need is closer to 120-150. The outside limit of what anyone can brute force today is probably 70-80 bits, and remember adding bits gives you exponentially more security. So, you would really only need a password that was 17-20 characters long.

score 2 · Answer 2 · edited Apr 13 '17 at 12:48

Some points towards an answer:

Why HMAC-SHA3? HMAC and its security proofs have been devised for Merkle-Damgård hashes, and SHA3 is not one. HMAC-SHA256 would be fine (Updated per comment: the Keccak submission does endorse its use with HMAC, using a block size parameters of 576 (resp. 832, 1088, 1152) bits for the hash with output of 512 (resp 384, 256, 224) bits; but NIST's current draft specification of SHA-3 does not mention HMAC, or a block size; on NIST's website I only located this slide that does).
Why Twofish? It did not win the AES competition, and is (thus) less widely available, especially in hardware.
Twofish alone does not define an encryption algorithm, for it is a block cipher. You need to specify an operating mode.
On 1.: Since Twofish is a 128-bit block cipher (as AES is), we do not even know what to do with a 256-bit IV (in common operating modes). 128-bit is fine for all common mode.
On 2.: As explained in a fine other answer, a 14 characters key/pass-phrase can't give 256-bit security. If 14 characters are drawn uniformly randomly among 64 characters, and with 10000 iterations of PKKDF2, one gets $14\cdot\log_2(64)+\log_2(10000)\approx 97$ bits of security; that is, odds of success of an hypothetical effort involving $2^{90}$ hashes (which may be feasible in a 5 years effort, starting from now, for a well funded governmental organization) are less than $1/150$. That's likely enough for any practical use (though there might exist regulations interpretable as requiring even more).
On 2. and 3.: However if a human needs to memorize a 14 characters key/pass-phrase, or worse generates it, security can be tremendously lower. How much lower is hard to quantify (and the key will often end written on a badly hidden note anyway). I advise
- at least allowing long and diverse pass-phrases;
- education of users on how to chose a good pass-phrase if they care for security;
- not hoping for strong security if they do not, no matter how clever the criteria to filter weak passwords is, and the entropy-stretching function used;
- rather than PBKDF2, using scrypt with highest parameters (memory, iteration, CPUs) practical in all envisioned runtime environments and circumstances (in particular, scrypt's increase of the memory requirement dramatically lowers the efficiency of brute-force attack for a given cost; if for some reason the use of HMAC-SHA256 in scrypt was seen as an issue, it is easy to use any trusted MAC, including HMAC-SHA3-256); whatever the method for stretching of pass-phrase to key(s), I suggest that it requires heavy use of significant memory, which is a simple and effective deterrent for ASIC/FPGA-based brute force crackers;
- using a random per-instance salt (where instance is e.g. the passphrase, or database),
- trying to keep that salt secret inasmuch as feasible, which can only help.
It is un-academic to use the same key for encryption and integrity, which seems to be the case here; however since the algorithms proposed for these two functions are so different, the main practical issue is that if one key leaks (e.g. thru DPA), the other does.
Why not AES-GCM, which is standard and kills the confidentiality and integrity birds with the same stone/key?
It is not discussed how the passphrase, key, and deciphered data, are going to be protected from disclosure (e.g. by shoulder-surfing, keyboard logger, RAM dump..); that's a hard practical issue. Smart Cards may help, but that turns the whole proposition around.

Database row level encryption scheme

2 Answers2