I am developing a row level database encryption scheme. Ideally I would not do this and instead rely on something vetted, but I haven't been able to find such a scheme online.
My primary concerns are:
- A database dump leaking.
- Somebody unauthorized gets access to the database server.
- Authorized personell with access to the database server being able to read confidential data without first going through my web application which performs audit logging, etc.
Algorithm wise I am currently using AES-256-GCM for encryption and a SHA-512 HKDF to derive keys for each row using a single 'master key'. The master key is 32 bytes generated by a CSPRNG.
To facilitate key rotation without downtime, I store a master key version identifier with the encrypted data. On rotation, a new key and its version are added to the application configuration. Encryption operations then use the new key, while decryption remains possible for rows encrypted with the old key. Rows are re-encrypted in the background, allowing for the retirement of the old key once finished.
Here's an example table layout in the database using my scheme:
| Id | EncryptedData |
|---|---|
1 |
v5;<base64 encoded encrypted JSON> |
2 |
v4;<base64 encoded encrypted JSON> |
(v4 and v5 here are the master key versions)
To derive a row key I do the following:
RowKey = HKDF(hash: SHA512, ikm: MasterKey, outputLen: AES_256_GCM_KEY_LEN, salt: SHA512(RowId), info: TableName)
The resulting row key is then used to encrypt data as follows:
Ciphertext, Tag = AES-256-GCM(RowKey, Nonce, Plaintext)
The nonce used above is 12 bytes generated by a CSPRNG. I generate a new nonce for every encryption operation.
For final serialization into the EncryptedData column I simply Base64 encode the concatenated ciphertext, nonce, and tag. I then prepend the master key version identifier and a semicolon.
Questions:
- Are there any glaring security flaws in this approach?
- Is there any prebuilt scheme I can use instead rolling my own?
- Am I using the HKDF correctly? I'm not sure if the salt & info parameters are being correctly used.
- Is there any benefit to hashing the row id before using it as the salt?
- Is SHA-512 overkill or would I be fine using SHA-256?
Thanks for reading, I greatly appreciate any insights and suggestions!