2

I am using Gnupg with AES256 symmetric encryption. There is a passphrase for each kind of files to encrypt, e.g. the passphrase for all work-related documents might be MyWorkRelatedPass19, whereas private home videos might have the password PrivateStuffOfMine34.

Then I came to a solution to make the passphrases a bit longer while minimizing the risk of forgetting part of the passphrase, and that is using a common prefix. So, all work-related documents would use CommonChunk#(/MyWorkRelatedPass19, and my home videos would be encrypted using CommonChunk%#(/PrivateStuffOfMine34.

In case the common prefix is leaked, does this give an advantage to the attacker?

In other words, if each star represents an unknown character, is

LeakedChunk'*********

easier to crack than

'*********

(where the length of the unknown chunk is the same in both scenarios)?

Also, taking into account that the unknown chunk is completely unrelated to the first one (I mean, obviously if the prefix is "IwantToDrink" I wouldn't use as the second part "aGlassOfWater")

EDIT To further specify my question, I would like to know if knowing the first part of a passphrase gives an attacker any advantage, in terms of how encryption and salting passphrases work. It's clear that bruteforcing this "??????" (six unknown ascii characters) takes the same number of brute-force attempts than this "HelloWorld??????" (HelloWorld followed by six unknown characters). In both cases there are six unknown slots to bruteforce, but in the second case maybe knowing the original first part of the passphrase reduces the phase space to bruteforce because the space of keys after salting is somehow conditioned by the known chunk of the passphrase. I hope this is well explained (my knowledge is very limited here).

I guess, if there is any weakness in "HelloWorld??????" vs "??????", it may be in the key derivation function, can anyone confirm, deny or expand on this?

Mephisto
  • 143
  • 3

3 Answers3

2

The strength of any such passwords is the strength of the unique part of the password.

If the strengths of the un-prefixed passwords are truly independently sufficient (in other words, more than hard enough to crack), then adding a known prefix does not materially reduce that strength. This is primarily because password cracking requires getting a "full" match: the candidate password must be exactly and entirely correct (collisions aside). So if a password is 15 random characters and begins with "Super Secret" ... you still have to "get" those 15 characters (which is the hard part).

But ...

The opposite case -- where the unique part of the password is weak, but also contains a common strong prefix -- is very bad.

Consider this naive (but common) password "system", in which the user goes to the trouble of generating a great password, but then uses it everywhere, just adding a site-specific mnemonic:

TXZaW$I3DyhAh7.facebook
TXZaW$I3DyhAh7.gmail
TXZaW$I3DyhAh7.twitter
TXZaW$I3DyhAh7.mastodon
TXZaW$I3DyhAh7.capitalone
TXZaW$I3DyhAh7.fanforum

The problem here is that if any of the sites is compromised -- and if any of those sites are storing the password in plain text, or it's leaked in a system log, etc. -- then the pattern would be immediately obvious to an attacker, and all of the passwords would be compromised.

I bring this other case up in my answer because it illustrates why your approach, while harmless, doesn't actually make anything much more secure. Making the passphrases "a bit longer" only helps across the ecosystem if that lengthening is globally unique. But if you add a common prefix to each password, then disclosure of that prefix reverts the strength of all passwords back to the strength of its unique remainder.

In your system, this is harmless (assuming strong passphrases) but unnecessary. In the "password system" case, it's fatal. And this isn't just hypothetical -- if you study a corpus of leaked / intercepted passwords across many different users and platforms, the people using a "password system" like this one are immediately obvious, and all of their passwords in the corpus can be cracked in short order (and you know what platform they're used for!)

Royce Williams
  • 253
  • 2
  • 8
1

Short Answer

In General:

In a proper password-based key derivation scheme that uses modern cryptographic hash functions, appending a common prefix that is unrelated to the unique portion of the passphrase (the "true password") will not make your passphrases or derived encryption keys weaker.

This is because every password is hashed into a seemingly random output in order to derive the corresponding encryption key. Whether the outputs are sufficiently indistinguishable from random bits depends on the robustness of the hashing algorithm. Since modern cryptographic hash functions have not yet been broken, we can assume that the encryption keys are reasonably secure from direct attacks-- even if they were generated from similar inputs.

As a result, the weakest link in the security scheme would be the passphrase itself, not the encryption key. This is a different issue from how the encryption keys are derived (it is a matter of password security). Should the common prefix be leaked, the security of the passphrase is dependent solely on the true password. A proper password-based key derivation scheme would not impose limitations on password lengths since the passphrases will ultimately be hashed to a fixed size, so if the true password and prefix are completely independent, then adding the prefix would not negatively impact password security.

GnuPG:

GnuPG uses S2K to derive keys from passphrases. This is a process that repeatedly salts and hashes the input, similar to PBKDF2. The hash function used can be specified, and according to the manual, GnuPG supports several modern cryptographic hash functions including SHA-256. This should ensure that the derived encryption keys appear to be indistinguishable from random bits, even if the original inputs are not that different. However, whether this algorithm is sufficient to protect your passphrases (any passphrase) from brute force attacks is questionable.


Long Answer

The post is confusing. The post title, the updated section, and the self-answer all seem to point towards different questions:

  1. Is using a common prefix in GnuPG passphrases worse than not using them? (implied by first sentence of post)

    • Answer: No, provided that you specify usage of a modern cryptographic hashing algorithm and no one manages to break it.
  2. "Is using a common prefix in passwords safe within any security scheme?" (implied by post title)

    • Answer: It depends on the security scheme. Additionally, the evaluation of safety and password strength is a matter of password security, which is a separate matter from key-derivation and security scheme design.
  3. "Will a password-based key derivation function produce similar outputs if the inputs are similar?" (implied by updated section)

    • Answer: It depends on the KDF you are using. If you are using a modern password hashing algorithm such as Argon2 or PBKDF2, then no.
  4. "Is using the same salt for password hashes secure?" (The summary of the AI response in the self-answer appears to refer to "precomputation attacks", i.e. dictionary attacks and rainbow tables, which implies it interpreted the "common prefix" to be the password salt. This is supported by how it claims Argon2 is safe from these attacks; Argon2 automatically generates its own salts, lowering the effectiveness of precomputation attacks.)

    • Answer: No. The common prefix should not be the salt. Every salt should be randomly generated and of sufficient length to foil precomputation attacks. You shouldn't need to worry about this if you are using a proper password-based key derivation scheme.

I suspect you may be misunderstanding how the symmetric encryption keys are derived. I will outline the general process below.

Proper Password-Based Key Derivation Scheme

A simple yet effective password-based key derivation scheme is one that uses a modern password hashing algorithm to securely convert a low-entropy input of any length (the passphrase) into an output of fixed length that appears uniformly random and originates from an apparently uniform distribution.

Overview:

Encryption algorithms require keys, not passwords. A key is a string of bits (1s and 0s) of a length specified by the encryption algorithm. In AES-256, keys must be 256 bits long.

A password can be read as a key (by converting the letters/numbers/symbols into bits), but this is generally a bad idea since passwords might not have the exact number of bits required by the encryption algorithm.

To solve this, key-derivation functions are used. A key-derivation function is any algorithm that can derive one or more secret keys from a starting "original" or "master" key. Key derivation functions frequently use hash functions internally to transform the input (which can be of any length) into a unique* output of a fixed length.

* Technically, by the pigeonhole principle, the outputs are not unique, but trying to find a collision (identical output for two inputs) is infeasible because the number of possible outputs is too large to search within our lifetimes.

This means when you have a password like MyWorkRelatedPass19, it is not used to directly encrypt a plaintext. In a proper scheme, it is first transformed via a KDF into a value such as FC76A099A7BC0D0970B00C466ACCC34431B20524CE0806C206E4840A4F114F63, which is exactly 256 bits long and can thus be used as an AES-256 key.

Hashing and KDFs:

A hash function is a deterministic one-way operation that maps an input of any length into a unique output of a fixed length. It is a tool that can be used for many purposes, such as checksums, digital signatures, and constructing efficient data structures.

In cryptography, we use specially designed cryptographic hash functions. One of the objectives of a good cryptographic hash function is exhibiting the avalanche effect, which means changing just one bit of the input should drastically change the output. In other words, the outputs of such a function should be statistically indistinguishable from random. This would mean hashing "CommonPrefix+A" and "CommonPrefix+B" would result in two completely different values, even though they differ by just one character.

Since hash functions are deterministic (using the same input produces the same output every time), it is possible to keep a list of known hashes and their inputs, and then refer to it when trying to find the input to a hash. For example, if I knew that MyWorkRelatedPass19 hashed to FC76A099A7BC0D0970B00C466ACCC34431B20524CE0806C206E4840A4F114F63, then the next time I saw the same hash out in the wild, I would know that the original input was MyWorkRelatedPass19.

To get around this, a "salt" can be used. A salt is a string appended or prepended to the input before hashing (e.g. the $SALT$ in $SALT$MyWorkRelatedPass19). The salt value can be anything, and does not need to be kept secret.

By using different salts, we can get different hash outputs from the same input. We might use $SALT123$MyWorkRelatedPass19 for one hash, and $SALT456$MyWorkRelatedPass19 to produce a completely different hash. However, to foil precomputation attacks (the list of known hashes and inputs), it is most effective for the salt to be random. This way, attackers with access to the hash outputs would have to recalibrate their tables for every single possible salt value.

Adding a common prefix to a password is not the same as adding a salt. The prefix is part of the input, and the salt would be attached to the complete input as part of the hashing process.

As mentioned previously, a key-derivation function is an algorithm that generates subkeys from a single "master key". A KDF often uses cryptographic hash functions internally; for example, Argon2 uses Blake2 while PBKDF2 can be specified to use SHA-256 or SHA-512. Sometimes, the hash functions themselves can be used as a KDF, but this should not be done with passwords. This is because password-based key derivation functions employ key-stretching to make weak inputs stronger. Furthermore, hash functions are intended to be fast, which is not desirable since it would mean attackers could compute hashes quickly and therefore brute-force hashes at a faster rate-- this is especially bad for passwords since they are typically not random (meaning attackers have a convenient list to start from that is more likely to hit a matching hash than starting from nothing).

The last reason is why PBKDF2 and GnuPG's S2K algorithms allow users to specify a work factor: this increases the number of iterations that the algorithm executes, making the process take longer and consume more resources.

Conclusion:

Your passphrase is not used as an encryption key directly. It is hashed to derive a key, and if a modern cryptographic hash is used, the resulting key should be indistinguishable from random, no matter the input. Therefore, using similar passphrases should not result in similar-looking encryption keys.


Password Security

While common prefixes in passwords might not negatively affect the brute-forceability of the derived keys, that doesn't mean they won't ever damage the security of the password. If the common prefix is leaked or otherwise publicly-known, then the security of your passphrase depends on the unique portion (the "true password").

If the act of including a common prefix weakens the true password, then yes, a common prefix would weaken your passphrases as a whole.

To understand what is meant by "weaken", we must first understand the factors that determine the security level of a password. These factors depend on the attack model we are considering when evaluating security. The most common attack models to consider are the extremes of the "attacker knowledge spectrum". Real attackers usually fall somewhere in the middle of this spectrum, but it's easiest to only consider and defend against the extremes.

There are other models that account for how passwords are stored and how they are transmitted, but users typically don't have much control over these things, besides deciding whether to use a particular service or not.

One Side of the Attacker Knowledge Spectrum: No External Information Whatsoever

In this model, it is assumed that the attacker is given a set of hashes or derived keys, and their goal is to find the original inputs (the passwords) without knowing anything about who the outputs belong to. In this situation, the attacker is forced to rely on a brute-force attack: trying every possible combination, one character at a time, until the right input is found. The number of possible correct inputs (the search space) is therefore [the number of possible characters] raised to the power of [the number of characters in the input].

For a five-character password consisting of only lowercase English letters, the probability of a successful attempt would be 1/(26^5), or 1 in 11,881,376. On average, we expect an attacker to go through half of the search space, so it would probably take them about 5,940,688 tries. Considering how fast computers work, this is not very many tries at all.

To improve our odds of the attacker giving up before finding our password in the possible search space, we can try increasing the number of possible characters: including uppercase letters, numbers, symbols, even letters from other languages. This is the basis for most services asking users to include a symbol or number in their passwords; without this restriction, people might take the most convenient way out and just use lowercase English letters.

But the biggest impact would be felt by increasing the length of the password. Adding just one more character would exponentially increase the number of possible inputs. Since we don't know what characters the attacker will assume to be valid, a password like correcthorsebatterystaple would be equally strong as FRENMT7EinjsWodpZojNAcEcp and 1234567890123456789012345 or even 我在图书馆吃甜点的时候看到一只大猫拿着火箭筒指着我; they are all 25 characters long. But no matter what character set the attacker uses, we can reasonably say that a shorter password has a smaller search space (and is therefore weaker) than a longer password with the same character set.

Therefore, against brute-force attackers with no external information, the most important factor in password security is the length of the password.

The Other Side of the Attacker Knowledge Spectrum: Total Knowledge of YOU

The probabilities of success for the brute force attack above assumes that all characters in a given character set are equally likely to appear. This is not true in reality; passwords often need to be memorable to the user and are not completely random. This is often achieved by using words rather than random combinations of characters.

However, the probability distribution of letter appearances in words is not uniform. In English words, the letter u is far more likely to appear after the letter q than any other letter. Also, humans like to use certain words or character combinations that appeal to them when coming up with passwords, such as birth dates, pet names, or convenient phrases like the name of the service. An attacker that had this information for a specific user would be significantly more likely to be able to guess that user's password in a smaller timespan than that of a generic brute-force attack.

To account for this, we must consider an attack model where the attacker knows everything about the victim, short of the actual password used. This includes preferences, habits, and even the method used to generate the password the attacker is trying to find. Even if the victim tried to come up with a "random" password, we cannot assume it is truly random since humans have biases and aren't very good at generating random values.

To make this more clear, imagine your method of deciding a password is to flip a coin and select one of two 1,000,000-character-long strings (of any of the 1,114,112 possible Unicode characters) to use as your password depending on whether the coin comes up heads or tails. Obviously, it would be totally infeasible to brute force either of these passwords. But an attacker that knew you would only ever use these two passwords would be able to crack it immediately-- if it wasn't the first one, it would have to be the second one.

Now, imagine instead of flipping a coin once, you flipped ten coins and had a password for each possible combination of outcomes. This would result in 2^10 or 1024 possibilities. Even if the attacker knew how you decided on a password, they would still need to try each one until they found the right one. Since you didn't know what the outcome would be in advance, neither does the attacker.

Therefore, against attackers with complete information on the target, the most important factor in password security is the randomness of the password, which can be quantified as the number of equally-likely passwords that could have been chosen.

This is the idea behind Diceware and similar password generation methods. It's also why you shouldn't reroll the results of these generators until you find a password you like; the [number of passwords you like] is smaller than the [number of possible passwords].

How Common Prefixes Affect Password Security

When designing a password, we must prepare for attacks on both ends of the "attacker knowledge spectrum". This means we must assume the attacker knows the common prefix (because it is part of the password generation process).

If you are required to use a password with a maximum length (e.g. "no more than 30 characters"), then adding a common prefix to your passwords would reduce the length of the true password, which would make it easier to brute-force.

If the common prefix is related to the true password, then the number of possible true passwords is reduced, which makes it easier to guess.

Therefore, a common prefix in passphrases can weaken all of your passphrases. But conversely, if none of these situations apply, then the strength of the passphrase falls back to the strength of the true password, and a common prefix would not weaken it further.

The takeaway here regarding password security is that your true password (unique portion of the passphrase) should be long and truly random (not human-ly random).

0

Ok. Here is what I think I understood, by inquiring DeepSeek (this is being written in 2025, so take it with a grain of salt because the AI might be hallucinating)

The following password: HelloWorld**********

is weaker than this one: **********

where each '*' represents an ascii character that is unknown for the attacker.

This might seem counterintuitive because in both cases the number of unknow passwords to bruteforce is identical. There is a catch, however. GNUPG does not use the passphrase as it is, but rather derives the key from the passphrase mixed with a random number and an algorithm that requires up to 66 million iterations. This is done in order to slow down each attempt in a bruteforce attack.

The time this step requires, deriving the key from the passphrase, is not linear with the passphrase length. Moreover, it's of the same order of magnitude. Doubling the length of a passphrase increases dramatically the number of possibilities to bruteforce, but the time it takes to derivate a key from each possible passphrase is nearly the same.

However, if half the password is known, the attacker can use some numerical hokuspokus called "precomputation" that reduces the time it takes to derive the key from each possible passphrase, in some scenarios the reduction can be dramatical, so bruteforcing the same amount of guessed passphrases can take several orders of magnitude less, so that not very long passwords that would have been secure without an appended prefix, become feasible to crack.

I won't accept my own answer and I hope someone with better knowledge can explain things a bit better here.

Oh, and using the known part as suffix instead of prefix, does not have the same vulnerability.

Remark: The algorithm ARGON2 to derive a key from a passphrase would be safe against this, according to the AI, but it's not yet implemented in GNUPG.

Mephisto
  • 143
  • 3