Why is FIPS 140-2 compliance controversial?

Question

I was reading the comments of an article about a proposed new implementation of /dev/random in Linux today, and someone remarked that it must be bothersome to go through 43 revisions and still not have your patch landed. A few comments down the line and someone seemingly implies that this new implementation would be FIPS 140-2 compliant, and that this is controversial with "a developer of one famous VPN kernel module" which "purposely utilize only non-NIST approved algorithms" has "strong opinion against FIPS compliance needed for governmental use cases".

Why is this? What is controversial about FIPS 140-2 compliance?

fgrieu · Answer 1 · 2021-11-24T10:38:38.783

I'll add to the other answer: the FIPS 140-2 certification rules for RNGs were flawed; and FIPS 140-2 change notice 2 (Dec. 2002) removed the part on self-tests. They are literally struck out from the standard, leaving vacuum. Thus FIPS 140-2 prescribes no technically satisfactory test of the entropy source, never did, and that's an issue. It only prescribes approved cryptographic conditioning (on which I have no technical reservations now that Dual_EC_DRBG is out).

It was originally prescribed 4 tests (monobit, poker, runs, and long runs) to be performed on operator demand (at level 3) or at each power-up (level 4), with a manual intervention required if a test fails. That's wrong for several reasons:

The acceptance levels are very stringent (more than they were in FIPS 140-1). Even with a perfect generator, the tests will fail on the field with sizable probability, with human intervention mandated. This is plain unacceptable in some applications, including OSes, TPMs, Smart Cards. Anything operated unattended can't be level 4, and contortions are required at level 3 to justify the "operator demand" thing.
It's not specified that these tests should be run on the unconditioned entropy source. Thus it is tempting to run the tests on the random numbers as output by some conditioning block, which makes the tests largely pointless: if the unconditioned entropy source fails or becomes low-entropy, a test of the conditioned output won't catch that, unless the conditioning has a huge theoretical flaw.
The acceptance level of some tests (monobit test in particular, and to a lesser degree poker test) is such that a slight bias, which is perfectly normal and harmless for an unconditioned entropy source followed by proper cryptographic conditioning, will cause disastrously high rejection rate. Thus it's essentially impossible to apply the tests on the material that requires testing.

The last two issues remain with the later SP 800-22 Rev. 1a NIST statistical tests for RNGs for cryptographic applications (which, according to my limited understanding, are now used during certification). The math of the tests is fine. But as above the tests are very sensitive, thus unusable on an unconditioned true entropy source (the tests would often fail), thus usable only at the output of a conditioned source, thus unable to detect the source's defects if the conditioning is good. And it's impossible to detect a competently backdoored generator from it's output alone.

So these tests either fail and give a good insurance that the material tested is distinguishable from random, or succeed and give an apparent insurance of security, even if the source's entropy is very limited or the conditioning has a backdoor, which both can allow an attack.

The people at NIST are competent, and it's reasonable to wonder if the true purpose of these tests is to give that illusion of security. That would be in line with a long history of the US actively sneaking weakened crypto:

The elaborate and decades-long compromise of the Swiss Crypto AG firm, that sold deliberately weakened cipher machines.
DES: an NSA publication, now declassified, acknowledges that it's 56-bit key is the result of a bargaining between the designers wanting 64-bit (at least: the Lucipher design had an option for 128-bit keys); and the NSA, trying to impose 48-bit to ease cracking; see this.
Dual_EC_DRBG mentioned in the other answer: a deliberate attempt (and at time success) to widely field a RNG, with a public design and parameters, that was secure except against US authorities (or whoever managed to change the public key, which happened).

score 31 · Accepted Answer · answered Nov 21 '21 at 20:25

Because there was previously a NIST approved random number generator (Dual_EC_DRBG) that was championed by NSA and had a flaw that is generally assumed to be an intentional backdoor created by NSA. This has made some people distrust any crypto algorithms that come out of NIST. Lots of articles on the net about this, here's one by Schneier that explains the issue in a fair amount of detail.

score 9 · Answer 3 · answered Nov 23 '21 at 23:18

You can make the best possible choice in all cases if you don't have to comply with FIPS 140-2. If you do have to comply with FIPS 140-2, you can only make the best approved choice. Thus FIPS 140-2 compliance never enables you to make better choices and sometimes forces you to make worse choices.

Say you have to choose between two options, one of which is FIPS 140-2 approved and the other is generally regarded as the much more secure choice by the cryptographic community. Which should you choose?

The answer is that you should definitely choose the one that is regarded as much more secure unless you must have FIPS 140-2 compliance. In that case, you must use the compliant one.

It is perfectly good to use FIPS 140-2 approved methods when they make sense. The only difference FIPS 140-2 compliance makes is that it forces you to make worse choices in some cases.

Paul Uszak · Answer 4 · 2021-12-09T00:27:34.273

My thoughts are too long for commenting, so I'll wrap them up as an answer...

There are some serious issues with the other two answers, to the extent that some of us have miss-interpreted what a randomness test is. As bullet points:-

1. "distrust any crypto algorithms that come out of NIST". There are no NIST generated algorithms in FIPS. Certainly none of the complexity of Dual_EC_DRBG. Runs and Poker tests are not US Department of Commerce (NIST) proprietary algorithms. They are mathematical characteristics of a uniformly random distribution. If I posit that the expected number of ones should be ~50%, does that make me a subversive? Neither does expanding the mean of 0.5 with $n$ standard deviations. $\mathcal{N}(\mu, \sigma^2)$ is the standardised form for that distribution and I wouldn't expect anything less incomplete. Checking for repeat output blocks (Continuous random number generator test) is not subversion, it's common sense.

2. Can I offer this FIPS test as evidence:-

$cat /dev/urandom | rngtest
rngtest 5
Copyright (c) 2004 by Henrique de Moraes Holschuh
This is free software; see the source for copying conditions.  There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rngtest: starting FIPS tests...
rngtest: bits received from input: 8310580032
rngtest: FIPS 140-2 successes: 415198
rngtest: FIPS 140-2 failures: 331
rngtest: FIPS 140-2(2001-10-10) Monobit: 41
rngtest: FIPS 140-2(2001-10-10) Poker: 53
rngtest: FIPS 140-2(2001-10-10) Runs: 123
rngtest: FIPS 140-2(2001-10-10) Long run: 115
rngtest: FIPS 140-2(2001-10-10) Continuous run: 0
rngtest: input channel speed: (min=10.703; avg=1976.720; max=19073.486)Mibits/s
rngtest: FIPS tests speed: (min=75.092; avg=199.723; max=209.599)Mibits/s
rngtest: Program run time: 43724402 microseconds

The failure rate is p=0.0008. That's very comparable to the p=0.001 threshold within the SP800 STS test suite, and dieharder's:-

NOTE WELL:  The assessment(s) for the rngs may, in fact, be completely
  incorrect or misleading.  In particular, 'Weak' p values should occur
  one test in a hundred, and 'Failed' p values should occur one test in
  a thousand -- that's what p MEANS.  Use them at your Own Risk!  Be Warned!

So not apparently controversial.

3. "It's not specified that these tests should be run on the unconditioned entropy source". Of course not. That's correct. No one has statistical characteristics for unconditioned entropy source distributions. They come in all shapes and locations. Some of them do not even have mathematical names (double sample of log normal, bathtub MOD $x$ e.t.c.) We can only run standardised statistical tests on conditioned final output.

4. "it's impossible to detect a competently backdoored generator from it's output alone". Again, of course. That's not the intention of e.g. FIPS startup testing. You need programmers and cryptographers for that. FIPS simply automates the randomness testing and sets out guidelines for basic security programming like no string literals for control, and relocatable code. All very normal.

Therefore FIPS 140 isn't all that contentious. Saying so is equivalent to saying NIST has backdoored the Normal distribution, or that dieharder is useless. FIPS is just not great at some few things. And testing 20,000 bit blocks fits neatly at the bottom end of the scale for randomness testing, just below ent (500,000 bits).

Why is FIPS 140-2 compliance controversial?

4 Answers4