Fellow Stackers,
I have a formal customer requirement for a mechanism to have a failure rate of 0.1% or less. After some study, I found various methods for estimating the confidence interval of a binomial proportion and settled on the Wilson (1927) [1,2]. I got a seemingly reasonable answer: in n=1637 runs, if you observe 0 failures, you can conclude that this mechanism falls somewhere in the distribution of p=0.999 with 90% confidence. Sounds plausible but I'm not sure of the details (z vs. t) or the conclusion.
I made a table of results for various levels of confidence and several values of p. I included p=0.5, for grins. At the 90% confidence level, one sided, n=2, a sample without failures (2 heads or 2 tails) is sufficient to conclude that the coin has p=0.5. At the 95% confidence level, 3 in-a-row (3 heads or 3 tails) is sufficient to conclude that the coin is fair. At the 99.9% confidence level, you need to observe 6 in-a-row (all heads or all tails) to conclude that p=0.5. I've flipped a lot of coins in my day and a run of 6 is fairly unusual. Based on this, the k-in-a-row calculation doesn't really tell you how much coin flipping work to expect, right? It does tell you what what kind of sample would prove p, if you happen to observe such a sample. Is that correct?
Going back to the original issue, 0.1% failure rate of a mechanism, I want to tell my manager what to expect in terms of labor. Do I tell him to plan for a tech to run 1637 trials and probably several times over? If we get lucky and the first sample of 1627 trials doesn't contain a failure, I can quit, right? But there's no way to know when a 0-failure sample will occur.
So my questions are:
- Am I calculating the minimum number of trials correctly?
- Can I calculate how likely am I to observe a sample that proves p? In other words, how likely am to get sufficient number of success to prove p (at a given confidence level) on the first try?
Thanks for your attention. I am looking forward to some expert commentary!
KLN
References
- http://www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
- https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval
`
Minimum Trials to Prove Failure Rate
90% Confidence (z=1.28)
Trials to Prove Failure Rate with Failures
Failure --------------------------------------------
Rate % 0 1 2 3 4
-------------------------------------------------------
0.01 16383 33388 48060 61826 75069
0.02 3276 6677 9611 12364 15012
0.1 1637 3338 4805 6181 7505
0.2 818 1668 2402 3090 3752
0.3 545+ 1112 1601 2059 2501
0.5 327 666 960 1235 1500
1 163 333 479 617 749
2 81 166 239 307 374
5 32 65 95 122 148
10 15 32 47 60 73
50 2 5 8 11 13
-------------------------------------------------------
95% Confidence (z=1.65)
Trials to Prove Failure Rate with Failures
Failure --------------------------------------------
Rate % 0 1 2 3 4
-------------------------------------------------------
0.01 27223 45001 60625 75265 89307
0.02 5443 8998 12123 15051 17859
0.1 2720 4498 6060 7524 8928
0.2 1359 2248 3029 3761 4463
0.3 905 1498 2018 2506 2974
0.5 542 898 1210 1503 1783
1 270 448 604 750 890
2 134 223 301 374 444
5 52 88 119 148 176
10 25 43 58 73 86
50 3 7 9 12 15
-------------------------------------------------------
99% Confidence (z=2.33)
Trials to Prove Failure Rate with Failures
Failure --------------------------------------------
Rate % 0 1 2 3 4
-------------------------------------------------------
0.01 54284 72913 89831 105775 121068
0.02 10853 14578 17962 21151 24209
0.1 5424 7287 8978 10573 12102
0.2 2710 3641 4487 5284 6048
0.3 1805 2426 2989 3521 4030
0.5 1081 1453 1792 2110 2416
1 538 724 893 1052 1205
2 267 360 444 523 600
5 104 141 174 206 237
10 49 68 85 100 115
50 6 9 13 16 18
-------------------------------------------------------
Here's the code I used to generate the table:
#!/usr/bin/perl -w
use strict;
# Wilson score for binomial proportion CI based on
# www.itl.nist.gov/div898/handbook/prc/section2/prc241.htm
#
# X - successes
# n - trials
# z - For 2-sided interval, use z_{1-alpha/2} for lower limit and
# z_{alpha/2} for upper limit. For one-sided test, use z_{alpha}
# end - 0 is lower limit, 1 is upper limit
sub ws($$$$)
{
my $X = $_[0];
my $n = $_[1];
my $z = $_[2];
my $end = $_[3];
my $p_hat = $X / $n;
my $C = 1 / ( 1 + ($z*$z)/$n );
my $D = $p_hat + ($z*$z)/(2*$n);
my $E = $z * sqrt( $p_hat*(1-$p_hat)/$n + ($z*$z)/(4*$n*$n) );
if( $end == 0 ) {
return $C*($D-$E);
}
else {
return $C*($D+$E);
}
}
# Search for minimum number of trials such that the lower limit of
# reliability of the mechanism is within the confidence interval.
#
# p probability of mechanism succeeding
# z z-score for a given confidence level (1.65 for one-sided 95%)
#
# j number of failures allowed. Ususally just use 0
#
sub ws_search($$$)
{
my $p = $_[0];
my $z = $_[1];
my $j = $_[2];
my $n;
my $p_lower = -1;
for( $n = $j + 1; $n <= 1000000; $n += 1 ) # j+1 avoids sqrt of neg
{
$p_lower = ws( $n-$j, $n, $z, 0 );
if( $p_lower > $p ) {
last;
}
}
return( $n );
}
# Minimum success rates
my @ps = ( 0.9999, 0.9995, 0.999, 0.998, 0.997, 0.995, 0.99, 0.98, 0.95, 0.90, 0.5 );
# standard score for alpha, one sided
# 90% 95% 99%
my @zs = (1.28, 1.65, 2.33);
# Number of failures withing a given sample
my @fails = ( 0, 1, 2, 3, 4 );
foreach my $z (@zs) {
print( "z=$z\n" );
foreach my $p (@ps) {
printf( "p=%8.6f ", $p );
foreach my $fail (@fails) {
printf( "%8i ", ws_search( $p, $z, $fail ) );
}
print( "\n" );
}
print( "\n" );
}