0

Suppose I want to estimate the head probability p for a biased coin. Usually this is done by assuming the distribution tends to normal using large number of trials and CLT. I am wondering how robust is the CLT approximation. Because confidence interval is an objective statement about the true value of p, would CLT approximation makes the confidence interval calculated this way wrong in some extreme cases? Also, How to find the confidence interval for this simple coin flip example exactly without using CLT?

BruceET
  • 52,418
Shadumu
  • 321

1 Answers1

2

Suppose we have $x = 21$ successes in $n = 50$ trials. Here are a few approaches to getting a 95% CI for success probability $\theta.$

Using binomial tails in quest of an "exact" interval. Without using a normal approximation, the rough idea to get a 95% CI for binomial success probability $\theta$ is to find $\theta_1$ such that $P(X \le x; n, \theta_1) \approx .025$ and $\theta_2$ such that $P(X \le x; n, \theta_2) \approx .975.$ Then the CI is $(\theta_1, \theta_2).$ This idea is discussed in Section 3 of these course notes.

Briefly, an important difficulty is that the discreteness of the binomial distribution prevents getting 'tail probabilities' of exactly .025 for either $\theta_i.$ One solution is to fuss about putting a little more probability in one tail and a little less in the other tail in order to get as close to a 95% CI as possible. Another solution (the 'conservative approach') is to allow tail probabilities smaller than .025 to ensure at least 95% confidence

Below is a rough demonstration (with no fussing) that I programmed in R statistical software. It gives the CI $(.300, .568).$

x = 21;  n = 50
th = seq(0, 1, by=.001)        # grid of values of 'theta' in [0,1]
cdf = pbinom(x, n, th)
d1 = abs(cdf - .975)
th1 = th[d1 == min(d1)];  th1
## 0.3
d2 = abs(cdf - .025)
th2 = th[dl == min(dl)];  th2
## 0.568

In R, a refined version of this style of CI is part of the output of the function binom.test, which includes the CI $(0.282, 0.568).$ From what I can tell, this function uses a conservative approach, hence the slightly smaller left endpoint. [By default, the function tests whether the coin is fair, although other hypotheses may be specified.]

binom.test(21,50)

        Exact binomial test

data:  21 and 50
number of successes = 21, number of trials = 50, p-value = 0.3222
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.2818822 0.5679396
sample estimates:
probability of success 
                  0.42 

Bayesian Posterior Interval. Sometimes Bayesian posterior intervals are used as confidence intervals. Based on a 'noninformative' $\mathsf{Unif}(0,1)$ prior distribution and the binomial likelihood function from $x = 21$ and $n = 50$, one obtains the posterior distribution $\mathsf{Beta}(x + 1, n-x + 1).$ Cutting probability .025 from each tail of this distribution one obtains the interval $(0.293, 0.558).$

qbeta(c(.025,.975), x+1,n-x+1)
## 0.2934549 0.5583072

Agresti-Coull Interval. The so-called 'Wald' CI, based on $\hat \theta = x/n,$ is of the form $\hat \theta \pm 1.96\sqrt{\hat\theta(1-\hat\theta)/n}.$ You are correct that this kind of interval can give very bad results for small $n,$ especially when $\theta$ is far from $1/2.$ Not only does it use a normal approximation that may not be appropriate for such $n$ and $\theta,$ it also assumes (under the square root sign) that $\hat\theta = s/n$ is the same as $\theta.$ "Bad" means that the true coverage probability of a "95%" CI can be far from $95\%$, often much lower.

However, a considerable improvement is achieved by using $\tilde n = n + 4$ and $\tilde\theta = (x+2)/\tilde n$ to make the CI $\tilde \theta \pm 1.96\sqrt{\tilde\theta(1-\tilde\theta)/\tilde n}.$ This is called the Agresti-Coull interval; for confidence levels other than 95%, slight adjustments are made. You can google 'Agresti interval' for details. The Agresti interval for $x = 21$ and $n = 50$ is $(0.294, 0.558)$.

pa = (x+2)/(n+4);  pm = -1:1
pa + pm*1.96*sqrt(pa*(1-pa)/(n+4))
## 0.2940364 0.4259259 0.5578154

While the Agresti intervals use a normal approximation, they do not assume that $\tilde \theta$ is the same as $\theta.$

When $n$ is quite small, all of the CIs based on a normal approximation perform badly. However, when $n$ is small it is increasingly difficult to find appropriate binomial quantiles to make an "exact" interval of approximate confidence 95% (or any other desired level).

BruceET
  • 52,418
  • 1
    In the discrete setting I am inclined to just report the interval at whatever level of significance it provides, rather than seeking out the interval for a specified level of significance. This is basically your "conservative" approach, except that in doing this you could also reduce the level of significance from your initial "target". – Ian Jun 23 '17 at 23:12
  • Thanks for your comprehensive answer. One thing to clarify: when you say " this kind of interval can give very bad results for small n, especially when θ is far from 1/2", why does it give bad result for θ far from 1/2? Is it because of the truncation? – Shadumu Jun 24 '17 at 08:32
  • @user3229471 Normal approximation is valid for $nq \gg 1$ where $q=\min { p,1-p }$. – Ian Jun 24 '17 at 10:17
  • Is it possible then to use the truncated normal? @lan – Shadumu Jun 24 '17 at 11:42
  • @user3229471 I do not know whether there is a way to use some sort of asymmetrically truncated normal distribution for that purpose. – Ian Jun 24 '17 at 13:26
  • @Ian. One of my previous answers on binomial CIs shows the poor 'coverage' probability of the Wald Interval, even in cases where the usual rules of thumb for using normal aprx to binom are satisfied. Lack of normal fit, discreteness of binomial, and using and estimate for the variance all play roles. Wilson style intervals as well as (above) Agresti, and Bayesian intervals can help. Even AP stat books now suggest Agresti. Flat prior Bayesian is better if software available. – BruceET Jun 24 '17 at 15:26
  • @BruceET How do we generalize it to the multinomial case(with the contrainst that $\sum \theta_i =1$)? It certainly can be decomposed into several binomial cases, but maybe more exact and tight condition can be given? – Shadumu Jul 06 '17 at 16:04