0

Here's the problem:

"During a test, we asked $120$ people who Jean-Jaques Rousseau was, and $12$ of them answered that he was a driver (which is false). Estimate the proportion of the population that would give a false answer. Obtain a $95\%$ confidence interval for this proportion."

Here's my attempt.

The estimated proportion of the population that would give a false answer is $10\%$

Let $R_1, ... , R_{120}$ be r.v representing the answers of each person. Let $X$ be a r.v that counts the nb of ppl that give a wrong answer, i.e $X = \sum_{i=1}^{120} {I(R_i = 1)}$

Clearly, $X\stackrel{}{\sim} B(\ n=120, \ p= 10\%)$

From there, I can set my pivot $Q$ = $\frac{X-E(X)}{\sqrt{Var(X)}}$, where $Q \stackrel{}{\sim} N(0,1)$

So, I have $$P(-1.96 < \frac{X-E(X)}{\sqrt{Var(X)}} < 1.96) = 0.95$$ where $E(X) = n\cdot p$ and $Var(X) = np(1-p)$.

This gives me $$P(5.56 < X < 18.44) = 0.95$$

Now, finally, we can obtain the confidence interval: $$[5.56/120, 18.44/120] = [0.046,0.154]$$

I think this is correct. However, I have the feeling that this method is unnecessarily complicated for a problem that seems quite simple. I guess there is a more clever way of using the central limit theorem, but I do not know how.

Skyris
  • 287
  • "The proportion of the population that would give a false answer is 10%" is not an correct assertion... The 10% came from the sample information since 12/120 = 10%. Furthermore this $\hat p 10%$ is an estimate of the true pop. proportion of people that give false answer. To complete problem, just construct 95% CI corresponding to $\hat p = 10%$. – cat Jun 06 '18 at 20:02

1 Answers1

1

The traditional (sometimes called 'Wald') confidence interval (CI) for a binomial proportion $\theta$ based on the number of successes $X$ in $n$ independent trials uses the point estimate $\hat \theta = X/n$ of $\theta$ and estimates the standard error of $\hat \theta$ as $SE = \sqrt{\frac{\hat \theta(1-\hat \theta)}{n}}.$ Thus an approximate 95% CI for $\theta$ is of the form $$\hat \theta \pm 1.96\sqrt{\frac{\hat \theta(1-\hat \theta)}{n}}.$$ Notice that there are two approximations involved in this type of CI: (a) approximating the binomial random variable $X \sim \mathsf{Binom}(n, \theta)$ by the normal distribution $\mathsf{Norm}(\mu= n\theta, \sigma=\sqrt{n\theta(1-\theta)}\,)$ and (b) approximating the standard error of $\hat \theta,$ which is $\sqrt{\frac{\theta(1-\theta)}{n}},$ by $\sqrt{\frac{\hat\theta(1-\hat\theta)}{n}}.$

More recently, computational studies have shown that in conjunction these two approximations can produce "95%" confidence intervals with considerably less than confidence 95% of coverage. An improvement suggested by Agresti and Coull is to estimate $\theta$ by $\check \theta = \frac{X+2}{n+4}$ and the standard error by $\sqrt{\frac{\check\theta(1-\check\theta)}{n+4}}.$ The Agresti-Coull CI of the form

$$\check \theta \pm 1.96\sqrt{\frac{\check \theta(1-\check \theta)}{n+4}},$$

has proved to be more accurate for small and moderate sample sizes $n.$ (For $n \ge 1000$ the adjustment becomes trivial and unnecessary.)

Note: For additional information you can google 'Agresti binomial confidence' or look at this page.

BruceET
  • 52,418