0

The textbook, Introduction to Probability by Anderson, Seppalainen, Valko, reads, on page 151,


Another task is to find the confidence interval around $\hat{p}$ that captures the true $p$, with a given (high) probability. The $100r \%$ confidence interval for the unknown success probability $p$ is given by $(p- \epsilon,p + \epsilon)$ where $\epsilon$ is chosen to satisfy $P(\hat{p} − p| < \epsilon) \geq r$. In other words, the random interval $(p − \epsilon‚\mu + \epsilon)$ contains the true $p$ with probability at least $r.$

Example 4.12.

We repeat a trial $1000$ times and observe $450$ successes. Find the $95\%$ confidence interval for the true success probability $p.$ This time $n$ is given and we look for $\epsilon$ such that $P(|\hat{p}- p| <\epsilon) \geq 0.95.$ From (4.9) we need to solve the inequality $2\Phi (2\epsilon \sqrt{n}) − 1 \geq 0.95$ for $\epsilon$. First simplify and then turn to the $\Phi$ table: $$\Phi(2\epsilon\sqrt{n})\geq 0.975 \iff 2\epsilon\sqrt{n} \geq 1.96 \iff \epsilon \geq \frac{1.96} {2\sqrt{1000}} \approx 0.031.$$

Thus if $n = 1000$, then with probability at least $0.95$ the random quantity $\hat{p}$ satisfies $|\hat{p}- p| < 0.031.$ If our observed ratio is $\hat{p} = \frac{450}{1000} = 0.45,$ we say that the $95\%$ confidence interval for the true success probability $p$ is $(0.45 −0.031, 0.45 + 0.031) =(0.419, 0.481).$

Note carefully the terminology used in the example above. Once the experiment has been performed and 450 successes observed, $\hat{p} = \frac{450}{1000}$ is no longer random. The true p is also not random since it is just a fixed parameter. Thus we can no longer say that "the true $p$ lies in the interval $(\hat{p}-0.031, \hat{p}+0.031) = (0.419, 0.481)$ with probability $0.95.$" That is why we say instead that $(\hat{p}-0.031, \hat{p} + 0.031) = (0.419, 0.481)$ is the $95\%$ confidence interval for the true $p$.


I do not understand the distinction that is being made here. What is the difference between saying, "the true $p$ lies in the interval $(0.419,0.481)$ with probability $0.95$" versus, "$(0.419, 0.481)$ is the $95\%$ confidence interval for the true $p$"?

Jbag1212
  • 1,698
  • I don't know exactly , just that this confused the hell out of me as well. It has to do with fundamentals of statistics, as well as with the frequentists vs bayesian statistic. I never found a convincing and satisfying explanation though – AndroidBeginner Mar 26 '23 at 23:42
  • Are you quoting the text exactly? Particularly the part that reads "the true $p$ lies in the interval $(p - .031, p + .031)$: do you want to replace those last $p$'s with $\hat{p}$'s? Because for any positive $e$, it's always the case that $p$ lies in $(p - e, p + e)$. – user43208 Mar 26 '23 at 23:44
  • @user43208 Yes, you are right, I have fixed this. – Jbag1212 Mar 26 '23 at 23:49
  • 2
    The difference here is that p is not random, the interval is. p is just a number, so it is either in the given interval or it isn't in the given interval. So the interpretation is that if you repeat the experiment of sampling 1000 times many times, the interval you construct with this procedure (which will change each time you repeat the experiment) will contain the true p in at least (approximately) 95% of the samples. – Chris Janjigian Mar 27 '23 at 00:00
  • I trust that you searched for related questions on this site before you posted and found dozens of them, many of them with several answers. It would make it much more likely that someone can provide an answer that's actually helpful for you if you tell us something about why all those other answers weren't. (You could start here, but most of the questions you'll find searching for something like "confidence interval interpretation" will do.) – joriki Mar 27 '23 at 00:02
  • Independent of whether you found a related question-answer pair that addresses your question to your satisfaction, you should read carefully Chris's comment. You should think of that 95 percent as referring to the percentage of $\hat{p}$'s (which varies from sample to sample) in the entire sampling distribution such that $p$ lies in the confidence interval. For any particular fixed $\hat{p}$, though, it's 1 or 0 (i.e., $p$ is in it, or it's not) -- except you don't know which. – user43208 Mar 27 '23 at 00:29

0 Answers0