2

Question: In a crossover trial comparing a new drug to a standard, $\pi$ denotes the probability that the new one is judged better. it is desired to estimate $\pi$ and test $H_0:\pi=0.50$ against $H_a: \pi \neq 0.50$. In $20$ independent observations, the new drug is better each time.

Give the ML estimate of $\pi$. Conduct a Wald test and construct a 95% Wald confidence interval of $\pi$. Are these sensible?

I have this so far:

Ml: $\hat{\pi} = \frac{y}{n} \rightarrow \frac{20}{20}=1.$

The Wald test is $Z_w = \frac{\hat{\pi}-\pi_o}{\sqrt{\frac{(1-\hat{\pi})\hat{\pi}}{n}}}$ Inputting our values we just get

$\rightarrow Z_w = \frac{1-0.50}{\sqrt{\frac{(1-1)0.50}{20}}}=0$ For some reason the solution tells me that it goes to $\infty$, if someone could explain that.

The 95% Wald confide interval of $\pi$ is$\rightarrow \hat{\pi} \pm Z_\frac{\alpha}{2} \sqrt{\frac{\hat{\pi}(1-\hat{\pi})}{n}}$.

Which is just $ 1 \pm 1.96(0) \rightarrow (1,0) \& (1,0)$ Is this sensible?

1 Answers1

1

It is good that you ask, "Is this sensible?" I think the purpose of this problem is to illustrate a difficulty with the traditional or Wald CI. It gives absurd one-point "intervals" as results when $\hat \pi$ is either 0 or 1. (BTW. your test statistic is infinite because the denominator has the factor $(1-\hat \pi) = (1 - 1) = 0.$)

Presumably you have studied or are about to study other kinds of CIs for the binomial proportion. If not, you can look up the 'Wilson' interval on the Internet; the Wikipedia page is pretty good. The 95% Wilson CI results from 'inverting the test', solving the inequality

$$-1.96 \le \frac{\hat \pi = \pi}{\sqrt{\pi(1-\pi)/n}} \le 1.96$$

to get an interval for $\pi.$ (This involves solving a quadratic equation, and a page of tedious algebra.) For your situation with $x = n = 20,$ the Wilson interval is approximately $(.8389,1.000).$

The Wilson interval is a little messy to compute, so Agresti and Coull have proposed an interval that is very nearly the same for 95% intervals. The idea is to append four imaginary observations to your data, two Successes and two Failures. Thus, you have $\tilde n = n+ 4$ and $\tilde \pi = (x + 2)/\tilde n.$ Then the 95% CI is of the form $$\tilde \pi \pm 1.96\sqrt{\frac{\tilde \pi(1 - \tilde \pi)}{\tilde n}}.$$ [The solution for the Wilson interval has some $2$s and some "small" terms with powers of $n$ in denominators. The Agresti interval conflates $1.96 \approx 2$ and ignores some "small" terms.] In the Agresti interval, $\tilde \pi > 0$ and $\tilde \pi < 1,$ so that nonsensical "one-point" CIs cannot occur. Perhaps more important, Brown, Cai, and DasGupta (2001) have shown that this Agresti or 'Plus-4' interval has actual coverage probabilities much nearer to the nominal 95% than the Wald intervals. The paper is readable, or you could look at this page.

Based on asymptotic results, Wald intervals are fine for very large $n.$ However, they involve two approximations that do not work well for small and moderate $n.$ (1) the normal approximation to the binomial and (2) the use of the approximate standard error $\sqrt{\hat \pi(1-\hat \pi)/n}$ instead of the exact standard error $\sqrt{\pi(1- \pi)/n}.$

Finally, a Bayesian probability interval (based on an non-informative prior distribution) is sometimes used as a CI when a computer package such as R is available to compute the endpoints. When there are $x$ successes in $n$ trials, the 95% interval uses quantiles .025 and .975 of the distribution $\mathsf{Beta}(x+1, n - x + 1).$ So for your example with $x = n = 20,$ the interval would be $(0.8389, 0.9988).$

x = n = 20; qbeta(c(.025,.975), x + 1, n - x + 1)
## 0.8389024 0.9987951
BruceET
  • 52,418