3

A coin has a probability of getting tails of $\theta$ unknown. We would like to do the following hypothesis test over the value of $\theta$: $$\begin{cases}H_0 : \theta=0.5 \\ H_1 : \theta > 0.5 \end{cases}$$ Suppose we flipped the coin 5 times and got 4 tails. Calculate the p-value.

Let $X$ be a random variable such that it takes $1$ if we got tails, and $0$ if heads (a Bernoulli distribution with unknown parameter $\theta$) and $X_1, \dots, X_5 $ a simple random sample of $X$.

Then the p-value is going to be $P_{H_0}(T\geq t)$ (assuming $H_0$ is true), with $T=\overline{X}_n$ and $t=\frac{4}{5}=0.8$.

So, $P_{H_0}(T\geq t)=P_{H_0}(\overline{X}_n\geq0.8)=1-P_{H_0}(\overline{X}_n <0.8)$

By the central limit theorem, $\overline{X}_n \sim N\left(0.5,\frac{(0.5)(1-0.5)}{5} \right)=N(0.5, 0.05)$ (approximately).

So, $p-value=1-\phi \left(\frac{0.8-0.5}{0.22} \right)=0.0869$ which is incorrect. Where's my mistake?

The correct answer is

0.187

Moria
  • 817
  • 1
    Basically, you want to know $P(X \geq 4)$, where $X$ follows a Bernoulli distribution with parameters $n=5$ and $p=0.5$. Besides this, note that a minimum requirement for the application of the central limit theorem is a sample size of $n=30.$ – trancelocation Jun 19 '22 at 14:27
  • The "n=30" mantra that is often seen is pretty much nonsense. For any reasonable adjudication of "close enough to normal to use the approximation" there's counterexample situations that won't meet it at n=30 ... (and indeed for essentially any other choice of n). Unless some conditions are placed on it, the best that can be said for a 'rule' like that is 'it works when it works' which is not of great practical value if you're trying to guess whether it's reasonable in some particular case where you don't know the population distribution you're faced with ('okay, but will it suffice here?') – Glen_b Jun 20 '22 at 01:28
  • 1
    In some particular cases, n=2 may be fine. In some case n=10 is excellent. In some cases n=1000 may not be sufficient. There are more specific rules of thumb for the binomial in particular (e.g. n min(p,1-p) > 5 or np(1-p)>10 - not that I am advocating either of those; they're sometimes quite inadequate), that for particular p will require much larger n than 30 (e.g. consider p=1/36, the chance of snake-eyes on two dice and look at np>5 which implies n>180; even then the approximation is pretty rough) – Glen_b Jun 20 '22 at 01:32

2 Answers2

5

I think the Central Limit Theorem is a poor approximation here for 5 samples. The exact probability is $P(T \ge t) = P(5T \ge 4) = P(5T = 5) + P(5T = 4) = \frac{1}{2^5} (\binom{5}{0} + \binom{5}{1}) = 0.1875$.

angryavian
  • 93,534
1

$ H_{0}: X \sim \mathcal{N}( 0,5;\ \ 5\cdot 0,5 \cdot (1-0,5))= \mathcal{N}(0,5;\ \ 1,25).$

$ p-value = P(\{X >0,8\}) = 1 - P(\{Z\leq \frac{0,8-0,5}{\sqrt{1,25}}\}) = 1 -\phi(0,2683)= 0,3942.$

Janusz
  • 39
  • 4
  • This seems to fall victim of relying on the CLT with an overly small sample size. – Daniel R. Collins Jun 19 '22 at 23:15
  • This isn't right. If $X$ is to represent the mean of the samples, then the normal approximation is $N(0.5, 0.5(1-0.5)/5)$ and the p-value is $P(X \ge 0.8)$ as OP as done. If instead $X$ is to represent the sum of the samples, then the normal approximation is $N(5 \cdot 0.5, 5 \cdot 0.5 (1-0.5))$ and the p-value is $P(X \ge 4)$. Both should give the same result and both suffer from the poor approximation due to the low sample size. – angryavian Jun 19 '22 at 23:34