Consider the statistical model $\left( E, \{\Bbb P_ \theta\}_{\theta \in \Theta}\right)$(with $\Theta \subseteq \Bbb R$) and a random i.i.d sample $X_1, \ldots, X_n$ from the true distribution $P_{\bar{\theta}}.$ Note that we fix the true parameter $\bar{\theta}$ to differentiate it from the generic $\theta$ that is used for denoting the elements in $\Theta.$ Let $\alpha \in (0, 1).$ Which of the following definitions of a confidence interval of level $1 - \alpha$ is the correct one? Could you point me to a reference on the correct definition?
A). An interval $\mathcal{I}_n$ whose boundaries do not depend on $\bar{\theta}$ (but are possibly a function of the sample) such that $\Bbb P_{\theta}(\mathcal{I}_n \ni \bar{\theta}) \geq 1 - \alpha \;\;\forall \;\theta \in \Theta.$
B). An interval $\mathcal{I}_n$ whose boundaries do not depend on $\bar{\theta}$ (but are possibly a function of the sample) such that $\Bbb P_{\theta}(\mathcal{I}_n \ni \theta) \geq 1 - \alpha \;\;\forall \;\theta \in \Theta.$
C). An interval $\mathcal{I}_n$ whose boundaries do not depend on $\bar{\theta}$ (but are possibly a function of the sample) such that $\Bbb P_{\bar{\theta}}(\mathcal{I}_n \ni \bar{\theta}) \geq 1 - \alpha.$
- 1,978
-
It's definitely not A (because such an interval need not exist, and it wouldn't be interesting if it did). And I believe B and C are fundamentally the same thing. – user469053 Sep 06 '22 at 15:39
-
@user469053 It is clear that B implies C but the converse does not hold. In any case, I am interested in the formal definition. Could you please help? – John D Sep 06 '22 at 18:46
-
1It's confusing what is meant by $P_\theta$ in B). You say that $\mathcal I_n$ is constructed using a sample from the true distribution $P_{\bar \theta}$, but I'm not sure if that's what you intend when you write $P_\theta(\theta \in \mathcal I_n)$. – user6247850 Sep 07 '22 at 02:02
-
If you are saying that C would hold, regardless of which value in the parameter space was the "true" value $\overline{\theta}$, then you have an implicit $\forall \overline{\theta}\in \Theta$ in $C$, yielding equivalence of B and C. – user469053 Sep 07 '22 at 16:23
3 Answers
After a cursory scan of some textbooks (Lehmann and Romano, "Testing Statistical Hypotheses" sec. 5.4; Keener, "Theoretical Statistics" sec. 9.4; or Wasserman "All of Statistics" sec 6.3.2), the most common definition seems to be (B).
However, the intuitive definition for a confidence interval is an interval that contains the true parameter (ie $\bar{\theta}$) with a given probability, which suggests that (C) is the more appropriate definition, so why is it less common?
As you say, (B) implies (C). But if we are going to prove that (C) holds without knowing $\bar{\theta}$, the only way of doing this is to prove that it holds regardless of the true value of $\bar{\theta}$. But that's exactly what (B) means! Therefore, we might as well take (B) as the definition in the first place; despite being seemingly more restrictive, in practice we don't gain anything from a looser definition.
- 5,428
From Shao's Mathematical Statistics p. 471:
(...) $X=(X_1,...,X_n)$ denotes a sample from a population $P \in \mathcal{P}$; $\theta=\theta(P)$ denotes a functional from $\mathcal{P}$ to $\Theta\subset \mathcal{R}^k$ for a fixed integer $k$; and $C(X)$ denotes a confidence set for $\theta$, a set in $\mathcal{B}_\Theta$ (the class of Borel sets on $\Theta$). We adopt the basic concepts of confidence sets introduced in §2.4.3. In particular, $\inf_{P\in \mathcal{P}}P(\theta \in C(X))$ is the confidence coefficient of $C(X)$, and if the confidence coefficient of $C(X)$ is $\geq 1-\alpha$ for fixed $\alpha \in (0,1)$, then we say that $C(X)$ has significance level $1-\alpha$ or $C(X)$ is a level $1-\alpha$ confidence set.
and §2.4.3 at p. 129, where notation differs slightly:
Consider a real-valued $\vartheta$. If $C(X)=[\underline{\vartheta}(X),\overline{\vartheta}(X)]$ for a pair of real valued statistics $\underline{\vartheta}$ and $\overline{\vartheta}$, then $C(X)$ is called a confidence interval for $\vartheta$.
Also, from p. 129 again:
(...) $C(X)\in \mathcal{B}^k_\Theta$ depending only on the sample $X$.
So $C(X)$ is a confidence interval if $C(X)=[\underline{\vartheta}(X),\overline{\vartheta}(X)]$ and $$\inf_{P \in \mathcal{P}}P(\theta\in [\underline{\vartheta}(X),\overline{\vartheta}(X)])\geq 1-\alpha$$ Since $\theta: \mathcal{P}\to \Theta$ is a functional of $P$, the answer seems to be (B).
- 18,347
-
Thanks for your answer. I guess my confusion comes from the abuse of notation that, coincidentaly, is employed in your reference too, that is, the distinction between the true distribution $\bar{P} \in \mathcal{P}$ and a generic $P \in \mathcal{P}.$ Could you please share some light into it? I think that it should formally say: "$X$ denotes a sample from a population $\bar{P} \in \mathcal{P}; \theta:\mathcal{P} \rightarrow \Theta \subseteq \Bbb R^k$ denotes a functional, ... In particular , $\inf_{P \in \mathcal{P} }P(\theta(P) \in C(X))$ is the ....". Could you confirm my suspicion? – John D Sep 08 '22 at 22:01
-
1@JohnD $\theta$ is defined as a functional on $\mathcal{P}$ so it seems to me that $P(\theta(P)\in C(X))$ is the full statement; e.g. the usual Gaussian confidence coefficient for the mean with known $\sigma^2$ is $P(\mu \in [\overline{X}-a, \overline{X}+b])$ where $\theta(P)=\mu$ and $P$ has Lebesgue density $(2\pi \sigma^2)^{-n}\exp(-\sum_{1\leq k\le n}(x_k-\mu)^2/(2\sigma^2))$ (the laws are indexed by $\mu$).
As for the presence of $\overline{\theta}$, there is no direct mention of it so it is unclear; but this issue does not influence the definition of the confidence sets
– Snoop Sep 08 '22 at 23:24 -
Thanks! So when the book writes " ... denotes a sample from a population $P \in \mathcal{P}$ ..." it is not meant that $P$ is denoting the true distribution, but rather that the true distribution belongs to $\mathcal{P},$ right? For me those statements are not equivalent. – John D Sep 09 '22 at 00:56
-
My interpretation of the statement is that $\mathcal{P}$ is the collection of admissible laws of $X$ @JohnD – Snoop Sep 09 '22 at 01:08
-
The difference between B and C is, I believe, a linguistic distinction without a difference.
From the outset, we can either declare that there is a "true" value $\overline{\theta}$ of the parameter, or we can decline to do that. In the latter case, when we decline to declare that there is a "true" value, the meaning of the model is that the different values of $\theta$ in $\Theta$ exist in parallel universes, and in the $\theta$ universe, that value of $\theta$ is the true value in that universe. In this case, we're not saying that there is no true value of the parameter. We're effectively saying that each value is/could be the true value. Informally, this is equivalent to saying that there is a "true" value $\overline{\theta}$, but we don't know what it is and all of our theory/definitions/formulas, etc., will remain as stated regardless of which value of $\overline{\theta}$ is the true one.
The expression in C is consistent with the linguistic convention of declaring that there is a "true" value $\overline{\theta}$, and the expression in B is consistent with the linguistic convention of declining to do so. There is an explicit "$\forall \theta\in \Theta$" in B, and there is an implicit "$\forall \overline{\theta}\in \Theta$", because C is intended to hold regardless of which value of $\overline{\theta}$ is the "true" value.
As a thought experiment to try to understand whether there really is a fundamental difference between B and C, I would ask: In estimation, $\overline{\theta}$ is the true value, why would you ever compute/care about $\mathbb{P}_\theta$ for any $\theta$ other than $\overline{\theta}$? For an event $A$, $\mathbb{P}_\theta(A)$ exactly refers to the probability of the event $A$ in that universe in which $\theta$ is the true value of the parameter.
- 2,620