How likely is an anomalous difference in standard errors?

Question

A many population $U$ has many elements $u_i$, each of which has some associated value of interest $x_i$ which can be measured precisely. The values of $x$ are distributed according to a Gaussian distribution. For the purposes of this question, it can be assumed that this is a unit normal distribution because the answer to the question will not depend on the underlying mean or variance.

Two experimenters, $E_1$ and $E_2$, are interested in determining the population mean value $<x>$. $E_1$ obtains $n>1$ samples from $U$, computes the mean, and computes a 95% confidence interval, of length $I_1$ centered on that estimated mean. $I_1$ is of course proportional to the standard error of the sample that $Ee_1$ obtained.

$E_2$ does exactly the same experiment as $E_1$, except that she uses twice the sample size: $2n$ samples. It is, of course, likely that $I_2 < I_1$, but that is not certain.

What is the probability $A(n)$ that $I_1 < I_2$?

That is meant to ask how rapidly does $A(n)$ fall with $n$.

Suppose $E_1$ gets sample variance $S_2^2$ and $E_2$ gets $S_2^2,$ then it seems you are asking for $P(S_1^2/n < S_2^2/2n) =$ $P(\frac{S_1^2/n}{S_2^2/2n} < 1),$ where $S_1^2/S_2^2 \sim F(n-1,2n-1).$ — BruceET, Jan 11 '17 at 18:45
Small adjustment necessary to compare CI lengths instead of std errors. — BruceET, Jan 11 '17 at 22:39

score 1 · Answer 1 · answered Jan 11 '17 at 19:06

$E_1$ computes a CI with half-width $I_1/2 = t_{\alpha}(n-1)S_{n}/\sqrt{n}$

$E_2$ computes a CI with half-width $I_2/2= t_{\alpha}(2n-1)S_{2n}/\sqrt{2n}$

where $S_k$ is the usual unbiased standard deviation for the sample of size $k$, and $t_\alpha(d)$ denotes the critical value for a $t$-distribution with $d$ degrees of freedom.

By Cochran's theorem, $$(n-1)\frac{S_n^2}{\sigma^2}\sim\chi^2_{n-1}\\ (2n-1)\frac{S_{2n}^2}{\sigma^2}\sim\chi^2_{2n-1} $$ so (assuming the samples are independent) $$\frac{S_n^2}{S_{2n}^2}\sim \frac{\chi^2_{n-1}/(n-1)}{\chi^2_{2n-1}/(2n-1)}\sim F_{n-1,2n-1} $$ where $F_{d_1,d_2}$ denotes a random variable with an F-distribution with $d_1$ and $d_2$ degrees of freedom. Then $$\begin{align}P(I_1<I_2) &= P(t_{\alpha}(n-1)S_{n}/\sqrt{n}< t_{\alpha}(2n-1)S_{2n}/\sqrt{2n})\\ &=P(\frac{S_{n}}{S_{2n}}< \frac{1}{\sqrt{2}}\frac{t_{\alpha}(2n-1)}{t_{\alpha}(n-1)})\\ &=P\left(\frac{S_{n}^2}{S_{2n}^2}< \frac{1}{2}\left(\frac{t_{\alpha}(2n-1)}{t_{\alpha}(n-1)}\right)^2\right)\\ &=P\left(F_{n-1,2n-1}<\frac{1}{2}\left(\frac{t_{\alpha}(2n-1)}{t_{\alpha}(n-1)}\right)^2\right)\end{align}.$$

Looks right (+1). My answer focused just on std err, not length of CI. For $n > 30$ and 95% CIs, the critical values of t are all about 2.0, so it doesn't make much difference. After lunch, I plan to adjust my answer. — BruceET, Jan 11 '17 at 19:29
Adjustment made for 99% CIs, where it makes enough difference to notice for small $n$. Asymptotically, negligible. — BruceET, Jan 11 '17 at 22:36

BruceET · Answer 2 · 2017-01-11T22:29:57.797

Following my Comments and using lengths of 99% CIs instead of standard errors. I used 99% CIs instead of 95% CIs because the necessary correction is a little larger, and especially more noticeable for small $n.$

n=2:60; t1 = qt(.995,n-1);  t2 = qt(.995, 2*n -1)
a = pf(.5*(t2/t1)^2, n-1, 2*n-1)
plot(n, a, ylim=c(0,.11), pch=20)
abline(h=0, col="green2")

Reality check: In the above $A(20) = 0.0316.$ Do the simulation with $m = 100,000$ iterations for each experimenter and compare lengths of their 99% CIs:

m = 10^5;  n = 20;  q1 = qt(.995, 9);  q2 = qt(.995, 19)
MAT1 = matrix(rnorm(m*n), nrow=m)
MAT2 = matrix(rnorm(m*n*2), nrow=m)
len1 = 2*q1*apply(MAT1, 1, sd)/sqrt(n)
len2 = 2*q2*apply(MAT2, 1, sd)/sqrt(2*n)
mean(len1 < len2)
## 0.03144                # aprx A(20)
pf(.5*(q2/q1)^2, n-1, 2*n-1)
## 0.03158697             # exact A(20)

score 1 · Answer 3 · edited Apr 13 '17 at 12:21

If $S^2_n$ denotes the sample variance obtained from a sample of size $n$, then $$P(I_2> I_1)=P\left(\frac{S^2_{2n}}{2n}>\frac{S^2_{n}}{n}\right)=P(S^2_{2n}>2S^2_n).\tag1$$ Since $S_n^2$ is distributed like $\frac1{n-1}\sum_1^{n-1}Y_i^2$ where $Y_1,Y_2,\ldots,Y_{n-1}$ are iid standard normal, the desired probability can be written $$ P\left(\frac1{2n-1}\sum_1^{2n-1} Y_i^2 > 2\frac1{n-1}\sum_1^{n-1}Y'^{2}_j\right)\tag2 $$ where the $Y_i$'s and $Y'_j$'s are mutually independent standard normal. Noting that each $Y^2$ has mean $\sigma^2=1$ and variance $2\sigma^4$, we can further manipulate (2) to obtain $$ P\left(\frac{\sqrt n}{\sqrt{2n-1}}\frac{\sum(Y_i-\sigma^2)}{\sqrt{(2n-1)2\sigma^4}}-2\frac{\sqrt n}{\sqrt{n-1}}\frac{\sum(Y'^{2}_j-\sigma^2)}{\sqrt{(n-1)2\sigma^4}} > \sqrt{2n}\right).\tag3 $$ Now by the central limit theorem the LHS of the inequality in (3) is approximately distributed like $ \frac1{\sqrt 2}X -2X' $, where $X$ and $X'$ are independent standard normal. So (3) is approximately the gaussian tail probability $$P(Z>\frac23\sqrt n)\le k\frac {e^{-2n/9}}{\sqrt n},\tag4$$ which I've bounded using this upper-tail inequality.

(Please check my algebra throughout.)

As mentioned in the Answer of @r.e.s., your first equality in (1) is not strictly true, but the adjustment to correct for that is small; (+1) anyway. — BruceET, Jan 11 '17 at 19:33
@BruceET You are right, the critical value for $n$ differs from that for $2n$. Luckily the required adjustment washes out asymptotically. — grand_chat, Jan 11 '17 at 19:40

How likely is an anomalous difference in standard errors?

3 Answers3