Why use t distribution when we are already using Bessel correction?

Question

As far as I understand it, when estimating the population mean from a sample without knowing the population standard deviation $\sigma$, we cant use the $Z$-test. According to the Central Limit Theorem, the sampling distribution of sample means is a normal distribution with mean${}=\mu$ and variance${} = \sigma / \sqrt n$. But the sample standard deviation $s = \frac {\sum(x_i - \mu)^2}{n} $ underestimates the true population parameter $\sigma$ (i.e. $s < \sigma$). For that reason, we cannot apply Central Limit Theorem using the sample standard deviation directly.

But we use Bessel's correction precisely for this reason! Ww write $s = \frac {\sum(x_i - \mu)^2}{n-1} $ so that now $s ≈ \sigma$. My question is, after applying Bessel Correction, why cant we use Z test directly to estimate population mean $\mu$ ?

The $T$-distribution is a bit flatter than the $Z$-distribution. This essentially says the fact that the population variance is a bit more than the variance we estimated from the sample. But did we not already take that into consideration by applying Bessel correction?

Now another question arises in this context. From the Central Limit Theorem, the sampling distribution is essentially a Normal distribution and nothing else. Never any other distribution, and certainly NOT a $T$-distribution. Just because we may fail to estimate the variance does not mean you should change the sampling distribution of sample means from a $Z$-distribution to a $T$-distribution. If instead of the $Z$-distribution, you would have taken some other Normal distribution with variance "a bit more than 1", at least that would have made sense. A $T$-Distribution looks like a normal distribution, but is NOT a $Z$-distribution with variance increased by some amount. Just because there may be some uncertainty in determination of the $\sigma$, why are you assuming that, some entirely other distribution should better approximate the distribution of sample means?

Note: I have already looked into the following answers and they do not satisfactorily answer my question

Estimating population SD when calculating t-statistic

When the population variance is unknown, we should use t-distribution.

Yeah, I am. Could you please explain in details as an answer? — user257330, Dec 31 '19 at 07:03

Michael Hardy · Answer 1 · 2019-12-31T07:20:19.220

1

But the sample standard deviation $s = \frac {\sum(x_i - \mu)^2}{n} $ underestimates the true population parameter $\sigma$ (i.e. $s < \sigma$).

Attention to some details is needed here.

You first said $\mu$ is the mean of the population from which the sample is drawn.
But then you said $\frac{\sum(x_i-\mu)^2} n$ is the sample standard deviation. That is wrong. You should distinguish between the population mean $\mu$ and the sample mean $\overline x.$ The latter is different for different for different samples of $n$ observations; the former is not. The sample variance, not the sample standard deviation, is $\frac{\sum (x_i-\overline x)^2} n,$ and that differs from $\frac{\sum(x_i-\mu)^2} n.$
This should be called $s^2,$ not $s,$ and (as mentioned) it's the sample variance, not the sample standard deviation.
This sample variance on average, underestimates $\sigma^2.$ Note: $\sigma^2,$ not $\sigma.$
That is not the same as saying $s^2<\sigma^2;$ rather it means $\operatorname E(s^2) < \sigma^2.$

But we use Bessel's correction precisely for this reason! Ww write $s = \frac {\sum(x_i - \mu)^2}{n-1} $ so that now $s ≈ \sigma$.

No. We write $s^2$ (not $s$) ${} = \frac{\sum(x_i-\overline x)^2)}{n-1},$ not $\frac{\sum(x_i-\mu)^2} n,$ and $\operatorname E(s^2)$ (not $\operatorname E(s)$) ${}=\sigma^2$ (not ${} =\sigma$), and $\operatorname E(s^2),$ not just $s^2.$

My question is, after applying Bessel Correction, why cant we use Z test directly to estimate population mean $\mu$?

We have $s^2= \sum(x_i-\overline x)^2/(n-1).$

We know that

$$ \frac{\overline x - \mu}{\sigma/\sqrt n} \sim \operatorname N(0,1). \tag 1 $$

But $$ \frac{\overline x - \mu}{s/\sqrt n} \sim t_{n-1}. \tag 2 $$

The latter is used for deriving confidence intervals and hypothesis tests on $\mu$ because we cannot observe $\sigma,$ whereas we can observe $s.$

We need a pivotal random variable in which the only unobservable quantity is $\mu.$ $\text{“}$Pivotal$\text{''}$ means its probability distribution does not depend on unobservables, and that the only unobservable taken into account in finding the value of the pivotal quantity is the one for which we want a confidence interval or a hypothesis test.

edited Dec 31 '19 at 07:20

answered Dec 31 '19 at 07:10

Michael Hardy

1

I understand the mistakes I made in the question. But you have started answering my actual question in the last 2 paragraphs only. I understand why equation (1) holds (due to CLT), but my main question (which you did not answer) is why equation (2) holds? I understand that just because E[s^2] = $\sigma^2$, we cant say that $E[s]=\sigma$. But still I dont fully get it. – user257330 Dec 31 '19 at 07:20
1

@user257330 : Actually, I had in mind a sample from a normally distributed population, so that CLT is not needed in order to know about the distribution of $(1).$ Nor of $(2).$ Notice two things: the distributions of $(1)$ and $(2)$ differ from each other, and $(2)$ the distribution of $(2)$ does not change if $\sigma$ changes, since the numerator and the denominator are both multiplied by the same thing. And we can thus take $(2)$ to be the definition of the distribution that we call $t_{n-1}. \qquad$ – Michael Hardy Dec 31 '19 at 07:23
So Bessel Correction is meaningless if we want the unbiased standard deviation of the population? It only helps as long as we are satisfied with the unbiased variance only? But even that is meaningless. If we have the unbiased variance, cant we just take the square root and it is expected to give us unbiased standard deviation? And if we indeed have the unbiased estimate of standard deviation of population, my actual question returns that is: if $s = \sigma$ , why is equation (2) NOT a normal distribution ? – user257330 Dec 31 '19 at 07:31
No: The square root of an unbiased estimator of the variance is NOT an unbiased estimator of the standard deviation. – Michael Hardy Dec 31 '19 at 07:38
1

. . . . and it is certainly not true that $s=\sigma.$ You take a sample of $n$ observations from the population and you get a value of $s.$ Then take the next independent sample of $n$ observations and get a DIFFERENT value of $s.$ But the value of $\sigma$ does not change. The average of all possible values of $s^2$ is $\sigma^2,$ but that's not the same as saying $s^2=\sigma^2. \qquad$ – Michael Hardy Dec 31 '19 at 07:41
As far as I get it, in equation (1), there is only one random variable which is $x^{bar}$, the others are constants for a given population. But in equation (2), there are actually two random variables $x^{bar}$ and $s$, and although $E[s] = \sigma$, but it changes the distribution from normal to t-dist (after taking the distribution of s into picture along with the normal dist of x^{bar}). Am I right?
Also please state that whether $E[s] = \sigma$ ? I just wrote that to emphasize that even if this be true, but that does not stop from the distribution getting changed from a normal to t-dist
– user257330 Dec 31 '19 at 07:52
... my main confusion is the distribution getting changed from a normal to t-dist. – user257330 Dec 31 '19 at 07:53
@user257330 : Everything before "Am I right?" is right. It is not true that $\operatorname E(s) = \sigma.$ The change from the normal distribution to the t-distribution coincides exactly with the step from $(1)$ to $(2)$. – Michael Hardy Dec 31 '19 at 08:01
@user257330 : You can get exactly the same confidence intervals that you get by using a t-distribution in the standard way by an only slightly different process that does NOT use Bessel's correction, but rather divides by $n$ rather than by $n-1. \qquad$ – Michael Hardy Dec 31 '19 at 20:28
@user257330 : I've posted a separate answer below that has a different emphasis. – Michael Hardy Dec 31 '19 at 21:08

score 1 · Answer 2 · answered Dec 31 '19 at 21:07

My first posted answer spent a lot of time on addressing errors. This one will concentrate on the fact that the topic of Bessel's correction in this context is really altogether a separate thing from the topic of how the t-distribution enters this problem.

We have:

$X_1,\ldots,X_n\sim\text{i.i.d.} \operatorname N(\mu,\sigma^2).$
$\overline X = (X_1+\cdots+X_n)/n.$
$S^2 = \big( (X_1-\overline X_n)^2 + \cdots + (X_n-\overline X)^2 \big)/(n-1)$
$U^2 = \big( (X_1-\overline X_n)^2 + \cdots + (X_n-\overline X)^2 \big)/n$

Now recall that $$ T= \frac{\overline X- \mu}{S/\sqrt n} \sim t_{n-1}. \tag 1 $$ Now got to our tables or our software and find the number $A$ for which $$ \Pr(-A<T<A) = 0.9 $$ and conclude that $$ \Pr\left( \overline X- A\frac S {\sqrt n} < \mu < \overline X + A\frac S{\sqrt n} \right) = 0.9. $$ But what if we did not use Bessel's correction? So we use $U$ instead of $S.$

We have $$ U = S\cdot \sqrt{\frac {n-1} n} $$ and therefore $$ \sqrt{\frac{n-1} n} \cdot \frac{\overline X - \mu}{U/\sqrt n} = \frac{\overline X - \mu}{S/\sqrt n}, $$ and so $$ -A < \sqrt{\frac{n-1} n} \cdot \frac{\overline X - \mu}{U/\sqrt n} < A $$ $$ -A\sqrt{\frac n {n-1}} < \frac{\overline X - \mu}{U/\sqrt n} < A\sqrt{\frac n {n-1}}. $$ $$ -B < \frac{\overline X - \mu}{U/\sqrt n} < B, $$ $$ \Pr\left( -B \frac U {\sqrt n} <\mu < B\frac U {\sqrt n} \right) = 0.9. $$ This is exactly the same interval that we got using Bessel's correction. We could have simply designed our software and our tables to give us this number $B$ instead of the number $A$ that we get from the tables now used, and then proceeded without Bessel's correction.

So the use of Bessel's correction is an altogether separate issue from the problem of how to adjust the size of the confidence interval for the uncertainty in estimating $\sigma.$

score 1 · Answer 3 · answered Dec 31 '19 at 21:14

The issue is not that the sample standard deviation $S$ tends to underestimate the population standard deviation. The issue is that $S$ is a random variable rather than a constant, so its value fluctuates, and correspondingly $$ T = \frac{\bar X - \mu}{S/\sqrt{n}} $$ fluctuates more (has a larger variance) than $$ Z = \frac{\bar X - \mu}{\sigma/\sqrt{n}}. $$ So $T$ has longer tails than $Z$.

Why use t distribution when we are already using Bessel correction?

3 Answers3