The formula for the Chi-Square test statistic is the following:
$$\chi^2=\sum_{i=1}^n\frac{({O_i-E_i})^2}{E_i}$$
where $O_i$ is observed data, and $E_i$ is expected.
I am just curious why this follows the $\chi^2$ distribution?
The formula for the Chi-Square test statistic is the following:
$$\chi^2=\sum_{i=1}^n\frac{({O_i-E_i})^2}{E_i}$$
where $O_i$ is observed data, and $E_i$ is expected.
I am just curious why this follows the $\chi^2$ distribution?
It is $\frac{O_i-E_i}{E_i}$ that follows a normal distribution not its square root. We are just assuming that relative errors are Gaussian. It is just an assumption. The square of the gaussian variable ~ Gamma Distribution (Chi-square). The sum of the these squared variables follow a Chi-square with $n$ degrees of freedom. Now if we were to look at absolute values , namely $\left|\frac{O_i-E_i}{E_i}\right|$ , we would have a half-normal distribution and the sum $\sum_{i=1}^n\left|\frac{O_i-E_i}{E_i}\right|$would end up converging to a Gaussian, though unlike Chi-Square, without a clearly standard distribution for $n$ small.
It is correct to say that the `goodness-of-fit' statistic follows the chi-squared distribution asymptotically, not exactly. This means that the statistic lies in any interval with probability close to, or approximately equal to, that of a $ \chi^2_{n-1} $ variable lying in the same interval, provided the sample size $ N $ is large. Here I am assuming that the $E_i$s are expected frequencies arising from a completely specified model and no estimation of parameters is involved; otherwise the d.f. would change.
A neat and simple proof, as well as one that is less simple, can be found in http://sites.stat.psu.edu/~dhunter/asymp/lectures/p175to184.pdf. The simpler one goes roughly as follows: you need to observe that under the model, $ (O_1,O_2,\dots,O_n) $ has a multinomial distribution with parameters $N$ and cell probabilities $ \left(\frac{E_1}N, \frac{E_2}N, \dots, \frac{E_n}N \right) $.
This means that when $N$ is large, $ (O_1,O_2,\dots,O_n) $ has approximately an $n$-variable normal distribution, but a singular one, since $ \sum_{i=1}^N O_i \equiv N $ is non-random. Another way to interpret the singularity is to see that the parameters of the distribution are the mean vector and the dispersion matrix and the latter is singular.
However, any $n-1$ out of $ O_1, O_2, \dots, O_n $ have approximately a non-singular $(n-1)$-variate normal distribution. Choosing $ \tilde O := (O_1, O_2, \dots, O_{n-1}) $, the inverse $ \Sigma^{-1} $ of the dispersion matrix $ \Sigma $ of $\tilde O $ is calculated. Specifically, $ \Sigma^{-1} $ turns out to have all off-diagonal entries equal to $ 1 / E_n $ and for $ 1 \le 1 \le n-1$, $ 1/E_i + 1/E_n $ as the $i$th diagonal entry.
Finally, the goodness of fit statistic is shown to exactly equal the standardized sum of squares $ \{ ( \tilde O - E (\tilde O)\}^T \Sigma^{-1} \{ ( \tilde O - E (\tilde O)\} $, which approximately follows a $ \chi^2_{n-1} $ distribution because $\forall \: k $, the map on ${\mathbb R}^k $ that takes a vector $ \tilde x $ to the real number $ ( \tilde x - \tilde a )^T A ( \tilde x - \tilde a ) $, whenever $ \tilde a \in {\mathbb R}^k$ and the $k\times k $ matrix $ A $ are fixed, is a continuous function, and the standardized sum of squares from an exact $k$-variate nonsingular normal distribution is $\chi^2_k$ distributed. Here $ n-1 $ is used as $k$.
To check equality of the test statistic with $ \{ ( \tilde O - E (\tilde O)\}^T \Sigma^{-1} \{ ( \tilde O - E (\tilde O)\} $, you will need to use the facts that $ E(O_i) = E_i $ under the model; and repeatedly that $ \sum_{i=1}^n (O_i-E_i) = N - N = 0 $.