1

For Chi-Squared test on contingency tables there is a proof to get from: $\sum\frac{(O_i - E_i)^2}{E_i}$ which equals $\frac{N(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)}$

Can anyone explain the steps in the proof i know how to get from one to other but not sure why certain steps happen!

Below ill put the proof if anyone wants to see it or can explain it?

Thanks

V. Vancak
  • 16,927
lilne
  • 46
  • The first step is to calculate the expected value for each observed value of a, b, c and d. These are: a= (a+b)(a+c)/N, b=(a+b)(b+d)/N, c=(a+c)(c+d)/N and d=(b+d)(c+d)/N and i understand where this comes from. – lilne Mar 19 '16 at 10:24
  • Next we take away the observed from expected, square it and divide be expected this gives the following for terms which are summed together: [a-(a-(a+b)(a+c)/N)^2/((a+b)(a+c)/N)+(b-(a+b)(b+d)/N)^2/((a+b)(b+d)/N)+(c-(a+c)(c+d)/N)^2/((a+c)(c+d)/N)+(d-(b+d)(c+d)/N)^2/((c+d)(b+d)/N) and again i understand this step. the next however causes some confusion – lilne Mar 19 '16 at 10:28
  • How do i get from the above to this ill just show the firs term as a lot to type {((ad-bc)^2)/N]*(N/((a+b)(a+c))). No clue how the first term above becomes this! – lilne Mar 19 '16 at 10:29
  • You need to start with $\sum_{i,j} \frac{(O_{ij}-E_{ij})^2}{E_{if}},$ where the sum is taken over $i = 1,2; j=1,2$. This formula works only for $2 \times 2$ tables. The idea is that $O_{11} = a,, O_{12} = b,$ and so on. Also, pay attention to how the four $E_{ij}$'s are computed from $a, b, c,$ and $d.$ I have done this, and it can be done. But have coffee first. – BruceET Mar 19 '16 at 17:26
  • Haha I'm ready for the explanation – lilne Mar 19 '16 at 19:26
  • Method should now be clear. Your coffee, your sharpened pencil, your erasers, your pad of paper, your job. – BruceET Mar 19 '16 at 21:16
  • i know its clear i don't get the next steps – lilne Mar 20 '16 at 17:57

2 Answers2

1

Same way, but done by human:

Starting with $\chi^2=\sum^4_{i=1}\frac{(o_i-e_i)^2}{e_i}$, and expand the square: $$\chi^2=\sum^4_{i=1}\frac{(o_i-e_i)^2}{e_i}=\sum^4_{i=1}\bigl(\frac{o_i^2}{e_i}-2o_i+e_i\bigr)=\sum^4_{i=1}\frac{o_i^2}{e_i}-n$$ (since $\sum o_i=\sum e_i =n$)

Substitute the values in: $$\chi^2=\frac{na^2}{(a+c)(a+b)}-a+\frac{nb^2}{(b+d)(a+b)}-b+\frac{nc^2}{(a+c)(c+d)}-c+\frac{nd^2}{(c+d)(b+d)}-d$$

Notice: $$an-(a+c)(a+b)=a(a+b+c+d)-(a+c)(a+b)=ad-bc$$

So, $$\frac{a}{(a+c)(a+b)}[na-(a+c)(a+b)]=\frac{a(ad-bc)}{(a+c)(a+b)}$$

Do the same thing for b,c,d.


It now gets fairly staight forward

Ann
  • 11
-1

I went through the same problem as yours. I used Mathematica to ease the computations.

$\chi ^2=\sum _{i=1}^4 \frac{(O_i-E_i)^2}{E_i}=\frac{n(a d-b c)^2}{(a+c)(b+d)(a+b)(c+d)}$

$O_i$ are the observed values, $O_1=a, O_2=b,O_3=c$ and $O_4=d$

$E_i$ are the expected values, $E_1=\frac{(a+b)(a+c)}{n}, E_2=\frac{(a+b)(b+d)}{n},E_3=\frac{(a+c)(c+d)}{n}$ and $E_4=\frac{(b+d)(c+d)}{n}$

Then by expanding the summation we have: $\sum _{i=1}^4 \frac{(O_i-E_i)^2}{E_i}=\frac{(O_1-E_1)^2}{E_1}+\frac{(O_2-E_2)^2}{E_2}+\frac{(O_3-E_3)^2}{E_3}+\frac{(O_4-E_4)^2}{E_4}$

Substituting in the summation we have:

$\frac{n\left(a-\frac{(a + b)(a+c)}{n}\right)^2}{(a + b)(a+c)}+\frac{n\left(b-\frac{(a + b)(b+d)}{n}\right)^2}{(a + b)(b+d)}+\frac{n\left(c-\frac{(a + c)(c+d)}{n}\right)^2}{(a + c)(c+d)}+\frac{n\left(d-\frac{(b + d)(c+d)}{n}\right)^2}{(b + d)(c+d)}$

$\text{Simplify}\left[\frac{n\left(a-\frac{(a + b)(a+c)}{n}\right)^2}{(a + b)(a+c)}+\frac{n\left(b-\frac{(a + b)(b+d)}{n}\right)^2}{(a + b)(b+d)}+\frac{n\left(c-\frac{(a + c)(c+d)}{n}\right)^2}{(a + c)(c+d)}+\frac{n\left(d-\frac{(b + d)(c+d)}{n}\right)^2}{(b + d)(c+d)}\right]$

$\frac{((b+d) (c+d) ((a+b) (a+c)-a n)^2+(a+c) (c+d) ((a+b) (b+d)-b n)^2+(a+b) (b+d) ((a+c) (c+d)-c n)^2+(a+b) (a+c) ((b+d) (c+d)-d n)^2}{((a+b) (a+c) (b+d) (c+d) n)}$

Then expanding the numerator and distributing the denominator all over we have:

$\text{Expand}[\frac{((b+d) (c+d) ((a+b) (a+c)-a n)^2+(a+c) (c+d) ((a+b) (b+d)-b n)^2+(a+b) (b+d) ((a+c) (c+d)-c n)^2+(a+b) (a+c) ((b+d) (c+d)-d n)^2}{((a+b) (a+c) (b+d) (c+d) n)}]$

$ -\frac{2 a^3 b c}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 a^2 b^2 c}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a b^3 c}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 a^2 b c^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{6 a b^2 c^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 b^3 c^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a b c^3}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 b^2 c^3}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a^3 b d}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 a^2 b^2 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a b^3 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a^3 c d}{(a+b) (a+c) (b+d) (c+d)}-\frac{10 a^2 b c d}{(a+b) (a+c) (b+d) (c+d)}-\frac{10 a b^2 c d}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 b^3 c d}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 a^2 c^2 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{10 a b c^2 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{6 b^2 c^2 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a c^3 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 b c^3 d}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a^3 d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{6 a^2 b d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 a b^2 d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{6 a^2 c d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{10 a b c d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 b^2 c d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 a c^2 d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{4 b c^2 d^2}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a^2 d^3}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a b d^3}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 a c d^3}{(a+b) (a+c) (b+d) (c+d)}-\frac{2 b c d^3}{(a+b) (a+c) (b+d) (c+d)}+\frac{a^4 b c}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^3 b^2 c}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^2 b^3 c}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a b^4 c}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^3 b c^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{7 a^2 b^2 c^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 a b^3 c^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{b^4 c^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^2 b c^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 a b^2 c^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{2 b^3 c^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a b c^4}{(a+b) (a+c) (b+d) (c+d) n}+\frac{b^2 c^4}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a^4 b d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^3 b^2 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^2 b^3 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a b^4 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a^4 c d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{8 a^3 b c d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{14 a^2 b^2 c d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{8 a b^3 c d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{b^4 c d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^3 c^2 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{14 a^2 b c^2 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{16 a b^2 c^2 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 b^3 c^2 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a^2 c^3 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{8 a b c^3 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 b^2 c^3 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a c^4 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{b c^4 d}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a^4 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 a^3 b d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{7 a^2 b^2 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a b^3 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 a^3 c d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{16 a^2 b c d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{14 a b^2 c d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 b^3 c d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{7 a^2 c^2 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{14 a b c^2 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{7 b^2 c^2 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a c^3 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 b c^3 d^2}{(a+b) (a+c) (b+d) (c+d) n}+\frac{2 a^3 d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 a^2 b d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a b^2 d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{5 a^2 c d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{8 a b c d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 b^2 c d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 a c^2 d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{3 b c^2 d^3}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a^2 d^4}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a b d^4}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a c d^4}{(a+b) (a+c) (b+d) (c+d) n}+\frac{b c d^4}{(a+b) (a+c) (b+d) (c+d) n}+\frac{a^2 b c n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a b^2 c n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a b c^2 n}{(a+b) (a+c) (b+d) (c+d)}+\frac{2 b^2 c^2 n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a^2 b d n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a b^2 d n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a^2 c d n}{(a+b) (a+c) (b+d) (c+d)}+\frac{b^2 c d n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a c^2 d n}{(a+b) (a+c) (b+d) (c+d)}+\frac{b c^2 d n}{(a+b) (a+c) (b+d) (c+d)}+\frac{2 a^2 d^2 n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a b d^2 n}{(a+b) (a+c) (b+d) (c+d)}+\frac{a c d^2 n}{(a+b) (a+c) (b+d) (c+d)}+\frac{b c d^2 n}{(a+b) (a+c) (b+d) (c+d)} $

Then we fully simplify that messy horrible thing

FullSimplify[Messy horrible thing from above]

$ -2 (a+b+c+d)+\frac{(a+b+c+d)^2}{n}+\frac{\left(b c (d (c+d)+b (2 c+d))+a^2 (b (c+d)+d (c+2 d))+a \left(b^2 (c+d)+c d (c+d)+b \left(c^2+d^2\right)\right)\right) n}{(a+b) (a+c) (b+d) (c+d)} $

Up to this point we observe we have the term $(a + b + c + d)$ which we know is equivalent to $n$. We substitute that by $n$ in the following equation and simplify

$ \text{Simplify}[-2 (n)+\frac{(n)^2}{n}+\frac{\left(b c (d (c+d)+b (2 c+d))+a^2 (b (c+d)+d (c+2 d))+a \left(b^2 (c+d)+c d (c+d)+b \left(c^2+d^2\right)\right)\right) n}{(a+b) (a+c) (b+d) (c+d)}] $

$ \frac{(b c-a d)^2 n}{(a+b) (a+c) (b+d) (c+d)} $

There you go, hope this helps.