2

A clinical trial is done with 400 persons suffering from a particular disease, to find out whether a treatment is better than placebo. They are randomised to receive treatment or placebo (200 participants each). The outcome studied is how many get cured. The results are shown in the following 2x2 table:

\begin{array} {|r|r|} \hline \text{ } & \text{Treatment group} & \text{Placebo group}\\ \hline \text{Cured} & 172 & 151 \\ \hline \text{Not cured} & 28 & 49 \\ \hline \text{Total} & 200 & 200 \\ \hline \end{array}

The odds ratio calculated from this table is $1.99$. The objective now is to test the null hypothesis (odds ratio = 1) against the alternate hypothesis (odds ratio is not 1). Ludbrook's 2008 article describes an exact test for this scenario:

The formula for executing a two-sided randomization test, adapted to a 2x2 table with the constraint that the column totals are fixed (single conditioning), is:

P=(All tables for which the summary statistic is at least as extreme as that observed, in either direction)/All possible tables with the same column totals

I am a bit confused about what exactly it means. Does it mean I should form all possible tables with 200 treatment and 200 control participants, with each participant having a 50% chance of getting cured? Then there would be $2^{200} \times 2^{200}=2^{400}$ possible tables, each being equally likely. I would then calculate what fraction of these tables give an odds ratio equally or more extreme than the one I got experimentally, i.e. $1.99$. This would give me the p-value.

Is this the correct interpretation? If not, why?

If so, why the assumption of 50% cure rate? Why not 20%, 70%, 90%, or any other number?

(I would have contacted the author directly, but it turns out he is deceased. That is why I asked this question here.)


Reference

John Ludbrook, Analysis of 2 × 2 tables of frequencies: matching test to experimental design, International Journal of Epidemiology, Volume 37, Issue 6, December 2008, Pages 1430–1435, https://doi.org/10.1093/ije/dyn162

Adhish
  • 129
  • Without explicit reference to odds ratios, your null hypothesis is that treatment and placebo are equally effective curing the disease. The alternative hypothesis is that true cure rates may be unequal, this includes the possibility that the Treatment might impede cure. // Realistically speaking, a clinical trial with only 400 subjects may be at an early phase of development and exploration of the Treatment. // My link to Wikipedia includes a discussion of randomized and permutation tests. – BruceET Jul 13 '20 at 01:38

2 Answers2

1

Let's analyse this $2\times 2$ contigenty table.

  • 172 of the 200 treated patients got cured, that means $\frac{172}{200}=86\%$

  • 151 of the 200 untreated patients got cured, that means $\frac{151}{200}=75.5\%$

being 86>75.5 the treatment looks work.

Now the question is: 86 is really greater then 75.5 or the difference is due to the random variability of the phenomena?

To get an answer we can do the $\chi^2$ test

enter image description here

  • the first table is your contingenty table

  • the second one, is the expected table, under the hypothesis that there is no difference in treatment group or placebo group. (every expected value is calculated under independence hypothesis, i.e. $161.5=\frac{323\times 200}{400}$)

  • the third table is the test. Every cell is calculated as $\frac{[\text{Observed}-\text{Expected}]^2}{\text{Expected}}$

  • the total test is 7.09 that means a p-value of $0.8\%$, using a chi square distribution with $(2-1)\times (2-1)=1$ degree of freedom.

CONCLUDING: the test has a high significant statistical level. The data are enough to reject the hypotesis of OR=1 (the treatment is good to get cured)

tommik
  • 33,201
  • 4
  • 17
  • 35
  • Thanks for the answer @tommik, but my question is not about the chi-squared test at all. It is about a different test, namely an exact test. For more details, see my question and the cited article. – Adhish Jul 12 '20 at 10:41
  • Your method of analysis using a chi-squared test is entirely reasonable and nicely presented (+1)--even though OP specifically asks about an exact test and mentions odds ratios. // Sample sizes are plenty large for the chi-squared statistic to have a chi-squared distribution. Also the chi-squared test is 2-sided. And its P-value 0.008 is not far from the P-value 0.01 of Fisher's Exact test. – BruceET Jul 13 '20 at 01:56
  • @BruceET As I mentioned in a comment elsewhere, this table is singly conditioned. Thus, chi-squared would not be an appropriate test as it assumes an unconditioned table (draw a random sample of size 400, not necessarily containing 200 in any group, and THEN classify into two groups in each of two ways). – Adhish Jul 13 '20 at 02:13
  • 1
    Opinions differ on Fisher exact and chi-squared tests. One might say that the chi-squared test in its use of expected counts based on marginal totals takes necessary restrictions into account. Some criticize the exact test precisely because it is conditiond on totals. There is rich literature on these topics stretching back about 80 years, so we will not settle this in comments here. – BruceET Jul 13 '20 at 02:24
0

Fisher's exact test is based on a hypergeometric distribution.

Fisher's Exact Test in R. As implemented in R statistical software, the results of the two-sided test are as follows:

TABL = rbind(c(172,151), c(28,49))
TABL
     [,1] [,2]
[1,]  172  151
[2,]   28   49

fisher.test(TABL)

    Fisher's Exact Test for Count Data

data: TABL p-value = 0.01088 alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 1.160626 3.464351 sample estimates: odds ratio 1.989975

Based on Hypergeometric Distribution. Here is one way to explain the connection to a hypergeometric distribution. Suppose we draw 77 patients not cured from among the 400 patients (200 Treatment and 200 Placebo), what is the probability I'll see exactly 28 in the Treatment group? That's the following hypergeometric probability:

$$P(X=28)=\frac{{200\choose 28}{200\choose 49}}{{400\choose 77}}= 0.000295.$$

This is evaluated in R by computing the binomial coefficients or by using R's hypergeometric PDF function dhyper.

choose(200,28)*choose(200,49)/choose(400,77)
[1] 0.002917137
dhyper(28, 200,200, 77)
[1] 0.002917137

One-sided P-value: However, the P-value of a one-sided test would be $P(X\le 28) = 0.00544,$ which can be evaluated by summing 29 hypergeometric probabilities or by using R's hypergeometric CDF function phyper:

sum(dhyper(0:28, 200,200, 77))
[1] 0.005441333
phyper(28, 200,200, 77)
[1] 0.005441333

Two-Sided P-value: Finally, the P-value for a 2-sided test is the probability of a more extreme result in either direction: $P(X \le 28) + P(X \ge 49) = 0.01088,$ which is the P-value shown in the R printout from Fisher's Exact test above.

sum(dhyper(49:77, 200,200, 77))
[1] 0.005441333
2*phyper(28, 200,200, 77)
[1] 0.01088267
sum(dhyper(c(0:28, 49:77), 200,200, 77))
[1] 0.01088267

In the plot of the relevant hypergeometric PDF below, the two-sided P-value is the sum of the heights of the bars outside the vertical dotted lines. [The relevant hypergeometric distribution is precisely symmetrical because Treatment and Placebo groups are of exactly the same size. One might say that there are ${400 \choose 77} = 4.47 \times 10^{56}$ possible $2 \times 2$ tables matching the experimenal outcomes, but this hypergeometric distribution contains the information about them needed for a valid test.]

k = 0:77;  PDF = dhyper(k, 200,200, 77)
plot(k, PDF, type="h", col="blue", lwd=2, main="Hypergeometric PDF")
  abline(v=c(28.5, 48.5), col="red", lwd=2, lty="dotted")

enter image description here

BruceET
  • 52,418
  • 1
    Thanks for the detailed answer. However, Fisher's test is based on the hypergeometric distribution which assumes BOTH row and column sums are fixed (i.e. a doubly conditioned table). This is not an appropriate assumption in this case as the 323 (cured) and 77 (not cured) are not fixed. They are merely results of the experiment. What is fixed is the column totals: 200 and 200 (i.e. a singly conditioned table). Ludbrook, in his article, advises in such a scenario to avoid Fisher's exact test in favour of a different kind of exact test. It is the latter I wish to understand. – Adhish Jul 13 '20 at 02:10