When to use which multiple testing correction?

Question

There are a large number multiple testing p-value correction methods. e.g.:

   bonferroni : one-step correction
   sidak : one-step correction
   holm-sidak : step down method using Sidak adjustments
   holm : step-down method using Bonferroni adjustments
   simes-hochberg : step-up method (independent)
   hommel : closed method based on Simes tests (non-negative)
   fdr_bh : Benjamini/Hochberg (non-negative)
   fdr_by : Benjamini/Yekutieli (negative)
   fdr_tsbh : two stage fdr correction (non-negative)
   fdr_tsbky : two stage fdr correction (non-negative)

(based on https://www.statsmodels.org/dev/generated/statsmodels.stats.multitest.multipletests.html)

I have found a lot of pages that explain the methods individually (and why corrections are needed) but I have not found an overview of when to use which method e.g. a comparison table or even better a decision flow diagram as it exists for machine learning methods.

Any ideas? How do I decide which multiple testing correction I should apply?

Cole Wagner · Answer 1 · 2025-04-25T16:28:41.213

In short, the appropriate test depends on the nature of the data. Without knowing the specifics, here are a few loose guidelines:

If the tests involve a control being compared to several treatments, use Dunnett's correction.
Tukey-Kramer is a good general-purpose option for when you want all pairwise comparisons of means. Do keep in mind, though, that this method assumes normality and equal variance.
Bonferonni is a more conservative option that is good when the assumptions for Tukey-Kramer are violated.
Scheffe is another conservative method for when you may want to do some other post-hoc analyses of the means such as linear contrasts.

See Lee and Lee (https://pmc.ncbi.nlm.nih.gov/articles/PMC6193594/) for more details.

Hopefully this helps!

score -2 · Answer 2 · answered Jan 27 '21 at 18:28

The Bonferroni and Holm methods lead to the same FWER and disjunctive power when analysing multiple primary outcomes. This is because both methods adjust the smallest p-value in the same way. Similarly, the Hochberg and Hommel methods lead to same FWER and disjunctive power when two primary outcomes are analysed and differences between these methods arise when analysing three or more outcomes

You can read about in this article Link

When to use which multiple testing correction?

2 Answers2