What hypothesis is tested by the Mann–Whitney U testing, and how does it differ from the Brunner-Munzel test?

Question

It’s not clear to me what exactly is tested by the Mann–Whitney U test (also called Wilcoxon rank-sum test).

First, assume independence of the observations, and ordinality of the data.

My understanding is that under these conditions, the Mann-Whitney U is testing the hypothesis of stochastic equality, that is

$H_0 : P(X > Y) = P(Y > X)$

$H_1 : P(X > Y) \neq P(Y > X)$

Additionally, if we assume that the distributions have the same shape but not the same location, then the Mann-Whitney U becomes a test of equality of medians. If we also add the assumption of symmetricality, then the Mann-Whitney U becomes a test of equality of means (assuming they exist).

However, some sources say the Mann-Whitney U assumes equal variance, which confuses me. It seems to me the Mann-Whitney U either makes no assumptions about the shape of the two distributions for the stochastic equality test, or assumes a difference in location only for the median test interpretation.

Even more confusing is that the Brunner-Munzel test is often described as “relaxing” the assumption of equal variance of the Mann-Whitney U. The test of hypotheses is the same as the one I mentioned for the Mann-Whitney U earlier, that is

$H_0 : P(X > Y) = P(Y > X)$

$H_1 : P(X > Y) \neq P(Y > X)$

So it seems to be testing the exact same hypothesis as the Mann-Whitney U, just without the “test of equality of medians” interpretation.

Are the hypotheses for the Mann-Whitney U and Brunner-Munzel tests the same? If not, how do they differ?

jginestet · Answer 1 · 2025-04-05T20:12:40.207

A bit late to the party, but, better late than never.

+1 for a good question, and in particular properly formulating the null for the Mann-Whitney test (MWUt) and Brunner-Munzel test (BMt).

Very simply, the Brunner-Munzel test is a "better" version of the Mann-Whitney U test, in the same sense that the Welch t-test is a "better" version of the Student's t-test; the "better" versions do not suffer from the Bherens-Fisher problem (BFp). The BFp is alpha error inflation when the variances between the 2 samples are different (and in the case of MWUt, even if the sample sizes are the same). This is to the point where some authors (with whom I agree) have advocated always using BMt, or Welch.

So, the simplest answer is yes, they are the same.

Therefore, the additional assumption of equal variance you may have found are simply to avoid the BFp. But that is unnecessary, as we have the BMt. So I would ignore these sources, as they are, at best, dated (the seminal BMt paper was published in 2000).

Now, if you add some assumptions to MWUt (or BMt), you can torture it into either a test of medians, or a test of equal distributions. But these assumptions are basically not testable, and the tests are not robust to even small departures from these assumptions.

If one assumes that the 2 samples come from symmetric distributions, then MWUt and BMt become tests of equality medians, and btw of equality of means as well. But if one of them is slightly skewed, even if they have equal medians, then I can always find a sample size for which the result is significant. So you will not know if your significant result is due to unequal medians, or assymetry of at least 1 distribution, or some combination of both.

If one assumes that the 2 samples come from identical distributions, except for location; that is all the distribution's moments (all of them!) are identical, except for the first (mean). In that case, it is a test of the median, but also of the mean, of the 17.5 percentile, etc... Except for the most trivial ones, I can not think of any real world case where a treatment/intervention or other difference in samples only shifted the central location, and nothing else.

So, quoting from wikipedia's MWUt page,

the test is only consistent when the following occurs under H1: The probability of an observation from population X exceeding an observation from population Y is different (larger, or smaller) than the probability of an observation from Y exceeding an observation from X; i.e., $P(X > Y) ≠ P(Y > X)$ or $P(X > Y) + 0.5 \cdot P(X = Y) ≠ 0.5$.

Hence the null is $P(X > Y) = P(Y > X)$ or equivalently $P(X > Y) + 0.5 \cdot P(X = Y) = 0.5$, for both tests.
And there is really no valid reason to keep using MWUt, when one could use BMt.

What hypothesis is tested by the Mann–Whitney U testing, and how does it differ from the Brunner-Munzel test?

1 Answers1