0

Long ago I learned about what I now find is the rank sum test.

I am confused, because what my memory tells me is different from what I find now on wikipedia.

What my memory tells me is it is a test of differently labeled samples, called "u" and "d", calculating a rank sum that is quite like what is described on the net, but without that terms about $n_1,n_2$, and the purpose was to decide if the "u"s and "d"s are significantly ordered one before the other.

It looks to me that language has changed, just like iodine in chemistry is no longer denoted by "J" but nowadays by "I".

Maybe my memory is playing tricks on me, maybe notation and interpretation has changed. Maybe all of that.

That would not be a good question so far, so I try to sharpen it:

given a rank sum $r$ based on $n_1,n_2$, what is the probability that $r>r_0$ when the distributions for samples "u" and "d" would be equal?

RobPratt
  • 50,938

1 Answers1

1

Suppose your "u" and "d" samples have $n_1$ and $n_2$ items respectively. In the rank sum test you pool these two samples together and compute the ranks of the "u" sample within this pooled list of $n_1 + n_2$ items. The test statistic (call it $R_1$) is the sum of the "u" ranks.

Under the null hypothesis that the "u" and "d" distributions are equal, the $n_1$ numbers corresponding to the "u" ranks are selected uniformly at random without replacement from the numbers $1, 2,\ldots n_1+n_2$, so the test statistic $R_1$ has mean $$ \mu_{R_1}:=E(R_1)=\frac{n_1(n_1+n_2+1)}2\tag1$$ and variance $$\sigma^2_{R_1}:=\operatorname{Var}(R_1)=\frac{n_1n_2(n_1+n_2+1)}{12}\tag2$$ (see here for a derivation) and the test statistic $$z:=\frac{R_1 - \mu_{R_1}}{\sigma_{R_1}}$$ has approximately standard normal distribution.

In Wikipedia the query "rank sum test" redirects to Mann-Whitney U-Test. These two tests are not synonymous, but there is an algebraic identity, shown on the Wikipedia page, that connects the rank sum test statistic $R_1$ and the Mann-Whitney $U$ statistic: $$U = n_1n_2 + \frac{n_1(n_1+1)}2 - R_1 .\tag3$$ In other words, $U$ plus $R_1$ equals a constant. You can use (3) in conjunction with (1) and (2) to compute $$E(U) = \frac{n_1n_2}2,\qquad\operatorname{Var}(U)=\operatorname{Var}(R_1)=\frac{n_1n_2(n_1+n_2+1)}{12},$$ which confirms the result found in the Wikipedia page.

grand_chat
  • 40,909
  • I see, but I don't understand yet. Now it is obvious that the main reason of my confusion comes from confusing the sum of ranks with the other way, summing up for each u how may d's follow it. – Gyro Gearloose Aug 31 '24 at 16:31
  • By now I figured out the relation between U and R, didn't find a proof or direct statement but proofed it by myself. Sorry my question was not straight forward. – Gyro Gearloose Sep 05 '24 at 19:03