3

For our purposes we need to find at the least one case when a 50th percentile and the median aren't the same thing. I will present three such cases.

Here the first case (proof) goes as this: Suppose we have set of numbers {1,2,3}. Its median is 2. But 2 is NOT a 50th percentile because 2 isn't a percentile at all, it's a tertile. It has 1/3 of datapoints below it and 2/3 of datapoints below or equal to it.

The second proof: Suppose we have set of numbers {1,2,3,4,5}. Its median is 3. But 3 is NOT a 50th percentile. For the inclusive definition of a percentile it's a 60th percentile, while for the exclusive definition it's a 40th percentile.

The third proof: Suppose we have a set of numbers {0,10}. Its median is 5. And it's also a 50th percentile. But there are other 50th percentiles that aren't equal to 5 (and thus, aren't medians), namely any number in interval [0;10) for the inclusive defintion of a percentile and any number in interval (0;10] for exclusive definition of a percentile.

KarmaPeasant
  • 838
  • 2
  • 10
  • 26
  • Your third example is wrong. –  Jun 02 '19 at 14:35
  • @YvesDaoust Can you elaborate? – KarmaPeasant Jun 02 '19 at 14:41
  • How could, say $2$ be a $50^{th}$ percentile ? Or a median ? –  Jun 02 '19 at 14:43
  • It's a 50th percentile because 50% of datapoinst are less than 2. – KarmaPeasant Jun 02 '19 at 14:44
  • Ok, I read ${0,\cdots,10}$. In case of ties, all intermediate values are quantiles, and it is customary to take the mean, here, $5$. –  Jun 02 '19 at 14:48
  • @YvesDaoust Can you please elaborate? I don't see how it proves that 2 isn't 50th percentile. I mean I totally believe that 5 is 50th percentile, but 2 is 50th percentile too. Besides, I don't see any ties here. And to be honest, your comment doesn't make sense to me. – KarmaPeasant Jun 02 '19 at 14:51
  • Can you read ? "all intermediate values are quantiles". –  Jun 02 '19 at 14:54
  • I can read. This combination of words just doesn't make sense to me. I don't know about the rule that you refer to. If you gave elaborate example, then possibly I would understand. – KarmaPeasant Jun 02 '19 at 14:55

2 Answers2

0

For a correct definition of the quantiles, you must use the (empirical) $\text{cdf}$ of the distribution, i.e.

$$\text{cdf}_X(x)=\frac{\#\{k:x_k\le x\}}n.$$

When $X=\{1,2,3\}$, the $50\%$ percentile is $2$, because $\dfrac{\#\{1\}\le50\%}3$, while $\dfrac{\#\{1,2\}}3>50\%$, the "jump" occurs at $2$ and this is where the $\text{cdf}$ meets the $50\%$ horizontal.

If you don't adopt such a definition, neither the centiles nor the median would exist !

The median, the central quartile and the $50\%$ centile are exact synonyms.

  • Firstly, why do you think that your CDF is the only right method to find percentiles (and quantiles in general)? Secondly, if they are exact synonyms, then how do you explain me getting 50th percentiles different from the median when using the calculator? – KarmaPeasant Jun 02 '19 at 14:40
  • @user161005: because it is. –  Jun 02 '19 at 14:40
  • Amen. How about my second question? – KarmaPeasant Jun 02 '19 at 14:42
  • By second question I meant "if they are exact synonyms, then how do you explain me getting 50th percentiles different from the median when using the calculator?" – KarmaPeasant Jun 02 '19 at 14:54
-1

Turns out, that there are at the least 8 ways to calculate a percentile (probably even more)

I found this calculator that calculates percentiles in 8 different ways for the same dataset: https://www.wessa.net/rwasp_percentiles.wasp

I was unable to make calculations for {1,2,3} and {0,10}, probably they were too small. But I was able to make calculations for similar sets, {1,2,3,4,5,6,7,8,9} and {0,1,2,3,4,5,6,7,8,9} respectively.

Turns out that I'm both right and wrong in my examples. For some methods the median is indeed different from the 50th percentile, while for some it is equal to the median.

Let's review the results. For an analogue of the first case, set {1,2,3,4,5,6,7,8,9}, I got following results (the median is 5): enter image description here

It seems like the methods used to calculate the 50th percentile are based on definitions of percentiles that allow to percentiles to exist even for such set as {1,2,3,4,5,6,7,8,9}. I have no idea what these definitions are though.

Now the second case, set {1,2,3,4,5} and the median is 3: enter image description here Only one method gave 50th percentile different from the median.

Now the third case, set {0,1,2,3,4,5,6,7,8,9} and the median is 4.5 enter image description here 3 methods gave result different from the median.

So, to summarize it all: Some methods can sometimes give the 50th percentile that is different from the median. Some methods never give 50th percentile that is different from the median. Some methods (I'm looking at you, "Weighted Average at Xnp"!) always give 50th percentile result that is different from the median. But even when 50th percentile is different from the median, the difference is relatively small. Probably the difference can be ignored altogether.

KarmaPeasant
  • 838
  • 2
  • 10
  • 26
  • These alternative percentiles do not appear to be percentiles (even though on small datasets there are coincidences). –  Jun 02 '19 at 14:39
  • @YvesDaoust To what do you refer by "alternative percentiles"? And why do you doubt that they are really percentiles? – KarmaPeasant Jun 02 '19 at 14:57