11

So I understand that a quartile is a quantile where the data is divided into four groups.

   1   2   3
---|---|---|---

And 1, 2, and 3 are the quartiles. The second quartile is the median, etc.

But while studying for the GRE, I read this as part of the answer solution to a question:

From this you can conclude that the word quartile refers to one of the four groups that is created by listing the data in increasing order and then dividing the data into four groups of equal size.

Which indicates that these are the quartiles:

 1   2   3   4
---|---|---|---

So which is it? Are there 3 quartiles or 4?

On the bounty: I have sufficient evidence that the term is overloaded, so any answer that glosses over that won't be accepted. I'd like an explanation of how to use the terms and when.

MJD
  • 67,568
  • 43
  • 308
  • 617
jds
  • 2,326
  • 2
    You have four quartiles. The first quartile represents 25% of the distribution. The second quartile represents 50% of the distribution. The third quartile represents 75% of the distribution. The fourth quartile represents 100% of the distribution. I would display it like this (with two vertical lines at the beginning and at the end): $$\begin{array}{|c|c|c|c|} 1 & 2 & 3 & 4 \ \hline 1,1,2 & 2,3,4 & 4,5,6 & 7,9,9 \end{array}$$ – callculus42 Sep 03 '15 at 12:33
  • Why does Wikipedia only list 3 quartiles then? https://en.wikipedia.org/wiki/Quartile#Definitions. – jds Sep 03 '15 at 13:20
  • Maybe because it is obvious, that $Q_4$ is the highest value. In my example $Q_3$ is $6+(7-6)\cdot \frac{3}{4}=6.75$ The values in the fourth quartile are between 6.75 and 9. 25% of the highest values are between 6.75 abd 9. – callculus42 Sep 03 '15 at 14:14
  • I don't think so. Throughout the entire page, it only refers to 3 quartiles. For example, in the introductory text: "The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set." No mention of Q4. And in methods, it only gives methods of computing three quartiles: https://en.wikipedia.org/wiki/Quartile#Computing_methods. But then Wolfram Math refers to 4 quartiles. – jds Sep 03 '15 at 17:23
  • It is obvious that $Q_4$ for the normal distribution doesn´t exist. But in my numerical example 9 could be denoted as $Q_4$, but it doesn´t have to. – callculus42 Sep 03 '15 at 19:03
  • 3
    Stop saying things are obvious. – jds Sep 04 '15 at 13:16
  • @gwg When a mathematician says something is obvious, it isn't a personal attack or insult. It simply means that the claim is not resting on any special tricks, or hidden knowledge. That is to say, you do not need anything more than what you currently have to understand what is being said. – A. Thomas Yerger Oct 07 '15 at 02:05
  • 2
    In this case, I think, some of the "obvious" claims are not actually obvious. – David K Oct 07 '15 at 02:10
  • 1
    You need to read the first two paragraphs of the wikipedia link you provided. Quartiles, depending on the context, refer either to FOUR separate sets OR to THREE points that separate these sets. Personally, I never even realized that they are ever only three quantiles even thought it's obvious. – A.S. Oct 07 '15 at 02:28
  • Tthere are four quartiles: the word itself tells you this. – Mariano Suárez-Álvarez Oct 07 '15 at 02:55

3 Answers3

10

If you look in a non-mathematical dictionary, you will often find both definitions. For example, http://www.oxforddictionaries.com/us/definition/american_english/quartile defines quartile as

1 Each of four equal groups into which a population can be divided according to the distribution of values of a particular variable.

1.1 Each of the three values of the random variable that divide a population into four groups.

It is possible to find some examples where the first definition is used. In a passage in Digest of Education Statistics 1999, edited by Thomas D. Snyder, page 157, Table 143 has four columns under the heading "Socioeconomic status quartile", labeled Lowest, Second, Third, and Highest. Moreover, in footnote 1 of Table 144, we find the passage

The "Low" SES group is the lowest quartile; the "Middle" SES group is the middle two quartiles; and the "High" SES group is the upper quartile.

So a "quartile" in this context is a subset of the sample to which an individual belongs.

The Wikipedia article on quartile cites only one reference, the article "Sample quantiles in statistical packages", which, as the title suggests, is all about computing numbers to describe quantiles, in particular, the return value of the R function quantile(). The article therefore is mainly (exclusively?) concerned with the correct way to compute the numerical values that divide the data into quartiles (or other quantiles). But if you go to other sources such as the NIST/SEMATECH e-Handbook of Statistical Methods, you will find passages such as

The box plot uses the median and the lower and upper quartiles (defined as the 25th and 75th percentiles). If the lower quartile is Q1 and the upper quartile is Q3, then the difference (Q3 - Q1) is called the interquartile range or IQ.

Here, clearly each quartile is a number: the lower quartile is not bounded by Q1; it is Q1 in this context, which is a number that can be subtracted from another number.

My attempts to search for "quartile" on the Web seem to dredge up many more examples of the "number" usage than of the "subset" usage. I can guess a few of reasons for this, though I have not found much other discussion of it:

  1. Unless the number of observations in your sample is divisible by $4$, you will not be able to separate the sample into four equal parts by rank.

  2. Much of statistics has the goal of describing data succinctly, for example by a mean and standard deviation. The four lists of members of each of four equal (or nearly-equal) subsets of a large sample do not constitute a succinct description; in some cases this can be almost as verbose as the entire data set. On the other hand, it requires just three numbers to describe the boundaries between these subsets of the data, hence those three numbers appear frequently in the literature.

  3. There are several competing ways to compute the values that should serve as the "dividing lines" between the four (not necessarily exactly equal) ranked subsets of the data. This leads to a great deal written about "quartiles" using the "number" definition.

But notice that in the quoted passages from the Digest and Handbook, above, there is no ambiguity whatsoever about which meaning of "quartile" is intended. If a particular use of the word could possibly be ambiguous, one can first use the word in an unambiguous context to establish its meaning, or one can simply define it.

David K
  • 108,155
4

The word quartile refers to both the four partitions (or quarters) of the data set, and to the three points that mark these divisions. After all, we can't have one without the other.

When citing a value for a quartile, though, we are specifically referring to the three dividing points, else it'd be meaningless. Thus, the first, second, and third quartiles have a specific value in a data set. These points are often referred to as the lower, middle, and upper quartile.

On the other hand, we can say that there are multiple data points contained in the first, second, third, and fourth quartiles. In this context, we refer to the actual partition.

It all depends on context. The word is malleable, but the intent ought to be clear when used properly.

zahbaz
  • 10,736
2

Note: This is really just a long comment - but maybe it's helpful.

This seems to me to be purely a matter of context. I have never seen anything like this before, with only 3 quartiles - it's written into the word itself that there should be 4 (QUART-iles). That said, this kind of thing happens in mathematics relatively often - there will be multiple uses of the same word or piece of notation. Many times the overlap stems from two different situations with a similar defining characteristic, or with similar associated mental pictures/ideas. Likewise in this case, both meanings are highly similar.

My advice for when doing mathematics is to use whichever one you feel is most convenient or applicable, and if need be, leave a remark or something explaining the convention you chose.

Also being a prospective graduate student and studying for the GRE (both general and in mathematics) I can say that I have never seen a practice question which is ambiguous. Although I can't find any kind of statement from the test-makers that there is one defining way, I can say that when I have seen such questions, they are of the form above. Even if they aren't always that way, I can assure you that there will never be ambiguity on the exam. That is to say, if you compute one, and it is not a choice, computing the other will always be there or vice versa. There will never be both listed as correct answers without clarifying.

A. Thomas Yerger
  • 18,175
  • 4
  • 45
  • 93