I know the formula of formula for finding median for grouped data that is $$\mathrm{Median} = L_m + \left [ \frac { \frac{n}{2} - F_{m-1} }{f_m} \right ] \times c$$ and I know what all the letters stand for. But can anyone provide a derivation of this. Because I am very curious on how this comes.
-
@Shahab thank you for a bounty on such an old question – Shivam Patel Jul 22 '14 at 17:12
-
6"I know what all the letters stand for." Good for you! I don't. Why don't you tell us what they stand for? – Jul 28 '14 at 04:27
-
@Rahul: $L_m$ is the lower limit of the median class, $n$ is the total number of observations, $F_{m-1}$ is the cumulative frequency of the class preceding the median class, $f_m$ is the frequency of the median class, $c$ is the class width. – Jul 29 '14 at 10:10
-
https://www.themathdoctors.org/finding-the-median-of-grouped-data/ – Number945 Aug 11 '20 at 15:28
2 Answers
This formula is the result of a linear interpolation, which identifies the median under the assumption that data are uniformly distributed within the median class.
To derive the formula, we can note that since $N/2$ is the number of observations below the median, then $N/2 - F_{m-1}$ is the number of observations that are within the median class and that are below the median ($F_{m-1}$ is the cumulative frequency of the interval below the median class, i.e. of all classes below the median class).
As a result, the fraction $\displaystyle\frac {N/2 - F_{m-1}}{f_m}$ (where $f_m$ is the frequency of the median class) represents the proportion of data values in the median class that are below the median.
Now if we assume that data are uniformly distributed (i.e., equally spaced) within the median class, multiplying the last fraction by $c$ (total width of the median class) we obtain the fraction of median class width corresponding to the position of the median. Adding the result to $L_m$ (lower limit of the median class), we get the final formula $\displaystyle L_m + \left [ \frac { \frac{N}{2} - F_{m-1} }{f_m} \right ] \cdot c$, which identifies the median.
- 17,227
-
4But why do we find the N/2th observation? How will it be the 50th percentile then? Why not find the (N+1)/2th observation as it is done in the case of an individual series? – Shub Aug 10 '20 at 05:41
-
2This method does not give you an exact estimation of the median, but only an approximation. Accordingly, it is used when you have no complete data (e.g. when you only have grouped frequency tables). It is based on the assumption that data are uniformly distributed within the median class, so that you get the estimate by interpolation. Using $N/2$ in the formula gives the point where we expect the median value to be placed under this assumption, irrespective of whether N is even or odd. – Anatoly Aug 11 '20 at 17:38
Suppose you want the fraction value of $4$ boys out of total children $10$ we calculate $\frac{4}{10}$ multiplied by total value .
Similarly we calculate median $=l+x$ , where $x$ is the fraction value of median observation .
Thus we calculate $x = \frac{1}{f}(\frac{n}{2}- cf) ×c $
- 9,640
- 19