3

Short version: I want to know where the $-2$ comes from in the formula on p. 221 of CLRS 3rd edition.

Long version: CLRS (3rd ed.) give an algorithm for $O(n)$ worst case arbitrary order statistic of $n$ distinct numbers. The algorithm is roughly:

Input: an array of $n$ elements and $i$, the number of the order statistic to return from the elements.

  1. Divide the $n$ elements into $\lfloor n/5 \rfloor$ groups of 5 elements each along with an optional group containing $n\mod{5}$ elements (resulting in $\lceil n/5 \rceil$ groups.)
  2. Find the median of each of the groups by sorting.
  3. Recurse, using the $\lceil n/5 \rceil$ medians as the array and $\lfloor\lceil n/5 \rceil/2\rfloor$ as the order statistic, resulting in the median-of-medians.
  4. Partition the $n$ elements around the median-of-medians (using a quicksort-like $O(n)$ partitioning algorithm.
  5. Letting $k-1$ be the number of elements less than the median-of-medians, if $i = k$, return the median-of-medians. Otherwise recurse: if $i < k$ then recurse finding the $i$th order statistic of the $k-1$ elements less than the median-of-medians; if $i > k$, then recurse finding the $i-k$th order statistic of the $n-k$ elements greater than the median-of-medians.

Output: the $i$th order statistic of the $n$ numbers.

In the proof of the runtime, CLRS argue that the number of elements greater than the median-of-medians is at least:

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil - 2\bigg) $$

The reasoning is that half of the medians are greater than the median-of-medians, and each of those medians' groups has at least three elements greater than the median-of-medians (the median itself plus the two elements greater than the median.) That would result in

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil\bigg) $$

for the lower bound on the number of elements greater than the median-of-medians.

But we must account for two things: the group containing the median-of-medians (the median-of-medians is not greater than itself) and the group that contains the modulo leftovers. To account for the group containing the median-of-medians, we subtract 1, resulting in:

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil\bigg) - 1 $$

and I think that for the modulo leftovers group, we should subtract 4, because the least number of elements in the group is 1. So that would give:

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil\bigg) - 5 $$

which can be transformed into

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil - 2\bigg) + 1 $$

Why does my analysis lead to a lower-bound 1 greater than that given in CLRS?

Raphael
  • 73,212
  • 30
  • 182
  • 400
Carl G
  • 133
  • 6

1 Answers1

3

In

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil - 2\bigg) $$

we are subtracting 2 in order to discard the group containing the median of medians and the group of leftovers. So, 2 is the number of groups we are discarding.

First of all, note that there may not be a group of leftover elements: if $n$ is an exact multiple of 5 there will be no leftover group. We are interested in bounding from below the number of elements greater than the median-of-medians in the worst case, so suppose that a leftover group exists. Therefore, it will contain at least an element and no more than 4 elements. If you want to reason in terms of elements to be discarded and not in terms of groups, then we must discard exactly 3 elements for the group containing the median of medians (including the median of the medians), and at most 2 element from the leftovers group (if this group contains 1 or 2 elements you do not discard any element; if this group contains 3 or 4 elements, then you discard respectively 1 or 2 elements which are greater than the group's median). So, you discard in the worst case (leftover group with 4 elements) 3 + 2 elements:

$$ 3 \bigg(\bigg\lceil \frac{1}2 \bigg\lceil{\frac{n}5} \bigg\rceil \bigg\rceil\bigg) - 5 $$

Massimo Cafaro
  • 4,360
  • 19
  • 27