19

Sometimes it's easy to identify the time complexity of an algorithm my examining it carefully. Algorithms with two nested loops of $N$ are obviously $N^2$. Algorithms that explore all the possible combinations of $N$ groups of two values are obviously $2^N$.

However I don't know how to "spot" an algorithm with $\Theta(N \log N)$ complexity. A recursive mergesort implementation, for example, is one. What are the common characteristics of mergesort or other $\Theta(N \log N)$ algorithms that would give me a clue if I was analyzing one?

I'm sure there is more than one way an algorithm can be of $\Theta(N \log N)$ complexity, so any and all answers are appreciated. BTW I'm seeking general characteristics and tips, not rigorous proofs.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
Barry Fruitman
  • 755
  • 3
  • 10
  • 17

7 Answers7

17

Your archetypical $\Theta(n \log n)$ is a divide-and-conquer algorithm, which divides (and recombines) the work in linear time and recurses over the pieces. Merge sort works that way: spend $O(n)$ time splitting the input into two roughly equal pieces, recursively sort each piece, and spend $\Theta(n)$ time combining the two sorted halves.

Intuitively, continuing the divide-and-conquer idea, each division stage takes linear time in total, because the increase in the number of pieces to divide exactly matches the decrease in the size of the pieces, since the time taken by division is linear. The total running time is the product of the total cost of a division stage multiplied by the number of division stages. Since the size of the pieces is halved at each time, there are $\log_2(n)$ division stages, so the total running time is $n \cdot \log(n)$. (Up to a multiplicative constant, the base of the logarithm is irrelevant.)

Putting it in equations (), one way to estimate the running time $T(n)$ of such an algorithm is to express it recursively: $T(n) = 2 T(n/2) + \Theta(n)$. It's clear that this algorithm takes more than linear time, and we can see how much more by dividing by $n$: $$ \frac{T(n)}{n} = \frac{T(n/2)}{n/2} + \Theta(1) $$ When $n$ doubles, $T(n)/n$ increases by a constant amount: $T(n)/n$ increases logarithmically, or in other words, $T(n) = \Theta(n \log n)$.

This is an instance of a more general pattern: the master theorem. For any recursive algorithm that divides its input of size $n$ into $a$ pieces of size $n/b$ and takes a time $f(n)$ to perform the division and recombination, the running time satisfies $T(n) = a \cdot T(n/b) + f(n)$. This leads to a closed form that depends on the values of $a$ and $b$ and the shape of $f$. If $a = b$ and $f(n) = \Theta(n)$, the master theorem states that $T(n) = \Theta(n \log n)$.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
Gilles 'SO- stop being evil'
  • 44,159
  • 8
  • 120
  • 184
11

Two other categories of algorithms that take $\Theta(n \log n)$ time:

Algorithms where each item is processed in turn, and it takes logarithmic time to process each item (e.g. HeapSort or many of the plane sweep computational geometry algorithms).

Algorithms where the running time is dominated by a sorting pre-processing step. (For example, in Kruskal's algorithm for minimum spanning tree, we may sort the edges by weight as the first step).

David Richerby
  • 82,470
  • 26
  • 145
  • 239
Joe
  • 4,105
  • 1
  • 21
  • 38
9

Another category: Algorithms in which the output has size $\Theta(n \log n)$, and therefore $\Theta(n \log n)$ running time is linear in the output size.

Although the details of such algorithms often use divide-and-conquer techniques, they don't necessarily have to. The run-time fundamentally comes from the question being asked, and so I think it is worth mentioning separately.

This comes up in data structures which are based on an augmented binary search tree, where each node stores a linear size data structure to search over the leaves in that node's subtree. Such data structures come up often in geometric range searching, and are often based on a decomposition scheme. See Agarwal's Survey.

For a concrete example, consider the range-tree, built to answer two dimensional orthogonal range queries. Although the space was later reduced using some compression techniques to pack multiple objects into a single word, the textbook (and most intuitive) version of the data structure requires $O(n \log n)$ space (each leaf is stored in an auxiliary structure at each node on the path from the leaf to the root, or in $O(\log n)$ places), and the construction algorithm takes time linear in the space requirement.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
Joe
  • 4,105
  • 1
  • 21
  • 38
6

A complexity of $O(n\log n)$ arises from divide and conquer algorithms which divide their input into $k$ pieces of roughly equal size in time $O(n)$, operate on these pieces recursively, and then combine them in time $O(n)$. If you only operate on some of the pieces, the running time drops to $O(n)$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
5

Those are typically algorithms of the "divide and conquer" variety, where the cost of dividing and combining subsolutions isn't "too large". Take a look at this FAQ to see what kinds of recurrences give rise to this behaviour.

vonbrand
  • 14,204
  • 3
  • 42
  • 52
3

Typically, Divide-and-conquer algorithms will yield O(N log N) complexity.

Brent Hronik
  • 201
  • 1
  • 4
-1

A generic example of a loop (not an algorithm per se) that runs in $O(n \log n)$ is the following:

for (i = 0; i < constant; i++){
    for(j = 0; j < n; j++){
        // Do some O(1) stuff on the input
    }
}

// Alternative Variant:
for (i = 0; i < constant; i++){
    for(j = n; j < constant; j++){
        // Do some O(1) stuff on the input
    }
}

(note that this nested loops are not $O(n^2)$. Also note that it isn't divide-and-conquer nor recursive.)

I can't give a concrete example of an algorithm that uses this loop, but it comes up often when coding custom algorithms.