Why does $O(n \log n)$ seem so linear?

Question

I've implemented an algorithm, that when analyzed should be running with the time complexity of $O(n \log n)$.

However when plotting the computational time against the cardinality of the input set, it seems somewhat linear and computing $R^2$ confirms this somewhat. When then sanity checking myself by plotting $n$ on the $x$-axis and $n \log_2 n$ on the $y$-axis with python, and plotting this it also seemed linear. Computing $R^2$ (scipy.stats.linregress) further confuses me, as I get $R^2=0.9995811978450471$ when my $x$ and $y$ data is created as so:

for n in range(2, 10000000):
    x.append(n)
    y.append(n * math.log2(n))

Am I missing something fundamental? Am I using too few iterations for it to matter? When looking at the graph at http://bigocheatsheet.com/ it does not seem linear at all.

score 1 · Answer 1 · answered Apr 25 '19 at 06:40

Just some general observations.

O(n log n) is only an upper bound. If it's not tight, that's your explanation right there.
A Θ(n log n) running time can have many different components, for instance

$\qquad\displaystyle a \cdot n\log n + b \cdot n \log \log n + c \cdot \sqrt n + d \cdot n + e \cdot \log n + d$

While technically the linearithmic term dominates, if $a$ is small compared to the other coefficients you will have a hard time detecting it.
Measuring wall-clock running time is noisy without end, inparticular because the coefficients mentioned above get skewed by platform details. Try investigating counts, for instance of a dominant operation or block.
Linear regresssion always works. Since the "difference" between $n \log n$ and $n$ is rather small (also considering above point), it's not susprising you'd get a high confidence. Run linearithmic regression and compare!

Why does $O(n \log n)$ seem so linear?

1 Answers1