1

I've implemented an algorithm, that when analyzed should be running with the time complexity of $O(n \log n)$.

However when plotting the computational time against the cardinality of the input set, it seems somewhat linear and computing $R^2$ confirms this somewhat. When then sanity checking myself by plotting $n$ on the $x$-axis and $n \log_2 n$ on the $y$-axis with python, and plotting this it also seemed linear. Computing $R^2$ (scipy.stats.linregress) further confuses me, as I get $R^2=0.9995811978450471$ when my $x$ and $y$ data is created as so:

for n in range(2, 10000000):
    x.append(n)
    y.append(n * math.log2(n))

Am I missing something fundamental? Am I using too few iterations for it to matter? When looking at the graph at http://bigocheatsheet.com/ it does not seem linear at all.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Andreas V.
  • 11
  • 2

1 Answers1

1

Just some general observations.

  • O(n log n) is only an upper bound. If it's not tight, that's your explanation right there.
  • A Θ(n log n) running time can have many different components, for instance

    $\qquad\displaystyle a \cdot n\log n + b \cdot n \log \log n + c \cdot \sqrt n + d \cdot n + e \cdot \log n + d$

    While technically the linearithmic term dominates, if $a$ is small compared to the other coefficients you will have a hard time detecting it.

  • Measuring wall-clock running time is noisy without end, inparticular because the coefficients mentioned above get skewed by platform details. Try investigating counts, for instance of a dominant operation or block.
  • Linear regresssion always works. Since the "difference" between $n \log n$ and $n$ is rather small (also considering above point), it's not susprising you'd get a high confidence. Run linearithmic regression and compare!
Raphael
  • 73,212
  • 30
  • 182
  • 400