Why to calculate "Big O" if we can just calculate number of steps?

Question

I never really understood how we use Big O to estimate algorithm efficiency. I know that $f\left(n\right) = O\left(g\left(n\right)\right)$ as $n \to +\infty$ if there exists some constant $C > 0$, such that $\left|f\left(n\right)\right| \leqslant C\left|g\left(n\right)\right|$. But I can't understand what is the purpose of it and why does it makes sence.

Why do we even calculate it? It doesn't tell us neither exact number of steps in our algorithm nor even an approximate number of steps. I mean, suppose that our algorithm does $10000000n^2 + 10^{1000}$ steps for the input of length $n$. We have $O\left(n^2\right)$ in this case, but what does it tell us? You may say that "with asymptotic notation you are trying to describe what will happen when the size $\left(n\right)$ gets really really big", BUT... if we wrote, for example, $O\left(10000000n^2 + 10^{1000}\right)$ it would tell us the same and would be more informative. What makes us use $O\left(n^2\right)$ instead?

You might say "you throw away all coefficients because if $\left(n\right)$ gets really big, then those coefficients don't matter", but I in my head they do matter: no matter how big $\left(n^2\right)$ is, but $\left(10000000n^2\right)$ is $10000000$ times bigger. So, why do we throw it away? Because it's not important? But who tells us what is important and what isn't?

I am sure I don't get something essential about all this idea of Big O, but I am desperate to find out what exactly.

Because the number of steps is not always deterministic. How many steps does it take to quick sort a list with $n$ items? If the list was in order from the start, very few (something like $O(n\log(n))$, I think?). But in a worst case scenario, it may take significantly more operations (the worst case for quicksort is $O(n^2)$, if I recall correctly). — Xander Henderson, Sep 15 '21 at 00:13
There is also the idea that you are trying to capture what happens when you increase the size of the input. If I have an algorithm which finds the eigenvalues of a matrix in $1000000\log(n)$ steps, and another algorithm which does the same job in $n^2$ steps, I am often going to choose the first algorithm, because it doesn't take too long before $n^2$ is larger than $1000000\log(n)$. Those constants might feel big, but they are nothing when compared to an extra factor of $n$). — Xander Henderson, Sep 15 '21 at 00:17
I'm a bit perplexed at exactly what you're asking here (and whether it's really a mathematical question) - especially with asking "who tells us what is important and what isn't?". The literal answer to your question is: "No one; you can use whatever tool is most appropriate." It might be more answerable if this question asked for situations that show the utility of big O notation or asked when it was appropriate to use or asked what it does and doesn't help with - rather than to ask a question based on the premise that it's the only tool you're allowed to use. — Milo Brandt, Sep 15 '21 at 00:18
@Xander Henderson, thank you for your response, but it doesn't seem to be a good reasoning for calculating Big O just because we can't calculate the actual number of steps. Moreover, if you can't calculate the actual number of steps, then how do you calculate Big O? I mean in both cases you should know what algorithm's efficiency you're estimating. — mathgeek, Sep 15 '21 at 00:21
@MiloBrandt: I think it's fairly indisputable that Big O notation is widespread and it would be a fundamental expectation of most papers of algorithm analysis (and certain areas of analysis) to use it or some other asymptotic notation. The OP simply asks why we would want to seemingly throw away information we have about an algorithm (i.e. is this simply perpetuated because it is popular?). As discussed in my answer, there is a very practical reason to do this and it is not discussed enough given that I myself went through a number of years without this being properly explained. +1 from me. — Isky Mathews, Sep 15 '21 at 00:27
@Milo Brandt, with asking "who tells us what is important and what isn't?" I meant "why we consider coefficients unimportant while using Big O". It was not meant to be a rhetorical question. Moreover, I gave an example, explaining my confusion. I don't insist that it is the only tool I am allowed to use: I just don't understand its meaning, as I described in the question. — mathgeek, Sep 15 '21 at 00:30
Your example is silly. Compute what $n$ would have to be to make $n^2$ significant. Take a look at https://math.stackexchange.com/questions/418372/why-is-big-oh-multiplicative/418390#418390 to see why $O(n^2+n)$ doesn't provide any value over $O(n^2)$. — John Douma, Sep 15 '21 at 01:46
@John Douma, I don't see how you answer is relevant. My question is asking about the meaning, not actual calculation. — mathgeek, Sep 15 '21 at 02:34
Interesting facts related to your question: (1) QuickSort is usually considered faster than MergeSort, even though they have the same average complexity; (2) To multiply two square matrices of size n, we know several algorithms with different complexities O(n^ω) with different values for 2 < ω < 3, and the one that is usually implemented in programming libraries is not the one with the smallest ω, because the algorithm with the smallest ω has a higher multiplicative constant in front of the n^ω, and is thus slower unless the matrices are unfathomably big. — Stef, Sep 15 '21 at 08:38

Isky Mathews · Answer 1 · 2021-09-15T00:36:14.067

The idea is actually a very practical one: what exactly are these "steps" we are counting?

Suppose that we write out the algorithm in our favourite programming language, say Python, and then we decide that a "step" is a given line in our algorithm. We then look at the loops, think about the problem carefully, and decide that this algorithm will take (in the worst case, say) $10n^2+50$ steps (where $n$ is some completely explained parameter relating to the size of the input). What would happen if we ran this?

Well, in reality the "if" statements and the "while" statements and the "for" statements might actually compile to being a number of different commands in Assembly Language (the low-level language that your computer uses at its base level) and the number of these steps might differ (e.g. the if statements become 3 fundamental Assembly steps, for loops take 20, while take 50). These numbers might differ again if you write the same algorithm in a different programming language or you update the same programming language or you switch computers (which use different CPUs with different instruction sets) etc.

All of these minor possible variations in your computational device will actually have an effect on the speed of your algorithm. If you actually implement the same algorithm in different programming languages and plot the time against size of problem, it might differ from your "ideal computed number of steps" function $10n^2+50$ but one thing we can all agree on is that the many many kinds of changes will NOT lead a quadratic-time algorithm to becoming a cubic-time algorithm (unless, for example, your programming language fundamentally misrepresents the architecture of your computer).

THIS is precisely the idea: that the variations between implementations should be seen as unimportant somehow and yet we still need to make concrete mathematical statements about these "fuzzy rates of growth". Big O, Big $\Omega$, Big $\Theta$, little $o$ and various other asymptotic notations are there to solve this problem.

===

EDIT: As an afterthought, I feel I should add that this is kind of part of a more general theme in Theoretical Computer Science: we want results about computability that are true regardless of the specific model of computation we use.

In Computational Complexity Theory, we regularly want to discuss results which say that some problem, in a formal sense, is somehow "difficult to compute" and we don't want this to mean that some person's 1990s Dell finds it difficult to solve. We want this result to have some interpretation that tells you that no matter the computer, the problem is difficult.

Similarly, one of the first theorems anyone ever proves in a discussion concerning Kolmogorov Complexity is an invariance theorem which states that any two Turing machines' Kolmogorov complexity functions are necessarily the same up to some constant difference - this difference is something we're not supposed to care about and tells us that results can be proven about this concept without worrying about the specific computer architecture we're considering.

Thank you for this highly useful answer! If I understood right, you claim that we throw away coefficients ($10$ and $50$ in your example) because they don't affect the actual rate of growth? I understood that variations between implementations will have an effect on the actual number of steps in CPU. But when you say "rate of growth" I still think that $10^{1000}$ can't be ignored because no matter what implementation you use, it's always going to be a large number of additional steps that we've thrown away. — mathgeek, Sep 15 '21 at 01:03
I mean, if you were to choose between $10^{1000}n^2$ and $n^2$ algorithms, what would you choose? One can't just pretend that they have the same efficiency. However we use $O\left(n^2\right)$ in both cases. I mean it doesn't describe the whole picture. It just describes that we take $n$ to the $2$ power. But again, we also multiply $n$ by $10^{1000}$. — mathgeek, Sep 15 '21 at 01:25

zkutch · Accepted Answer · 2021-09-15T00:40:17.443

Certainly big-O is not a perfect tool, but who is perfect? The main idea of creating is to hide/suppress insignificant information in comparison with more significant and create a classification.

Big-O is class, a set of functions i.e. the method of classifying the so-called "speed", the "order" of growth. In fact, this is an abstraction. $10n^2$ and $10^6n^2$ are in the same class, because they have the same "growth rate" which we designate by $O(n^2)$. Dropping the constant in multiplication allows to scale different $10n^2$ and $10^6n^2$ functions and put them in one class.

The second factor of abstraction is taking the neighborhood of infinity, i.e. again we single out a certain property and call it the main one in comparison with another. In a particular case, this may even be erroneous, since today's computers only work for bounded values.

Let me repeat, that this approach has many drawbacks for one given specific case, but any abstraction, any classification that replaces a specific object with an abstract one always loses the specific properties of this particular object.

Comments are not for extended discussion; this conversation has been moved to chat. — Xander Henderson, Sep 15 '21 at 13:12

score 4 · Answer 3 · answered Sep 17 '21 at 17:25

Let me try to give you a different perspective on this. I'm a Site Reliability Engineer. My day job is managing Google's production systems. One of my responsibilities is provisioning and capacity management - that is, my colleagues and I decide how much hardware we need to run our services, and where we need it.

In order to do this, we need to know, calculate, or empirically evaluate all of the following pieces of information:

How many requests (per second) will we need to process?
How big will each request be?
How much does a request cost to process, in terms of how big it is?

For the most part, we evaluate all three of these questions through measurement and load testing - not with Big O. However, sometimes we also need to consider:

Are all requests the same size, and if not, what happens to really big requests?

If our service is $O(1)$, then the answer is "We process them exactly the same as the small requests." If our service is $O(N)$, then the answer is "Each large request is equivalent* to some number of small requests based on its size, so we just need to increase the overall number of requests that we can process, based on the distribution of request sizes that we observe." But if our service is slower than $O(N)$, then we start getting concerned about this question. For example, a $O(N^2)$ service might slow to a crawl if you throw a small number of really big requests at it, whereas it could easily handle a large number of small requests. So now we need to test the service with large requests, which might not have been strictly necessary for a $O(N)$ service, and explicitly account for the overall distribution of request sizes in a much more careful and rigorous fashion than for a $O(N)$ service. Alternatively, we might need to redesign the service, stick a caching layer in front of it, or apply some other mitigation so that we can "pretend" the service is $O(N)$ or faster.

The coefficients are, frankly, not very useful in this context. We're already measuring that information empirically anyway. You might wonder if we could somehow theoretically predict the results of our load tests and benchmarks, and thereby avoid doing them, but in practice, this turns out to be very difficult due to the high complexity of distributed systems. You would need to accurately model and characterize the performance of a lot of moving parts, most of which were not explicitly designed to provide real-time guarantees of performance. Given that empirical load testing is a straightforward and standardized process, it's simply infeasible to try and do this theoretically.

* A linear function may have a non-zero y-intercept, which makes this equivalence technically false. But in most cases, the y-intercept is not large enough for this to make a practical difference. If it is large enough to matter, then we account for it as a separate component of the overall cost. Similarly, very few services are actually $O(1)$, because at an absolute minimum, you need to unmarshal the request, which is itself a $O(N)$ operation, but in practice, this is very cheap and may not be worth accounting for.

Why to calculate "Big O" if we can just calculate number of steps?

3 Answers3

Linked