26

I've heard several times that for sufficiently small values of n, O(n) can be thought about/treated as if it's O(1).

Example:

The motivation for doing so is based on the incorrect idea that O(1) is always better than O(lg n), is always better than O(n). The asymptotic order of an operation is only relevant if under realistic conditions the size of the problem actually becomes large. If n stays small then every problem is O(1)!

What is sufficiently small? 10? 100? 1,000? At what point do you say "we can't treat this like a free operation anymore"? Is there a rule of thumb?

This seems like it could be domain- or case-specific, but are there any general rules of thumb about how to think about this?

Glorfindel
  • 754
  • 1
  • 9
  • 20
rianjs
  • 373
  • 3
  • 10

11 Answers11

45

This is largely piggy-backing on the answers already posted, but may offer a different perspective.

It's revealing that the question discusses "sufficiently small values of n". The whole point of Big-O is to describe how processing grows as a function of what's being processed. If the data being processed stays small, it's irrelevant to discuss the Big-O, because you're not interested in the growth (which isn't happening).

Put another way, if you're going a very short distance down the street, it may be equally fast to walk, use a bicycle, or drive. It may even be faster to walk if it would take a while to find your car keys, or if your car needs gas, etc.

For small n, use whatever's convenient.

If you're taking a cross-country trip, then you need to look at ways to optimize your driving, your gas mileage, etc.

Eric Hughes
  • 641
  • 1
  • 5
  • 11
21

All orders of magnitude involve a constant $C$, several of them actually. When the number of items are large enough that constant is irrelevant. The question is whether the number of items is small enough for that constant to dominate.

Here's a visual way to think about it.

enter image description here

All have a startup constant which determines their starting point on the Y axis. Each also has a critical constant $C$ dominating how fast they will increase.

  • For $O(1)$, $C$ determines the time.
  • $O(n)$ is really $C \times n$, where $C$ determines the angle.
  • $O(n^2)$ is really $(C \times n)^2$, where $C$ determines the sharpness of the curve.

To determine which algorithm you should use, you need to estimate the spot where the runtimes intersect. For example, an $O(1)$ solution with a high startup time or a high $C$ will lose to an $O(n)$ solution with a low startup time and a low $C$ at fairly large numbers of items.

Here's a real world example. You have to move a bunch of bricks across a yard. You can move them a few at a time with your hands, or go get a huge, slow backhoe to lift and drive them over in one trip. What is your answer if there are three bricks? What is your answer if there are three thousand?

Here's a CS example. Let's say you need a list which is always sorted. You could use a tree which will keep itself in order for $O(\log{n})$. Or you could use an unsorted list and re-sort after every insert or deletion at $O(n \log{n})$. Because tree operations are complicated (they have a high constant), and sorting is so simple (low constant), the list will likely win out to hundreds or thousands of items.

You can eyeball this sort of thing, but in the end benchmarking is what will do it. You also have to eyeball how many items you'll typically have, and mitigate the risk of being handed more. You'll also want to document your assumption like "performance will degrade rapidly over $X$ items" or "we assume a maximum set size of $X$".

Because these requirements are subject to change, it's important to put these sorts of decisions behind an interface. In the tree/list example above, don't expose the tree or list. That way, if your assumptions turn out to be wrong, or you find a better algorithm, you can change your mind. You can even do a hybrid and dynamically switch algorithms as the number of items grows.

Schwern
  • 324
  • 1
  • 6
15

The quote is rather vague and imprecise. There are at least three related ways in which it can be interpreted.

The literal mathematical point behind it is that, if you're only interested in instances of size up to some limit then there are only finitely many possible instances. For example, there are only finitely many graphs on up to a hundred vertices. If there are only a finite number of instances, the you can, in principle, solve the problem by just constructing a look-up table of the all the answers to all the possible instances. Now, you can find the answer by first checking that the input isn't too big (which takes constant time: if the input is longer than $k$, it's invalid) and then look up the answer in the table (which takes constant time: there are a fixed number of entries in the table). Note, though, that the actual size of the table is probably infeasibly big. I said there are only a finite number of graphs on a hundred vertices and it's true. It's just that the finite number is bigger than the number of atoms in the observable universe.

A more practical point is that, when we say that the running time of an algorithm is $\Theta(n^2)$, that only means that it's asymptotically $cn^2$ steps, for some constant $C$. That is, there's some constant $n_0$ such that, for all $n\geq n_0$, the algorithm takes roughly $cn^2$ steps. But maybe $n_0=100,000,000$ and you're only interested in instances of size much smaller than that. The asymptotic quadratic bound might not even apply to your small instances. You might be lucky and it might be faster on small inputs (or you might be unlucky and have it be slower). For example, for small $n$, $n^2 < 1000n$ so you'd rather run a quadratic algorithm with good constants than a linear algorithm with bad constants. A real-life example of this is that the asymptotically most efficient matrix multiplication algorithms (variants of Coppersmith–Winograd, running in time $O(n^{2.3729})$) are seldom used in practice because Strassen's $O(n^{2.8074})$ algorithm is faster unless your matrices are really big.

A third point is that, if $n$ is small, $n^2$ and even $n^3$ are small. For example, if you need to sort a few thousand items of data and you only need to sort them once, any sorting algorithm is good enough: a $\Theta(n^2)$ algorithm is still only going to need maybe a few tens of millions of instructions to sort your data, which isn't much time at all on a CPU that can perform billions of instructions per second. OK, there are memory accesses, too, but even a slow algorithm will take less than a second so it's probably better to use a simple, slow algorithm and get it right than to use a complex, fast algorithm and find that it's lightning-fast but buggy and doesn't actually sort the data properly.

David Richerby
  • 82,470
  • 26
  • 145
  • 239
9

Big-O notation really only says something about the behaviour for arbitrary large n. For example, $f (n) = O (n^2)$ means that there is a constant c > 0 and an integer $n_0$ such that $f (n) < c n^2$ for every $n > n_0$.

In many cases, you can find a constant c and say "For every n > 0, f (n) is approximately $c n^2$". Which is useful information to have. But in some cases, this isn't true. If f (n) = $n^2 + 10^{18}$, then this is totally misleading. So just because something is O (n^2) doesn't mean you can switch your brain off and ignore the actual function.

On the other hand, if you only ever encounter the values n = 1, 2 and 3, then in practice it doesn't make a difference what f (n) does for n ≥ 4, so you might as well consider that f (n) = O (1), with c = max (f (1), f (2), f (3)). And that's what sufficiently small means: If the claim that f (n) = O (1) doesn't mislead you if the only values of f (n) that you encounter are "sufficiently small".

gnasher729
  • 32,238
  • 36
  • 56
5

If it doesn't grow, it's O(1)

The author's statement is a bit axiomatic.

Orders of growth describe what happens to the amount of work you must do as N increases. If you know that N doesn't increase, your problem is effectively O(1).

Remember that O(1) doesn't mean "fast". An algorithm that always requires 1 trillion steps to complete is O(1). An algorithm that takes anywhere from 1-200 steps, but never more, is O(1). [1]

If your algorithm takes exactly N ^ 3 steps, and you know that N can't be more than 5, it can never take more than 125 steps, so it's effectively O(1).

But again, O(1) doesn't necessarily mean "fast enough". That's a separate question that depends on your context. If it takes a week to finish something, you probably don't care if it's technically O(1).


[1] Eg, lookup in a hash is O(1), even though hash collisions mean that you may have to look through several items in one bucket, as long as there's a hard limit on how many items can be in that bucket.

Nathan Long
  • 273
  • 2
  • 7
2

Now, I can use a hashtable, and have O(1) lookups (leaving aside the specific implementation of the hashtable), but if I had e.g., a list, I would have O(n) lookups. Given this axiom, these two are the same if the collections are small enough. But at some point they diverge... what is that point?

Practically, it's the point where building the hash table takes more than the benefit you gain from the improved lookups. This will vary a lot based on how often you're doing the lookup, versus how often you're doing other things. O(1) vs O(10) isn't a big deal if you do it once. If you do it thousands of times a second, even that matters (though at least it matters at a linearly increasing rate).

Telastyn
  • 509
  • 2
  • 10
2

While the quote is true (but vague) there are also dangers to it. Imo you should look at complexity in any stage of your application.

It's all too easy to say: hey I only have a small list, if I want to check if item A is in the list I'll just write an easy loop to traverse the list and compare the items.

Then your buddyprogrammer comes along needs to use the list, sees your function and is like: hey I don't want any duplicates in the list so he uses the function for every item added to the list.

(mind you, it's still a small list scenario.)

3 years later I come along and my boss has just made a big sale: our software is going to be used by a big national retailer. Before we only serviced small shops. And now my boss comes at me swearing and shouting, why the software, that has always "worked fine" now is so terribly slow.

Turns out, that list was a list of clients, and our customers had only like maybe a 100 clients, so nobody noticed. The operation of populating the list was basically an O(1) operation, because it took less then a millisecond. Well, not so much when there's 10.000 clients to be added to it.

And years after the original bad O(1) decision, the company almost lost a big client. All because of one little design/assumption error years before.

Pieter B
  • 139
  • 2
1

The motivation for doing so is based on the incorrect idea that O(1) is always better than O(lg n), is always better than O(n). The asymptotic order of an operation is only relevant if under realistic conditions the size of the problem actually becomes large.

If I have two algorithms with these times:

  • log(n)+10000
  • n+1

Then there exists some point where they cross. For n smaller than that, the "linear" algorithm is faster, and for n larger than that, the "logarithmic" algorithm is faster. Many people make the mistake of assuming the logarithmic algorithm is faster, but for small n, it isn't.

If n stays small then every problem is O(1)!

I speculate what's meant here is that if n is limited, then every problem is O(1). For instance, if we're sorting integers, we may chose to use the quicksort. O(n*log(n)) obviously. But if we decide that there can't ever be more than 2^64=1.8446744e+19 integers, then we know that n*log(n) <= 1.8446744e+19*log(1.8446744e+19) <= 1.1805916e+21. Therefore, the algorithm will always take less than 1.1805916e+21 "units of time". As that's a constant time, we can say the algorithm can always be done in that constant time -> O(1). (Note that even if those units of time are nanoseconds, that's a grand total of over 37411 years). But still O(1).

Mooing Duck
  • 131
  • 5
0

My 2 cents:

sentence “For small values of $n$, $O(n)$ can be treated as if it's $O(1)$” considering from formal position, have no sense, because $O(n)$ is not a function, but set of functions, as well as $O(1)$.

It is possible to say more: $n$, as value, is not even in $O(n)$ definition, but rather there is identity function $\text{id}_\mathbb{N}(n)=n$. Variables $n, C, N$ which appears in $O(n)$ definition are variables under quantifiers, so called bound variables, and free variable is only $\text{id}$ : $$O(\text{id})=\left\{g \colon \exists C > 0, \exists N \in \mathbb{N}, \forall n > N, g(n) \leqslant C \cdot \text{id}(n)=n \right\}$$ Really, we have $O(\text{id})$, but it is more habital to speak about it as about $O(n)$.

Now suppose we want to rephrase this sentence to elements of $O(n)$ and $O(1)$. Because $O(1) \subset O(n)$, then we can say, that, of course, some functions from $O(n)$, are $O(1)$.

And at end let me say, that $O$-notation, as asymptotical one, obviously, is not appropriate for some cases. For example, when we want to compare functions on some bounded set from $\mathbb{N}$, then we shouldn't use $O$-notation. Another example is factor constants: when they are important, then $O$ can give wrong answers. Nobody from us are agree to give $2$ billions instead of $1$, but $O(1) = O(2)$.

zkutch
  • 2,389
  • 1
  • 9
  • 14
0

I suspect many of these answers are missing a fundamental concept. O(1) : O(n) is not the same as f(1) : f(n) where f is the same function, because O doesn't represent a single function. Even Schwern's nice graph isn't valid because it has the same Y axis for all lines. To all use the same axis the lines would have to be fn1,fn2 and fn3, where each was a function whose performance could be directly compared to the others.

I've heard several times that for sufficiently small values of n, O(n) can be thought about/treated as if it's O(1)

Well, if n=1 are they are exactly the same? No. A function allowing a variable number of iterations has nothing in common with one that doesn't, big-O notation doesn't care, and neither should we.

Big-O notation is simply there to express what happens when we have an iterative process, and how will performance (time or resources) degrades as 'n' increases.

So to answer the actual question... I would say that those who make that claim don't understand Big-O notation properly, because it's an illogical comparison.

Here's a similar question: If I loop through a string of characters, and I know that in general my strings will be less than 10 characters, can I say that it's the equivalent of O(1), but if my strings were longer then I'd say it was O(n)?

No, because a string of 10 characters takes 10 times as long as a string of 1 character, but 100 times less than a string of 1000 characters! It's O(n).

JSobell
  • 109
  • 1
0

I believe the text you quoted is quite inacurrate (using the word "better" is usually meaningless unless you provide the context: in terms of time, space etc.) Anyway, I believe the simplest explanation would be:

If time of execution grows with the size of an input then it is definitely not $O(1)$ and that should be clear. $O(1)$ does not mean fast. It just means (in terms of time complexity) that time of execution has a constant upper bound.

Now, let's take a relatively small set of 10 elements and have a few algorithms to sort it (just an example). Let's assume that we keep the elements in a structure that also provides us with an algorithm capable of sorting the elements in constant time. Let's say our sorting-algorithms can have following complexities (with big-O notation):

  1. $O(1)$
  2. $O(n)$
  3. $O(nlog(n))$
  4. $O(n^2)$

Which algoritm would you choose? The first answer that comes to mind may be "of course I'll use the $O(1)$ one!", but this is not necessarily correct. What you forget when thinking like that is that the big-O notation hides the constant factor. And if you know your set is pretty small, then this constant factor may be much more important than the asympthotic complexity.

Now let's "reveal" the true complexities of that sorting algorithms mentioned above (where "true" means not hiding the constant), represented by numbers of steps required to finish (and assume all steps take the same amount of time):

  1. $200$ steps
  2. $11n$ steps
  3. $4nlog(n)$ steps (log with base 2)
  4. $1n^2$ steps

If our input is of size 10, then these are exact amounts of steps for every algorithm mentioned above:

  1. $200$ steps
  2. $11 \times 10 = 110$ steps
  3. $4 \times 10 \times 3.32 \approx 134$ steps
  4. $1 \times 100 = 100$ steps

As you see, in this case the apparently worst algorithm with asympthotic complexity $O(n^2)$ is the fastest one, beating algorithms with $O(1), O(n)$ and $O(nlog(n))$ asympthotic complexities. The constant factor hidden by the big-O notation matters here. In my opinion it does not mean that we can treat $O(n^2)$ as better than $O(1)$ (what would it mean anyway?) It means that for sufficiently small input (like you've seen in the example) the $O(n^2)$ may still be faster than $O(1)$ because of the hidden constant. And if the constant is relatively large compared to the size of the input, it may matter more than the asympthotic complexity.

3yakuya
  • 934
  • 6
  • 17