4

Consider the binary search problem on a sorted array containing $n$ integers on 16 bits. Everybody agrees that the binary search needs $O(\log(n))$ time, because it makes at worst $O(\log(n))$ steps. But: at each step it needs to calculate a midpoint. The first midpoint is $[n/2]$. The next midpoint could be $[n/2]+[n/4]$. If the searched number is placed in the second half of the array, the midpoint at each step is no less than $[n/2]$. Each calculation of a new midpoint involves at least a number with $O(\log(n))$ bits, hence each midpoint calculation requires $O(\log(n))$. I obtain a more precise complexity of $O(\log(n)^2)$. Where is the error?

The same problem could appear on countless algorithms that use arrays or matrices. Take the most well-known dynamic programming algorithm for the knapsack problem. Everybody agrees it takes $O(nW)$ time (e.g., the wikipedia article on the knapsack problem), where $n$ is the number of items and $W$ is the capacity. But at each step, it needs to compute a difference of weights/capacities, which should account for an additional factor of $O(\log(W))$. Where is the error?

Where/how does this $O(\log(n))$ complexity factor disappear? If we had needed it, it would have appeared in many algorithms.

ps. A very similar problem arises in the questions asked here and here, thanks Ariel, Raphael and ryan for pointing this out.

Daniel Porumbel
  • 271
  • 1
  • 7

2 Answers2

9

You're not wrong, you're just using a different cost model.

Typically there are two:

  • Uniform cost model - assigns a constant cost to every machine operation regardless of size of numbers.

  • Logarithmic cost model - assigns a cost proportional to the number of bits involved for each machine operation.

Under uniform cost, a binary search takes constant operations to find the next index to search at, we get a recurrence relations like so: $$\begin{align} T(1) & = O(1)\\[0.5em] T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + O(1)\\[0.5em] & = \Theta(\log n) \end{align}$$

Under logarithmic cost, we can assume it takes $f(n)$ operations to find the next midpoint and we get a recurrence like so: $$\begin{align} T(1) & = O(1)\\[0.5em] T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + f(n)\\[0.5em] \end{align}$$ For binary search, typically $f(n)$ will be no more than computing $\lfloor \frac{l + r}{2} \rfloor$ which would take $\log_2 n$ operations for the addition, and to divide by $2$ we could do a bit shift taking $\log_2 n$ operations (see here). Which brings us to: $$\begin{align} T(n) & = T\left( \left\lfloor \frac{n}{2} \right\rfloor \right) + \Theta(\log n)\\[0.5em] & = \Theta(\log^2 n) \end{align}$$ These logarithmic costs are just for addition and shifting though. If you were to do other $n$-bit operations then $f(n)$ could change. For example, $n$-bit multiplication and exponentiation could be larger than $\Omega(n)$ and you would have to adjust you complexity accordingly under logarithmic cost.

ryan
  • 4,533
  • 1
  • 16
  • 41
1

Usually people neglect time it takes to perform simple math operations on numbers and assume them to be $O(1)$. In practice, computers work with words, so we assuming bounded number. And it uses summator to add , which is extremely fast, comparing to other $O(log(n))$ operations.

Indeed, if you go really deep, and you have unbounded numbers $n,m$, Turing machine needs $O(log(n)+log(n))$ to add they assuming they are not in unary. With your array problem, I'd suggest looking carefully what is your input size and write complexity very carefully even for math operations, I expect them to be neglecting small. Like size of your array versus bound on numbers in the array and so on.

Eugene
  • 1,106
  • 1
  • 6
  • 14