Why Djikstra's algorithm is said to have $\mathcal{O}(|V|^2)$ complexity?

Question

Djikstra's algorithm assigns some number to non-removed vertex each time it finds a path from removed vertex to it. Number of assignments is $\mathcal{O}(|V|^2)$. However, complexity of assignment is not $\mathcal{O}(1)$, it is $\mathcal{O}(\log s)$, where $s$ is weight of the path from initial vertex to current.

And this means that complexity is $\mathcal{O}(|V|^2\log s)$.

Am I missing something?

score 6 · Answer 1 · answered May 27 '17 at 22:18

Algorithms are usually analyzed under the word RAM model, in which basic operations on machine words cost $O(1)$. Machine words are defined as words of length $\log n$, where $n$ is the size of the input (in bits).

In the case of algorithms on weighted graphs, we could make the further assumption that weights are put in special registers for which "reasonable" arithmetic operations are $O(1)$. This assumption is made implicitly, without any mention of it, since the exact computation model isn't really needed when designing an algorithm, as long as you're not "cheating". It allows using floating point weights, for example.

The computation model serves two purposes:

To explain what constitutes "cheating", that is, what is not allowed (or rather, what is allow).
To allow proving lower bounds.

If we are not careful with the computation model, we could solve PSPACE-complete problems such as TQBF; see here.

One instantiation of the murky class of computation models for weighted graphs is allowing the weights to fit in a constant number of machine words; then we can just analyze the algorithm using the word RAM model. In this case, any weight encountered during Dijkstra's algorithm also fits in a constant number of machine words, since any such weight is the sum of at most $|V|-1$ weights. Therefore in the word RAM model, assignment (as well as addition) costs $O(1)$ rather than $O(\log s)$.

score 3 · Accepted Answer · edited Jun 16 '20 at 10:30

tl;dr- The idea that setting a value has $\mathcal{O}\left(\log{s}\right)$ complexity follows from a presumed model for numeric data where numbers require $\log{x}$ bits for representation. This isn't the case in most real-world implementations nor in general, so it isn't reflected in common descriptions of algorithm complexity.

However, the expression given in the question seems valid when the presumed data model is used. As such, I'd characterize this expression as implementation-specific; it's more precise when applicable, but it's only applicable when the relevant data model is used.

Intro

It sounds like you're assuming a data model for numeric values where values start at $0$, then each bit expands the range, e.g. as in common binary notation. I'll assume that you handle floating-point extensions in a similar manner, e.g. $0.25$ in decimal is stored as 0.01 in the data model, while $\frac{1}{3}$ would require infinite bits to precisely represent.

In binary, we can distinguish $2^{n_{\text{bits}}}$ of numbers given $n_{\text{bits}}$ of storage. Further, this grows with $x$ as $x$ gets larger if we pin the data-zero to numeric-$0$. So, you could say that the complexity of a value $x$ is then $\log_2\left(x\right)$, which is proportional to $\log\left(x\right)$.

The problem with this logic is that you're presuming a data model for numeric values that's (1) not usually true in implementation nor (2) logically optimal. I'll try to comment on (3) the general problem.

(1) Not true in practice

In most common implementations of algorithms like this, primitive datatypes are used. Such datatypes store the same number of bits regardless of the numeric interpretation of their value.

In these cases (which is most cases), the complexity simply doesn't grow with the value of the weights.

(2) Not true in the general case

If only (1) were the issue, it'd seem like an implementation thing, right? But $\log{\left(x\right)}$ isn't logically true either.

The core problem is that it presumes a numeric model where you start at zero and add bits to represent larger values. While that seems like a sensible approach, it's by no means fundamentally the only reasonable way to store data.

For a practical example, consider a system where weights might be a multiple of $\pi$. To store $\pi$ in the presumed data model, you'd need $$\log{\infty}=\infty$$ bits. That's obviously not possible. But, you can use algebraic logic (think Mathematica) with more abstract data types to bridge the gap.

(3) The general problem

In general, the computer needs to do two things:

Keep expressions for total of path distance.
Be able to compare those expressions to determine if one's larger than the other.

Neither of these require that the numbers be stored in a format with $\mathcal{O}\left(\log{s}\right)$ complexity. For example, consider someone asking a StackExchange question like this:

What's larger:

the distance from Washington DC to Maryland; or

the distance from Earth to the edge of the observable universe to the googolplex^th power to the googolplex^th power.

In answering this question, do you need $\log{\left(\left(\left[{r}_{\text{universe}}\right]^{{10}^{\left({10}^{100}\right)}}\right)^{{10}^{\left({10}^{100}\right)}}\right)}$ of storage in your brain? Or, if they asked about volumes or something else that involves $\pi$, does your brain need infinite data storage?

The answer's obviously no; you have comparison methods that aren't based on the presumed model with $\log_2\left(x\right)$ of storage needed. Likewise, a computer program shouldn't need to use that storage model, either.

Conclusion

We don't attribute $\mathcal{O}\left(\log{s}\right)$ complexity to numeric data because that characterization follows from a model that's neither true in general nor used in practice.

If you do want to include that component in your complexity assessment of an algorithm, you'd need to qualify the complexity claim with the fact that you're considering a special case of the algorithm which uses the presumed data model.