19

Wikipedia says union by rank without path compression gives an amortized time complexity of $O(\log n)$, and that both union by rank and path compression gives an amortized time complexity of $O(\alpha(n))$ (where $\alpha$ is the inverse of the Ackerman function). However, it does not mention the running time of path compression without union rank, which is what I usually implement myself.

What's the amortized time complexity of union-find with the path-compression optimization, but without the union by rank optimization?

D.W.
  • 167,959
  • 22
  • 232
  • 500
Drathier
  • 720
  • 1
  • 4
  • 13

2 Answers2

9

Seidel and Sharir proved in 2005 [1] that using path compression with arbitrary linking roughly on $m$ operations has a complexity of roughly $O((m+n)\log(n))$.

See [1], Section 3 (Arbitrary Linking): Let $f(m,n)$ denote the runtime of union-find with $m$ operations and $n$ elements. They proved the following:

Claim 3.1. For any integer $k>1$ we have $f(m, n)\leq (m+(k−1)n)\lceil \log_k(n) \rceil$.

According to [1], setting $k = \lceil m/n \rceil + 1$ gives $$f(m, n)\leq (2m+n) \log_{\lceil m/n\rceil +1}n$$.

A similar bound was given using a more complex method by Tarjan and van Leeuwen in [2], Section 3:

Lemma 7 of [2]. Suppose $m \geq n$. In any sequence of set operations implemented using any form of compaction and naive linking, the total number of nodes on find paths is at most $(4m + n) \lceil \log_{\lfloor 1 + m/n \rfloor}n \rceil$ With halving and naive linking, the total number of nodes on find paths is at most $ (8m+2n)\lceil \log_{\lfloor 1 + m/n \rfloor} (n) \rceil $.

Lemma 9 of [2]. Suppose $m < n$. In any sequence of set operations implemented using compression and naive linking, the total number of nodes on find paths is at most $ n + 2m \lceil \log n\rceil + m$.

[1]: R. Seidel and M. Sharir. Top-Down Analysis of Path Compression. Siam J. Computing, 2005, Vol. 34, No. 3, pp. 515-525.

[2]: R. Tarjan and J. van Leeuwen. Worst-case Analysis of Set Union Algorithms. J. ACM, Vol. 31, No. 2, April 1984, pp. 245-281.

6

I don't know what the amortized running time is, but I can cite one possible reason why in some situations you might want to use both rather than just path compression: the worst-case running time per operation is $\Theta(n)$ if you use just path compression, which is much larger than if you use both union by rank and path compression.

Consider a sequence of $n$ Union operations maliciously chosen to yield a tree of depth $n-1$ (it is just a sequential path of nodes, where each node is the child of the previous node). Then performing a single Find operation on the deepest node takes $\Theta(n)$ time. Thus, the worst-case running time per operation is $\Theta(n)$.

In contrast, with the union-by-rank optimization, the worst-case running time per operation is $O(\log n)$: no single operation can ever take longer than $O(\log n)$. For many applications, this won't matter: only the total running time of all operations (i.e., the amortized running time) will matter, not the worst-case time for a single operation. However, in some cases the worst-case time per operation might matter: for instance, reducing the worst-case time per operation to $O(\log n)$ might be useful in an interactive application where you want to make sure that no single operation can cause a long delay (e.g., you want a guarantee that no single operation can cause the application to freeze for a long time) or in a real-time application where you want to ensure that you will always meet the real-time guarantees.

D.W.
  • 167,959
  • 22
  • 232
  • 500