13

B-tree is a data structure, which looks like this:

enter image description here

If I want to look for some specific value in this structure, I need to go through several elements in root to find the right child-node. The I need to go through several elements in the child-node to find its right child-node etc.

The point is, when I have $n$ elements in every node, then I have to go through all of them in the worst case. So, we have $O(n)$ complexity for searching in one node.

Then, we must go through all the levels of the structure, and they're $log_m N$ of them, $m$ being the order of B-tree and $N$ the number of all elements in the tree. So here, we have $O(log N)$ complexity in the worst case.

Putting these information together, we should have $O(n) * O(log n) = O(n * log n)$ complexity.

But the complexity is just $O(log n)$ - why? What am I doing wrong?

Raphael
  • 73,212
  • 30
  • 182
  • 400
Eenoku
  • 252
  • 1
  • 2
  • 10

5 Answers5

13

You have introduced $n$ and $m$ as the order of B-tree, I will stick to $m$.

Their height will be in the best case $\lceil log_m(N + 1) \rceil$, and the worst case is height $\lceil log_{\frac{m}{2}}(N)\rceil$ but there is also a saturation factor $d$, that you have not mentioned.
The height will be $O(log N)$, please notice that $m$ disappeared, because it effectively is multiplication by a constant.
Now at every node you have at most $m$ sorted elements, so you can perform binary search giving $log_2(m)$, so the proper complexity is $O(log(N) * log(m))$.
Since $m << N$, and what is more important, is that it does not depend on $N$, so it should not be mixed, or it might be given explicitly (with $m$ not $N$ or appearing $n$).

KLD
  • 103
  • 3
Evil
  • 9,525
  • 11
  • 32
  • 53
2

The point is, when I have n elements in every node, then I have to go through all of them in the worst case. So, we have O(n) complexity for searching in one node.

No. You would do a binary search in the node, so the complexity of searching in a node is $O(log n)$, not $O(n)$.

TilmannZ
  • 764
  • 4
  • 6
2

Considering this as an order $m$ B-Tree, whether or not you take $m$ to be a constant, worst case search takes $\Theta(\lg N)$ total comparisons ($N$ values total). As is stated in another answer (as a newbie, I cannot comment on it yet), the height of the tree is about $\log_m N = (\lg N)/(\lg m)$. Especially if you are taking $m$ to be variable, it is assumed that you will have a logarithmic search per node, order $O(\lg m)$. Multiplying those terms, $\log_m N \cdot \lg m = ((\lg N) / (\lg m)) \cdot \lg m = \lg N$, you don't have to drop the $\lg m$ term using big-O, they really do cancel.

For most (but not all) analysis on external memory algorithms, page size is not treated as a constant. It isn't wrong to do so, but generally gives more information if you don't. One of the difficult things about external memory algorithms is that you are generally trying to optimize (at least) two different things at once: overall operations, and page accesses, which are so inefficient that you might want to minimize them even if it meant paying some extra in other operations. A B-Tree is so elegant because even when you consider page size as a variable, it is asymptotically optimal for operations on comparison based structures, and simultaneously optimizes for page accesses, $O(\lg_m N)$ per search. Notice how uninteresting that last fact becomes if we just consider $m$ as a constant: of course for a $O(\lg N)$ operation search, it would use $O(\lg N)$ page references. $O(\lg_m N)$ is much more informative.

0

If you have n elements in every node, that means the number of total elements are exponential to n.In complexity analysis n is your total number of elements in the whole tree, so if your tree is balanced there is no way that you would have n elements in any node.

Assume your tree in your question has 4 elements in every node. That means you have 16 nodes in total and in worst case you have to search through root and the node with the searched element ,which makes 4 elements in total, so your total N=16 and in your worst case you still inspect 4 elements, still O(logN).

atakanyenel
  • 212
  • 1
  • 2
  • 9
0

You can have the worst case complexity O(n) if

1) the number of keys per node is unlimited, all the keys end up in one node and for some reason the tree is not rebalanced, and

2) the keys in one node are accessed sequentially, and not in some more efficient way.

That would be a terrible way to implement a B-tree, and even in this case, it's still only the worst case complexity. You are partially right though ;-)