1

prove that in binary heap buildheap function does at most 2N-2 comparison I don't know how should I prove it I need some hint thanks.

buildheap procedure: we have n element and we build a heap at first and then we start from the first node which is not leaf(floor of n/2) and for each step we have 2 comparison(each node should be compared with min of children and to find min also there is 1 comparison) so we compare this node and we go back until we get to root(and all of these steps are recursive and keeps continuing till we reach the node or right place of node)

buildheap():
    for i ← size / 2 till i > 0
        Max-Heapify(i)
        i ← i-1

Max-Heapify(A, i): left ← 2×i right ← 2×i + 1 largest ← i

if left ≤ length(A) and A[left] > A[largest] then:
    largest ← left

if right ≤ length(A) and A[right] > A[largest] then:
    largest ← right

if largest ≠ i then:
    swap A[i] and A[largest]
    Max-Heapify(A, largest)

I wrote number of comparisons:

node          comparison
 1                 0
 2                 1
 3                 2
 4                 3
 5                 6
 6                 7
 7                 8
 8                 11
 9                 14

I think the worst case is when we have new level with one node(3,5,9,.. node because it has 2 less than the most we have)

but I have another idea:

we have n/2 node which don't move down

n/4 moves one level down

n/8 moves 3 level down

and each moving from level to level needs 2 comparison so we have

$\frac{n}{2} \times 0 + \frac{n}{4} \times 1 + \frac{n}{8} \times 2 + .... = n \times (\frac{1}{4} + \frac{2}{8} + \frac{3}{{16}} + \frac{4}{{32}} + ...)$

sum of $(\frac{1}{4} + \frac{2}{8} + \frac{3}{{16}} + \frac{4}{{32}} + ...)$ is $1$ and it will be multiplied by 2(because of 2 comparison for each move) but we have 2n here not 2n-2

did I make a mistake or the idea is wrong?

negar
  • 21
  • 6

2 Answers2

2

Floyd's method of constructing a heap consists of applying heapify on all non-leaf nodes. Running heapify on a node $v$ costs twice the height of the subtree rooted at $v$ (where the height of a leaf is 0), with only one exception: if the unique deepest path terminates at an only child, then the cost is twice the height minus 1.

Consider now a heap on $n$ elements, and suppose that $2^h \leq n < 2^{h+1}$. Let us first compute the total number of comparison when ignoring the very last level. In other words, we are considering the case of a complete binary tree with $h$ levels, containing $2^h-1$ nodes in total. The total sum of heights is $$ (h-1)2^0 + (h-2)2^1 + (h-3)2^2 + \cdots + 1\cdot 2^{h-2} + 0\cdot 2^{h-1} = \\ (2^0 + 2^1 + \cdots + 2^{h-2}) + (2^0 + \cdots + 2^{h-3}) + \cdots + (2^0) = \\ (2^{h-1} - 1) + (2^{h-2} - 1) + \cdots + (2^1 - 1) = \\ 2^h - h - 1. $$ The total number of comparisons is twice that.

The actual heap contains $m = n - 2^h + 1$ nodes at level $h+1$. These increase the sum of heights by 1 for each node which contains one of these $m$ nodes as a descendant. At depth $d$ (where the root is depth zero), there are $\lceil m/2^{h-d} \rceil$ such nodes. This shows that if $n$ is odd, the number of comparisons is exactly $$ 2(2^h-h-1) + 2\sum_{d=0}^{h-1} \left\lceil \frac{n-2^h+1}{2^{h-d}} \right\rceil. $$

If $n$ is even, then there are some nodes in which the unique deepest path terminates at an only child. To determine how many, let us denote by $x$ the odd leaf. The node $x$ is always the unique child of its parent. Going one level up, $x$ is the unique node at depth 2 iff $n$ is divisible by 4. Continuing in this way, it is not hard to check that if $2^r$ is the largest power of 2 dividing $n$, then the number of such nodes is $r$, and so we have to subtract $r$ from the above formula: $$ 2(2^h-h-1) + 2\sum_{d=0}^{h-1} \left\lceil \frac{n-2^h+1}{2^{h-d}} \right\rceil - r. $$ We can simplify this formula using $2^h/2^{h-d} = 2^d$, to get $$ 2\sum_{d=0}^{h-1} \left\lceil \frac{n+1}{2^{h-d}} \right\rceil - 2h - r. $$ Replacing the ceiling with a floor and changing the order of summation, this simplifies to $$ 2\sum_{d=1}^h \left\lfloor \frac{n}{2^d} \right\rfloor - r. $$ This is always at most $$ 2\sum_{d=1}^h \frac{n}{2^d} < 2n, $$ and so at most $2n-1$. In fact, the same argument shows that the sum itself is less than $n$, and so at most $n-1$, hence the number of comparisons is at most $2n-2$.

We can also get a more explicit expression for this quantity. Let the binary expansion of $n$ be $b_h b_{h-1} \ldots b_0$, where $b_h = 1$. The sum equals $$ (b_h \ldots b_1)_2 + (b_h \ldots b_2)_2 + \ldots + (b_h)_2 = \\ b_h (2^{h-1} + \cdots + 1) + b_{h-1} (2^{h-2} + \cdots + 1) + b^0 (0) = \\ b_h (2^h - 1) + b_{h-1} (2^{h-1} - 1) + \cdots + b_0 (1 - 1) = \\ n - (b_h + \cdots + b_0). $$ Thus, if we denote by $\sigma$ the sum of all digits in the binary representation of $n$, then we get $$ 2n - 2\sigma - r, $$ a formula also appearing on Wikipedia. Since $\sigma \geq 1$, this is always at most $2n - 2$.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
0

we're focusing on the final step of the process. At this point, we assume that the two child heaps are already built and properly organized. The cost to "fix" the heap at this stage is given as 2logn -2 (the last 2).but Additionally, we need to include the cost of the two children. total :T(n)=2T(n/2)+2(log(n)-1) -> 2n-log(n)-2 here: Solving T(n) = 2T(n/2) + log n with the recurrence tree method

                             root 2((logn)-1) 
          fixed heap(2(logn/2)-1)         fixed heap(2(logn/2)-1)