Let $T$ be a tree with$n$ vertices, having height $h$. If there are any internal vertices in $T$ at levels less than $h — 1$ that do not have two children, take a leaf at level $h$ and move it to be such a missing child. This only lowers the average depth of a leaf in this tree, and since we are trying to prove a lower bound on the average depth, it suffices to prove the bound for the resulting tree. Repeat this process until there are no more internal vertices of this type. As a result, all the leaves are now at levels $h — 1$ and $h$. Now delete all vertices at level $h$. This changes the number of vertices by at most (one more than) a factor of two and so has no effect on a big-Omega estimate (it changes $\log n$ by at most 1). Now the tree is complete, and it has $2^{h-1}$ leaves, all at depth $h — 1$, where now $n = 2^{h-1}$. The desired estimate follows. The statement above is the answer from the textbook,but I couldn't understand it. Is there anyone can give me a more explicit answer?
-
1Can you be more precise about which parts you understand and which parts you don't? – Kimball May 18 '15 at 13:11
-
As a result, all the leaves are now at levels h—1 and h ...... Now the tree is complete, ..... @Kimball – ZWHmepsy May 19 '15 at 02:03
2 Answers
We can do much better than this lower bound and compute the exact asymptotics.
Remark. I just realized I computed the average depth of the internal nodes and not the leaves. Correction TBA.
Note that the functional equation for binary trees classified by height is $$T(z, u) = 1 + uzT(uz, u)^2.$$
As we are interested in the average we need to compute $$G(z) = \left.\frac{\partial}{\partial u} T(z, u)\right|_{u=1}.$$
We differentiate the functional equation, getting $$\frac{\partial}{\partial u} T(z, u) = z T(uz, u)^2 + 2uz T(z, u) \left(z\frac{\partial}{\partial z} T(z, u) + \frac{\partial}{\partial u} T(z, u)\right).$$
Writing $T(z)$ for the solution of $T(z) = 1 + z T(z)^2$, the generating function of the Catalan numbers $$C_n = \frac{1}{n+1} {2n\choose n}$$ we set $u=1$ to obtain $$G(z) = zT(z)^2 + 2z T(z) (z T'(z) + G(z))$$ which yields $$G(z) = z\frac{T(z)^2+2z T(z) T'(z)}{1-2z T(z)}.$$
From the functional equation we have $$T'(z) = T(z)^2 + 2z T(z) T'(z)$$ or $$T'(z) = \frac{T(z)^2}{1-2zT(z)}$$ so $G(z)$ becomes $$G(z) = z\frac{T(z)^2+2z T(z)^3/(1-2zT(z)) }{1-2z T(z)} \\ = z\frac{T(z)^2-2zT(z)^3+2z T(z)^3}{(1-2z T(z))^2} \\ = z\frac{T(z)^2}{(1-2z T(z))^2}.$$
To extract coefficients from this we use Lagrange inversion in the integral $$[z^n] G(z) = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} z\frac{T(z)^2}{(1-2z T(z))^2} \; dz.$$
Put $w=T(z)$ so that $w = 1 + z w^2$ or $$z = \frac{w-1}{w^2} \quad\text{and}\quad dz = \left(\frac{1}{w^2}-2\frac{w-1}{w^3}\right) \; dw.$$
This gives for the integral $$\frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{w^{2n}}{(w-1)^n} \frac{w^2}{(1-2(w-1)/w)^2} \left(\frac{1}{w^2}-2\frac{w-1}{w^3}\right) \; dw \\ = \frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{w^{2n}}{(w-1)^n} \frac{w^4}{(w-2(w-1))^2} \left(\frac{1}{w^2}-2\frac{w-1}{w^3}\right) \; dw \\ = \frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{w^{2n}}{(w-1)^n} \frac{w^4}{(1-(w-1))^2} \frac{1-(w-1)}{w^3} \; dw \\ = \frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{w^{2n+1}}{(w-1)^n} \frac{1}{1-(w-1)}\; dw.$$
Expanding into a series we have $$\frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{1}{(w-1)^n} \sum_{q=0}^{2n+1} {2n+1\choose q} (w-1)^q \sum_{p=0}^\infty (w-1)^p \; dw$$
which yields $$\sum_{q=0}^{n-1} {2n+1\choose q} = -{2n+1\choose n} + \sum_{q=0}^{n} {2n+1\choose q} = -{2n+1\choose n} + 2^{2n}.$$
This gives the sequence $$1, 6, 29, 130, 562, 2380, 9949, 41226, 169766,\ldots$$ which is OEIS A008549 where we learn of additional combinatorial interpretations.
For the average we get $$-\frac{1}{n} C_n^{-1} {2n+1\choose n} + \frac{1}{n} 2^{2n} C_n^{-1}.$$
The second term produces the dominant asymptotics while the first term yields $$-\frac{n+1}{n} \frac{(n!)^2}{(2n)!} \frac{(2n+1)!}{n!(n+1)!} = -\frac{2n+1}{n} = -2 - \frac{1}{n}.$$
Now using the asymptotics of the Catalan numbers which are $$C_n\sim \frac{4^n}{n^{3/2}\sqrt{\pi}}$$
we finally obtain for the average depth the value $$\frac{1}{n} 2^{2n} \frac{n^{3/2}\sqrt{\pi}}{4^n}$$ which simplifies to $$\sqrt{\pi n}.$$
Addendum, average height of leaves.
Note that the functional equation for binary trees classified by total height of the leaves is (use the fact that a tree on $n$ internal nodes has $n+1$ leaves) $$T(z, u) = 1 + u^2 z T(uz, u)^2.$$
As we are interested in the average we need to compute $$G(z) = \left.\frac{\partial}{\partial u} T(z, u)\right|_{u=1}.$$
We differentiate the functional equation, getting $$\frac{\partial}{\partial u} T(z, u) = 2u z T(uz, u)^2 + 2u^2 z T(z, u) \left(z\frac{\partial}{\partial z} T(z, u) + \frac{\partial}{\partial u} T(z, u)\right).$$
We set $u=1$ to obtain $$G(z) = 2z T(z)^2 + 2z T(z) (zT'(z)+G(z))$$ which yields $$G(z) = z\frac{2T(z)^2 + 2zT(z)T'(z)}{1-2zT(z)}.$$
Substituting in the formula for $T'(z)$ yields $$G(z) = z\frac{2T(z)^2 + 2zT(z)^3/(1-2zT(z))}{1-2zT(z)} \\ = z\frac{2T(z)^2 -4zT(z)^3 + 2zT(z)^3}{(1-2zT(z))^2} \\ = z\frac{2T(z)^2 - 2zT(z)^3}{(1-2zT(z))^2} \\ = 2z T(z)^2\frac{1 - zT(z)}{(1-2zT(z))^2}.$$
This time the Lagrange inversion integral is $$[z^n] G(z) = \frac{1}{2\pi i} \int_{|z|=\epsilon} \frac{1}{z^{n+1}} 2z T(z)^2\frac{1 - zT(z)}{(1-2zT(z))^2} \; dz.$$
Use the same substitution as before to obtain for the integral $$\frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{2w^{2n}}{(w-1)^n} w^2 \frac{1-(w-1)/w}{(1-2(w-1)/w)^2} \left(\frac{1}{w^2}-2\frac{w-1}{w^3}\right) \; dw \\ = \frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{2w^{2n}}{(w-1)^n} w \frac{1}{(1-2(w-1)/w)^2} \frac{1-(w-1)}{w^3} \; dw \\ = \frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{2w^{2n}}{(w-1)^n} \frac{1}{(1-(w-1))^2} (1-(w-1)) \; dw \\ = \frac{1}{2\pi i} \int_{|w-1|=\epsilon} \frac{2w^{2n}}{(w-1)^n} \frac{1}{1-(w-1)} \; dw.$$
Re-write this to prepare for coefficient extraction: $$\frac{2}{2\pi i} \int_{|w-1|=\epsilon} \frac{1}{(w-1)^n} \sum_{q=0}^{2n} {2n\choose q} (w-1)^q \sum_{p=0}^\infty (w-1)^p \; dw.$$
This gives $$2\sum_{q=0}^{n-1} {2n\choose q} = 2^{2n} - {2n\choose n}.$$
This is the sequence $$0, 2, 10, 44, 186, 772, 3172, 12952, 52666, 213524,\ldots$$ which is OEIS A068551.
For the asymptotics of the lower order term we get $$-\frac{1}{n}{2n\choose n} C_n^{-1} = -\frac{n+1}{n} = -1 - \frac{1}{n}.$$
For the dominant term we get the same as before $$\frac{1}{n} 2^{2n} \frac{n^{3/2}\sqrt{\pi}}{4^n}$$ which simplifies to $$\sqrt{\pi n}.$$
Concrete verification. These data can be verified with the combstruct package (Maple). This is the code.
with(combstruct);
gf :=
proc(n)
option remember;
local trees, leaves;
trees := { T=Union(V, Prod(Z, Sequence(T, card=2))),
Z=Atom, V=Atom };
leaves :=
proc(struct, height)
if type(struct, function) then
if op(0, struct) = Sequence then
return add(leaves(op(q, struct), height+1),
q=1..nops(struct));
else
return add(leaves(op(q, struct), height),
q=1..nops(struct));
fi;
fi;
if struct = Z then return 0 fi;
return height;
end;
add(u^leaves(t, 0), t in allstructs([T, trees], size=2*n+1));
end;
f := n -> subs(u=1, diff(gf(n), u));
The output is
> seq(f(n), n=1..8);
2, 10, 44, 186, 772, 3172, 12952, 52666
which confirms the data from above.
A slightly different approach may be consulted at this MSE link.
- 64,728
What is written is not quite correct.
First off, you need to assume something like your tree is binary, because if it is a star graph, all vertices, except the root, have degree 1.
Now the idea of the argument is, given a binary tree, balance it out. This means move the leaves of the tree higher up until one gets a tree of some minimal height, call it $h$. (In the OP, $h$ is overused, and cannot mean both the initial height and the height after doing this process--e.g., think of a path graph (i.e., a tree with only one vertex at each depth).) This process will only lower the average depth of the tree, and as in the link, now $h$ is within one of $\log_2(n)$.
- 3,197