5

Given n, I want to randomly generate a binary tree (unlabelled) that has n end nodes. Could someone kindly provide a reference containing an algorithm for doing that?

I attempted to do as follows: From a PRNG obtain n PRNs in [0.0, 1.0) as (relative) frequencies of n symbols for generating a Huffman tree (used in data compression). But, if the PRNs used are uniform, then I think this would highly favour generation of those Huffman trees that are more flat and Huffman trees corresponding to widely different frequencies of the symbols would be highly suppressed in the generation process. If this is correct, how could one do better? Thanks in advance.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Mok-Kong Shen
  • 535
  • 4
  • 11

2 Answers2

12

Binary trees are counted by Catalan numbers: the number of "full" binary trees (each node has either 0 or 2 children) with $n$ leaves is $C_{n-1}$. The Catalan numbers are given by an explicit formula, and also satisfy the recurrence $$ C_{n+1} = \sum_{i+j = n} C_i C_j, $$ which follows directly from the interpretation as counting binary trees. You can use this formula to decode a number in $[0,C_{n-1}) = \{0,\ldots,C_{n-1}-1\}$ to a binary tree. If the index is chosen randomly, you obtain a random binary tree.

The decoding procedure is recursive. First, we partition $[0,C_{n-1})$ into intervals of lengths $C_0 C_{n-2}, C_1 C_{n-3}, \ldots, C_{n-2} C_0$. If the number falls in the interval of length $C_i C_j$, then the left subtree will have $i+1$ leaves, and the right subtree will have $j+1$ leaves. Within the interval, we have an index in the range $[0,C_i C_j)$, which we decode to a pair of indices $x \in [0,C_i)$ and $y \in [0,C_j)$. We recursively compute which tree with $i+1$ leaves is encoded by $x$ and which tree with $j+1$ leaves is encoded by $y$. The recursion stops when $n = 1$, in which case the tree consists of just the root.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
1

I have found an algorithm on p.16 of the book: L. Alonso, R. Schott, Random Generation of Trees, Kluwer, 1995. Note however that the C code for it on p.191-192 of the book is erroneous (from communication of the first author). (I have coded the algorithm in Python and included it in a program of my own: http://s13.zetaboards.com/Crypto/topic/7164646/1/)

Mok-Kong Shen
  • 535
  • 4
  • 11