Knowing the frequencies of each symbol, is it possible to determine the maximum height of the tree without applying the Huffman algorithm? Is there a formula that gives this tree height?
3 Answers
Huffman coding (asymptotically) gets within one bit of the entropy of the sequence. This means that if you calculate the entropy of your symbol frequencies, you will be (asymptotically) within one bit of the average length (i.e. height) of your code. You can use this average to bound the longest length (on average), or you can use combinatorial methods to get determinsitic bounds.
- 652
- 4
- 10
According to this paper,
it is shown that if $p$ is the range $0<p\leq 1/2$, and if $K$ is the unique index such that $1/F_{K+3}< p \leq 1/F_{K+2}$, where $F_K$ denotes the $K$-th Fibonacci number, then the longest Huffman codeword for a source whose least probability is $p$ is at most $K$, and no better bound is possible.
use std::io;
fn find_k(p: f64) -> Option<usize> {
let (mut a, mut b, mut c) = (0, 1, 1); // starting with the first three Fibonacci numbers
let mut i = 0;
loop {
if (1.0/c as f64) < p && p <= (1.0/b as f64) {
return Some(i - 1);
}
// rolling the Fibonacci sequence
let temp_a = b;
let temp_b = c;
c = b + c;
a = temp_a;
b = temp_b;
i += 1;
// if i > 1e299 as usize { // added a safety measure to prevent potential infinite loops
// break;
// }
}
return None; // K not found for given constraints
}
fn main() {
let mut input = String::new();
io::stdin().read_line(&mut input).unwrap();
let p: f64 = input.trim().parse().unwrap();
input.clear();
match find_k(p) {
Some(k) => println!("{}", k),
None => println!("No suitable K found.")
}
}
- 101
- 4
The pathological case would be when the sorted symbol frequency resembles that of Fibonacci sequence. N:= # of symbols. for N>2, max possible height: N-1. for N == 1 or 2: 1
- 9
- 1