5

The classic Huffman algorithm, as Wikipedia states, finds an optimal prefix-free binary code with minimum expected codewords length, given a set of symbols and their weights. Now, suppose codewords for some (but not all) symbols are already assigned (maybe in a suboptimal way). How should we assign remaining codewords to the remaining symbols so as to minimize the minimum expected codewords length, assuming that there exists at least one binary sequence such that none existing codewords have it as their prefix, and neither of the existing codewords is its prefix?

Misha
  • 151
  • 4

1 Answers1

1

"Ordinary" Huffman is suboptimal

Let's call a symbol "free" if it has not yet been assigned any codeword. Here's an easy counterexample to the optimality of running the ordinary Huffman algorithm on the set of free symbols:

Suppose we have three symbols $a, b, c$, with frequencies $0.5, 0.25, 0.25$, of which $b$ and $c$ are free; $a$ has already been assigned the codeword $00$. Then the unique minimum-size tree in which $b$ and $c$ -- the lowest-frequency free symbols -- are siblings is

  /\
 / /\
a b  c

with expected codeword length $0.5*2 + 0.25*2 + 0.25*2 = 2$, which is strictly greater than that of either of the two trees in which they are not siblings:

  /\        /\
 /\ c      /\ b
a  b      a  c

which each have expected codeword length $0.5*2 + 0.25*2 + 0.25*1 = 1.75$.

j_random_hacker
  • 5,509
  • 1
  • 17
  • 22