Why is this code uniquely decodable?

Question

Source alphabet: $\{a, b, c, d, e, f\}$

Code alphabet: $\{0, 1\}$

$a\colon 0101$
$b\colon 1001$
$c\colon 10$
$d\colon 000$
$e\colon 11$
$f\colon 100$

I thought that for a code to be uniquely decodable, it had to be prefix-free. But in this code, the codeword $c$ is the prefix of codeword $f$ for example, so it is not prefix-free. However my textbook tells me that its reverse is prefix free (I don't understand this), and therefore it is uniquely decodable. Can someone explain what this means, or why it is uniquely decodable? I know it satisfies Kraft's inequality, but that is only a necessary condition, not a sufficient condition.

Yuval Filmus · Accepted Answer · 2019-03-03T14:09:28.913

Your code has the property that if you reverse all codewords, then you get a prefix code. This implies that your code is uniquely decodable.

Indeed, consider any code $C = x_1,\ldots,x_n$ whose reverse $C^R := x_1^R,\ldots,x_n^R$ is uniquely decodable. I claim that $C$ is also uniquely decodable. This is because $$ w = x_{i_1} \ldots x_{i_m} \text{ if and only if } w^R = x_{i_m}^R \ldots x_{i_1}^R. $$ In words, decompositions of $w$ into codewords of $C$ are in one-to-one correspondence with decompositions of $w^R$ into codewords of $C^R$. Since the latter are unique, so are the former.

Since prefix codes are uniquely decodable, it follows that the reverse of a prefix code is also uniquely decodable. This is the case in your example.

The McMillan inequality states that if $C$ is uniquely decodable then $$ \sum_{i=1}^n 2^{-|x_i|} \leq 1. $$ In other words, a uniquely decodable code satisfies Kraft's inequality. Therefore if all you're interested in is minimizing the expected codeword length, there is no reason to look beyond prefix codes.

Sam Roweis gives in his slides a nice example of a uniquely decodable code which is neither a prefix code nor the reverse of a prefix code: $$ 0,01,110. $$ In order to show that this code is uniquely decodable, it suffices to show how to decode the first codeword of a word. If the word starts with a $1$, then the first codeword is $110$. If it is of the form $01^*$, then it must be either $0$ or $01$. Otherwise, there must be a prefix of the form $01^*0$. We now distinguish several cases:

$$ \begin{array}{c|cccc} \text{prefix} & 00 & 010 & 0110 & 01110 \\\hline \text{codeword} & 0 & 01 & 0 & 01 \end{array} $$ Longer runs of $1$ cannot be decoded at all.

score 5 · Answer 2 · answered Mar 04 '19 at 01:09

If I give you any message that you are supposed to decode, then you can do the following: Reverse the message, starting with the last bit instead of the first bit. Reverse the code words. Decode the message. Reverse the decoded string.

You can do that because after reversing the six code words, you get a prefix-free code: 1010, 1001, 01, 000, 11, 001 is prefix free.

score 0 · Answer 3 · answered Mar 03 '19 at 23:50

If prefix-free means what I think, the reverse of ‘a’ starts with 1, or 10, or 101, none of which is any other whole valid code.

Therefore, if a message ends with 0101, it can only be an ‘a’ and you can apply similar logic to the preceding bit(s).

However, what if there is no end to start from? Well, if the first bit is 1, you know it isn’t ‘a’ or ‘d’. The second bit will eliminate ‘e’ or {‘b’,’c’,’f’}. The third bit might bring it down to one choice, but if not, it is unique by the fourth bit.

As soon as you get to a unique sequence, you restart the algorithm on the next bit.

Why is this code uniquely decodable?

3 Answers3