1

It is known that for the number $c(n)$ of phrases / tupel of the LZ compression for binary words of length $n$ the following relation holds: $$c(n)\leq\frac{n}{(1-\epsilon_n)\log_2 n}$$ With $\epsilon_n\to 0$ for $n\to\infty$.

The proof is made in Thomas & Cover: Elements of Information Theory (Lemma 12.10.1, page 320 in the linked chapter).

I tried to generalize it to an alphabet of size $k$ by adjust the proof step by step, but I failed. So, my question:

How can I prove that the number $c(n)$ of phrases / tupel of the LZ compression is bounded by $$c(n)\leq\frac{n}{(1-\epsilon_n)\log_k n}$$ for all words of length $n$ over an alphabet of size $k$ with $\epsilon_n\to0$ for $n\to\infty\;?$

Danny
  • 1,004
  • 5
  • 10

1 Answers1

4

You don't need to redo the proof for this, simply note that $n$ symbols of an alphabet of size $k$ can be represented with $n \log_2(k)$ bits. The Lempel-Ziv bound is then:

$\mbox{# phrases} \leq \frac{n log_2 k}{(1-\epsilon_{n \lg k})log_2(n \log_2 k)}$

Dividing numerator and denominator by $\log_2 k$ then gives:

$\mbox{# phrases} \leq \frac{n}{(1-\epsilon_{n \lg k})\left(log_k(n) + \log_2(\log_2(k))/log_2(k)\right)}$

Since $\epsilon_n \longrightarrow 0$ as $n \longrightarrow \infty$, the result follows.

Ari Trachtenberg
  • 652
  • 4
  • 10