I find this a bit difficult to describe, but I am interested in the following idea :
The LZ algorithm factors (verb) an input stream into adjacent factors, these are by definition the maximal prefixes of the piece of text that occur in the previous text (or equivalently the previous concatenation of LZ factors).
I know and believe that in the long term (given infinite input and infinite window) this coding scheme can achieve the Shannon limit, that it will find all repeat patterns that exist.
However in any given finite text (but with an unbounded window) how optimal is this?
Do the choice of factors earlier in the input have potential detrimental effects later on? For instance, could LZ converge to a choice of factors that omits certain larger factors, or factor-choices that would result in a better cover of the input (i.e. a choice of factors that cover more of the text?).
Or is the optimality of LZ only constrained by the window limit, and the finite nature of a text? Please provide some kind of hand waving or intuitive proof.