5

some time ago I stumbled upon a fact that the average first appearance of a sequence HH while randomly repeatedly generating flipping coins is higher than for example for HT. This for a fact depends on the number of same symbols that the sequence has in the beginning as in the end.

This made me to formulate a theorem stating that for a finite alphabet $A$ of size $q=|A|$ and a word $S$ from $A$ of length $k=|S|$, and for $c_i=\left[ \left[ s_1s_2\dotsc s_{k-i} = s_{i+1}s_{i+2}\dotsc s_k \right] \right]$ (Iverson's brackets), the expected time of randomly generating symbols of $A$ until $S$ appears is $$\mathbb{E}(N)=\sum_{i=0}^{k-1} q^{k-i}c_i.$$

I know a standard proof using optional stopping theorem from martingales (designing a clever betting game) (see for example here or in this youtube video), but I have been trying to prove it from Markov Chain perspective. Given that the Markov Chain describing the progress is constructed, for example ($\emptyset$)$\rightarrow$ (H) $\rightarrow$ (HT) $\rightarrow$ (HTT) + the edges from HT to H and from H to itself, I know that by obtaining the fundamental matrix I can sum the first row and that's the expected value. But it is not general, I know the chain can be always constructed, but I didn't find any correspondence between transition or fundamental matrices and the symbol-repeating structure.

Then I found out, based on this post, that this expected value can be determined also using the generating function $F$ of the sequences that do not contain $S$ defined by: $$ F(z)=\sum_{n=1}^n f(n)\cdot z^{-n}, $$ where $f(n)$ is number of words of length $n$ not containing $S$. The publication "String Overlaps, Pattern Matching, and Nontransitive Games" by L. J. Guibas and A. M. Odlyzko states (equation 1.4) that this function is equal to $$ F(z)= \frac{z c(z)}{1+(z-q)c(z)},$$ where $c(z)=\sum_{i=0}^{k-1}c_i z^n$ is autocorrelation polynomial. There is a proof (page 191) of more general theorem, but I can not seem to understand it correctly. What I got is that the shape of $F(z)$ follows from some system of equations. First of them follows from $$qf(n)= f(n+1)+f_S(n+1),$$ where $f_S(n)$ is a number of words of length $n$ ending with $S$ without its previous occurrence. This is obvious, by adding a letter to number of words without $S$ of length $n$ we get either words of length $n+1$ ending by $S$ or without $S$ anywhere. By clever multiplications and algebra we can get the equation for corresponding generating function $F(z)$ and $F_S(z)$, where the latter one represents the generating function of words ending with $S$ and without their previous occurrence. The second equation should somehow link the autocorrelation polynomial $c(z)$ and both generating functions. However, I can not seem to find it out. Does anybody stumbled upon this problem themselves or has a time for help?

MatEZ
  • 51

0 Answers0