4

What is the average length of longest substring occurring at more than one position in a uniformly random binary string S of length N ?

For example,

n  answer
1  0/2
2  2/4
3  10/8
4  26/16

For n=4

0000  3
0001  2
0010  1
0011  1
0100  1
0101  2
0110  1
0111  2
1000  2
1001  1
1010  2
1011  1
1100  1
1101  1
1110  2
1111  3
-------
      26

Generalized version: in stead of a binary string, S is a uniformly random string of length N with alphabet of size M.

RobPratt
  • 50,938
VainMan
  • 1,745

1 Answers1

1

By Wikipedia: Longest repeated substring problem , we need to study the average depth of suffix trees of random binary strings. By Analysis of the average depth in a suffix tree under a Markov model, the average depth is $O(\log(N)/h)$, where $h$ is the entropy of the process producing the random bits (and the logarithm is probably base $2$ although all logarithms of $N$ are equivalent under a big-O). If the process produces both 0 and 1 bits uniformly at random, the entropy is $1$ (measured in bits), so $O(\log(N))$. Of course, if the bits aren't uniformly randomly generated, the entropy is lower, the expected depth of the suffix tree increases, and the expected length of the longest repeated substring increases.

Your generalized version is also covered in the paper. Notice that a larger alphabet produces greater entropy produces shorter repeated substrings.

Eric Towers
  • 70,953