6

For what size alphabet does it take longer to construct a suffix tree - for a really small alphabet size (because it has to go deep into the tree) or for a large alphabet size? Or is it dependent on the algorithm you use? If it is dependent, how does the alphabet size affect Ukkonen's algorithm?

Raphael
  • 73,212
  • 30
  • 182
  • 400
John Smith
  • 61
  • 1

1 Answers1

6

A larger alphabet is usually a drawback. However there are algorithms that can deal with this as long as the alphabet size is $n^{O(1)}$.

Ukkonen's algorithm runs only in $O(n)$ if the alphabet size is a constant but it is $O(n \log n)$ without this assumption. However, there are alternatives. You can compute the suffix-array of a text in linear time with the DC-3 Algorithm. This is a super-cool fancy algorithm that can be implemented in 50 lines of readable C++ code - one of my all-time favorites. If you can compare two characters in constant time and the alphabet size is $n^{O(1)}$, then the DC3 algorithm runs in $O(n)$ time.

Notice that you can get the suffix tree out of the suffix array in $O(n)$ time, when you have the LCP-array. Basically, you compute the Cartesian tree of the LCP-array and use the suffix-array to label the nodes. The LCP-array can be also computed with the DC3-algorithm.

A.Schulz
  • 12,252
  • 1
  • 42
  • 64