What are the effects of the alphabet size on construct algorithms for suffix trees?

Question

For what size alphabet does it take longer to construct a suffix tree - for a really small alphabet size (because it has to go deep into the tree) or for a large alphabet size? Or is it dependent on the algorithm you use? If it is dependent, how does the alphabet size affect Ukkonen's algorithm?

A.Schulz · Answer 1 · 2012-11-23T09:43:36.243

A larger alphabet is usually a drawback. However there are algorithms that can deal with this as long as the alphabet size is $n^{O(1)}$.

Ukkonen's algorithm runs only in $O(n)$ if the alphabet size is a constant but it is $O(n \log n)$ without this assumption. However, there are alternatives. You can compute the suffix-array of a text in linear time with the DC-3 Algorithm. This is a super-cool fancy algorithm that can be implemented in 50 lines of readable C++ code - one of my all-time favorites. If you can compare two characters in constant time and the alphabet size is $n^{O(1)}$, then the DC3 algorithm runs in $O(n)$ time.

Notice that you can get the suffix tree out of the suffix array in $O(n)$ time, when you have the LCP-array. Basically, you compute the Cartesian tree of the LCP-array and use the suffix-array to label the nodes. The LCP-array can be also computed with the DC3-algorithm.

What are the effects of the alphabet size on construct algorithms for suffix trees?

1 Answers1

Linked