2

The $y$-fast trie is a data structure for storing a sorted collection of $n$ integers from the range $[0, M)$. It builds on the $x$-fast trie, which also stores elements in this range.

The space usage of an $x$-fast trie is $O(n \log M)$ words. To reduce the space usage to $O(n)$ words, the $y$-fast trie applies a level of indirection. It partitions the range of keys to store into groups of size $\Theta(\log M)$, placing each group into balanced binary search trees. To route queries to the correct BST, it stores the maximum element of each group in a "summary" $x$-fast trie. The space usage for the BSTs is then $O(n)$ machine words (each element is stored in one tree), and the space usage for the $x$-fast trie is then $O\left( \frac{n}{\log M} \cdot \log M \right) = O(n)$ words, for a total space usage of $n$ words.

I was teaching $y$-fast tries the other day and was thinking about reducing the space usage further. If we count out how many words are needed, we need $O(n)$ words for the summary $x$-fast trie, then $3n$ words to store the integers across the BSTs (specifically, each BST entry stores one word for the integer, plus two for the child pointers). In the static case, where elements aren't added or removed, we could replace all the BSTs with a single sorted array of integers, with the $x$-fast trie storing the indices denoting where the blocks start and stop within that array. That reduces our space usage down to $n$ words for the array, plus $O(n)$ words for the $x$-fast trie.

We could then reduce the space usage further by changing the size of our groups from $\Theta(\log M)$ to, say, $\Theta(\log^2 M)$. This makes the space usage of our $x$-fast trie now $O( \frac{n}{\log M})$, and doesn't asymptotically change the costs of successor or predecessor queries. Overall, this gives a space usage of $n + O(\frac{n}{\log M})$ words. This isn't succinct by the formal definition of a succinct structure, but is still quite nice.

More generally, for any fixed $k > 1$, choosing blocks of size $\Theta(\log^k M)$ gives space usage $n + O(\frac{n}{\log^{k-1} M})$ and query time $O(k \log \log M)$.

My questions are the following:

  1. Does this work? As in, is there some nuance I haven't picked up on that messes up the analysis?
  2. Why not do this? Other than history, is there a reason not to choose a larger group size when working with $y$-fast tries?

Thanks!

templatetypedef
  • 9,302
  • 1
  • 32
  • 62

0 Answers0