Subintervals during backward search of a BWT

Question

A Burrows-Wheeler Transform (BWT) can be used to find the suffix array intervals for a pattern $P$ by issuing $p=|P|$ paired $rank$ queries \begin{align} s^\prime &= C[P[i]] + rank(s-1, P[i]) + 1 \\ e^\prime &= C[P[i]] + rank(e, P[i]) \end{align} where $s$ denotes the start of the search range and $e$ is the end of the range. Initially $s=1$ and $e=|BWT|$. $C$ is a lookup table containing the count of all alphabet symbols in the BWT that sort lexicographically before $P[i]$.

The search proceeds backwards through the search pattern, so it starts at $i=|P|$ and finishes at $i=1$. For example, given the BWT for string $\text{mississippi\\\$}$ and pattern $P=\text{iss}$, the backward search will first identify the interval $[9,12]$ for $\text{s}$, then $[11,12]$ for $\text{ss}$, and lastly $[4,5]$ for $\text{iss}$.

See [1] for a more complete description of backward search and a walk through the example.

My question is during backward search is it possible to determine which subinterval of the previous character's interval lead to the current character's interval?

For example, given the BWT for string $\text{mississippi\\\$}$ and pattern $P=\text{issi}$, the backward search will first identify the interval $[2,5]$ for $\text{i}$ and then $[9,10]$ for $\text{si}$. What I want to know is what subinterval of $[2,5]$ for $\text{i}$ lead to the $[9,10]$ for $\text{si}$. For this toy example I can see the answer is the subinterval $[3,4]$ of $[2,5]$ for $\text{i}$, but I want to know if/how this can be computed during the paired rank queries described above.

The literature suggests that this could be achieved by utilizing a forward search, but this would require a bidirection representation of the BWT, which seems excessive given that we already know the intervals that contain the subintervals.

Any insight would be greatly appreciated!

score 0 · Accepted Answer · answered Mar 30 '21 at 19:14

The answer is deceptively simple. After a paired $rank$ query for a character $P[i]$ you can compute the positions of $P[i]$ in the BWT by performing a paired $select$ query with the ranks from the paired $rank$ query. As seen in the image below taken from the linked example in the question, the character at position $i$ in a BWT is the left extension of the suffix at position $i$ in the corresponding suffix array. This means the positions of $P[i]$ in the BWT found by the paired $select$ query are also the subinterval of $P[i+1]$'s interval that lead to $P[i]$'s interval.

In other words, the subinterval can be computed during the paired $rank$ query with a complimentary paired $select$ query as follows \begin{align} r_s &= rank(s-1, P[i]) + 1 \\ r_e &= rank(e, P[i]) \\ s^\prime &= C[P[i]] + r_s \\ e^\prime &= C[P[i]] + r_e \\ s^{\prime\prime} &= select(r_s, P[i]) \\ e^{\prime\prime} &= select(r_e, P[i]) \end{align} where $s^{\prime\prime}$ and $e^{\prime\prime}$ denote the subinterval of $s$ and $e$ that lead to the $P[i]$ interval $s^\prime$ and $e^\prime$. As previously described, $s^{\prime\prime}$ and $e^{\prime\prime}$ are also the positions of $P[i]$ in the BWT.

Subintervals during backward search of a BWT

1 Answers1