How Can the Bounded Search Tree Algorithm for Closest String run in $\mathcal{O}(kd)$ per node?

Question

I am trying to understand an algorithm for solving Closest String using bounded search trees, as found in Parameterized Algorithms (Cygan et al., 2015).

Assume we have a set of $k$ strings $x_1, ..., x_k$ all of length $L$ and an integer $d$. The problem is to determine if there exists a string $y$ such that $d_H(y, x_i) \leq d$ for all input strings $x_i$, where $d_H$ is the Hamming distance. If this is the case, $y$ is known as a center string.

The algorithm goes as follows - I'm going fast since I'm assuming the reader knows about it, just to make sure we're on the same page:

Organize the input strings into a $k \times L$ matrix. If any columns in this matrix have the same character for all strings, delete them, as they are an obvious choice for a solution.

1.5. <we may already be done if the matrix ends up large or small enough>

Let $z = x_1$. If $z$ is a center string, we are done. If $z$ is not a center string, then there exists an $x_i$ with $d_H(x_i, z) > d$, so there are at least $d + 1$ positions where $x_i$ and $z$ differ. If a (hypothetical) string $y$ is a center string, then there is at least one position where $x_i$ and $y$ are the same. So we can pick $d + 1$ of these positions and recurse with a modified $z$ such that $z[p] = x_i[p]$ for each of these positions - for at least one of these $x_i[p]$ must be equal to $y[p]$. Since each level of recursion has at least one branch that brings $z$ closer to $y$, we will reach $y$ after $d$ steps (assuming it exists).

Now the complexity of this is supposed to be $\mathcal{O}(kL + kd(d + 1)^d)$. The $kL$ is for step 1. We can only recurse $d$ times, with $d + 1$ children at each level, that's where the $(d + 1)^d$ comes from. That leaves $kd$ as the time cost of step 2 at each node.

But how is the task of 1. checking whether $z$ is a center string and 2. picking out the $x_i$ and the $d + 1$ positions supposed to be doable in $\mathcal{O}(kd)$? After deleting the trivial columns in step one, our matrix is up to $kd$ columns wide. So comparing $z$ to $x_i$ for each $i$ requires iterating over up to $kd$ characters, leading to $\mathcal{O}(k^2d)$. We cannot make any assumptions about the content of the matrix - the strings could all be identical, the instance could be unsolvable, it could be just-barely solvable - only that its width is between $k + 1$ and $kd$. How does one come up with this complexity? Or have I misunderstood the complexity bound they gave?

Inuyasha Yagami · Accepted Answer · 2023-05-31T09:24:07.407

1

Please see Theorem 1 from the original paper. The distance of $z$ to $x_i$'s can be updated in $O(k)$ time.

Note that $z$ might not necessarily be chosen from the given set of strings $x_1 , \dotsc, x_k$. So, the algorithm change the string $z$ at one position at each step of the recursion. Since only one alphabet is changed, its distance to a particular string can be updated in $O(1)$ time.

edited May 31 '23 at 09:24

answered May 31 '23 at 08:59

Inuyasha Yagami

6,277
1
12
23

How Can the Bounded Search Tree Algorithm for Closest String run in $\mathcal{O}(kd)$ per node?

1 Answers1