5

If you have a long string of length $n$ and a shorter string of length $m$, what is a suitable recurrence to let you compute all $n-m+1$ Levevenshtein distances between the shorter string and all substrings of the longer string of length $m$?

Can it in fact be done in $O(nm)$ time?

Raphael
  • 73,212
  • 30
  • 182
  • 400
Simd
  • 1,036
  • 6
  • 17

1 Answers1

1

If you want the minimum number of single-character changes (insert, delete, substitute) that transforms the shorter string $S$ into some length-$m$ substring of $T$, then this can be done in $O(nm)$ time.

The standard algorithm for computing the edit distance between two strings runs in $O(nm)$ time. It uses dynamic programming to build up a matrix $d[\cdot,\cdot]$ where $d[i,j]$ denotes the minimum number of single-character changes that transform $S[0\ldots i]$ to $T[0 \ldots j]$.

You can use a simple variation of this algorithm to build up a matrix $d'[\cdot,\cdot]$ where $d'[i,j]$ denotes the minimum number of single-character changes that transform $S[0\ldots i]$ to some suffix of $T[0 \ldots j]$ (i.e., to some string $T[k \ldots j]$ for some $k\le j$). The running time remains $O(nm)$. This is sometimes known under the name fuzzy string search.

This gives you a single distance, which represents the smallest number of single-character changes needed to transform $S$ into a length-$m$ substring of $T$. (If you wanted to get all $n-m+1$ edit distances from $S$ to each length-$m$ substring of $T$, I don't know whether that can be done in $O(nm)$ time.)

D.W.
  • 167,959
  • 22
  • 232
  • 500