Highest Voted 'edit-distance' Questions - Computer Science Stack Exchange

12

votes

1 answer

Edit distance of list with unique elements

Levenshtein-Distance edit distance between lists is a well studied problem. But I can't find much on possible improvements if it is known that no element does occurs more than once in each list. Let's also assume that the elements are…

asked Jul 26 '15 at 19:55

user362178

221
1
5

12

votes

5 answers

How does editing software (like Microsoft word or Gmail) pick the 2nd string to compare in Levenshtein distance?

I understand the textbook explanation of how to use dynamic programming to find the minimum edit distance between 2 strings but how do we get to pick the 2nd string? I don't think the entire dictionary is compared as sometimes the difference is…

algorithms strings edit-distance

asked Sep 05 '21 at 23:06

heretoinfinity

649
1
6
16

9

votes

2 answers

Alternative to Hamming distance for permutations

I have two strings, where one is a permutation of the other. I was wondering if there is an alternative to Hamming distance where instead of finding the minimum number of substitutions required, it would find the minimum number of translocations…

terminology string-metrics permutations edit-distance

asked Oct 12 '12 at 16:31

user1357015

205
2
5

9

votes

2 answers

What are some efficient ways to find the differences between two large corpuses of text that have similar, but differently ordered content?

I have two large files containing paragraphs of English text: The first text is about 200 pages long and has about 10 paragraphs per page (each paragraph is 5 sentences long). The second text contains almost precisely the same paragraphs and text…

strings data-mining natural-language-processing edit-distance

asked Oct 07 '15 at 02:59

vikram7

191
2

8

votes

1 answer

Find all pairs of strings in a set with Levenshtein distance < d

I have a set of $n = $ 100 million strings of length $l = 20$, and for each string in the set, I would like to find all the other strings in the set with Levenshtein distance $\le d = 4$ from that string. The Levenshtein distance (also called the…

algorithms strings string-metrics edit-distance

asked Feb 18 '16 at 19:24

1''

183
1
6

7

votes

2 answers

Efficient algorithm for edit distance for short sequences

I have an application that needs to compute billions of levenshtein distance between pairs of strings. The strings are short (70 in length) DNA sequences, consisting only of 4 characters. Also it can be assumed that one of the strings is fixed,…

strings edit-distance

asked Jul 26 '17 at 09:30

Ameer Jewdaki

539
2
14

7

votes

1 answer

Levenstein distance and dynamic time warp

I am not sure how to draw parallel between the Wagner–Fischer algorithm and dtw algo. In both case we want to find the distance of each index combination (i,j). In Wagner–Fischer, we initiate the distance by the number of insert we'd have to do from…

algorithms dynamic-programming string-metrics edit-distance time-series-analysis

asked Nov 05 '12 at 22:39

nicolas

325
1
6

7

votes

0 answers

Number of strings at given edit distance

I would like to know the number of strings at edit distance $n$ of a string $s$. I guess this is textbook knowledge... but I cannot find the textbook in question. More formally, I have an alphabet $\Sigma$ (in my case, $|\Sigma| = 4$), and I…

combinatorics strings edit-distance word-combinatorics

asked Apr 25 '19 at 13:50

unamourdeswann

171
1

6

votes

1 answer

Extending ordered tree edit distance to DAGs

Computing edit distance (shortest sequence of edit operations) on ordered trees is a well studied problem with many known algorithms (e.g. Zhang & Shasha, RTED). There is also considerable literature on edit distance for general graph (e.g., this…

graphs edit-distance

asked Nov 23 '17 at 13:11

Martin Modrák

251
1
8

6

votes

1 answer

Why is the running time of edit distance with memoization $O(mn)$?

I understand without memoization it is going to be $O(3^{\max\,\{m,n\}})$ because every call results in extra three calls: thus we end up having a call tree with three children for each node, with height $\max\,\{m,n\}$, m and n being lengths of two…

algorithm-analysis runtime-analysis dynamic-programming edit-distance

asked Dec 09 '14 at 07:40

Sandesh Kobal

163
1
5

5

votes

3 answers

Find member of CFL that is Levenshtein-closest to non-member string

Is there an (efficient?) algorithm which given a context-free language $L$ (given as a grammar) and a string $x$ with $x \not \in L$ computes a $y$ with $y \in L$ and $\forall y': y' \in L \implies d(x, y) \le d(x, y')$, where $d$ is the Levenshtein…

formal-languages context-free string-metrics edit-distance

asked Jul 03 '16 at 14:02

Jonas Kölker

729
3
11

5

votes

1 answer

How to speed up process of finding duplicates/similar items in a large amount of strings?

Our software receives documents (in the order of tens of thousands) from various providers, each document flows through a number of steps, one of those steps finds duplicates and similar documents (within 80% threshold) to this document. We…

strings edit-distance

asked Oct 02 '15 at 11:56

chester89

151
4

5

votes

1 answer

Understanding the heuristic used for approximate string searching through an FSA

The paper I'm looking at: Fast approximate string matching with finite automata (2009) Explanation of the algorithm (from my understanding anyway): A word is inputted into the automaton and from each state, a number of possible actions can be taken…

finite-automata strings edit-distance

asked Jan 21 '15 at 15:17

user2908849

81
3

5

votes

2 answers

How fast can we identifiy almost-duplicates in a list of strings?

I'm having trouble figuring out the upper bound running time for this scenario: Input: $N$ number of strings $M$ upper bound of string length $T$ threshold for edit distance (2 strings with a Damerau-Levenshtein edit distance lower than $T$ are…

algorithms algorithm-analysis runtime-analysis string-metrics edit-distance

asked Jun 09 '14 at 16:10

Eran Medan

431
1
4
12

5

votes

1 answer

Semi-local Levenshtein distance

If you have a long string of length $n$ and a shorter string of length $m$, what is a suitable recurrence to let you compute all $n-m+1$ Levevenshtein distances between the shorter string and all substrings of the longer string of length $m$? Can it…

recurrence-relation dynamic-programming strings string-metrics edit-distance

asked Jun 30 '13 at 11:00

Simd

1,036
6
17

Questions tagged [edit-distance]