Use for generic string matching that may be exact substring matching (though then prefer the tag exact-string-matching), may be matching to a regular expression, or may be approximate matching (e.g. finding substrings within a given Levenshtein distance of a reference string)
Questions tagged [string-matching]
79 questions
8
votes
3 answers
What is the expected time complexity of checking equality of two arbitrary strings?
The simple (naive?) answer would be O(n) where n is the length of the shorter string. Because in the worst case you must compare every pair of characters.
So far so good. I think we can all agree that checking equality of two equal length strings…
jtschoonhoven
- 297
- 1
- 3
- 9
8
votes
1 answer
How to perform orthogonal check on two circular binary strings?
Say we have two circular binary strings $a = a_0a_1...a_{n-1}$ and $b = b_0b_1...b_{n-1}$ with arbitary starting point, and define a and b are orthogonal if $\sum_{i=0}^{n-1}a_ib_i = 0$.
Is there a $O(nlogn)$ algorithm can tell a rotation of such…
Yiqun Sun
- 81
- 1
6
votes
1 answer
Find the 'best' longest common subsequence
I am writing a program that computes and displays diffs. I implemented Meyers algorithm that computes the LCS between 2 subsequences (seq1 and seq2); its output is one of the possible LCS and a partition of seq1 and seq2, one projection of which is…
mookid
- 169
- 3
5
votes
1 answer
Find all substrings that fit the mask with asterisks
There is a problem.
Given string $text$ containing only letters and string $mask$ containing letters and asterisks (*), where asterisk means substitution of zero or more letters, find all substrings of $text$ that fit $mask$.
There is an example:…
Elman
- 155
- 6
4
votes
2 answers
NFA models with characters on nodes, not edges
I am attempting to understand the inner workings of the open source string matching library Hyperscan.
It takes a multiple-engine approach to the problem of generating string matches, and I'm still in the early stages of following through the…
Daniel Martin
- 643
- 4
- 14
4
votes
0 answers
Remove contiguous 5th powers (5-fold repetitions) from list of 'a's and 'b's?
Given a list of characters in $\{a,b\}$, for example $abababababa$, what is the most efficient way to remove all 5th powers in a way that makes the string as short as possible?
(This example would reduce to a since the $(ab)^5$ cancels.) By 5th…
Learner of math
- 41
- 3
4
votes
1 answer
please help me understand the algorithm for building the KMP failure function
I am struggling to grasp the algorithm for building the KMP failure function. The bulk of what is making my understanding incomplete concerns the line length=PI[length-1]. There is the psuedo code for the algorithm below. Here are my questions:
1.)…
B_math
- 39
- 2
4
votes
3 answers
Sorting array of strings (with repetitions) according to a given ordering
We get two arrays:
ordering = ["one", "two", "three"]
and
input = ["zero", "one", "two", "two", "three", "three", "three", "four"];
We want to find the array output so that
output = ["one", "two", "two", "three", "three", "three", "zero",…
Pe Wu
- 143
- 5
3
votes
3 answers
By what criteria is the base value selected in Rabin Karp algorithm?
In the Rabin Karp algorithm the rolling hash is calculated as follows:
H1= c1*a^k-1 + c2*a^k-2+c3*a^k-3+…+ck*a^0
where a is a constant. On what basis is this a selected? In Cormen they have used a value 10 and at some other places it is 26. By…
Navjot Singh
- 1,215
- 1
- 9
- 26
3
votes
3 answers
Complexity of string comparison vs whitespace-trimmed string comparison
I recently worked on an algorithm which, among other things, checks strings for equality using the classic builtin equality operator:
str1 == str2
(I think it should be irrelevant to the question, but I faced this issue in C++, and str1 and str2…
Enlico
- 127
- 9
3
votes
1 answer
calculating the string similarity of an optimal alignment
description of the algorithms behavior
I have two strings s1 and s2, with $len\_s1 <= len\_s2$. I would like to find the substring of s2, that has the biggest similarity to s1. The following alignments are possible:
[s2[:i] for i in range(len_s1)] +…
maxbachmann
- 81
- 4
3
votes
0 answers
Intellij string search and highlight algorithm
I'm searching for an alogrithm that takes two strings, a query and a string that is to be searched for the query. The algorithm should result in a 'found' when the string contains the characters of the query in the right order but with any amount of…
Philip Müller
- 31
- 1
3
votes
1 answer
Given a list of strings, find every pair $(x,y)$ where $x$ is a substring of $y$. Possible to do better than $O(n^2)$?
Consider the following algorithmic problem: Given a list of strings $L = [s_1, s_2, \dots, s_n]$, we want to know all pairs $(x,y)$ where $x$ is a substring of $y$. We can assume all strings are of length at maximum $m$, where $m << n$ and are all…
securitymensch
- 81
- 4
3
votes
1 answer
one-to-many matching in bipartite graphs?
Consider having two sets $L$ (left) and $R$ (right).
$R$ nodes have a capacity limit.
Each edge $e$ has a cost $w(e)$.
I want to map each of the $L$ vertices to one node from $R$ (one-to-many matching), with minimum total edge-costs.
Each vertex in…
mcqueenvh
- 53
- 6
3
votes
1 answer
Find shortest prefix to generate original string by overlapping
Given a string $S$, I want to find the prefix string $P$ of shortest length, such that the original string $S$ can be generated by concatenating copies of $P$ (where overlapping is allowed).
For example, if $S = atgatgatatgat$, I want to find $P =…
Robert Lee
- 31
- 1