Questions tagged [exact-string-matching]
24 questions
8
votes
2 answers
Minimal regular expression that matches a given set of words
I have a dictionary-like regular expression, an "or chain" of words,
word1|word2|word3|...
Unfortunately, the chain is too large. I'd like to find the minimal regular expression that is equivalent. How can do I do that?
You should think of this…
Peter Krauss
- 151
- 8
7
votes
3 answers
Is there a data structure for efficiently searching a string that contains a given substring?
This question arose from a practical problem: given a set of texts, find one, which contains a given string (not word).
Let $S$ be a set of $n$ strings, and $l$ the length of the longest string in $S$. What will be the best data structure to…
Somnium
- 285
- 2
- 11
7
votes
1 answer
How does the Galil's rule work on Boyer-Moore algorithm?
I would like to know how Boyer-Moore text searching algorithm with Galil's rule works,. I tried to search for but I couldn't understand the information I found, for example this Wikipedia page.
And why with this rule we go to a linear time…
Andi Domi
- 73
- 7
6
votes
1 answer
String matching algorithm - check if a string matches a pattern
This looks like quite the challenge; given a pattern $P$ (of length $n$) and a string $S$ (of length $m$), how would you check whether the string matches the pattern? For instance:
If $P$ = "xyx" and $S$ = "foobarfoo" then $S$ matches $P.$
If $P$…
Mathguy
- 411
- 5
- 14
4
votes
1 answer
Runtime of good suffix table creation in Boyer-Moore algorithm
According to Wikipedia, both bad character table and good suffix table can be created in $O(n)$ time, where $n$ is the length of the pattern. It is pretty obvious how bad character table can be computed in linear time, but I don't understand how…
nlogn
- 143
- 4
4
votes
4 answers
Substring in a infinite sequence of numbers
I have an infinite sequence of numbers, starting from 1 and need to find position of begin of some given substring of numbers.
Example:
1234567891011121314151617181920 ...
S = 141
Result: 18
All i think about is convert sequence to string and find…
fryme
- 41
- 3
3
votes
3 answers
Complexity of string comparison vs whitespace-trimmed string comparison
I recently worked on an algorithm which, among other things, checks strings for equality using the classic builtin equality operator:
str1 == str2
(I think it should be irrelevant to the question, but I faced this issue in C++, and str1 and str2…
Enlico
- 127
- 9
3
votes
1 answer
Why does the exact string matching brute force algorithm not compare index 1 of P with index 1 of S in the first round of the for?
In my ADS course we were given this pseudo code for the "exact string matching brute force" algorithm:
1 ESM-BF(P, S)
2 m = length(P), n = length(S)
3 k = 0 # number of matches
4 for j=1,...,n-m+1do
5 i=1
6 while i ≤ m and P[i]…
ilam engl
- 151
- 6
3
votes
1 answer
Is there any neutral element for the cryptographic hash function SHA256? (or its variants)
My question is the following:
Is it possible to compute a string given that after applying a SHA256 function the result is the same string?
Edit for clarification:
If my string A is a neutral element of SHA256, then:
A == SHA256(A) is true.
Does A…
L. Fernandez
- 31
- 2
2
votes
0 answers
Bad character rule in the Apostolico–Giancarlo algorithm
In the paper "Tight bounds on the complexity of the Apostoliko-Giancarlo algorithm" by Crochemore and Lecroq authors prove that algorithm performs not more than $1.5n$ comparison of characters in the processing stage. If I understand their proof…
Nikita Sivukhin
- 151
- 6
2
votes
0 answers
Why is the second while loop in KMP not a conditional statement?
When building the partial match table for KMP:
void buildBackTable() {
int i = 0, j = -1; b[0] = -1;
while (i < m) {
while (j >= 0 && P[i] != P[j]) j = b[j]; //Why is this a while loop!?
i++; j++;
b[i] = j;
}
}…
netwalkergreen
- 21
- 1
2
votes
0 answers
How to build a minimal string matching DFA with limited memory?
I am working on finite state machine pruning, a problem that requires me to build finite state machines (in the manner of the Aho Corasic algorithm) to match an evolving input string against a set of suffixes. If a match occurs, the search is…
fuz
- 913
- 6
- 20
2
votes
1 answer
Information-theoretic lower bound for succinct string dictionary of the Unicode Name property
Background
The literature on succinct data structures refers often to the “information-theoretic lower bound” of encoding data, i.e., the minimum number of bits needed to store the data – a concept related to information-theory entropy. For…
jschoi
- 123
- 5
2
votes
1 answer
What is the SetHorspool string searching algorithm and how is it implemented?
What is the SetHorspool string searching algorithm with pseudo-code so it can be easily implemented in a language of choice?
This has been implemented in 2 libraries I have come…
2
votes
2 answers
Sliding Window Dictionary String Matching
Consider the following problem. We are given a set of patterns (strings) $\Pi = \{\pi_i\}$, a text $s$, and a window length $k$. We want a list of all shifts $0 \le i \le |s|-k$ such that every pattern in $\Pi$ is contained in the substring…
dysonsfrog
- 227
- 1
- 4