Questions tagged [strings]

Questions about sequences of symbols, sets thereof and their properties as well as uses.

512 questions
43
votes
2 answers

Efficient data structures for building a fast spell checker

I'm trying to write a spell-checker which should work with a pretty large dictionary. I really want an efficient way to index my dictionary data to be used using a Damerau-Levenshtein distance to determine which words are closest to the misspelled…
Charles Menguy
  • 1,193
  • 1
  • 10
  • 12
32
votes
5 answers

Finding interesting anagrams

Say that $a_1a_2\ldots a_n$ and $b_1b_2\ldots b_n$ are two strings of the same length. An anagramming of two strings is a bijective mapping $p:[1\ldots n]\to[1\ldots n]$ such that $a_i = b_{p(i)}$ for each $i$. There might be more than one…
29
votes
2 answers

Longest Repeated (Scattered) Subsequence in a String

Informal Problem Statement: Given a string, e.g. $ACCABBAB$, we want to colour some letters red and some letters blue (and some not at all), such that reading only the red letters from left to right yields the same result as reading only the blue…
28
votes
1 answer

Is there a 'string stack' data structure that supports these string operations?

I'm looking for a data structure that stores a set of strings over a character set $\Sigma$, capable of performing the following operations. We denote $\mathcal{D}(S)$ as the data structure storing the set of strings $S$. Add-Prefix-Set on…
Alex ten Brink
  • 9,206
  • 3
  • 36
  • 63
27
votes
2 answers

Efficient map data structure supporting approximate lookup

I'm looking for a data structure that supports efficient approximate lookups of keys (e.g., Levenshtein distance for strings), returning the closest possible match for the input key. The best suited data structure I've found so far are…
merijn
  • 409
  • 4
  • 6
25
votes
1 answer

Compression of domain names

I am curious as to how one might very compactly compress the domain of an arbitrary IDN hostname (as defined by RFC5890) and suspect this could become an interesting challenge. A Unicode host or domain name (U-label) consists of a string of Unicode…
21
votes
1 answer

Does every large enough string have repeats?

Let $\Sigma$ be some finite set of characters of fixed size. Let $\alpha$ be some string over $\Sigma$. We say that a nonempty substring $\beta$ of $\alpha$ is a repeat if $\beta = \gamma \gamma$ for some string $\gamma$. Now, my question is whether…
Alex ten Brink
  • 9,206
  • 3
  • 36
  • 63
19
votes
1 answer

How does the runtime of the Ukkonen's algorithm depend on the alphabet size?

I am concerned with the question of the asymptotic running time of the Ukkonen's algorithm, perhaps the most popular algorithm for constructing suffix trees in linear (?) time. Here is a citation from the book "Algorithms on strings, trees and…
17
votes
3 answers

dynamic programming exercise on cutting strings

I have been working on the following problem from this book. A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of…
Mark
  • 373
  • 1
  • 3
  • 7
15
votes
2 answers

Why is the base used to compute hashes in Rabin–Karp always primes?

The Rabin–Karp string matching algorithm requires a hash function which can be computed quickly. A common choice is $$ h(x_0\ldots x_n) = \sum_{i=0}^n b^i x_i, $$ where $b$ is prime (all computations are module $2^w$, where $w$ is the width of a…
Saurabh Jain
  • 291
  • 2
  • 7
14
votes
2 answers

Comparison between Aho-Corasick algorithm and Rabin-Karp algorithm

I am working on string searching algorithms that support multiple pattern search. I found two algorithms that seem like the strongest candidates in terms of running time, namely Aho-Corasick and Rabin-Karp. However, I could not find any…
Hawk
  • 241
  • 3
  • 7
13
votes
7 answers

How to check if two strings are permutations of each other using O(1) additional space?

Given two strings how can you check if they are a permutation of each other using O(1) space? Modifying the strings is not allowed in any way. Note: O(1) space in relation to both the string length AND the size of the alphabet.
Teodor Dyakov
  • 1,341
  • 1
  • 13
  • 22
13
votes
5 answers

Word Frequency with Ordering in O(n) Complexity

During an interview for a Java developer position, I was asked the following: Write a function that takes two params: a String representing a text document and an integer providing the number of items to return. Implement the function such…
user2712937
  • 131
  • 1
  • 1
  • 3
12
votes
1 answer

Finding the longest repeating subsequence

Given a string $s$, I would like to find the longest repeating (at least twice) subsequence. That is, I would like to find a string $w$ which is a subsequence (doesn't have to be a contiguous) of $s$ such that $w=w' \cdot w' $. That is, $w$ is a…
12
votes
1 answer

Edit distance of list with unique elements

Levenshtein-Distance edit distance between lists is a well studied problem. But I can't find much on possible improvements if it is known that no element does occurs more than once in each list. Let's also assume that the elements are…
user362178
  • 221
  • 1
  • 5
1
2 3
34 35