14

Given two arbitrary regular expressions, is there an "efficient" algorithm to determine whether they match the same set of strings?

More generally, can we compute the size of the intersection of the two match sets?

What algorithms are there to do this, and what complexity class do they live in?

If we disallow the Kleene star, does that alter the picture at all?

Juho
  • 22,905
  • 7
  • 63
  • 117
MathematicalOrchid
  • 897
  • 1
  • 8
  • 13

2 Answers2

15

Equivalence of regular expressions is known to be PSPACE-complete, which is rather bad. The paper "Complexity of Decision Problems for Simple Regular Expressions" lists several subclasses of regular expressions with their respective complexities. (link)

Hendrik Jan
  • 31,459
  • 1
  • 54
  • 109
15

Hendrik Jan gives a good answer for complexity class, but not an algorithm itself.

The simplest algorithm to do this that I know of is to convert the regular expression to a DFA. There are known techniques for converting a regular expression to an NFA, and an NFA to a DFA.

Once you have two DFAs, testing for equivalence is efficient and decidable, since the minimal form of a DFA is unique up to isomorphism.

However, constructing these DFAs from NFAs could take lots of time, and produce extremely large DFAS, exponentially large in the worst case.

Joey Eremondi
  • 30,277
  • 5
  • 67
  • 122