20

Setting:

  • regular expressions with backreferences
  • unary language (1-symbol alphabet)

Is the following problem decidable in this setting:

  • Given a regular expression with backreferences, does it define a regular language?

For example, (aa+)\1 defines a regular language, while (aa+)\1+ doesn't. Can we decide which one is the case?


For concreteness, "regular expressions with backreferences" here refer to e.g. the following subset of the usual Perl-compatible regular expressions:

  • a matches character a (the only character in the alphabet)
  • X* matches 0 or more occurrences of X
  • X|Y matches X or Y
  • parentheses can be used for grouping and capturing
  • \1. \2, etc. match the same string as the 1st, 2nd, etc. pair of parentheses

We can also use the normal shorthands e.g. X+ = XX*.

Raphael
  • 73,212
  • 30
  • 182
  • 400
Jukka Suomela
  • 2,036
  • 16
  • 26

1 Answers1

4

Evidence against the effective decidability of the problem is provided by the construction in the proof of Theorem 9 in my paper On Practical Regular Expressions: You could determine if there are finitely many Fermat primes.

David Richerby
  • 82,470
  • 26
  • 145
  • 239