Regular expressions with backreferences over unary alphabet

Question

Setting:

regular expressions with backreferences
unary language (1-symbol alphabet)

Is the following problem decidable in this setting:

Given a regular expression with backreferences, does it define a regular language?

For example, (aa+)\1 defines a regular language, while (aa+)\1+ doesn't. Can we decide which one is the case?

For concreteness, "regular expressions with backreferences" here refer to e.g. the following subset of the usual Perl-compatible regular expressions:

a matches character a (the only character in the alphabet)
X* matches 0 or more occurrences of X
X|Y matches X or Y
parentheses can be used for grouping and capturing
\1. \2, etc. match the same string as the 1st, 2nd, etc. pair of parentheses

We can also use the normal shorthands e.g. X+ = XX*.

score 4 · Accepted Answer · edited Aug 27 '18 at 13:57

4

Evidence against the effective decidability of the problem is provided by the construction in the proof of Theorem 9 in my paper On Practical Regular Expressions: You could determine if there are finitely many Fermat primes.

edited Aug 27 '18 at 13:57

David Richerby

82,470
26
145
239

answered Aug 26 '18 at 19:48

Holger Petersen

76
2

Regular expressions with backreferences over unary alphabet

1 Answers1