Given regular expression construct regex for the complement language

Question

Disclamer: this is my uni assignment, which is rated comparatively low, thus I assume that the answer should be simple. Hints are appreciated (as opposed to direct answers).

Write an algorithm which accepts a regular expression $r$ and produces a language $\overline{L[r]}$.

I think it is reasonable to assume that all operations I need to consider are just the concatenation, union and Kleene star. For simplicity, I assumed the alphabet to be $\{a,b,c\}$. I also think that I can invert individual operations as follows:

$a^* \to (a^*(b+c))^+$.
$(a+b) \to (\epsilon+(c(a+b+c)^*))$.
$ab \to \epsilon+((aa+b+c)(a+b+c)^*)$.

But attempting to combine these operations produces wrong results, for example:

$$ a^*b^* \to \\ (a^*(b+c))^+(b^*(a+c))^+ \to \\ (\epsilon+((a^*(b+c))^+(a^*(b+c))^++b+c)(a+b+c)^*)(\epsilon+((b^*(a+c))^++a+c)(a+b+c)^*)) $$

doesn't seem to do what I expect it to because, for example, it will match the empty string on both sides.

Should I perhaps consider transforming the regexp into DFA and "inverting" the DFA instead?

score 5 · Accepted Answer · answered Sep 09 '15 at 10:31

The problem with your attempt is that you've only looked at what happens when the regexp to transform is a simple one. For example, you've looked at what the complement of $a^*$ looks at. But in order to write a compositional complementation algorithm, you need to figure out how to compute a regular expression that recognizes the complement of the language of $r^*$, for an arbitrary regular expression $r$.

The bad news is that your approach to build a complement for $a^*$ does not generalize easily. Consider for example the regular expression $(a + b + c)$. A regular expression that recognizes its complement language is $\epsilon + (a + b + c) (a + b + c) (a + b + c)^*$ — but this isn't really useful to know if you're looking for the complement of $(a + b + c)^*$, which is the empty language. You can't reach the complement of $(a+b+c)^*$ from starring something related to the complement of $(a+b+c)$.

You can approach this problem by adding a complement operator to regular expressions, in which case the algorithm you're looking for is an algorithm to remove the complement operator from such generalized regular expressions. It turns out that this brings you close to an open problem, the generalized star height problem: it's unknown whether you can eliminate nested stars in such generalized regular expressions, which shows that the interaction between the star operator and the complement operator is poorly understood.

Your idea of transforming the regular expression into a DFA and back is a good one. It's perfectly reasonable a different representation of an object where the operation you're interested in is easier. In fact, this is the usual algorithm to find a regular expression for the complement language of a regular expression:

Build a DFA that recognizes the language of the regular expression.
Transform the DFA into one that recognizes the complement language.
Build a regular expression whose language is that of the second DFA.

Step 2 is very easy: just swap accepting and non-accepting states. Note that this doesn't work on an NFA!

Given regular expression construct regex for the complement language

1 Answers1

Linked