Regex for $L = \{ w \mid w \in \Sigma^* \text{ and each substring } u \text{ of } w \text{ where } |u| = 4 \text{ contains the character } 0 \}$

Question

The question asks to write a regex to the following language $L$ above $\Sigma = \left \{0,1 \right \}$.

$L = \{ w \mid w \in \Sigma^* \text{ and each substring } u \text{ of } w \text{ where } |u| = 4 \text{ contains the character } 0 \}$

Note that if $ |w| \leq 3$ then there is no substring of length 4, so there is no need for the string to contain 0.

I came up with the following regular expression:

$r = (0 \Sigma^3 + \Sigma 0 \Sigma^2 + \Sigma^2 0 \Sigma + \Sigma^3 0)^* (\epsilon + \Sigma + \Sigma^2 + \Sigma^3) : \Sigma = (0 + 1)$

It is a wrong answer because for example the word $0111101$ can be generated by $r$.

I tried to convert the NFA in the picture to regex, but the regex was too long and it's missing the point of the question because I suppose to generate the regex by understanding what characteristic a word in $L$.

I came up to the conclusion that words in $L$ can't have sustring $1111$ but I don't know how to use it in order to create a regex.

It seems I'm failing short from the solution and I would like to know how I can transform $r$ to the required regex.

NFA for <span class= $L$" />

codeR · Accepted Answer · 2024-04-19T07:48:30.370

Basically, your language asks you to restrict the run length of 1s to at most three. Here's how you can construct the required regex:

Step 1: Construct a DFA $M$ that accepts all strings that have at least one run of 1s or length four (or more). It should have five states with a single final state.

Step 2: Construct the complement $M'$ of $M$ by alternating the final and non-final status of the states.

Step 3: Build the regex for $M'$ using Arden's theorem.

If you want to directly derive the regex, here's how you can proceed: after the consecutive three 1s, you must put a zero if it is not the end of the string. But a zero may appear anywhere; there is no restriction on it. Thus, the regex is: $(0+10+110+1110)^*(\epsilon+1+11+111)$

Regex for $L = \{ w \mid w \in \Sigma^* \text{ and each substring } u \text{ of } w \text{ where } |u| = 4 \text{ contains the character } 0 \}$

1 Answers1