Finding Language of a CFG

Question

Say you are given the following CFG $G$: $$ S \to S_1 \mid S_2 \\ S_1 \to AbAS_1c \mid \epsilon \\ S_2 \to BaBS_2c \mid \epsilon \\ A \to Aa \mid \epsilon \\ B \to Bb \mid \epsilon $$

What is $L(G)$?

So far I've derived the following regular expressions:

$ S \rightarrow (a^*ba^*)^*c \mid (b^*ab^*)^*c$

So far I've come up with this $L(G)$:

$L(G) = \{ (a^nba^n)^qc \mid (b^nab^n)^*q : n,q \geq 0 \}$

When you approach the second $S_1$ do you include the $c$ ( As in, finish $S_1c$ and the go recursive)?

Hendrik Jan · Answer 1 · 2016-07-01T11:03:13.803

Let's consider "half" of your problem: $S\to AbASc \mid \epsilon$, and $A\to aA \mid \epsilon$. It is important to realize that the number of $b$'s and $c$'s generated is equal. Try to express that in your solution.

At the same time, when two $A$'s are generated, they will generate an arbitrary number of $a$'s. One of your solutions claims these numbers are always the same, but they are not.

Personally I would prefer to write this language as $L(G_{\frac12}) = \{ wc^k \mid w\in\{a,b\}^*, k\ge1, \#_b(w) = k\} \cup \{\epsilon\}$, where $\#_b(w)$ is the notation for the number of occurrences of letter $b$ in the string $w$. Alternatively, if you want to avoid that notation perhaps $L(G_{\frac12}) = \{ a^{n_0}ba^{n_1}b\dots ba^{n_k}c^k \mid k\ge1, n_0,\dots,n_k\ge 0\}\cup \{\epsilon\}$.

It is not immediate how to write this using "regular expressions" as the equality of the number of $b$'s and $c$'s is not regular property itself. Also we cannot put the star-operator inside set brackets, as the star yields a language itself, so it would be proper to write $L(G_{\frac12}) = \bigcup_{k\ge1} (a^*b)^ka^*c^k \cup \{\epsilon\} = \bigcup_{k\ge0} (a^*ba^*)^kc^k $ .

Sorry about the $\dots \cup \{\epsilon\}$ in several of these solutions. Allowing $k=0$ would not give the right answer, as that would add $a^*$ to the language, which is not generated by the grammar.

Chul Hyun Kim · Accepted Answer · 2016-06-29T02:14:06.127

To find the language for the grammar,

You need to understand how does the recursiveness in production rules work.

In solving A->Aa|epsilon , you need to know that epsilon works as a stopper for a recursive production, and determine the number of times the recursive production occurred for the given production rule.

In solving A->Aa|epsilon to get the expression for A made up of terminals , one of the ways is to keep doing with the production rule. After you do that you consider the epsilon which determines the number of times the production rule have occurred, and use that result to express the A concisely.

A->Aa->Aaa->Aaaa ... you see that the a is produced recursively and epsilon makes A to stop at any right arrow, so you can substitute A with a*. As asterisk is defined as n>=0 and n is an integer.

It is same for solving S1 and S2.

First you use the conclusion that A can be substituted with a* as they are same expressions, then AbA(S1)c is same as (a*)b(a*)(S1)(c) keep doing that production rule again and again until you find the regularity in them. S1->(a*)b(a*)S1c->(a*)b(a*)(a*)b(a*)S1cc->(a*)b(a*)(a*)b(a*)(a*)b(a*)S1ccc (You know that c occurs every time the production rule occurs)

Then you see that there is (a*)(a*) going on here and you can substitute this with just a* because they are exactly the same expression. let the left asterisk's number of recurrence time as L and right asterisk's number of recurrence time as R. We already know that L and R both satisfy the condition L>=0 , R>=0 and L,R are both integers. Then let S=L+R, then the minimum value for S is 0 because L and R has the minimum value that are both equal to zero, and since the +operator is closed on the integer set, we conclude that S>=0 which means minimum value for S is equal to zero. And S is integer, which is exactly equal to the asterisk's definition. since L+R for the expression (a*)(a*) is equal to the T which is the number of recurrence time for a in (a*). so (a*) is exactly the same expression as (a*)(a*).

Then S1->(a*)b(a*)S1c->(a*)b(a*)(a*)b(a*)S1cc->(a*)b(a*)(a*)b(a*)(a*)b(a*)S1ccc

(You now know that c is produced every time you apply the rule so this also answers your second question.)

becomes S1->(a*)b(a*)S1c->(a*)b(a*)b(a*)S1cc->(a*)b(a*)b(a*)b(a*)S1ccc->(a*)b(a*)b(a*)b(a*)b(a*)S1cccc....

You find the expression of S1 is ((a*)b)^n(a*)c^n since ((a*)b) and c keeps recurring every time you apply production rule for S1. or you can write (a*)(b(a*))^nc^n as (b(a*)) and c keeps recurring.

you can do this same for the S2. and the result should look like this. ((b*)a)^n(b*)c^n or (b*)(a(b*))^nc^n

Since S->S1|S2 you finally get L(G)

L(G)={(a*)(b(a*))^nc^n | ((b*)a)^n(b*)c^n | epsilon : n>=1 and n is an integer}

Finding Language of a CFG

2 Answers2

Linked