3

I have a Context-Free Grammar (CFG) G which has a S for generating a language L. Now to produce a grammar for L*, another variable T (which is not present in the variable set of G) is taken and the grammar is produced in the following way:

                                 G' = (V, A, T, P)  

where,

  G'.V = G.V U {T} (V is the variable set) 
  G'.A = G.A       (A is the alphabet here, I was't able to insert the symbol of capital sigma here)
  G'.T              is the starting variable
  G'.P = G.P U { T -> ST | NULL }   (P is a finite relation from V to (A U V)* )

My question is that why can't we use S-> SS | NULL instead of introducing T?

P.S. : This is my first question so apologies for incorrect formatting. Also I am new to Theory of Computation so a basic and simple explanation will help me more. Thank you.

killua
  • 33
  • 5

1 Answers1

3

In the general case, your grammar generates more words than those in $L^*$. Consider this grammar:

$$ \begin{array}{l} S \to A \\ A \to (S)\ |\ a \end{array} $$

The above grammar generates $L = \{ a, (a), ((a)), \ldots \}$.

Using Kleene's star, we get $L^* = \{a,aa,a(a), ((a))(a), \ldots \}$. Note that $(aa)\notin L^*$.

However, if we add your proposed productions, we get

$$ \begin{array}{l} S \to A\ |\ SS\ |\ \epsilon \\ A \to (S)\ |\ a \end{array} $$

whose language is $L' = \{a,(aa),((a)a)(a), \ldots \}$, and this is a strict superset of $L^*$.

Your construction looks correct in the cases where $S$ never occurs in the right-hand side of a production in the original grammar.

chi
  • 14,704
  • 1
  • 31
  • 40