I have been given the following set of production rules $P$ for a grammar $G(A) = \newcommand{\perm}[1]{\left\langle #1 \right\rangle}\perm{\Sigma, V, P, S}$, generated from a pushdown automaton $A = \perm{Q, \Sigma, \Gamma, \delta, q_0, F}$, that recognizes the language $\newcommand{\lang}{\mathcal L}\lang = \newcommand{\set}[1]{\left\{ #1 \right\}}\newcommand{\Nset}{\mathbb{N}}\set{0^n1^n \mid n \in \Nset \cup \set{0}}$:
\begin{align*} \newcommand{\rewrite}{\longrightarrow} \perm{q, Sp} \in \delta(s,\epsilon, \epsilon) &: A(s, \epsilon, f) \rewrite A(q, S, q) A( q, p, f ) \\ \perm{q, \epsilon} \in \delta(q, 0, 0) &: A(q, 0, q) \rewrite 0 \\ \perm{q, \epsilon} \in \delta(q, 1, 1) &: A(q, 1, q) \rewrite 1 \\ \perm{q, 0S1} \in \delta(q, \epsilon, S) &: A(q, S, q) \rewrite A(q, 0, q) A(q, S, q) A(q, 1, q) \\ \perm{q, \epsilon} \in \delta(q, \epsilon, S) &: A(q, S, q) \rewrite \epsilon \\ \perm{f, \epsilon} \in \delta(q, \epsilon, S) &: A(q, p, f) \rewrite \epsilon, \end{align*} In addition to these, we have the following productions related to the initial and accepting states $s$ and $f$ of the automaton $A$: $$ S \rewrite A(s, \epsilon, s), \quad A(s, \epsilon, s) \rewrite \epsilon \quad\text{and}\quad S \rewrite A(s, \epsilon, f)\,. $$
I'm supposed to transform this grammar into Chomsky normal form, but to this end I would need to first find out which of the variables or non-terminals $v \in V$ are generating, as in a word of the language $\lang$ can be derived from them: $\newcommand{\derive}{\Longrightarrow} v \derive_G^\ast x \in \lang$.
The following algorithm to find out whether a variable $V_i$ is generating can be used:
- Mark all alphabet of the language as generating (which makes sense as there is no non-empty language without the alphabet).
- Mark all variables $V_i$ in the rules $V_i\rewrite v$, where the string $v$ only contains generating symbols as generating.
- Repeat step 2 until all symbols have all either been marked as generating or not.
Now obviously the variables $A(q, 0, q)$ and $A(q, 1, q)$ are generating, as the alphabet $0$ and $1$ are generating. But what about the variables $S$, $A(s,\epsilon,s)$, $A(q, S, q)$ and $A(q, p, f)$? The empty string $\epsilon$ is obviously a part of the language $\lang$, but can it be considered as a part of the alphabet $\Sigma$?
If it can, then the symbols $A(s, \epsilon, s)$, $S$ and $A(q, p, f)$ are also generating. But what about $A(q, S, q)$? It is used in its own definition, so can it be considered generating? If this is the case, then there would be no simplifying the production rules.
But is this the case?