34

My problem is how can I prove that a grammar is unambiguous? I have the following grammar: $$S → statement ∣ \mbox{if } expression \mbox{ then } S ∣ \mbox{if } expression \mbox{ then } S \mbox{ else } S$$

and make this to an unambiguous grammar, I think its correct:

  • $ S → S_1 ∣ S_2 $

  • $S_1 → \mbox{if } expression \mbox{ then } S ∣ \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_1$

  • $S_2 → \mbox{if } expression \mbox{ then } S_2 \mbox{ else } S_2 ∣ statement$

I know that a unambiguous grammar has one parse tree for every term.

Raphael
  • 73,212
  • 30
  • 182
  • 400
user1594
  • 541
  • 2
  • 6
  • 9

4 Answers4

27

There is (at least) one way to prove unambiguity of a grammar $G = (N,T,\delta,S)$ for language $L$. It consists of two steps:

  1. Prove $L \subseteq \mathcal{L}(G)$.
  2. Prove $[z^n]S_G(z) = |L_n|$.

The first step is pretty clear: show that the grammar generates (at least) the words you want, that is correctness.

The second step shows that $G$ has as many syntax trees for words of length $n$ as $L$ has words of length $n$ -- with 1. this implies unambiguity. It uses the structure function of $G$ which goes back to Chomsky and Schützenberger [1], namely

$\qquad \displaystyle S_G(z) = \sum_{n=0}^\infty t_nz^n$

with $t_n = [z^n]S_G(z)$ the number of syntax trees $G$ has for words of length $n$. Of course you need to have $|L_n|$ for this to work.

The nice thing is that $S_G$ is (usually) easy to obtain for context-free languages, although finding a closed form for $t_n$ can be difficult. Transform $G$ into an equation system of functions with one variable per nonterminal:

$\qquad \displaystyle \left[ A(z) = \sum\limits_{(A, a_0 \dots a_k) \in \delta} \ \prod\limits_{i=0}^{k} \ \tau(a_i)\ : A \in N \right] \text{ with } \tau(a) = \begin{cases} a(z) &, a \in N \\ z &, a \in T \\ \end{cases}.$

This may look daunting but is really only a syntactical transformation as will become clear in the example. The idea is that generated terminal symbols are counted in the exponent of $z$ and because the system has the same form as $G$, $z^n$ occurs as often in the sum as $n$ terminals can be generated by $G$. Check Kuich [2] for details.

Solving this equation system (computer algebra!) yields $S(z) = S_G(z)$; now you "only" have to pull the coefficient (in closed, general form). The TCS Cheat Sheet and computer algebra can often do so.


Example

Consider the simple grammar $G$ with rules

$\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$.

It is clear that $\mathcal{L}(G) = \{ww^R \mid w \in \{a,b\}^*\}$ (step 1, proof by induction). There are $2^{\frac{n}{2}}$ palindromes of length $n$ if $n$ is even, $0$ otherwise.

Setting up the equation system yields

$\qquad \displaystyle S(z) = 2z^2S(z) + 1$

whose solution is

$\qquad \displaystyle S_G(z) = \frac{1}{1-2z^2}$.

The coefficients of $S_G$ coincide with the numbers of palindromes, so $G$ is unambiguous.


  1. The Algebraic Theory of Context-Free Languages by Chomsky, Schützenberger (1963)
  2. On the entropy of context-free languages by Kuich (1970)
Raphael
  • 73,212
  • 30
  • 182
  • 400
7

For some grammars, a proof by induction (over word length) is possible.


Consider for example a grammar $G$ over $\Sigma = \{a,b\}$ given by the following rules:

$\qquad \displaystyle S \to aSa \mid bSb \mid \varepsilon$

All words of length $\leq 1$ in $L(G)$ -- there's only $\varepsilon$ -- have only one left-derivation.

Assume that all words of length $\leq n$ for some $n \in \mathbb{N}$ have only one left-derivation.

Now consider arbitrary $w = w_1 w' w_n \in L(G) \cap \Sigma^n$ for some $n > 0$. Clearly, $w_1 \in \Sigma$. If $w_1 = a$, we know that the first rule in every left-derivation has to be $S \to aSa$; if $w_1 = b$, it has to be $S \to bSb$. This covers all cases. By induction hypothesis, we know that there is exactly one left-derivation for $w'$. In combination, we conclude that there is exactly one left-derivation for $w$ as well.


This becomes harder if

  • there are multiple non-terminals,
  • the grammar is not linear, and/or
  • the grammar is left-recursive.

It may help to strengthen the claim to all sentential forms (if the grammar has no unproductive non-terminals) and "root" non-terminals.

I think the conversion to Greibach normal form maintains (un)ambiguity, to applying this step first may take care of left-recursion nicely.

The key is to identify one feature of every word that fixes (at least) one derivation step. The rest follows inductively.

Raphael
  • 73,212
  • 30
  • 182
  • 400
6

This is a good question, but some Googling would have told you that there is no general method for deciding ambiguity, so you need to make your question more specific.

reinierpost
  • 6,294
  • 1
  • 24
  • 40
3

Basically, it's a child generation problem. Start with the first expression, and generate it's children .... Keep doing it recursively (DFS), and after quite a few iterations, see if you can generate the same expanded expression from two different children. If you are able to do that, it's ambiguous. There is no way to determine the running time of this algorithm though. Assume it's safe, after maybe generating 30 levels of children :) (Of course it could bomb on the 31st)