15

I am trying to understand what is meant by "deterministic" in expressions such as "deterministic context-free grammar". (There are more deterministic "things" in this field). I would appreciate an example more then the most elaborate explanation! If possible.

My primary source of confusion is from not being able to tell how this property of a grammar is different from (non-)ambiguity.

The closest I got to finding what it means is this quote from the paper by D. Knuth On the Translation of Languages from Left to Right:

Ginsburg and Greibach (1965) have defined the notion of a deterministic language; we show in Section V that these are precisely the languages for which there exists an L R ( k ) grammar

which becomes circular as soon you get to the Section V, because there it says that what LR(k) parser can parse is the deterministic language...


Below is an example that I could find to help me understand what "ambigous" means, please take a look:

onewartwoearewe

Which can be parsed as one war two ear ewe or o new art woe are we - if a grammar allows that (say it has all the words I just listed).

What would I need to do to make this example language (non-)deterministic? (I could, for example, remove the word o from the grammar, to make the grammar not ambiguous).

Is the above language deterministic?

PS. The example is from the book Godel, Esher, Bach: Eternal Golden Braid.


Let's say, we define the grammar for the example language like so:

S -> A 'we' | A 'ewe'
A -> B | BA
B -> 'o' | 'new' | 'art' | 'woe' | 'are' | 'one' | 'war' | 'two' | 'ear'

By the argument about having to parse the whole string, does this grammar make the language non-deterministic?


let explode s =
  let rec exp i l =
    if i < 0 then l else exp (i - 1) (s.[i] :: l) in
  exp (String.length s - 1) [];;

let rec woe_parser s =
  match s with
  | 'w' :: 'e' :: [] -> true
  | 'e' :: 'w' :: 'e' :: [] -> true
  | 'o' :: x -> woe_parser x
  | 'n' :: 'e' :: 'w' :: x -> woe_parser x
  | 'a' :: 'r' :: 't' :: x -> woe_parser x
  | 'w' :: 'o' :: 'e' :: x -> woe_parser x
  | 'a' :: 'r' :: 'e' :: x -> woe_parser x
  (* this line will trigger an error, because it creates 
     ambiguous grammar *)
  | 'o' :: 'n' :: 'e' :: x -> woe_parser x
  | 'w' :: 'a' :: 'r' :: x -> woe_parser x
  | 't' :: 'w' :: 'o' :: x -> woe_parser x
  | 'e' :: 'a' :: 'r' :: x -> woe_parser x
  | _ -> false;;

woe_parser (explode "onewartwoearewe");;
- : bool = true

| Label   | Pattern      |
|---------+--------------|
| rule-01 | S -> A 'we'  |
| rule-02 | S -> A 'ewe' |
| rule-03 | A -> B       |
| rule-04 | A -> BA      |
| rule-05 | B -> 'o'     |
| rule-06 | B -> 'new'   |
| rule-07 | B -> 'art'   |
| rule-08 | B -> 'woe'   |
| rule-09 | B -> 'are'   |
| rule-10 | B -> 'one'   |
| rule-11 | B -> 'war'   |
| rule-12 | B -> 'two'   |
| rule-13 | B -> 'ear'   |
#+TBLFM: @2$1..@>$1='(format "rule-%02d" (1- @#));L

Generating =onewartwoearewe=

First way to generate:

| Input             | Rule    | Product           |
|-------------------+---------+-------------------|
| ''                | rule-01 | A'we'             |
| A'we'             | rule-04 | BA'we'            |
| BA'we'            | rule-05 | 'o'A'we'          |
| 'o'A'we'          | rule-04 | 'o'BA'we'         |
| 'o'BA'we'         | rule-06 | 'onew'A'we'       |
| 'onew'A'we'       | rule-04 | 'onew'BA'we'      |
| 'onew'BA'we'      | rule-07 | 'onewart'A'we'    |
| 'onewart'A'we'    | rule-04 | 'onewart'BA'we'   |
| 'onewart'BA'we'   | rule-08 | 'onewartwoe'A'we' |
| 'onewartwoe'A'we' | rule-03 | 'onewartwoe'B'we' |
| 'onewartwoe'B'we' | rule-09 | 'onewartwoearewe' |
|-------------------+---------+-------------------|
|                   |         | 'onewartwoearewe' |

Second way to generate:

| Input             | Rule    | Product           |
|-------------------+---------+-------------------|
| ''                | rule-02 | A'ewe'            |
| A'ewe'            | rule-04 | BA'ewe'           |
| BA'ewe'           | rule-10 | 'one'A'ewe'       |
| 'one'A'ewe'       | rule-04 | 'one'BA'ewe'      |
| 'one'BA'ewe'      | rule-11 | 'onewar'A'ewe'    |
| 'onewar'A'ewe'    | rule-04 | 'onewar'BA'ewe'   |
| 'onewar'BA'ewe'   | rule-12 | 'onewartwo'A'ewe' |
| 'onewartwo'A'ewe' | rule-03 | 'onewartwo'B'ewe' |
| 'onewartwo'B'ewe' | rule-13 | 'onewartwoearewe' |
|-------------------+---------+-------------------|
|                   |         | 'onewartwoearewe' |
wvxvw
  • 1,388
  • 9
  • 13

5 Answers5

9

A PDA is deterministic, hence a DPDA, iff for every reachable configuration of the automaton, there is at most one transition (i.e., at most one new configuration possible). If you have a PDA which can reach some configuration for which two or more unique transitions may be possible, you do not have a DPDA.

Example:

Consider the following family of PDAs with $Q = \{q_0, q_1\}$, $\Sigma = \Gamma = \{a, b\}$, $A = q_0$ and $\delta$ given by the following table:

q    e    s    q'   s'
--   --   --   --   --
q0   a    Z0   q1   aZ0
q0   a    Z0   q2   bZ0
...

These are nondeterministic PDAs because the initial configuration - q_0, Z0 - is reachable, and there are two valid transitions leading away from it if the input symbol is a. Anytime this PDA starts trying to process a string that begins with an a, there's a choice. Choice means nondeterministic.

Consider, instead, the following transition table:

q    e    s    q'   s'
--   --   --   --   --
q0   a    Z0   q1   aZ0
q0   a    Z0   q2   bZ0
q1   a    a    q0   aa
q1   a    b    q0   ab
q1   a    b    q2   aa
q2   b    a    q0   ba
q2   b    b    q0   bb
q2   b    a    q1   bb

You might be tempted to say this PDA is nondeterministic; after all, there are two valid transitions away from the configuration q1, b(a+b)*, for instance. However, since this configuration is not reachable by any path through the automaton, it doesn't count. The only reachable configurations are a subset of q_0, (a+b)*Z0, q1, a(a+b)*Z0, and q2, b(a+b)*Z0, and for each of these configurations, at most one transition is defined.

A CFL is deterministic iff it is the language of some DPDA.

A CFG is unambiguous if every string has at most one valid derivation according to the CFG. Otherwise, the grammar is ambiguous. If you have a CFG and you can produce two different derivation trees for some string, you have an ambiguous grammar.

A CFL is inherently ambiguous iff it is not the language of any unambiguous CFG.

Note the following:

  • A deterministic CFL must be the language of some DPDA.
  • Every CFL is the language of infinitely many nondeterministic PDAs.
  • An inherently ambiguous CFL is not the language of any unambiguous CFG.
  • Every CFL is the language of infinitely many ambiguous CFGs.
  • An inherently ambiguous CFL cannot be deterministic.
  • A nondeterministic CFL may or may not be inherently ambiguous.
Patrick87
  • 12,924
  • 1
  • 45
  • 77
8

Here are examples (from Wikipedia):

The language of even-length palindromes over the alphabet of 0 and 1 is a non-deterministic, but unambiguous language. A grammar for this language is $S \rightarrow 0S0 | 1S1|\varepsilon$. The language is non-deterministic because you need to look at the whole string to figure out where the middle is. The grammar is unambiguous because there is one and only one parse tree for each string in the language.

A context free language is deterministic if and only if there exists at least one deterministic push-down automaton that accepts that language. (There may also be lots of non-deterministic push-down automata that accept the language, and it would still be a deterministic language.) Essentially a deterministic push-down automata is one where the machine transitions are deterministically based on the current state, the input symbol and the current topmost symbol of the stack. Deterministic here means that there is no more than one state transition for any state/input symbol/topmost stack symbol. If you have two or more next states for some state/input symbol/topmost stack symbol triple then the automaton is non deterministic. (You would need to "guess" which transition to take in order to decide whether the automaton accepts or not.)

What Knuth proved was that every LR(k) grammar has a deterministic pushdown automaton and that every deterministic pushdown automata has an LR(k) grammar. So LR(k) grammars and deterministic pushdown automata can handle the same set of languages. But the set of languages that have a deterministic pushdown automaton that accepts them is (by definition) the deterministic languages. The argument isn't circular.

So deterministic language implies that there exists an unambiguous grammar. And we've shown an unambiguous grammar that has no deterministic pushdown automaton (and thus it is an unambiguous grammar that accepts a non-deterministic language.)

Are there context free languages for which no unambiguous grammar exists? It turns out there are. An example (again from Wikipedia) is the union of $\{a^nb^mc^md^n|n,m>0\}$ and $\{a^nb^nc^md^m|n,m>0\}$. Each of the sets individually is obviously context free and the union of context free languages is context free. Strings of the form $\{a^nb^nc^cd^n|n>0\}$ are obviously in this language (in fact that's the intersection of the two languages) and Hopcroft and Ullman proved that no matter what grammar you come up with for the union language, there will be some string in the intersection set that has two different parse trees.

Wandering Logic
  • 17,863
  • 1
  • 46
  • 87
5

Deterministic context-free languages are those which are accepted by some deterministic pushdown automaton (context-free languages are those accepted by some non-deterministic pushdown automaton). As such, it's a property of a language rather than of a grammar. In contrast, ambiguity is a property of a grammar, while inherent ambiguity is a property of a language (a context-free language is inherently ambiguous if every context-free grammar for the language is ambiguous).

There is a connection between the two definitions: deterministic context-free languages are never inherently ambiguous, as shown in the answer to this question.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
1

Definitions

  1. A deterministic pushdown acceptor (DPDA) is a pushdown automaton that never has a choice in its move.
  2. DPDA and NPDA are not equivalent.
  3. A CFG is non deterministic iff there are at least two productions with same terminal prefix on the right side of them.
  4. A CFG is ambiguous iff there exists some w ∈ L(G) that has at least two distinct derivation trees. Thus, it has two or more leftmost or rightmost derivations corresponding to two different derivation trees.
  5. A CFG is unambiguous iff every string has at most one valid derivation according to the CFG. Otherwise, the grammar is ambiguous.
  6. A CFL is inherently ambiguous iff it is not the language of any unambiguous CFG. It cannot have any DPDA.
    If every grammar that generates CFL is ambiguous, then the CFL is called inherently ambiguous. Thus it is not the language of any unambiguous CFGs.

Facts

  1. Every CFL is the language of infinitely many nondeterministic PDAs.
  2. Every CFL is the language of infinitely many ambiguous CFGs.
  3. A CFL accepted by some DPDA is not inherently ambiguous. (There exist at least one unambiguous CFG for it.)
  4. A CFL accepted by NDPDA may or may not be inherently ambiguous as there may exist some DPDA (or unambiguous CFG) for it.
  5. A CFL generated by ambiguous CFG may or may not be inherently ambiguous as there may exist some unambiguous CFG (or DPDA) for it.
  6. A CFL generated by at least one unambiguous CFG is not inherently ambiguous. (There exist some DPDA for it.)
  7. A non deterministic grammar may or may not be ambiguous.

Answer to your question (relation between determinism and ambiguousness)

  1. (Non) ambiguity applies mainly to the grammars (here CFGs). (Non) determinism applies to both grammars and automaton (here PDAs).

    If you want logical differences, you can look at last four points in facts section as they try to relate both ambiguity and determinism. Here I am repeating them again:

  2. A CFL accepted by some deterministic PDA is not inherently ambiguous. (There exist at least one unambiguous CFG for it.)

  3. A CFL accepted by non deterministic PDA may or may not be inherently ambiguous as there may exist some DPDA (or unambiguous CFG) for it.
  4. A CFL generated by ambiguous CFG may or may not be inherently ambiguous as there may exist some unambiguous CFG (or deterministic PDA) for it.
  5. A CFL generated by at least one unambiguous CFG is not inherently ambiguous. (There exist some DPDA for it.)
  6. A non deterministic grammar may or may not be ambiguous.

PS:

  1. The accepted answer uses lines likes “CFL is deterministic”, “deterministic CFL”, “CFL cannot be deterministic”, “A non deterministic CFL”. I guess the adjectives “deterministic” and “ambiguous” does not apply to CFL, but to PDA and CFG.(Though the adjective “inherently ambiguous” applies to CFL) Though I dont want to criticize the original answer, as I myself learnt crucial points from it. (In fact I have literally copy pasted some lines from that answer.) But still I felt it should be made more correct. So I tried to put stuffs more clearly here in two parts definitions and facts (I might have made it unnecessarily verbose and long). I guess I should have edited the original answer, but then it will involve deleting many points that use above lines. And I dont know if this will make it any valid edit as it involve complete rewrite.
  2. Notice that I have put a quantitative words in bold-italics to highlight comparative differences in different definitions and facts. The definition terms are only boldfaced.
  3. Some couple of points I have made myself, so I will need confirmation from someone knowledgeable here about correctness of every point.
Mahesha999
  • 1,773
  • 7
  • 30
  • 45
1

Perhaps an example would help here. Consider the language of palindromic strings on $\{a,b\}$, i.e.: $\{ w \in (a+b)^* \mid w = w^R \}$. This is generated by the grammar $S\to aSa | bSb | a | b | \epsilon$. This is an unambiguous grammar for an unambiguous language. A PDA accepting this language would simply push $a$'s and $b$'s until it locates the middle of the string, possibly eating one $a$ or $b$ (for odd-length strings), then matching $a$'s and $b$'s by reading input and popping stack until both are empty. But there is no way to make a DETERMINISTIC PDA for this language, because there is no way for the PDA to locate the center of the string without guessing.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
PMar
  • 11
  • 1