9

In the article Parsing Expressions by Recursive Descent by Theodore Norvell (1999) the author starts with the following grammar for arithmetic expressions:

E --> E "+" E | E "-" E | "-" E | E "*" E | E "/" E | E "^" E | "(" E ")" | v

which is quite bad, because it's ambiguous and left-recursive. So he starts from removing the left recursion from it, and his result is as such:

E --> P {B P}
P --> v | "(" E ")" | U P
B --> "+" | "-" | "*" | "/" | "^"
U --> "-"

But I can't figure out how did he get to this result. When I try to remove the left recursion myself, I'm doing it the following way:

  1. Firs, I group together the productions which doesn't have left recursion in one group, and other (left-recursive) in another group:

    E --> E "+" E | E "-" E | E "*" E | E "/" E | E "^" E     // L-recursive
    E --> v | "(" E ")" | "-" E
  2. Next, I name them and factor for easier manipulations:

    E --> E B E  // L-recursive; B stands for "Binary operator"
    E --> P  // not L-recursive; P stands for "Primary Expression"
    P --> v | "(" E ")" | U E   // U stands for "Unary operator"
    B --> "+" | "-" | "*" | "/" | "^"
    P --> "-"

    Now I need to deal only with the first two productions, which are now easier to deal with.

  3. I rewrite those first two productions by starting from the non-L-recursive production (which is simply P, the Primary expression) and following it by the optional Tail T, which I define as the rest of the original production less the first left-recursive nonterminal (that is, just B E) followed by the Tail T, or which could be empty:

    E --> P T
    T --> B E T |

    (note the empty alternative for the tail).

  4. These two productions I can now rewrite in EBNF like this:

    E --> P {B E}

    which is nearly what the author get, but I have E instead of P there inside the zero-or-more repetition pattern (the Tail). The other productions I get quite the same as he have got:

    P --> v | "(" E ")" | U E
    B -> "+" | "-" | "*" | "/" | "^"
    U -> "-"

    but here too I have E instead of P in the first production for P.

So, my question is: What am I missing? What algebraic transformation on the syntax I need to proceed now to get the same exact form as the autor gets? I tried substitutions for E, but it only leads me into loops. I suspect that I need somehow to substitute P for E, but I don't know any legal transformation to justify it. Maybe you know what's the last missing step?

Raphael
  • 73,212
  • 30
  • 182
  • 400
SasQ
  • 290
  • 1
  • 9

1 Answers1

8

The missing step:

E --> P T
T --> B E T |

rewrite E in T:

E --> P T
T --> B P T T | 

Simplify T:

E --> P T
T --> B P T | 

Equivalent to:

E --> P T
T --> {B P}

And there you are.

kena
  • 141
  • 3