9

This grammar is left recursive:

Expression  ::= AdditionExpression

AdditionExpression  ::=
    MultiplicationExpression
        | AdditionExpression '+' MultiplicationExpression
        | AdditionExpression '-' MultiplicationExpression

MultiplicationExpression    ::=
    Term
        | MultiplicationExpression '*' Term
        | MultiplicationExpression '/' Term

Term    ::=
    Number
        | '(' AdditionExpression ')'

Number  ::=
    [+-]?[0-9]+(\.[0-9]+)?

So in theory, recursive descent won't work. But by exploiting the properties of the grammar that each left-recursive rule corresponds to a specific precedence level, and that lookahead of a single token is enough to choose the correct production, the left-recursive rules can be individually parsed with while loops.

For example, to parse the AdditionExpression non-terminal, this pseudocode suffices:

function parse_addition_expression() {
    num = parse_multiplication_expression()
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            num += parse_multiplication_expression()
        else if (current_token == MINUS)
            num -= parse_multiplication_expression()
        else {
            unget_token()
            return num
        }
    }
    return num
}

What is the correct name for this type of parser? This informative article only refers to it as the "Classic Solution": https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm

There must be a proper name for this type of parser.

Raphael
  • 73,212
  • 30
  • 182
  • 400
user71015
  • 91
  • 3

2 Answers2

11

It is just an LL(1) parser implemented with recursive descent.

Starts with:

AdditionExpression  ::=
    MultiplicationExpression
        | AdditionExpression '+' MultiplicationExpression
        | AdditionExpression '-' MultiplicationExpression

apply left-recursion removal to get an LL(1) grammar:

AdditionExpression  ::= 
    MultiplicationExpression AdditionExpressionTail

AdditionExpressionTail ::=
        | '+' MultiplicationExpression AdditionExpressionTail
        | '-' MultiplicationExpression AdditionExpressionTail

write the corresponding functions:

function parse_AdditionExpression() {
    parse_MultiplicationExpression()
    parse_AdditionExpressionTail()
}

function parse_AdditionExpressionTail() {
    if (has_token()) {
        get_token()
        if (current_token == PLUS) {
            parse_MultiplicationExpression()
            parse_AdditionExpressionTail()
        } else if (current_token == MINUS) {
            parse_MultiplicationExpression()
            parse_AdditionExpressionTail()
        } else {
            unget_token()
        }
    }
}

remove tail recursion:

function parse_AdditionExpressionTail() {
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            parse_MultiplicationExpression()
        else if (current_token == MINUS)
            parse_MultiplicationExpression()
        else {
            unget_token()
            return
        }
    }
}

inline:

function parse_AdditionExpression() {
    parse_MultiplicationExpression()
    while (has_token()) {
        get_token()
        if (current_token == PLUS)
            parse_MultiplicationExpression()
        else if (current_token == MINUS)
            parse_MultiplicationExpression()
        else {
            unget_token()
            return
        }
    }
}

and you have just to add the semantic processing to get your function.

AProgrammer
  • 3,099
  • 18
  • 20
5

You want to look into LL($k$) parsing. The Wikipedia article is mostly useless, but it's basically recursive descent with $k$ symbols lookahead.

There is also LL($*$) which permits unbounded lookahead.

See here for a comprehensive overview on how powerful this class of parsers is.

Raphael
  • 73,212
  • 30
  • 182
  • 400