20

I try to teach myself the usage of bison. The manpage bison(1) says about bison:

Generate a deterministic LR or generalized LR (GLR) parser employing LALR(1), IELR(1), or canonical LR(1) parser tables.

What is an IELR-parser? All relevant articles I found on the world wide web are paywalled.

Bartosz Przybylski
  • 1,617
  • 1
  • 13
  • 21
fuz
  • 913
  • 6
  • 20

2 Answers2

10

The IELR(1) Parsing Algorithm

The IELR(1) parsing algorithm was developed in 2008 by Joel E. Denny as part of his Ph.D. research under the supervision of Brian A. Malloy at Clemson University. The IELR(1) algorithm is a variation of the so-called "minimal" LR(1) algorithm developed by David Pager in 1977, which itself is a variation of the LR(k) parsing algorithm invented by Donald Knuth in 1965. The IE in IELR(1) stands for inadequacy elimination (see last section).

LR(1) Algorithms

The LR(1) part of IELR(1) stands for Left to right, Rightmost derivation with 1 lookahead token. LR(1) parsers are also called canonical parsers. This class of parsing algorithms employs a bottom-up, shift-reduce parsing strategy with a stack and state transition table determining the next action to take during parsing.

Historically, LR(1) algorithms have been disadvantaged by large memory requirements for their transition tables. Pager's improvement was to develop a method of combining the transition states when the transition table is generated, significantly reducing the size of the table. Thus Pager's algorithm makes LR(1) parsers competitive with other parsing strategies with respect to space and time efficiency. The phrase "minimal LR(1) parser" refers to the minimal size of the transition table introduced by Pager's algorithm.

Limitations of Pager's Algorithm

Minimal LR(1) algorithms produce the transition table based on a particular input grammar for the language to be parsed. Different grammars can produce the same language. Indeed, it is possible for a non-LR(1) grammar to produce an LR(1) parsable language. In practice, LR(1) parser generators accept non-LR(1) grammars with a specification for resolving conflicts between two possible state transitions ("shift-reduce conflicts") to accommodate this fact. Denny and Malloy found that Pager's algorithm fails to generate parsers powerful enough to parse LR(1) languages when provided certain non-LR(1) grammars even though the non-LR(1) grammar generates an LR(1) language.

Denny and Malloy show that this limitation is not merely academic by demonstrating that Gawk and Gpic, both widely used, mature software, perform incorrect parser actions.

IELR(1)'s Improvements

Denny and Malloy studied the source of the deficiencies of Pager's algorithm by comparing the transition table generated by Pager's algorithm to the transition table of an equivalent LR(1) grammar and identified two sources of what they term inadequacies that appear in the transition table from Pager's algorithm but not in the LR(1) transition table. Denny and Malloy's IELR(1) (Inadequacy Elimination LR(1)) algorithm is an algorithm designed to eliminate these inadequacies when generating the transition table that is virtually identical in size to that of Pager's algorithm.

Robert Jacobson
  • 261
  • 2
  • 3
5

An article that claims to introduce it: IELR(1): Practical LR(1) Parser Tables for Non-LR(1) Grammars with Conflict Resolution (via archive.org) by Joel E. Denny and Brian A. Malloy, Clemson University, is freely available from Malloy's site.

What they are worth is something I can't answer. (Personally I don't understand the need for such crippled CFG parsing - why limit your expressive power when you can just use GLR? What does make sense to me is something like TAG or PEG (they seem natural and add expressive power) or tree grammars (for languages such as XML, in which recognizing parse trees is unproblematic by design.)

vike
  • 103
  • 2
reinierpost
  • 6,294
  • 1
  • 24
  • 40