4

This was an exam question for my course and I am struggling to actually answer it in a way that is not fluff.

Here is my current answer:

CFGs describe how non-terminal symbols are converted into terminal symbols via a parser. However, a scanner defines what those terminal symbols convert to in terms of lexical tokens. CFGs are grammatical descriptions of a language instead of simply defining what tokens should be scanned from an input string.

What is the correct way to answer this?

Filip
  • 168
  • 4
Dhruv Ghulati
  • 203
  • 3
  • 9

3 Answers3

4

You don't use CFGs because typically lexical analysis can be performed using regular automata, and these are faster than context-free parsers. It's a question of efficiency.

Yuval Filmus
  • 280,205
  • 27
  • 317
  • 514
0

Regular expressions are recognized by finite state machines, while context free grammar recognized by push-down automata. Push-down automata is not efficient as FSM in terms of time, hence it's all about efficiency.

0

Several angles here.

  • The distinction between "token" and "syntax" is somewhat arbitrary. Both can (and have been) described by CFGs.
  • One reason for the split is that we read e.g. a C program in terms of tokens (variable names, operators, reserved words, comments), not in term of characters. So it makes sense for the compiler writer to do what comes naturally.
  • A significant time of any compiler is reading and grouping characters into tokens, it makes sense to make this as fast as possible. The technology to handle regular expression matching is mature. Much character detail can just be discarded without much analysis (spaces, comments).
  • "Most" languages have the same (general) idea what the basic building blocks (names, operators, even comments) look like. It makes sense to package that and free the compiler writer of a task that way (less code, less bugs...). Several parsing tools offer this prepackaged.
  • Parsing is complex, it's speed depends on number of symbols handled. Grouping into tokens gives less stuff to parse, thus betrer performance.
vonbrand
  • 14,204
  • 3
  • 42
  • 52