5

Forgive me for my ignorance as I am self-teaching myself some of this theory... I am having some trouble understanding how to systematically/algorithmically compute FOLLOW sets, given that I have computed the FIRST sets.

For the following grammar:

E -> TE'
E' -> +TE' | ε
T -> FT'
T' -> *FT' |  ε
F -> (E) | id

I found that the FIRST sets are:

FIRST(E) = { (, id }
FIRST(E') = { + ,  ε }
FIRST(T) = { (, id }
FIRST(T') = { *,  ε }
FIRST(F) = { (, id }

When it comes to computing the FOLLOW sets, I am trying to use the following rules:

1. $ goes into the set FOLLOW(S) where S is the start symbol.
2. If A -> aBb, then (FIRST(b) - { ε }) is in FOLLOW(B).
3. If A -> aB then everything in FOLLOW(A) is in FOLLOW(B)
4. If A -> aBb and ε is in FIRST(b), then everything in FOLLOW(A) is in FOLLOW(B).

Obviously, the step I am most certain here is that

FOLLOW(E) contains $, based on rule 1. But everything after that, I seem to have a hard time following a pattern on how to systematically compute the FOLLOW sets. For example, to compute FOLLOW(E), how are we supposed to approach this?

E -> TE', looks like A -> aB , so everything in FOLLOW(E) is in FOLLOW(E'). Where to go from here? Do I compute FOLLOW(E')? Then going to FOLLOW(E'):

E' -> + TE'... looks like A -> aBb. Then that means FIRST(E') - { ε } is in FOLLOW(T). So I do know that { + } is in FOLLOW(T).

As you can see, with the way I have interpreted this, I am going around all over the place computing the FOLLOW of another production. Ultimately, I end up losing track on what terminal symbols go into which follow set. :(

I am for sure misunderstanding something here.

What is the right approach/ordering to compute the FOLLOW sets?

1 Answers1

4

The simple way to do it is the same as the simple way to compute FIRST sets, using a least fixed-point algorithm in which a set of inclusion rules are applied repeatedly until a fixed-point is reached:

  1. Initialise all the sets to empty.
  2. For every production in order, use the rules to add elements into the sets. (As a slight efficiency, rule 2 only needs to be applied on the first pass. I didn't apply that to the trace below.)
  3. If any set was modified during step 2, do it again.

So, here are the productions in order (I added the missing $*$ to $T'$, and inserted an augmented start rule $S$, which avoids the need for your special case rule 1):

$$\begin{align} S &\to E\; $ \\ E &\to TE' \\ E' &\to +TE' \mid \epsilon \\ T &\to FT' \\ T' &\to *FT' \mid \epsilon \\ F &\to (E) \mid id \end{align}$$

  1. Set all the $FOLLOW$ sets to $\{\}$
  2. First pass. I left out productions like $F\to id$ which don't match any of the rules. $$ \begin{array}{c|l|l} S \to E\; $ &\text{Add }\$\text{ to }FOLLOW(E) &FOLLOW(E) = \{ \$ \}\\ E \to TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T) &FOLLOW(T) = \{ + \}\\ &\text{Add }FOLLOW(E)\text{ to }FOLLOW(E') &FOLLOW(E') = \{ \$ \}\\ &\text{Add }FOLLOW(E)\text{ to }FOLLOW(T) &FOLLOW(T) = \{ + \$ \}\\ E' \to +TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T) &\text{no change}\\ &\text{Add }FOLLOW(E')\text{ to }FOLLOW(E') &\text{no change}\\ &\text{Add }FOLLOW(E')\text{ to }FOLLOW(T) &\text{no change}\\ T \to FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F) &FOLLOW(F) = \{ * \}\\ &\text{Add }FOLLOW(T)\text{ to }FOLLOW(T') &FOLLOW(T') = \{ + \$ \}\\ T' \to *FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F) &\text{no change}\\ &\text{Add }FOLLOW(T')\text{ to }FOLLOW(T') &\text{no change}\\ &\text{Add }FOLLOW(T')\text{ to }FOLLOW(F) &FOLLOW(F) = \{ * + \$\}\\ F \to (E) &\text{Add })\text{ to }FOLLOW(E) &FOLLOW(E) = \{ ) \$\}\\ \end{array} $$
  3. There were various changes, so we repeat step 2.
  4. (Step 2, second pass) $$\begin{array}{c|l|l} S \to E\; $ &\text{Add }\$\text{ to }FOLLOW(E) &\text{no change}\\ E \to TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T) &\text{no change}\\ &\text{Add }FOLLOW(E)\text{ to }FOLLOW(E') &FOLLOW(E') = \{ )\$ \}\\ &\text{Add }FOLLOW(E)\text{ to }FOLLOW(T) &FOLLOW(T) = \{ + )\$ \}\\ E' \to +TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T) &\text{no change}\\ &\text{Add }FOLLOW(E')\text{ to }FOLLOW(E') &\text{no change}\\ &\text{Add }FOLLOW(E')\text{ to }FOLLOW(T) &\text{no change}\\ T \to FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F) &\text{no change}\\ &\text{Add }FOLLOW(T)\text{ to }FOLLOW(T') &FOLLOW(T') = \{ + ) \$ \}\\ T' \to *FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F) &\text{no change}\\ &\text{Add }FOLLOW(T')\text{ to }FOLLOW(T') &\text{no change}\\ &\text{Add }FOLLOW(T')\text{ to }FOLLOW(F) &FOLLOW(F) = \{ * + ) \$\}\\ F \to (E) &\text{Add })\text{ to }FOLLOW(E) &\text{no change}\\ \end{array} $$
  5. (step 3, second pass) There were some more changes. Once again.
  6. (Step 2, third pass) $$\begin{array}{c|l|l} S \to E\; $ &\text{Add }\$\text{ to }FOLLOW(E) &\text{no change}\\ E \to TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T) &\text{no change}\\ &\text{Add }FOLLOW(E)\text{ to }FOLLOW(E') &\text{no change}\\ &\text{Add }FOLLOW(E)\text{ to }FOLLOW(T) &\text{no change}\\ E' \to +TE' &\text{Add }FIRST(E')-\{\epsilon\}\text{ to }FOLLOW(T) &\text{no change}\\ &\text{Add }FOLLOW(E')\text{ to }FOLLOW(E') &\text{no change}\\ &\text{Add }FOLLOW(E')\text{ to }FOLLOW(T) &\text{no change}\\ T \to FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F) &\text{no change}\\ &\text{Add }FOLLOW(T)\text{ to }FOLLOW(T') &\text{no change}\\ T' \to *FT' &\text{Add }FIRST(T')-\{\epsilon\}\text{ to }FOLLOW(F) &\text{no change}\\ &\text{Add }FOLLOW(T')\text{ to }FOLLOW(T') &\text{no change}\\ &\text{Add }FOLLOW(T')\text{ to }FOLLOW(F) &\text{no change}\\ F \to (E) &\text{Add })\text{ to }FOLLOW(E) &\text{no change}\\ \end{array} $$

So the $FOLLOW$ sets are: $$\begin{align} FOLLOW(S) &= \{\}\\ FOLLOW(E) &= \{ )$ \}\\ FOLLOW(E') &= \{ )$\}\\ FOLLOW(T) &= \{ + ) $\}\\ FOLLOW(T') &= \{ + ) $\}\\ FOLLOW(F) &= \{ * + ) $ \} \end{align} $$

That's a very simple grammar, of course. You might want to try it out with a more complicated one, although it is typical that it terminates after very few iterations and that many of the steps are obvious no-ops.

rici
  • 12,150
  • 22
  • 40