Parsing 4 Dr William Harrison Fall 2008 [email protected] CS4430 – Compilers I.

23
Parsing 4 Dr William Harrison Fall 2008 [email protected] CS4430 – Compilers I

Transcript of Parsing 4 Dr William Harrison Fall 2008 [email protected] CS4430 – Compilers I.

Page 1: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Parsing 4

Dr William Harrison

Fall 2008

[email protected] CS4430 – Compilers I

Page 2: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Today

Continuing the second phase “parsing” or grammatical analysis

• discovers the real structure of a program and represents it in a computationally useful way

“Predictive” parsing also called “recursive descent” parsing Last time: basics of LL(1) parsing

• follow sets, table-driven predictive parsing, etc. Today: finish predictive parsing

Reading: You should be reading Chapter 2 of “Modern Compiler Design”

Page 3: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Parsing concepts

Language Grammar

Parser

* all inter-related, but different notions

Page 4: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Review: Parse Trees from Derivations

S if E then S else SS begin S LS print E

L endL ; S L

E num = num

S begin S L begin print E L begin print 1=1 L begin print 1=1 end

S

begin S L

print E end

1 = 1

Parse Tree Associated with Derivation

Page 5: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

The Process of constructing Parsers

ConstructCFG

Recognizeits Form

ConstructParser

LanguageDesign

LanguageParser

start again if notin appropriate form(e.g., LL(1), LR(1),…)

Page 6: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Predictive ParsingThe first token on theRHS of each rule is

unique.

S if E then S else SS begin S LS print E

L endL ; S L

E num = num

• a.k.a. “recursive descent”• If the first symbol on the r.h.s. of the productions is unique, then this grammar is LL(1).

Page 7: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

“Rolling your own” Predictive Parsers

int tok = getToken();void advance() { tok = getToken(); }void eat(int t) {

if (tok = t) { advance() } else { error(); } }void S(){switch(tok) { case IF: eat(IF); E(); eat(THEN); S(); eat(ELSE); S(); break;

case BEGIN: …}void E() { eat(NUM); eat(EQ); eat(NUM);}…

S if E then S else SS begin S LS print E

L endL ; S L

E num = num

Has one function for each non-terminal, and one clause for each production

Page 8: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Road map for today

Determining if a grammar is LL(1)First sets“Follow” sets

“Massaging” a grammar into LLUsing the technique of “left factoring”…and eliminating “left recursion”

Page 9: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Review: Table-driven Predictive Parsing

1. S E $2. E T E’3. E’ + T E’4. E’ - T E’5. E’ 6. T F T’7. T’ * F T’8. T’ / F T’9. T’ 10. F id11. F num12. F ( E )

SEE’TT’F

+ * id $

1

2

3 5

6

9 7 9

10

empty=error

top of the “parse stack”

front of token stream

Actions are: predict(production) match, accept, and error

Page 10: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Remaining Questions about Predictive Parsing We were “given” that grammar and

prediction table Can all grammars be given a similar table

• Alas, no – only LL(1) grammars How did we come up with that table?

• “First” and “Follow” sets Techniques for converting some grammars

into LL(1)• Eliminating left-recursion• Left factoring

We’ll discuss these now

Page 11: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

FIRST()

is a sequence of symbols (terminal or non-terminal) For example, the right hand side of a production

FIRST returns the set of all possible terminal symbols that begin any string derivable from .

Consider two productions X and X

If FIRST () and FIRST () have symbols in common, then the prediction mechanism will not know which way to choose.

The Grammar is not LL(1)! If FIRST () and FIRST () have no symbols in

common, then perhaps LL(1) can be used. • We need some more formalisms.

Page 12: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

FOLLOW sets and nullable productions

FOLLOW(X) X is a non-terminal The set of terminals that can

immediately follow X.

nullable(X) X is a non-terminal True if, and only if, X can

derive to the empty string λ.

FIRST, FOLLOW & nullable can be used to construct predictive parsing tables for LL(1) parsers.

Can build tables that support LL(2), LL(3), etc.

X A X BC X dB a b

FOLLOW(X) = ?

X a BX BB dB λ

nullable(X) = ?

Page 13: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

FOLLOW sets and nullable productions

FOLLOW(X) X is a non-terminal The set of terminals that can

immediately follow X.

nullable(X) X is a non-terminal True if, and only if, X can

derive to the empty string λ.

FIRST, FOLLOW & nullable can be used to construct predictive parsing tables for LL(1) parsers.

Can build tables that support LL(2), LL(3), etc.

X A X BC X dB a bFOLLOW(X) = { d, a }

X a BX BB dB λ

nullable(X) = true

Page 14: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Constructing the parse table

X

t

For every non-terminal X and token t:

X

• Enter the production (X ) if t FIRST(),• If X is nullable, enter the production (X ) if t FOLLOW()

Page 15: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Constructing the parse table

X

t

What if there are more than one production?

X X

Page 16: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Constructing the parse table

X

t

What if there are more than one production?

X X

Then the grammar cannot be parsed with “predictive parsing”, and it is (by definition) not LL(1).

Page 17: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Recursive descent renders a readable parser. depends on the first terminal symbol of each sub-

expression providing enough information to choose which production to use.

But consider a predictive parser for this grammar

E E + TE T

Shortcomings of LL Parsers

void E(){switch(tok) { case ?: E(); eat(TIMES); T(); no way of choosing production case ?: T(); …}void T(){eat(ID);}

Page 18: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Consider this grammar it’s “left-recursive”

Can not use LL(1) – why? Consider this alternative different grammar

This derives the same language Now there is no left recursion. There is a generalization for more complex grammars.

E T E’E’ + T E’E’ λ

E E + TE T

Eliminating left recursion

Page 19: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Consider

It’s not LL(1) – why? We can transform this grammar, and make

it LL(1).

S if E then S else SS if E then S

S if E then S XX λX else S

Removing Common Prefixes

* Remark: The resulting grammar is not as readable.

Page 20: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

In Class Discussion (1)

S S ; S

S id := E

E id

E num

E E + E

How can we transform this grammar So that it accepts the

same language, but• Has no left recursion

We may have to use left factoring

Page 21: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

In Class Discussion (2)

S S1 ; S was S S ; SS S1S1 id := E

E E + E1 was E E+EE E1E1 idE1 num

Write a grammar that accepts the same language, but Is not ambiguous Has no left recursion

Prefix eliminationdoesn’t work here

Page 22: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Class Discussion (3)

S S1 S2S1 id := ES2 ; S1 S2S2

E E1 E2E1 idE1 numE2 + E1 E2E2

Write a grammar that accepts the same language, but

… Has no left recursion

Before: (# is a terminal)X X # YX Y

After:X Y X2X2 # Y X2X2

This final grammar is LL(1).

Page 23: Parsing 4 Dr William Harrison Fall 2008 HarrisonWL@missouri.edu CS4430 – Compilers I.

Next time

LR ParsingLR(k) grammars are more expressive

than LL(k) grammars,…but it’s not as obvious how to parse

with them.