0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of...

161
Compilation 0368-3133 Lecture 4: Syntax Analysis: Parsing Noam Rinetzky 1

Transcript of 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of...

Page 1: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Compilation0368-3133

Lecture4:SyntaxAnalysis:Parsing

NoamRinetzky

1

Page 2: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

2

Page 3: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

TheRealAnatomyofaCompiler

Executable code

exe

Sourcetext

txtLexicalAnalysis

Sem.Analysis

Process text input

characters SyntaxAnalysistokens AST

Intermediate code

generation

Annotated AST

Intermediate code

optimizationIR Code

generationIR

Target code optimization

Symbolic Instructions

SI Machine code generation

Write executable

output

MI

3

LexicalAnalysis

SyntaxAnalysis

Page 4: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Broadkindsofparsers

• Parsersforarbitrary grammars– Earley’s method,CYKmethod– Usually,notusedinpractice(thoughmightchange)

• Top-Downparsers– Constructparsetreeinatop-downmatter– Findtheleftmost derivation

• Bottom-Upparsers– Constructparsetreeinabottom-upmanner– Findtherightmost derivationinareverseorder

4

Page 5: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

CFGterminology

Symbols:Terminals (tokens):;:=() idnumprintNon-terminals:SEL

Startnon-terminal:SConvention:thenon-terminalappearinginthefirstderivationrule

Grammarproductions(rules)N® μ

S® S ; SS® id:= EE® idE® numE® E + E

5

Page 6: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

CFGterminology

• Derivation - asequenceofreplacementsofnon-terminalsusingthederivationrules

• Language - thesetofstringsofterminalsderivablefromthestartsymbol

• Sententialform- theresultofapartialderivation– Maycontainnon-terminals

6

Page 7: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Derivations

• ShowthatasentenceωisinagrammarG– Startwiththestartsymbol– Repeatedlyreplaceoneofthenon-terminalsbyaright-handsideofaproduction

– Stopwhenthesentencecontainsonlyterminals

• GivenasentenceαNβ andruleN®µαNβ =>αµβ

• ω isinL(G)ifS=>*ω7

Page 8: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Predictiveparsing

• Recursivedescent• LL(k)grammars

8

Page 9: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Predictiveparsing

• GivenagrammarGandawordwattempttoderivewusingG

• Idea– Applyproductiontoleftmostnonterminal– Pickproductionrulebasedonnextinputtoken

• Generalgrammar– Morethanoneoptionforchoosingthenextproductionbasedonatoken

• Restrictedgrammars(LL)– Knowexactlywhichsingleruletoapply– Mayrequiresomelookahead todecide

9

Page 10: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Booleanexpressionsexample

10

not(nottrueorfalse)

E® LIT|(EOPE)|not ELIT® true | falseOP® and |or |xor

Page 11: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

E=>not E=>not(EOPE)=>not(not EOPE)=>not(notLITOPE)=>not(nottrue OPE)=>not(nottrueor E)=>not(nottrueorLIT)=>not(nottrueorfalse )

not E

E

( E OP E )

not LIT or LIT

true false

Booleanexpressionsexample

not(nottrueorfalse)productiontoapplyknownfromnexttoken

E® LIT|(EOPE)|not ELIT® true | falseOP® and |or |xor

11

Page 12: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

E=>not E=>not(EOPE)=>not(not EOPE)=>not(notLITOPE)=>not(nottrue OPE)=>not(nottrueor E)=>not(nottrueorLIT)=>not(nottrueorfalse )

E

not E

( E OP E )

not LIT or LIT

falsetrue

Booleanexpressionsexample

not(nottrueorfalse)productiontoapplyknownfromnexttoken

E® LIT|(EOPE)|not ELIT® true | falseOP® and |or |xor

12

Page 13: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Implementationviarecursion

E → LIT| ( E OP E )| not E

LIT → true| false

OP → and| or| xor

E() {if (current Î {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT);

E(); OP(); E();match(RPAREN);

else if (current == NOT) match(NOT); E();else error;

}

LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;

}

OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;

}13

Page 14: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

FIRSTsets

• FIRST(X)={t |Xà*t β}∪{ℇ |Xà* ℇ}– FIRST(X)=allterminalsthatα canappearasfirstinsomederivationforX• +ℇ ifcanbederivedfromX

• Example:– FIRST(LIT)={true,false}– FIRST((EOPE))={‘(‘}– FIRST(notE)={not}

14

First(α)canbedefinedforanysequenceofsymbols

Page 15: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ComputingFIRSTsets• FIRST(t)={t}//“t”terminal

• ℇ∈ FIRST(X) if– Xà ℇ or– Xà A1 ..Ak andℇ∈ FIRST(Ai)i=1…k

• FIRST(α)⊆ FIRST(X)if– Xà A1 ..Ak α andℇ∈ FIRST(Ai)i=1…k

15

First(X)iscomputedfornon-terminals

Page 16: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Followsets

• Follow(X)={t |Sà*αXt β}– t– Terminalor$

16

Page 17: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

FOLLOWsets:Constraints

• $∈ FOLLOW(S)

• FIRST(β)– {ℇ}⊆ FOLLOW(X)– ForeachAà αXβ

• FOLLOW(A)⊆ FOLLOW(X)– ForeachAà αXβandℇ ∈ FIRST(β)

17

Page 18: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Example:FOLLOWsets

• Eà TX Xà+E|ℇ• Tà (E)|int YYà *T|ℇ

18

Non.Term.

E T X Y

FOLLOW ),$ +,),$ $,) +,),$

• $∈ FOLLOW(S)• FIRST(β)– {ℇ}⊆ FOLLOW(X)

– ForeachAà αXβ

• FOLLOW(A)⊆ FOLLOW(X)– ForeachAà αXβandℇ∈ FIRST(β)

Page 19: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

PredictionTable

• Aà α

• T[A,t]=αift∈FIRST(α)• T[A,t]=αifℇ ∈ FIRST(α)andt∈ FOLLOW(A)

– tcanalsobe$

• Tisnotwelldefinedè thegrammarisnotLL(1)

19

Page 20: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LL(k)grammars

• AgrammarisintheclassLL(K)whenitcanbederivedvia:– Top-downderivation– Scanningtheinputfromlefttoright(L)– Producingtheleftmostderivation(L)– Withlookahead ofktokens(k)

• AlanguageissaidtobeLL(k)whenithasanLL(k)grammar

20

Page 21: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LL(1)grammars

• AgrammarisintheclassLL(1)iff– ForeverytwoproductionsA® α andA® β wehave

• FIRST(α)∩FIRST(β)={}//includinge• Ife∈ FIRST(α)thenFIRST(β)∩FOLLOW(A)={}• Ife∈ FIRST(β)thenFIRST(α)∩FOLLOW(A)={}

21

Page 22: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

22

Problem:NonLLGrammars

Page 23: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Backtoproblem1:commonprefix

• FIRST(term)={ID}• FIRST(indexed_elem)={ID}

• FIRST/FIRSTconflict

term® ID |indexed_elemindexed_elem® ID [expr ]

23

Page 24: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Solution:leftfactoring• RewritethegrammartobeinLL(1)

Intuition:justlikefactoringx*y+x*zintox*(y+z)

term® ID |indexed_elemindexed_elem® ID [expr ]

term® ID after_IDAfter_ID® [expr ]| e

24

Page 25: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

S® ifEthenSelseS|ifEthenS|T

S® ifEthenSS’|T

S’® elseS|e

Leftfactoring– anotherexample

25

Page 26: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Problem:nullproduction

bool S(){returnA()&&match(token(‘a’))&&match(token(‘b’));

}

bool A(){returnmatch(token(‘a’))||true;}

S® Aa bA® a |e

§ Whathappensforinput“ab”?§ Whathappensifyoufliporderofalternativesandtry“aab”?

26

Page 27: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

• FIRST(S)={a} FOLLOW(S)={$}• FIRST(A)={a,e } FOLLOW(A)={a}

• FIRST/FOLLOWconflict

S® Aa bA® a |e

27

Problem:nullproduction

Page 28: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Backtoproblem2:nullproduction

• FIRST(S)={a} FOLLOW(S)={}• FIRST(A)={a,e } FOLLOW(A)={a}

• FIRST/FOLLOWconflict

S® Aa bA® a |e

28

Page 29: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Solution:substitution

S® Aa bA® a|e

S® aa b|ab

Substitute A in S

S® aafter_Aafter_A® ab|b

Left factoring

29

Page 30: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Backtoproblem3:Leftrecursion

• Leftrecursioncannotbehandledwithaboundedlookahead

• Whatcanwedo?

E® E- term|term

30

Page 31: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Leftrecursionremoval

• L(G1)=β,βα,βαα,βααα,…• L(G2)=same

N® Nα |β N® βN’N’® αN’|e

G1 G2

E® E- term|term

E® termTE|termTE® - termTE|e

§ Forour3rd example:

p. 130

Canbedonealgorithmically.Problem:grammarbecomesmangledbeyondrecognition

31

Page 32: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LL(k)Parsers

• RecursiveDescent– Manualconstruction– Usesrecursion

• Wanted– Aparserthatcanbegeneratedautomatically– Doesnotuserecursion

32

Page 33: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

• Pushdownautomatonuses– Predictionstack– Inputstream– Transitiontable

• nonterminals xtokens->productionalternative• EntryindexedbynonterminalNandtokentcontainsthealternativeofNthatmustbepredicatedwhencurrentinputstartswitht

LL(k)parsingviapushdownautomata

33

Page 34: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LL(k)parsingviapushdownautomata

• Twopossiblemoves– Prediction

• Whentopofstackisnonterminal N,popN,lookuptable[N,t].Iftable[N,t]isnotempty,pushtable[N,t]onpredictionstack,otherwise– syntaxerror

– Match• WhentopofpredictionstackisaterminalT,mustbeequaltonextinputtokent.If(t==T),popTandconsumet.If(t≠T)syntaxerror

• Parsingterminateswhenpredictionstackisempty– Ifinputisemptyatthatpoint,success.Otherwise,syntaxerror

34

Page 35: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

( ) not true false and or xor $

E 2 3 1 1

LIT 4 5

OP 6 7 8

(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor

Non

term

inal

s

Input tokens

Whichruleshouldbeused

Exampletransitiontable

35

Page 36: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Modelofnon-recursivepredictiveparser

PredictiveParsingprogram

Parsing Table

X

Y

Z

$

Stack

$b+a

Output

36

Page 37: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

a b c

A A® aAb A® c

A ® aAb | caacbb$

Inputsuffix Stack content Move

aacbb$ A$ predict(A,a)=A® aAb

aacbb$ aAb$ match(a,a)

acbb$ Ab$ predict(A,a)=A® aAb

acbb$ aAbb$ match(a,a)

cbb$ Abb$ predict(A,c)=A® c

cbb$ cbb$ match(c,c)

bb$ bb$ match(b,b)

b$ b$ match(b,b)

$ $ match($,$)– success

Runningparserexample

37

Page 38: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Erorrs

38

Page 39: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

HandlingSyntaxErrors

• Reportandlocatetheerror• Diagnosetheerror• Correcttheerror• Recoverfromtheerrorinordertodiscovermoreerrors– withoutreportingtoomany“strange”errors

39

Page 40: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ErrorDiagnosis

• Linenumber– maybefarfromtheactualerror

• Thecurrenttoken• Theexpectedtokens• Parserconfiguration

40

Page 41: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ErrorRecovery

• Becomeslessimportantininteractiveenvironments

• Exampleheuristics:– Searchforasemi-columnandignorethestatement– Tryto“replace” tokensforcommonerrors– Refrainfromreporting3subsequenterrors

• Globallyoptimalsolutions– Foreveryinputw,findavalidprogramw’ witha“minimal-distance” fromw

41

Page 42: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

a b c

A A® aAb A® c

A ® aAb | cabcbb$

Inputsuffix Stack content Move

abcbb$ A$ predict(A,a)=A® aAb

abcbb$ aAb$ match(a,a)

bcbb$ Ab$ predict(A,b)=ERROR

Illegalinputexample

42

Page 43: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ErrorhandlinginLLparsers

• Nowwhat?– Predictb S anyway“missingtokenbinsertedinlineXXX”

S ® a c | b Sc$

a b c

S S® ac S® bS

Inputsuffix Stack content Move

c$ S$ predict(S,c)=ERROR

43

Page 44: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ErrorhandlinginLLparsers

• Result:infiniteloop

S ® a c | b Sc$

a b c

S S® ac S® bS

Inputsuffix Stack content Move

bc$ S$ predict(b,c)=S® bS

bc$ bS$ match(b,b)

c$ S$ Looks familiar?

44

Page 45: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Errorhandlingandrecovery

• x=a*(p+q*(-b*(r-s);

• Whereshouldwereporttheerror?

• Thevalidprefixproperty

45

Page 46: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

TheValidPrefixProperty

• Foreveryprefixtokens– t1,t2,…,ti thattheparseridentifiesaslegal:

• thereexiststokensti+1,ti+2,…,tn suchthatt1,t2,…,tnisasyntacticallyvalidprogram

• Ifeverytokenisconsideredassinglecharacter:– Foreveryprefixworduthattheparseridentifiesaslegal

thereexistswsuchthatu.w isavalidprogram

46

Page 47: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Recoveryistricky

• Heuristicsfordroppingtokens,skippingtosemicolon,etc.

47

Page 48: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

BuildingtheParseTree

48

Page 49: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Addingsemanticactions

• Canaddanactiontoperformoneachproductionrule

• Canbuildtheparsetree– EveryfunctionreturnsanobjectoftypeNode– EveryNodemaintainsalistofchildren– Functioncallscanaddnewchildren

49

Page 50: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Buildingtheparsetree

Node E() {result = new Node(); result.name = “E”;if (current Î {TRUE, FALSE}) // E ® LITresult.addChild(LIT());

else if (current == LPAREN) // E ® ( E OP E )result.addChild(match(LPAREN));result.addChild(E());result.addChild(OP()); result.addChild(E());result.addChild(match(RPAREN));

else if (current == NOT) // E ® not Eresult.addChild(match(NOT));result.addChild(E());

else error;return result;

} 50

Page 51: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

static int Parse_Expression(Expression **expr_p) {

Expression *expr = *expr_p = new_expression() ;

/* try to parse a digit */

if (Token.class == DIGIT) {

expr->type=‘D’; expr->value=Token.repr –’0’;

get_next_token();

return 1; }

/* try parse parenthesized expression */

if (Token.class == ‘(‘) {

expr->type=‘P’; get_next_token();

if (!Parse_Expression(&expr->left)) Error(“missing expression”);

if (!Parse_Operator(&expr->oper)) Error(“missing operator”);

if (Token.class != ‘)’) Error(“missing )”);

get_next_token();

return 1; }

return 0;

} 51

ParserforFullyParenthesizedExpers

Page 52: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

BottomUpparsing

52

Page 53: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Bottom-UpParsing

• Goal:Buildaparsetree– Reporterrorifinputisnotalegalprogram

• How:– Readinputleft-to-right– Constructasubtree forthefirstleft-mosttreenodewhosechildern havebeenconstructed

53

Page 54: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

+ * 321

54

Bottom-upparsingE® E*TE® TT® T+FT® FF® idF® numF® (E)

E

E

TT

F

T

F F

(Nonstandardprecedence)

Page 55: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Bottom-upparsing:LR(k)Grammars

• AgrammarisintheclassLR(K)whenitcanbederivedvia:– Bottom-up derivation– Scanningtheinputfromlefttoright(L)– Producingtherightmostderivation(R)

• Inreverseoreder– Withlookahead ofktokens(k)

• AlanguageissaidtobeLR(k)ifithasanLR(k)grammar

• ThesimplestcaseisLR(0),whichwewilldiscuss

55

Page 56: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Terminology:Reductions&Handles

• Theoppositeofderivationiscalledreduction– LetAè α beaproductionrule– Derivation: βAµè βαµ– Reduction:βαµè βAµ

• Ahandle isthereducedsubstring– α isthehandlesforβαµ

56

Page 57: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

UseShift&ReduceIneachstage,weshift asymbolfromtheinputtothestack,orreduce accordingtooneoftherules.

57

Page 58: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

StackParser

Input

Output

ActionTable

Goto table

58

) x*)7+23((

RPIdOPRPNumOPNumLPLPtokenstream

Op(*)

Id(b)

Num(23) Num(7)

Op(+)

Howdoestheparserknowwhattodo?

Page 59: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Howdoestheparserknowwhattodo?

• Astate willkeeptheinfogatheredonhandle(s)– Astateinthe“control”ofthePDA– Also(partof)thestackalphabet

• Atable willtellit“whattodo”basedoncurrentstateandnexttoken– ThetransitionfunctionofthePDA

• Astackwillrecordsthe“nestinglevel”– Stackcontainsasequenceofprefixesofhandles

59

SetofLR(0)items

Page 60: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ImportantBottom-UpLR-Parsers

• LR(0) – simplest,explainsbasicideas• SLR(1)– simple,exaplins lookahead• LR(1) – complictated,verypowerful,expensive

• LALR(1)– complicated,powerfulenough,usedbyautomatictools

60

Page 61: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)vsSLR(1)vsLR(1)vsLALR(1)• Alluseshift/reduce

• Maindifference:howtoidentifyahandle– Technically:Usingdifferentsetsofstates

• Moreexpsensiveèmorestatesèmorespecificchoiceofwhichreductionruletouse

• Buttheusage ofthestatesisthesameinallparsers

• Reductionisthesameinalltechniques– Oncethehandleisdetermined

61

Page 62: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)Parsing

62

Page 63: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRitem

63

N ® α•β

Alreadymatched TobematchedInput

Hypothesisaboutαβ beingapossiblehandle:sofarwe’vematchedα,expectingtoseeβ

Page 64: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Example:LR(0)Items• Allitemscanbeobtainedbyplacingadotateverypositionforeveryproduction:

64

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)

1: S ® •E$2: S ® E • $3: S ® E $ •4: E ® • T5: E ® T •6: E ® • E + T7: E ® E • + T8: E ® E + • T9: E ® E + T •10: T ® • i11: T ® i •12: T ® • (E)13: T ® (• E)14: T ® (E •)15: T ® (E) •

Grammar LR(0)items

Page 65: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Example:LR(0)Items• Allitemscanbeobtainedbyplacingadotateverypositionforeveryproduction:

• Before • =reduced– matchedprefix

• After • =maybereduced– Maybematchedbysuffix

65

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)

1: S ® •E$2: S ® E • $3: S ® E $ •4: E ® • T5: E ® T •6: E ® • E + T7: E ® E • + T8: E ® E + • T9: E ® E + T •10: T ® • id11: T ® id •12: T ® • (E)13: T ® (• E)14: T ® (E •)15: T ® (E) •

Grammar LR(0)items

Page 66: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)items

66

N ® α•β ShiftItem

N ® αβ• ReduceItem

Statesaresetsofitems

Page 67: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)Items

• Aderivationrulewithalocationmarker(●)iscalledLR(0)item

E→E*B|E+B|BB→0|1

67

Page 68: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

PDAStates

• APDAstateisasetofLR(0)items.E.g.,q13 ={E→ E● *B,E→ E● +B,B→ 1●}

• Intuitively,ifwematched1,Thenthestatewillrememberthe3possiblealternativesrulesandwhereweareineachofthem

(1)E→ E● *B (2)E→ E● +B(3)B→ 1●

68

E→E*B|E+B|BB→0|1

Page 69: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)Shift/ReduceItems

69

N t α•β ShiftItem

N t αβ• ReduceItem

Page 70: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Intuition• Readinputtokensleft-to-rightandremembertheminthestack

• Whenarighthandsideofaruleisfound,removeitfromthestackandreplaceitwiththenon-terminalitderives

• Rememberingtokeniscalledshift– Eachshiftmovestoastatethatrememberswhatwe’veseensofar

• ReplacingRHSwithLHSiscalledreduce– Eachreducegoestoastatethatdeterminesthecontextofthederivation

70

Page 71: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ModelofanLRparser

71

LR Parser0

T

2

+

7

id

5

Stack

$id+id+id

Outputstate

symbol

GotoTable

ActionTable

Input

TerminalsandNon-terminals

Page 72: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRparserstack

• Sequencemadeofstate,symbolpairs• Forinstanceapossiblestackforthegrammar

S® E$E® TE® E+TT® idT® (E)

couldbe:0 T2 +7 id572Stackgrowsthisway

Page 73: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

FormofLRparsingtable

73

state terminals non-terminals

Shift/Reduceactions Goto part01...

sn

rk

shiftstaten reducebyrulek

gm

goto statem

acc

accept

error

Page 74: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRparsertableexample

74

gotoactionSTATE

TE$)(+id

g6g1s7s50

accs31

2

g4s7s53

r3r3r3r3r34

r4r4r4r4r45

r2r2r2r2r26

g6g8s7s57

s9s38

r5r5r5r5r59

Page 75: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Shiftmove

75

LRParsingprogram

q...

Stack

$…a…

gotoaction

Input

• action[q,a]=sn

Page 76: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Resultofshift

76

LRParsingprogram

naq...

Stack

$…a…

gotoaction

Input

• action[q,a]=sn

Page 77: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Reducemove

77

LRParsingprogram

qn

σn

…q1σ1q…

Stack$…a…

gotoaction

Input

2*n

• action[qn,a]=rk• Production:(k)At σ1… σn• Topofstacklookslike q1σ1…qnσnforsomeq1… qn• goto[q,A]=qm

Page 78: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Resultofreducemove

78

LRParsingprogram

Stack$…a…

gotoaction

Input

• action[qn,a]=rk• Production:(k)At σ1… σn• Topofstacklookslike q1σ1…qnσnforsomeq1… qn• goto[q,A]=qm

qmAq…

Page 79: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Acceptmove

79

LRParsingprogram

q...

Stack

$a…

gotoaction

Input

Ifaction[q,a]=acceptparsing completed

Page 80: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Errormove

80

LRParsingprogram

q...

Stack

$…a…

gotoaction

Input

Ifaction[q,a]=error(usuallyempty)parsingdiscoveredasyntacticerror

Page 81: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Example

81

Z t E $E t T | E + T

T t i | ( E )

Page 82: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Example:parsingwithLRitems

82

Z t E $E t T | E + TT t i | ( E )

E t •T E t •E + TT t •iT t •( E )

Z t •E $

i + i $

WhydoweneedtheseadditionalLRitems?Wheredotheycomefrom?Whatdotheymean?

Page 83: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

e-closure

• GivenasetSofLR(0)items

• IfPt α•Nβ isinstateS• thenforeachruleNt✏ inthegrammarstateSmustalsocontainNt •✏

83

e-closure({Z t •E $}) = E t •T, E t •E + T,T t •i , T t •( E ) }

{ Z t •E $,

Z t E $E t T | E + TT t i | ( E )

Page 84: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

84

i + i $

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Zt E$Et T|E+TTt i|(E)

Itemsdenotepossiblefuturehandles

Rememberpositionfromwhichwe’retryingtoreduce

Example:parsingwithLRitems

Page 85: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

85

Tt i• Reduceitem!

i + i $

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Zt E$Et T|E+TTt i|(E)

Matchitemswithcurrenttoken

Example:parsingwithLRitems

Page 86: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

86

i

Et T• Reduceitem!

T + i $Zt E$Et T|E+TTt i|(E)

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Example:parsingwithLRitems

Page 87: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

87

T

Et T• Reduceitem!

i

E + i $Zt E$Et T|E+TTt i|(E)

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Example:parsingwithLRitems

Page 88: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

88

T

i

E + i $Zt E$Et T|E+TTt i|(E)

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Et E•+T

Zt E•$

Example:parsingwithLRitems

Page 89: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

89

T

i

E + i $Zt E$Et T|E+TTt i|(E)

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Et E•+T

Zt E•$ Et E+•T

Tt •iTt •(E)

Example:parsingwithLRitems

Page 90: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

90

Et E•+T

Zt E•$ Et E+•T

Tt •iTt •(E)

E + T $

i

Zt E$Et T|E+TTt i|(E)

Et •TEt •E+TTt •iTt •(E)

Zt •E$

T

i

Example:parsingwithLRitems

Page 91: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

91

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Zt E$Et T|E+TTt i|(E)

E + T

T

i

Et E•+T

Zt E•$ Et E+•T

Tt •iTt •(E)

i

Et E+T•

$

Reduceitem!

Example:parsingwithLRitems

Page 92: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

92

Et •TEt •E+TTt •iTt •(E)

Zt •E$

E $

E

T

i

+ T

Zt E•$

Et E•+T

i

Zt E$Et T|E+TTt i|(E)

Example:parsingwithLRitems

Page 93: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

93

Et •TEt •E+TTt •iTt •(E)

Zt •E$

E $

E

T

i

+ T

Zt E•$

Et E•+T

Zt E$•

i

Zt E$Et T|E+TTt i|(E)

Example:parsingwithLRitems

Reduceitem!

Page 94: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

94

Et •TEt •E+TTt •iTt •(E)

Zt •E$

Z

E

T

i

+ T

Zt E•$

Et E•+T

Zt E$•

Reduceitem!

E $

i

Zt E$Et T|E+TTt i|(E)

Example:parsingwithLRitems

Page 95: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

GOTO/ACTIONtables

95

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 ZtE$

q3 q5 q7 q4 Shift

q4 EtE+T

q5 Tti

q6 EtT

q7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 TtE

GOTOTableACTIONTable

empty–errormove

Page 96: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)parsertables

• Twotypesofrows:– Shift row– tellswhichstatetoGOTOforcurrenttoken

– Reduce row– tellswhichruletoreduce(independentofcurrenttoken)• GOTOentriesareblank

96

Page 97: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRparserdatastructures• Input– remainderoftexttobeprocessed• Stack– sequenceofpairsN,qi

– N– symbol(terminalornon-terminal)– qi– stateatwhichdecisionsaremade

• Initialstackcontainsq0

97

+ i $Inputsuffix

q0stack i q5Stackgrowsthisway

Page 98: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)pushdownautomaton• Twomoves:shiftandreduce• Shift move

– Removefirsttokenfrominput– Pushitonthestack– ComputenextstatebasedonGOTOtable– Pushnewstateonthestack– Ifnewstateiserror– reporterror

98

i + i $input

q0stack

+ i $input

q0stack

shift

i q5

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

Stackgrowsthisway

Page 99: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)pushdownautomaton• Reduce move

– UsingaruleNtα– Symbolsinα andtheirfollowingstatesareremovedfromstack– NewstatecomputedbasedonGOTOtable(usingtopofstack,

beforepushingN)– Nispushedonthestack– NewstatepushedontopofN

99

+ i $input

q0stack i q5

ReduceTt i + i $input

q0stack q6

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

Stackgrowsthisway

Page 100: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

GOTO/ACTIONtable

100

State i + ( ) $ E T

q0 s5 s7 s1 s6

q1 s3 s2

q2 r1 r1 r1 r1 r1 r1 r1

q3 s5 s7 s4

q4 r3 r3 r3 r3 r3 r3 r3

q5 r4 r4 r4 r4 r4 r4 r4

q6 r2 r2 r2 r2 r2 r2 r2

q7 s5 s7 s8 s6

q8 s3 s9

q9 r5 r5 r5 r5 r5 r5 r5

(1)Z t E $(2)E t T (3)E t E + T(4)T t i (5)T t( E )

Warning:numbersmeandifferentthings!rn =reduceusingrulenumbernsm =shifttostate m

Page 101: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

101

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)

Stack Input Action0 id+id$ s5

Initializewithstate0

Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 102: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

102

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

Stack Input Action0 id+id$ s5

Initializewithstate0

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 103: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

103

Stack Input Action0 id+id$ s50id5 + id$ r4

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 104: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

104

Stack Input Action0 id+id$ s50id5 + id$ r4

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

popid5

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 105: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

105

Stack Input Action0 id+id$ s50id5 + id$ r4

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

pushT6

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 106: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

106

Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r2

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 107: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

107

Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s3

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 108: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

108

Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s5

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 109: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

109

Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s50 E1+3id5 $ r4

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 110: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

110

Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s50 E1+3id5 $ r40E1 +3T4 $ r3

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 111: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Parsingid+id$

111

Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s50 E1+3id5 $ r40E1 +3T4 $ r30E1 $ s2

gotoactionSTE$)(+idg6g1s7s50

accs312

g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26

g6g8s7s57s9s38

r5r5r5r5r59

(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway

rn =reduceusingrulenumbernsm =shifttostatem

Page 112: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)automatonexample

112

Z® •E$E® •TE® •E+TT® •iT® •(E)

T® (•E)E® •TE® •E+TT® •iT® •(E)

E® E+T•

T® (E)•Z® E$•

Z® E•$E® E•+T E® E+•T

T® •iT® •(E)

T® i•

T® (E•)E® E•+T

E® T•q0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

reducestateshiftstate

readinput“(“

ManagedtoreduceE

Page 113: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

StatesandLR(0)Items

• Thestatewill“remember”thepotentialderivationrulesgiventhepartthatwasalreadyidentified

• Forexample,ifwehavealreadyidentifiedEthenthestatewillrememberthetwoalternatives:

(1)E→ E*B, (2) E→ E+B• Actually,wewillalsorememberwhereweareineachof

them:(1)E→ E● *B, (2) E→ E● +B• AderivationrulewithalocationmarkeriscalledLR(0)

item.• ThestateisactuallyasetofLR(0)items.E.g.,

q13 ={E→ E● *B,E→ E● +B}

E→E*B|E+B|BB→0|1

113

Page 114: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

GOTO/ACTIONtables

114

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 Z® E$

q3 q5 q7 q4 Shift

q4 E® E+T

q5 T® i

q6 E® T

q7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 T® E

GOTOTable ACTIONTable

empty=errormove

Page 115: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)parsertables

• Actionsdeterminedbytopmoststate• Twotypesofrows:

– Shiftrow– tellswhichstatetoGOTOforcurrenttoken

– Reducerow– tellswhichruletoreduce(independentofcurrenttoken)• GOTOentriesareblank

115

Page 116: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

GOTO/ACTIONtables

116

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 Z® E$

q3 q5 q7 q4 Shift

q4 E® E+T

q5 T® i

q6 E® T

q7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 T® E

GOTOTable ACTIONTable

empty=errormove

Page 117: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

GOTO/ACTIONtableaction

shift

shift

Z® E$

Shift

E® E+T

T® i

E® T

shift

shift

T® E

117

State i + ( ) $ E T

q0 s5 s7 s1 s6

q1 s3 s2

q2 r1 r1 r1 r1 r1 r1 r1

q3 s5 s7 s4

q4 r3 r3 r3 r3 r3 r3 r3

q5 r4 r4 r4 r4 r4 r4 r4

q6 r2 r2 r2 r2 r2 r2 r2

q7 s5 s7 s8 s6

q8 s3 s9

q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ® E $(2)E ® T (3)E ® E + T(4)T ® i (5)T ® ( E )

Warning:numbersmeandifferentthings!rn =reduceusingrulenumbernsm =shifttostate m

Page 118: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)Parsingofi +i

Stackq0q0i q5q0T q6q0E q1q0E q1+ q3q0E q1+ q3i q5q0E q1+ q3T q4q0E q1q0E q1$ q2q0 $

118

State i + ( ) $ E T

q0 s5 s7 s1 s6

q1 s3 s2

q2 r1 r1 r1 r1 r1 r1 r1

q3 s5 s7 s4

q4 r3 r3 r3 r3 r3 r3 r3

q5 r4 r4 r4 r4 r4 r4 r4

q6 r2 r2 r2 r2 r2 r2 r2

q7 s5 s7 s8 s6

q8 s3 s9

q9 r5 r5 r5 r5 r5 r5 r5

(1)Z ® E $(2)E ® T (3)E ® E + T(4)T ® i (5)T ® ( E )

Warning:numbersmeandifferentthings!rn =reduceusingrulenumbernsm =shifttostate m

inputi +i $+i $+i $+i $i $$$$$

Actionshiftreduce4reduce2shiftshiftreduce4reduce3shiftreduce1accept

Page 119: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

ConstructinganLRparsingtable

• Constructa(determinized)transitiondiagramfromLRitems

• Ifthereareconflicts– stop• Filltableentriesfromdiagram

119

Page 120: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRitem

120

N ® α•β

Alreadymatched TobematchedInput

Hypothesisaboutαβ beingapossiblehandle,sofarwe’vematchedα,expectingtoseeβ

Page 121: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

TypesofLR(0)items

121

N ® α•β Shift Item

N ® αβ• Reduce Item

Page 122: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)automatonexample

122

Z® •E$E® •TE® •E+TT® •iT® •(E)

T® (•E)E® •TE® •E+TT® •iT® •(E)

E® E+T•

T® (E)•Z® E$•

Z® E•$E® E•+T E® E+•T

T® •iT® •(E)

T® i•

T® (E•)E® E•+T

E® T•q0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

reducestateshiftstate

Page 123: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Computingitemsets

• Initialset– Zisinthestartsymbol– e-closure({Z® •α |Z® α isinthegrammar})

• NextsetfromasetSandthenextsymbolX– step(S,X)={N® αX•β |N® α•Xβ intheitemsetS}– nextSet(S,X)=e-closure(step(S,X))

123

Page 124: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Operationsfortransitiondiagramconstruction

• Initial={S’® •S$}

• ForanitemsetIClosure(I)=Closure(I)∪

{X® •µ isingrammar|N® α•Xβ inI}

• Goto(I,X)={N® αX•β |N® α•Xβ inI}

124

Page 125: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Initialexample

• Initial={S® •E$}

125

(1)S® E$(2)E® T(3)E® E+T(4)T® id(5)T® (E)

Grammar

Page 126: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Closureexample

• Initial={S® •E$}• Closure({S® •E$})={

S® •E$E® •TE® •E+TT® •idT® •(E)}

126

(1)S® E$(2)E® T(3)E® E+T(4)T® id(5)T® (E)

Grammar

Page 127: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Gotoexample

• Initial={S® •E$}• Closure({S® •E$})={

S® •E$E® •TE® •E+TT® •idT® •(E)}

• Goto({S® •E$,E® •E+T,T® •id},E)={S® E• $,E® E• +T}

127

(1)S® E$(2)E® T(3)E® E+T(4)T® id(5)T® (E)

Grammar

Page 128: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Constructingthetransitiondiagram

• Startwithstate0containingitemClosure({S® •E$})

• Repeatuntilnonewstatesarediscovered– ForeverystatepcontainingitemsetIp,andsymbolN,computestateqcontainingitemsetIq=Closure(goto(Ip,N))

128

Page 129: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)automatonexample

129

Z® •E$E® •TE® •E+TT® •iT® •(E)

T® (•E)E® •TE® •E+TT® •iT® •(E)

E® E+T•

T® (E)•Z® E$•

Z® E•$E® E•+T E® E+•T

T® •iT® •(E)

T® i•

T® (E•)E® E•+T

E® T•q0

q1

q2

q3

q4

q5

q6

q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

reducestateshiftstate

Page 130: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Automatonconstructionexample

130

(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )S®•E$

q0

Initialize

Page 131: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

131

(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )

S® •E$E® •TE® •E+TT® •iT® •(E)

q0

applyClosure

Automatonconstructionexample

Page 132: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Automatonconstructionexample

132

(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )

S® •E$E® •TE® •E+TT® •iT® •(E)

q0 E® T•

q6

TT® (•E)E® •TE® •E+TT® •iT® •(E)

(

T® i•

q5i

S® E•$E® E•+T

q1E

Page 133: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

133

(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )

S® •E$E® •TE® •E+TT® •iT® •(E)

T® (•E)E® •TE® •E+TT® •iT® •(E)

E® E+T•

T® (E)•S® E$•

Z® E•$E® E•+T E® E+•T

T® •iT® •(E)

T® i•

T® (E•)E® E•+T

E® T•q0

q1

q2

q3

q4

q5

q6q7

q8

q9

T

(

i

E

+

$

T

)

+

E

i

T

(i

(

terminaltransitioncorrespondstoshiftactioninparsetable

non-terminaltransitioncorrespondstogotoactioninparsetable

asinglereduceitemcorrespondstoreduceaction

Automatonconstructionexample

Page 134: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Arewedone?

• Canmakeatransitiondiagramforanygrammar

• CanmakeaGOTOtableforeverygrammar

• CannotmakeadeterministicACTIONtableforeverygrammar

134

Page 135: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)conflicts

135

Z® E$E® TE® E+TT® iT® (E)T® i[E]

Z® •E$E® •T

E® •E+TT® •iT® •(E)T® •i[E] T® i•

T® i•[E]

q0

q5

T

(

i

E Shift/reduceconflict

Page 136: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)conflicts

136

Z® E$E® TE® E+TT® iV® iT® (E)

Z® •E$E® •T

E® •E+TT® •iT® •(E)T® •i[E] T® i•

V® i•

q0

q5

T

(

i

E reduce/reduceconflict

Page 137: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)conflicts

• Anygrammarwithane-rulecannotbeLR(0)• Inherentshift/reduceconflict

– A® e• – reduceitem– P® α•Aβ – shiftitem– A® e• canalwaysbepredictedfromP® α•Aβ

137

Page 138: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Conflicts

• Canconstructadiagramforeverygrammarbutsomemayintroduceconflicts

• shift-reduceconflict:anitemsetcontainsatleastoneshiftitemandonereduceitem

• reduce-reduceconflict:anitemsetcontainstworeduceitems

138

Page 139: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRvariants

• LR(0)– whatwe’veseensofar• SLR(0)

– RemovesinfeasiblereduceactionsviaFOLLOWsetreasoning

• LR(1)– LR(0)withonelookaheadtokeninitems

• LALR(0)– LR(1)withmergingofstateswithsameLR(0)component 139

Page 140: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(0)GOTO/ACTIONStables

140

State i + ( ) $ E T action

q0 q5 q7 q1 q6 shift

q1 q3 q2 shift

q2 Z® E$

q3 q5 q7 q4 Shift

q4 E® E+T

q5 T® i

q6 E® T

q7 q5 q7 q8 q6 shift

q8 q3 q9 shift

q9 T® E

GOTOTableACTIONTable

ACTIONtabledeterminedonly bystate,ignoresinput

GOTOtableisindexedbystateandagrammarsymbolfromthestack

Page 141: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

SLRparsing

• Ahandleshouldnotbereducedtoanon-terminalNifthelookaheadisatokenthatcannotfollowN

• AreduceitemN® α• isapplicableonlywhenthelookaheadisinFOLLOW(N)– IfbisnotinFOLLOW(N)weprovedthereisnoderivation

Sè*βNb.– Thus,itissafetoremovethereduceitemfromtheconflicted

state

• DiffersfromLR(0)onlyontheACTIONtable– Nowarowintheparsingtablemaycontainbothshiftactionsand

reduceactionsandweneedtoconsultthecurrenttokentodecidewhichonetotake

141

Page 142: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

SLRactiontable

142

State i + ( ) [ ] $

0 shift shift

1 shift accept

2

3 shift shift

4 E® E+T E® E+T E® E+T

5 T® i T® i shift T® i

6 E® T E® T E® T

7 shift shift

8 shift shift

9 T® (E) T® (E) T® (E)

vs.

state action

q0 shift

q1 shift

q2

q3 shift

q4 E® E+T

q5 T® i

q6 E® T

q7 shift

q8 shift

q9 T® E

SLR– use1tokenlook-ahead LR(0)– nolook-ahead… as before…T ® i T ® i[E]

Lookaheadtokenfromtheinput

Page 143: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(1)grammars

• InSLR:areduceitemN® α• isapplicableonlywhenthelookahead isinFOLLOW(N)

• ButFOLLOW(N)mergeslookahead forallalternativesforN– Insensitivetothecontextofagivenproduction

• LR(1)keepslookahead witheachLRitem• Idea:amorerefinednotionoffollowscomputedperitem 143

Page 144: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(1)items• LR(1)itemisapair

– LR(0)item– Lookaheadtoken

• Meaning– Wematchedthepartleftofthedot,lookingtomatchtheparton

therightofthedot,followedbythelookaheadtoken

• Example– TheproductionL® idyieldsthefollowingLR(1)items

144

[L→● id,*][L→● id,=][L→● id,id][L→● id,$][L→id●,*][L→id●,=][L→id●,id][L→id●,$]

(0)S’→S(1)S→L=R(2)S→R(3)L→*R(4)L→id(5)R→L

[L→● id][L→id●]

LR(0)items

LR(1)items

Page 145: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LR(1)items• LR(1)itemisapair

– LR(0)item– Lookaheadtoken

• Meaning– Wematchedthepartleftofthedot,lookingtomatchtheparton

therightofthedot,followedbythelookaheadtoken

• Example– TheproductionL® idyieldsthefollowingLR(1)items

• Reduceonlyifthetheexpectedlookhead matchestheinput– [L→id●,=]willbeusedonlyifthenextinputtokenis=

145

Page 146: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LALR(1)

• LR(1)tableshavehugenumberofentries• Oftendon’tneedsuchrefinedobservation(andcost)

• Idea:findstateswiththesameLR(0)componentandmergetheirlookaheads componentaslongastherearenoconflicts

• LALR(1)notaspowerfulasLR(1)intheorybutworksquitewellinpractice– Mergingmaynotintroducenewshift-reduceconflicts,onlyreduce-reduce,whichisunlikelyinpractice

146

Page 147: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Summary

147

Page 148: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

LRisMorePowerfulthanLL

• AnyLL(k)languageisalsoinLR(k),i.e.,LL(k)⊂ LR(k).– LRismorepopularinautomatictools

• Butlessintuitive

• Also,thelookaheadiscounteddifferentlyinthetwocases– InanLL(k)derivationthealgorithmseestheleft-handsideofthe

rule+kinput tokensandthenmustselectthederivationrule– InLR(k),thealgorithm“sees”allright-handsideofthederivation

rule+kinputtokensandthenreduces• LR(0)seestheentireright-side,butnoinputtoken

148

Page 149: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;nonterminal Integer expr;precedence left PLUS, MINUS;precedence left DIV, MULT;Precedence left UMINUS;%%expr ::= expr:e1 PLUS expr:e2

{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1 %prec UMINUS{: RESULT = new Integer(0 - e1.intValue(); :}| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n{: RESULT = n; :}

149

Usingtoolstoparse+createAST

Page 150: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

GrammarHierarchy

150

Non-ambiguous CFGLR(1)

LALR(1)

SLR(1)

LL(1)

LR(0)

Page 151: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Earley Parsing

151Jay Earley, PhD

Page 152: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Earley Parsing

• InventedbyJayEarley [PhD.1968]

• Handlesarbitrarycontextfreegrammars– Canhandleambiguousgrammars

• ComplexityO(N3)whenN=|input|• Usesdynamicprogramming

– Compactlyencodesambiguity152

Page 153: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Dynamicprogramming

• BreakaproblemPintosubproblems P1…Pk– SolvePbycombiningsolutionsforP1…Pk– Memoize (store)solutionstosubproblemsinsteadofre-computation

• Bellman-Fordshortestpathalgorithm– Sol(x,y,i)=minimumof

• Sol(x,y,i-1)• Sol(t,y,i-1)+weight(x,t)foredges(x,t)

153

Page 154: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Earley Parsing

• Dynamicprogrammingimplementationofarecursivedescentparser– S[N+1] Sequenceofsetsof“Earley states”

• N =|INPUT|• Earley state(item)sis asententialform+auxinfo

– S[i] Allparsetreethatcanbeproduced(byaRDP)afterreadingthefirsti tokens• S[i+1]builtusingS[0]…S[i]

154

Page 155: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

EarleyParsing

• ParsearbitrarygrammarsinO(|input|3)– O(|input|2)forunambigous grammer– LinearformostLR(k)langaues

• Dynamicprogrammingimplementationofarecursivedescentparser– S[N+1]Sequenceofsetsof“Earley states”

• N=|INPUT|• Earley statesisasententialform+auxinfo

– S[i]Allparsetreethatcanbeproduced(byanRDP)afterreadingthefirsti tokens• S[i+1]builtusingS[0]…S[i]

155

Page 156: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

EarleyStates

• s=<constituent,back>– constituent(dottedrule)forAàαβ

Aà•αβpredicatedconstituentsAàα•βin-progressconstituentsAàαβ•completedconstituents

– backpreviousEarlystateinderivation

156

Page 157: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Earley States

• s=<constituent,back>– constituent (dottedrule)forAàαβ

Aà•αβpredicated constituentsAàα•β in-progressconstituentsAàαβ• completed constituents

– backpreviousEarlystateinderivation

157

Page 158: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

EarleyParser

Input=x[1…N]S[0]=<E’à •E,0>;S[1]=…S[N]={}fori =0...NdountilS[i]doesnotchangedoforeach s∈ S[i]ifs=<Aà…•a…,b>anda=x[i+1]then//scanS[i+1]=S[i+1]∪ {<Aà…a•…,b> }

ifs=<Aà …•X…,b>andXàαthen//predictS[i]=S[i]∪ {<Xà•α,i > }

ifs=<Aà …•,b>and<Xà…•A…,k>∈ S[b]then//completeS[i]=S[i]∪{<Xà…A•…,k>}

158

Page 159: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Example

159

PRACTICAL EARLEY PARSING 621

S0

S′ → •E , 0E→ •E + E , 0E→ •n , 0

n

S1

E→ n• , 0S′ → E• , 0E→ E • +E , 0

+

S2

E→ E + •E , 0E→ •E + E , 2E→ •n , 2

n

S3

E→ n• , 2E→ E + E• , 0E→ E • +E , 2S′ → E• , 0

FIGURE 1. Earley sets for the grammar E → E + E | n andthe input n + n. Items in bold are ones which correspond to theinput’s derivation.

Earley recommended using lookahead for the COMPLETER

step [2]; it was later shown that a better approach was to uselookahead for the PREDICTOR step [8]; later it was shownthat prediction lookahead was of questionable value in anEarley parser which uses finite automata [9] as ours does.

In terms of implementation, the Earley sets are built inincreasing order as the input is read. Also, each set istypically represented as a list of items, as suggested byEarley [1, 2]. This list representation of a set is particularlyconvenient, because the list of items acts as a ‘work queue’when building the set: items are examined in order, applyingSCANNER, PREDICTOR and COMPLETER as necessary;items added to the set are appended onto the end of the list.

3. THE PROBLEM OF ϵ

At any given point i in the parse, we have two partially-constructed sets. SCANNER may add items to Si+1and Si may have items added to it by PREDICTOR andCOMPLETER. It is this latter possibility, adding items toSi while representing sets as lists, which causes grief withϵ-rules.

When COMPLETER processes an item [A→ •, j ] whichcorresponds to the ϵ-rule A → ϵ, it must look throughSj for items with the dot before an A. Unfortunately,for ϵ-rule items, j is always equal to i—COMPLETER

is thus looking through the partially-constructed set Si .3

Since implementations process items in Si in order, if anitem [B → . . . • A . . . , k] is added to Si after COMPLETER

has processed [A → •, j ], COMPLETER will never add[B → . . . A • . . . , k] to Si . In turn, items resulting directlyand indirectly from [B → . . . A• . . . , k] will be omitted too.This effectively prunes potential derivation paths, which cancause correct input to be rejected. Figure 2 gives an exampleof this happening.

3j = i for ϵ-rule items because they can only be added to an Earleyset by PREDICTOR, which always bestows added items with the parentpointer i.

S′ → S

S → AAAA

A → aA → E

E → ϵ

S0

S′ → •S , 0S → •AAAA , 0A→ •a , 0A→ •E , 0E→ • , 0A→ E• , 0S → A • AAA , 0

a

S1

A→ a• , 0S → A • AAA , 0S → AA • AA , 0A→ •a , 1A→ •E , 1E→ • , 1A→ E• , 1S → AAA • A , 0

FIGURE 2. An unadulterated Earley parser, representing setsusing lists, rejects the valid input a. Missing items in S0 soundthe death knell for this parse.

Two methods of handling this problem have beenproposed. Grune and Jacobs aptly summarize one approach:

‘The easiest way to handle this mare’s nest isto stay calm and keep running the Predictor andCompleter in turn until neither has anything moreto add.’ [10, p. 159]

Aho and Ullman [11] specify this method in their presen-tation of Earley parsing and it is used by ACCENT [12], acompiler–compiler which generates Earley parsers.

The other approach was suggested by Earley [1, 2].He proposed having COMPLETER note that the dot neededto be moved over A, then looking for this whenever futureitems were added to Si . For efficiency’s sake, the collectionof non-terminals to watch for should be stored in a datastructure which allows fast access. We used this methodinitially for the Earley parser in the SPARK toolkit [13].

In our opinion, neither approach is very satisfactory.Repeatedly processing Si , or parts thereof, involves a lotof activity for little gain; Earley’s solution requires anextra, dynamically-updated data structure and the unnaturalmating of COMPLETER with the addition of items. Ideally,we want a solution which retains the elegance of Earley’salgorithm, only processes items in Si once and has no run-time overhead from updating a data structure.

4. AN ‘IDEAL’ SOLUTION

Our solution involves a simple modification to PREDICTOR,based on the idea of nullability. A non-terminal A is saidto be nullable if A ⇒∗ ϵ; terminal symbols, of course,can never be nullable. The nullability of non-terminals ina grammar may be easily precomputed using well-knowntechniques [14, 15]. Using this notion, our PREDICTOR canbe stated as follows (our modification is in bold):

If [A→ . . . • B . . . , j ] is in Si , add [B → •α, i]to Si for all rules B → α. If B is nullable,also add [A→ . . . B • . . . , j] to Si .

THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002

ifs=<Aà…•a…,b>anda=x[i+1]then//scanS[i+1]=S[i+1]∪ {<Aà…a•…,b> }

ifs=<Aà …•X…,b>andXàαthen//predictS[i]=S[i]∪ {<Xà•α,i > }

ifs=<Aà …•,b>and<Xà…•A…,k>∈ S[b]then//completeS[i]=S[i]∪{<Xà…A•…,k>}

Page 160: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

Earley Parsing

160Jay Earley, PhD

Page 161: 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of parsers • Parsers for arbitrary grammars – Earley’smethod, CYK method – Usually,

161