0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of...
Transcript of 0368-3133 Lecture 4 - TAUmaon/teaching/2016-2017/compilation/compilatio… · Broad kinds of...
Compilation0368-3133
Lecture4:SyntaxAnalysis:Parsing
NoamRinetzky
1
2
TheRealAnatomyofaCompiler
Executable code
exe
Sourcetext
txtLexicalAnalysis
Sem.Analysis
Process text input
characters SyntaxAnalysistokens AST
Intermediate code
generation
Annotated AST
Intermediate code
optimizationIR Code
generationIR
Target code optimization
Symbolic Instructions
SI Machine code generation
Write executable
output
MI
3
LexicalAnalysis
SyntaxAnalysis
Broadkindsofparsers
• Parsersforarbitrary grammars– Earley’s method,CYKmethod– Usually,notusedinpractice(thoughmightchange)
• Top-Downparsers– Constructparsetreeinatop-downmatter– Findtheleftmost derivation
• Bottom-Upparsers– Constructparsetreeinabottom-upmanner– Findtherightmost derivationinareverseorder
4
CFGterminology
Symbols:Terminals (tokens):;:=() idnumprintNon-terminals:SEL
Startnon-terminal:SConvention:thenon-terminalappearinginthefirstderivationrule
Grammarproductions(rules)N® μ
S® S ; SS® id:= EE® idE® numE® E + E
5
CFGterminology
• Derivation - asequenceofreplacementsofnon-terminalsusingthederivationrules
• Language - thesetofstringsofterminalsderivablefromthestartsymbol
• Sententialform- theresultofapartialderivation– Maycontainnon-terminals
6
Derivations
• ShowthatasentenceωisinagrammarG– Startwiththestartsymbol– Repeatedlyreplaceoneofthenon-terminalsbyaright-handsideofaproduction
– Stopwhenthesentencecontainsonlyterminals
• GivenasentenceαNβ andruleN®µαNβ =>αµβ
• ω isinL(G)ifS=>*ω7
Predictiveparsing
• Recursivedescent• LL(k)grammars
8
Predictiveparsing
• GivenagrammarGandawordwattempttoderivewusingG
• Idea– Applyproductiontoleftmostnonterminal– Pickproductionrulebasedonnextinputtoken
• Generalgrammar– Morethanoneoptionforchoosingthenextproductionbasedonatoken
• Restrictedgrammars(LL)– Knowexactlywhichsingleruletoapply– Mayrequiresomelookahead todecide
9
Booleanexpressionsexample
10
not(nottrueorfalse)
E® LIT|(EOPE)|not ELIT® true | falseOP® and |or |xor
E=>not E=>not(EOPE)=>not(not EOPE)=>not(notLITOPE)=>not(nottrue OPE)=>not(nottrueor E)=>not(nottrueorLIT)=>not(nottrueorfalse )
not E
E
( E OP E )
not LIT or LIT
true false
Booleanexpressionsexample
not(nottrueorfalse)productiontoapplyknownfromnexttoken
E® LIT|(EOPE)|not ELIT® true | falseOP® and |or |xor
11
E=>not E=>not(EOPE)=>not(not EOPE)=>not(notLITOPE)=>not(nottrue OPE)=>not(nottrueor E)=>not(nottrueorLIT)=>not(nottrueorfalse )
E
not E
( E OP E )
not LIT or LIT
falsetrue
Booleanexpressionsexample
not(nottrueorfalse)productiontoapplyknownfromnexttoken
E® LIT|(EOPE)|not ELIT® true | falseOP® and |or |xor
12
Implementationviarecursion
E → LIT| ( E OP E )| not E
LIT → true| false
OP → and| or| xor
E() {if (current Î {TRUE, FALSE}) LIT();else if (current == LPAREN) match(LPARENT);
E(); OP(); E();match(RPAREN);
else if (current == NOT) match(NOT); E();else error;
}
LIT() {if (current == TRUE) match(TRUE);else if (current == FALSE) match(FALSE);else error;
}
OP() {if (current == AND) match(AND);else if (current == OR) match(OR);else if (current == XOR) match(XOR);else error;
}13
FIRSTsets
• FIRST(X)={t |Xà*t β}∪{ℇ |Xà* ℇ}– FIRST(X)=allterminalsthatα canappearasfirstinsomederivationforX• +ℇ ifcanbederivedfromX
• Example:– FIRST(LIT)={true,false}– FIRST((EOPE))={‘(‘}– FIRST(notE)={not}
14
First(α)canbedefinedforanysequenceofsymbols
ComputingFIRSTsets• FIRST(t)={t}//“t”terminal
• ℇ∈ FIRST(X) if– Xà ℇ or– Xà A1 ..Ak andℇ∈ FIRST(Ai)i=1…k
• FIRST(α)⊆ FIRST(X)if– Xà A1 ..Ak α andℇ∈ FIRST(Ai)i=1…k
15
First(X)iscomputedfornon-terminals
Followsets
• Follow(X)={t |Sà*αXt β}– t– Terminalor$
16
FOLLOWsets:Constraints
• $∈ FOLLOW(S)
• FIRST(β)– {ℇ}⊆ FOLLOW(X)– ForeachAà αXβ
• FOLLOW(A)⊆ FOLLOW(X)– ForeachAà αXβandℇ ∈ FIRST(β)
17
Example:FOLLOWsets
• Eà TX Xà+E|ℇ• Tà (E)|int YYà *T|ℇ
18
Non.Term.
E T X Y
FOLLOW ),$ +,),$ $,) +,),$
• $∈ FOLLOW(S)• FIRST(β)– {ℇ}⊆ FOLLOW(X)
– ForeachAà αXβ
• FOLLOW(A)⊆ FOLLOW(X)– ForeachAà αXβandℇ∈ FIRST(β)
PredictionTable
• Aà α
• T[A,t]=αift∈FIRST(α)• T[A,t]=αifℇ ∈ FIRST(α)andt∈ FOLLOW(A)
– tcanalsobe$
• Tisnotwelldefinedè thegrammarisnotLL(1)
19
LL(k)grammars
• AgrammarisintheclassLL(K)whenitcanbederivedvia:– Top-downderivation– Scanningtheinputfromlefttoright(L)– Producingtheleftmostderivation(L)– Withlookahead ofktokens(k)
• AlanguageissaidtobeLL(k)whenithasanLL(k)grammar
20
LL(1)grammars
• AgrammarisintheclassLL(1)iff– ForeverytwoproductionsA® α andA® β wehave
• FIRST(α)∩FIRST(β)={}//includinge• Ife∈ FIRST(α)thenFIRST(β)∩FOLLOW(A)={}• Ife∈ FIRST(β)thenFIRST(α)∩FOLLOW(A)={}
21
22
Problem:NonLLGrammars
Backtoproblem1:commonprefix
• FIRST(term)={ID}• FIRST(indexed_elem)={ID}
• FIRST/FIRSTconflict
term® ID |indexed_elemindexed_elem® ID [expr ]
23
Solution:leftfactoring• RewritethegrammartobeinLL(1)
Intuition:justlikefactoringx*y+x*zintox*(y+z)
term® ID |indexed_elemindexed_elem® ID [expr ]
term® ID after_IDAfter_ID® [expr ]| e
24
S® ifEthenSelseS|ifEthenS|T
S® ifEthenSS’|T
S’® elseS|e
Leftfactoring– anotherexample
25
Problem:nullproduction
bool S(){returnA()&&match(token(‘a’))&&match(token(‘b’));
}
bool A(){returnmatch(token(‘a’))||true;}
S® Aa bA® a |e
§ Whathappensforinput“ab”?§ Whathappensifyoufliporderofalternativesandtry“aab”?
26
• FIRST(S)={a} FOLLOW(S)={$}• FIRST(A)={a,e } FOLLOW(A)={a}
• FIRST/FOLLOWconflict
S® Aa bA® a |e
27
Problem:nullproduction
Backtoproblem2:nullproduction
• FIRST(S)={a} FOLLOW(S)={}• FIRST(A)={a,e } FOLLOW(A)={a}
• FIRST/FOLLOWconflict
S® Aa bA® a |e
28
Solution:substitution
S® Aa bA® a|e
S® aa b|ab
Substitute A in S
S® aafter_Aafter_A® ab|b
Left factoring
29
Backtoproblem3:Leftrecursion
• Leftrecursioncannotbehandledwithaboundedlookahead
• Whatcanwedo?
E® E- term|term
30
Leftrecursionremoval
• L(G1)=β,βα,βαα,βααα,…• L(G2)=same
N® Nα |β N® βN’N’® αN’|e
G1 G2
E® E- term|term
E® termTE|termTE® - termTE|e
§ Forour3rd example:
p. 130
Canbedonealgorithmically.Problem:grammarbecomesmangledbeyondrecognition
31
LL(k)Parsers
• RecursiveDescent– Manualconstruction– Usesrecursion
• Wanted– Aparserthatcanbegeneratedautomatically– Doesnotuserecursion
32
• Pushdownautomatonuses– Predictionstack– Inputstream– Transitiontable
• nonterminals xtokens->productionalternative• EntryindexedbynonterminalNandtokentcontainsthealternativeofNthatmustbepredicatedwhencurrentinputstartswitht
LL(k)parsingviapushdownautomata
33
LL(k)parsingviapushdownautomata
• Twopossiblemoves– Prediction
• Whentopofstackisnonterminal N,popN,lookuptable[N,t].Iftable[N,t]isnotempty,pushtable[N,t]onpredictionstack,otherwise– syntaxerror
– Match• WhentopofpredictionstackisaterminalT,mustbeequaltonextinputtokent.If(t==T),popTandconsumet.If(t≠T)syntaxerror
• Parsingterminateswhenpredictionstackisempty– Ifinputisemptyatthatpoint,success.Otherwise,syntaxerror
34
( ) not true false and or xor $
E 2 3 1 1
LIT 4 5
OP 6 7 8
(1) E → LIT(2) E → ( E OP E ) (3) E → not E(4) LIT → true(5) LIT → false(6) OP → and(7) OP → or(8) OP → xor
Non
term
inal
s
Input tokens
Whichruleshouldbeused
Exampletransitiontable
35
Modelofnon-recursivepredictiveparser
PredictiveParsingprogram
Parsing Table
X
Y
Z
$
Stack
$b+a
Output
36
a b c
A A® aAb A® c
A ® aAb | caacbb$
Inputsuffix Stack content Move
aacbb$ A$ predict(A,a)=A® aAb
aacbb$ aAb$ match(a,a)
acbb$ Ab$ predict(A,a)=A® aAb
acbb$ aAbb$ match(a,a)
cbb$ Abb$ predict(A,c)=A® c
cbb$ cbb$ match(c,c)
bb$ bb$ match(b,b)
b$ b$ match(b,b)
$ $ match($,$)– success
Runningparserexample
37
Erorrs
38
HandlingSyntaxErrors
• Reportandlocatetheerror• Diagnosetheerror• Correcttheerror• Recoverfromtheerrorinordertodiscovermoreerrors– withoutreportingtoomany“strange”errors
39
ErrorDiagnosis
• Linenumber– maybefarfromtheactualerror
• Thecurrenttoken• Theexpectedtokens• Parserconfiguration
40
ErrorRecovery
• Becomeslessimportantininteractiveenvironments
• Exampleheuristics:– Searchforasemi-columnandignorethestatement– Tryto“replace” tokensforcommonerrors– Refrainfromreporting3subsequenterrors
• Globallyoptimalsolutions– Foreveryinputw,findavalidprogramw’ witha“minimal-distance” fromw
41
a b c
A A® aAb A® c
A ® aAb | cabcbb$
Inputsuffix Stack content Move
abcbb$ A$ predict(A,a)=A® aAb
abcbb$ aAb$ match(a,a)
bcbb$ Ab$ predict(A,b)=ERROR
Illegalinputexample
42
ErrorhandlinginLLparsers
• Nowwhat?– Predictb S anyway“missingtokenbinsertedinlineXXX”
S ® a c | b Sc$
a b c
S S® ac S® bS
Inputsuffix Stack content Move
c$ S$ predict(S,c)=ERROR
43
ErrorhandlinginLLparsers
• Result:infiniteloop
S ® a c | b Sc$
a b c
S S® ac S® bS
Inputsuffix Stack content Move
bc$ S$ predict(b,c)=S® bS
bc$ bS$ match(b,b)
c$ S$ Looks familiar?
44
Errorhandlingandrecovery
• x=a*(p+q*(-b*(r-s);
• Whereshouldwereporttheerror?
• Thevalidprefixproperty
45
TheValidPrefixProperty
• Foreveryprefixtokens– t1,t2,…,ti thattheparseridentifiesaslegal:
• thereexiststokensti+1,ti+2,…,tn suchthatt1,t2,…,tnisasyntacticallyvalidprogram
• Ifeverytokenisconsideredassinglecharacter:– Foreveryprefixworduthattheparseridentifiesaslegal
thereexistswsuchthatu.w isavalidprogram
46
Recoveryistricky
• Heuristicsfordroppingtokens,skippingtosemicolon,etc.
47
BuildingtheParseTree
48
Addingsemanticactions
• Canaddanactiontoperformoneachproductionrule
• Canbuildtheparsetree– EveryfunctionreturnsanobjectoftypeNode– EveryNodemaintainsalistofchildren– Functioncallscanaddnewchildren
49
Buildingtheparsetree
Node E() {result = new Node(); result.name = “E”;if (current Î {TRUE, FALSE}) // E ® LITresult.addChild(LIT());
else if (current == LPAREN) // E ® ( E OP E )result.addChild(match(LPAREN));result.addChild(E());result.addChild(OP()); result.addChild(E());result.addChild(match(RPAREN));
else if (current == NOT) // E ® not Eresult.addChild(match(NOT));result.addChild(E());
else error;return result;
} 50
static int Parse_Expression(Expression **expr_p) {
Expression *expr = *expr_p = new_expression() ;
/* try to parse a digit */
if (Token.class == DIGIT) {
expr->type=‘D’; expr->value=Token.repr –’0’;
get_next_token();
return 1; }
/* try parse parenthesized expression */
if (Token.class == ‘(‘) {
expr->type=‘P’; get_next_token();
if (!Parse_Expression(&expr->left)) Error(“missing expression”);
if (!Parse_Operator(&expr->oper)) Error(“missing operator”);
if (Token.class != ‘)’) Error(“missing )”);
get_next_token();
return 1; }
return 0;
} 51
ParserforFullyParenthesizedExpers
BottomUpparsing
52
Bottom-UpParsing
• Goal:Buildaparsetree– Reporterrorifinputisnotalegalprogram
• How:– Readinputleft-to-right– Constructasubtree forthefirstleft-mosttreenodewhosechildern havebeenconstructed
53
+ * 321
54
Bottom-upparsingE® E*TE® TT® T+FT® FF® idF® numF® (E)
E
E
TT
F
T
F F
(Nonstandardprecedence)
Bottom-upparsing:LR(k)Grammars
• AgrammarisintheclassLR(K)whenitcanbederivedvia:– Bottom-up derivation– Scanningtheinputfromlefttoright(L)– Producingtherightmostderivation(R)
• Inreverseoreder– Withlookahead ofktokens(k)
• AlanguageissaidtobeLR(k)ifithasanLR(k)grammar
• ThesimplestcaseisLR(0),whichwewilldiscuss
55
Terminology:Reductions&Handles
• Theoppositeofderivationiscalledreduction– LetAè α beaproductionrule– Derivation: βAµè βαµ– Reduction:βαµè βAµ
• Ahandle isthereducedsubstring– α isthehandlesforβαµ
56
UseShift&ReduceIneachstage,weshift asymbolfromtheinputtothestack,orreduce accordingtooneoftherules.
57
StackParser
Input
Output
ActionTable
Goto table
58
) x*)7+23((
RPIdOPRPNumOPNumLPLPtokenstream
Op(*)
Id(b)
Num(23) Num(7)
Op(+)
Howdoestheparserknowwhattodo?
Howdoestheparserknowwhattodo?
• Astate willkeeptheinfogatheredonhandle(s)– Astateinthe“control”ofthePDA– Also(partof)thestackalphabet
• Atable willtellit“whattodo”basedoncurrentstateandnexttoken– ThetransitionfunctionofthePDA
• Astackwillrecordsthe“nestinglevel”– Stackcontainsasequenceofprefixesofhandles
59
SetofLR(0)items
ImportantBottom-UpLR-Parsers
• LR(0) – simplest,explainsbasicideas• SLR(1)– simple,exaplins lookahead• LR(1) – complictated,verypowerful,expensive
• LALR(1)– complicated,powerfulenough,usedbyautomatictools
60
LR(0)vsSLR(1)vsLR(1)vsLALR(1)• Alluseshift/reduce
• Maindifference:howtoidentifyahandle– Technically:Usingdifferentsetsofstates
• Moreexpsensiveèmorestatesèmorespecificchoiceofwhichreductionruletouse
• Buttheusage ofthestatesisthesameinallparsers
• Reductionisthesameinalltechniques– Oncethehandleisdetermined
61
LR(0)Parsing
62
LRitem
63
N ® α•β
Alreadymatched TobematchedInput
Hypothesisaboutαβ beingapossiblehandle:sofarwe’vematchedα,expectingtoseeβ
Example:LR(0)Items• Allitemscanbeobtainedbyplacingadotateverypositionforeveryproduction:
64
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)
1: S ® •E$2: S ® E • $3: S ® E $ •4: E ® • T5: E ® T •6: E ® • E + T7: E ® E • + T8: E ® E + • T9: E ® E + T •10: T ® • i11: T ® i •12: T ® • (E)13: T ® (• E)14: T ® (E •)15: T ® (E) •
Grammar LR(0)items
Example:LR(0)Items• Allitemscanbeobtainedbyplacingadotateverypositionforeveryproduction:
• Before • =reduced– matchedprefix
• After • =maybereduced– Maybematchedbysuffix
65
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)
1: S ® •E$2: S ® E • $3: S ® E $ •4: E ® • T5: E ® T •6: E ® • E + T7: E ® E • + T8: E ® E + • T9: E ® E + T •10: T ® • id11: T ® id •12: T ® • (E)13: T ® (• E)14: T ® (E •)15: T ® (E) •
Grammar LR(0)items
LR(0)items
66
N ® α•β ShiftItem
N ® αβ• ReduceItem
Statesaresetsofitems
LR(0)Items
• Aderivationrulewithalocationmarker(●)iscalledLR(0)item
E→E*B|E+B|BB→0|1
67
PDAStates
• APDAstateisasetofLR(0)items.E.g.,q13 ={E→ E● *B,E→ E● +B,B→ 1●}
• Intuitively,ifwematched1,Thenthestatewillrememberthe3possiblealternativesrulesandwhereweareineachofthem
(1)E→ E● *B (2)E→ E● +B(3)B→ 1●
68
E→E*B|E+B|BB→0|1
LR(0)Shift/ReduceItems
69
N t α•β ShiftItem
N t αβ• ReduceItem
Intuition• Readinputtokensleft-to-rightandremembertheminthestack
• Whenarighthandsideofaruleisfound,removeitfromthestackandreplaceitwiththenon-terminalitderives
• Rememberingtokeniscalledshift– Eachshiftmovestoastatethatrememberswhatwe’veseensofar
• ReplacingRHSwithLHSiscalledreduce– Eachreducegoestoastatethatdeterminesthecontextofthederivation
70
ModelofanLRparser
71
LR Parser0
T
2
+
7
id
5
Stack
$id+id+id
Outputstate
symbol
GotoTable
ActionTable
Input
TerminalsandNon-terminals
LRparserstack
• Sequencemadeofstate,symbolpairs• Forinstanceapossiblestackforthegrammar
S® E$E® TE® E+TT® idT® (E)
couldbe:0 T2 +7 id572Stackgrowsthisway
FormofLRparsingtable
73
state terminals non-terminals
Shift/Reduceactions Goto part01...
sn
rk
shiftstaten reducebyrulek
gm
goto statem
acc
accept
error
LRparsertableexample
74
gotoactionSTATE
TE$)(+id
g6g1s7s50
accs31
2
g4s7s53
r3r3r3r3r34
r4r4r4r4r45
r2r2r2r2r26
g6g8s7s57
s9s38
r5r5r5r5r59
Shiftmove
75
LRParsingprogram
q...
Stack
$…a…
gotoaction
Input
• action[q,a]=sn
Resultofshift
76
LRParsingprogram
naq...
Stack
$…a…
gotoaction
Input
• action[q,a]=sn
Reducemove
77
LRParsingprogram
qn
σn
…q1σ1q…
Stack$…a…
gotoaction
Input
2*n
• action[qn,a]=rk• Production:(k)At σ1… σn• Topofstacklookslike q1σ1…qnσnforsomeq1… qn• goto[q,A]=qm
Resultofreducemove
78
LRParsingprogram
Stack$…a…
gotoaction
Input
• action[qn,a]=rk• Production:(k)At σ1… σn• Topofstacklookslike q1σ1…qnσnforsomeq1… qn• goto[q,A]=qm
qmAq…
Acceptmove
79
LRParsingprogram
q...
Stack
$a…
gotoaction
Input
Ifaction[q,a]=acceptparsing completed
Errormove
80
LRParsingprogram
q...
Stack
$…a…
gotoaction
Input
Ifaction[q,a]=error(usuallyempty)parsingdiscoveredasyntacticerror
Example
81
Z t E $E t T | E + T
T t i | ( E )
Example:parsingwithLRitems
82
Z t E $E t T | E + TT t i | ( E )
E t •T E t •E + TT t •iT t •( E )
Z t •E $
i + i $
WhydoweneedtheseadditionalLRitems?Wheredotheycomefrom?Whatdotheymean?
e-closure
• GivenasetSofLR(0)items
• IfPt α•Nβ isinstateS• thenforeachruleNt✏ inthegrammarstateSmustalsocontainNt •✏
83
e-closure({Z t •E $}) = E t •T, E t •E + T,T t •i , T t •( E ) }
{ Z t •E $,
Z t E $E t T | E + TT t i | ( E )
84
i + i $
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Zt E$Et T|E+TTt i|(E)
Itemsdenotepossiblefuturehandles
Rememberpositionfromwhichwe’retryingtoreduce
Example:parsingwithLRitems
85
Tt i• Reduceitem!
i + i $
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Zt E$Et T|E+TTt i|(E)
Matchitemswithcurrenttoken
Example:parsingwithLRitems
86
i
Et T• Reduceitem!
T + i $Zt E$Et T|E+TTt i|(E)
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Example:parsingwithLRitems
87
T
Et T• Reduceitem!
i
E + i $Zt E$Et T|E+TTt i|(E)
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Example:parsingwithLRitems
88
T
i
E + i $Zt E$Et T|E+TTt i|(E)
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Et E•+T
Zt E•$
Example:parsingwithLRitems
89
T
i
E + i $Zt E$Et T|E+TTt i|(E)
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Et E•+T
Zt E•$ Et E+•T
Tt •iTt •(E)
Example:parsingwithLRitems
90
Et E•+T
Zt E•$ Et E+•T
Tt •iTt •(E)
E + T $
i
Zt E$Et T|E+TTt i|(E)
Et •TEt •E+TTt •iTt •(E)
Zt •E$
T
i
Example:parsingwithLRitems
91
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Zt E$Et T|E+TTt i|(E)
E + T
T
i
Et E•+T
Zt E•$ Et E+•T
Tt •iTt •(E)
i
Et E+T•
$
Reduceitem!
Example:parsingwithLRitems
92
Et •TEt •E+TTt •iTt •(E)
Zt •E$
E $
E
T
i
+ T
Zt E•$
Et E•+T
i
Zt E$Et T|E+TTt i|(E)
Example:parsingwithLRitems
93
Et •TEt •E+TTt •iTt •(E)
Zt •E$
E $
E
T
i
+ T
Zt E•$
Et E•+T
Zt E$•
i
Zt E$Et T|E+TTt i|(E)
Example:parsingwithLRitems
Reduceitem!
94
Et •TEt •E+TTt •iTt •(E)
Zt •E$
Z
E
T
i
+ T
Zt E•$
Et E•+T
Zt E$•
Reduceitem!
E $
i
Zt E$Et T|E+TTt i|(E)
Example:parsingwithLRitems
GOTO/ACTIONtables
95
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 ZtE$
q3 q5 q7 q4 Shift
q4 EtE+T
q5 Tti
q6 EtT
q7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 TtE
GOTOTableACTIONTable
empty–errormove
LR(0)parsertables
• Twotypesofrows:– Shift row– tellswhichstatetoGOTOforcurrenttoken
– Reduce row– tellswhichruletoreduce(independentofcurrenttoken)• GOTOentriesareblank
96
LRparserdatastructures• Input– remainderoftexttobeprocessed• Stack– sequenceofpairsN,qi
– N– symbol(terminalornon-terminal)– qi– stateatwhichdecisionsaremade
• Initialstackcontainsq0
97
+ i $Inputsuffix
q0stack i q5Stackgrowsthisway
LR(0)pushdownautomaton• Twomoves:shiftandreduce• Shift move
– Removefirsttokenfrominput– Pushitonthestack– ComputenextstatebasedonGOTOtable– Pushnewstateonthestack– Ifnewstateiserror– reporterror
98
i + i $input
q0stack
+ i $input
q0stack
shift
i q5
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
Stackgrowsthisway
LR(0)pushdownautomaton• Reduce move
– UsingaruleNtα– Symbolsinα andtheirfollowingstatesareremovedfromstack– NewstatecomputedbasedonGOTOtable(usingtopofstack,
beforepushingN)– Nispushedonthestack– NewstatepushedontopofN
99
+ i $input
q0stack i q5
ReduceTt i + i $input
q0stack q6
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
Stackgrowsthisway
GOTO/ACTIONtable
100
State i + ( ) $ E T
q0 s5 s7 s1 s6
q1 s3 s2
q2 r1 r1 r1 r1 r1 r1 r1
q3 s5 s7 s4
q4 r3 r3 r3 r3 r3 r3 r3
q5 r4 r4 r4 r4 r4 r4 r4
q6 r2 r2 r2 r2 r2 r2 r2
q7 s5 s7 s8 s6
q8 s3 s9
q9 r5 r5 r5 r5 r5 r5 r5
(1)Z t E $(2)E t T (3)E t E + T(4)T t i (5)T t( E )
Warning:numbersmeandifferentthings!rn =reduceusingrulenumbernsm =shifttostate m
Parsingid+id$
101
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)
Stack Input Action0 id+id$ s5
Initializewithstate0
Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
102
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
Stack Input Action0 id+id$ s5
Initializewithstate0
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
103
Stack Input Action0 id+id$ s50id5 + id$ r4
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
104
Stack Input Action0 id+id$ s50id5 + id$ r4
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
popid5
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
105
Stack Input Action0 id+id$ s50id5 + id$ r4
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
pushT6
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
106
Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r2
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
107
Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s3
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
108
Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s5
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
109
Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s50 E1+3id5 $ r4
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
110
Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s50 E1+3id5 $ r40E1 +3T4 $ r3
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
Parsingid+id$
111
Stack Input Action0 id+id$ s50id5 + id$ r40 T6 + id$ r20 E1 + id$ s30 E1+3 id$ s50 E1+3id5 $ r40E1 +3T4 $ r30E1 $ s2
gotoactionSTE$)(+idg6g1s7s50
accs312
g4s7s53r3r3r3r3r34r4r4r4r4r45r2r2r2r2r26
g6g8s7s57s9s38
r5r5r5r5r59
(1)S® E$(2)E® T(3)E® E+ T(4)T® id(5)T® ( E)Stackgrowsthisway
rn =reduceusingrulenumbernsm =shifttostatem
LR(0)automatonexample
112
Z® •E$E® •TE® •E+TT® •iT® •(E)
T® (•E)E® •TE® •E+TT® •iT® •(E)
E® E+T•
T® (E)•Z® E$•
Z® E•$E® E•+T E® E+•T
T® •iT® •(E)
T® i•
T® (E•)E® E•+T
E® T•q0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
reducestateshiftstate
readinput“(“
ManagedtoreduceE
StatesandLR(0)Items
• Thestatewill“remember”thepotentialderivationrulesgiventhepartthatwasalreadyidentified
• Forexample,ifwehavealreadyidentifiedEthenthestatewillrememberthetwoalternatives:
(1)E→ E*B, (2) E→ E+B• Actually,wewillalsorememberwhereweareineachof
them:(1)E→ E● *B, (2) E→ E● +B• AderivationrulewithalocationmarkeriscalledLR(0)
item.• ThestateisactuallyasetofLR(0)items.E.g.,
q13 ={E→ E● *B,E→ E● +B}
E→E*B|E+B|BB→0|1
113
GOTO/ACTIONtables
114
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 Z® E$
q3 q5 q7 q4 Shift
q4 E® E+T
q5 T® i
q6 E® T
q7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 T® E
GOTOTable ACTIONTable
empty=errormove
LR(0)parsertables
• Actionsdeterminedbytopmoststate• Twotypesofrows:
– Shiftrow– tellswhichstatetoGOTOforcurrenttoken
– Reducerow– tellswhichruletoreduce(independentofcurrenttoken)• GOTOentriesareblank
115
GOTO/ACTIONtables
116
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 Z® E$
q3 q5 q7 q4 Shift
q4 E® E+T
q5 T® i
q6 E® T
q7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 T® E
GOTOTable ACTIONTable
empty=errormove
GOTO/ACTIONtableaction
shift
shift
Z® E$
Shift
E® E+T
T® i
E® T
shift
shift
T® E
117
State i + ( ) $ E T
q0 s5 s7 s1 s6
q1 s3 s2
q2 r1 r1 r1 r1 r1 r1 r1
q3 s5 s7 s4
q4 r3 r3 r3 r3 r3 r3 r3
q5 r4 r4 r4 r4 r4 r4 r4
q6 r2 r2 r2 r2 r2 r2 r2
q7 s5 s7 s8 s6
q8 s3 s9
q9 r5 r5 r5 r5 r5 r5 r5
(1)Z ® E $(2)E ® T (3)E ® E + T(4)T ® i (5)T ® ( E )
Warning:numbersmeandifferentthings!rn =reduceusingrulenumbernsm =shifttostate m
LR(0)Parsingofi +i
Stackq0q0i q5q0T q6q0E q1q0E q1+ q3q0E q1+ q3i q5q0E q1+ q3T q4q0E q1q0E q1$ q2q0 $
118
State i + ( ) $ E T
q0 s5 s7 s1 s6
q1 s3 s2
q2 r1 r1 r1 r1 r1 r1 r1
q3 s5 s7 s4
q4 r3 r3 r3 r3 r3 r3 r3
q5 r4 r4 r4 r4 r4 r4 r4
q6 r2 r2 r2 r2 r2 r2 r2
q7 s5 s7 s8 s6
q8 s3 s9
q9 r5 r5 r5 r5 r5 r5 r5
(1)Z ® E $(2)E ® T (3)E ® E + T(4)T ® i (5)T ® ( E )
Warning:numbersmeandifferentthings!rn =reduceusingrulenumbernsm =shifttostate m
inputi +i $+i $+i $+i $i $$$$$
Actionshiftreduce4reduce2shiftshiftreduce4reduce3shiftreduce1accept
ConstructinganLRparsingtable
• Constructa(determinized)transitiondiagramfromLRitems
• Ifthereareconflicts– stop• Filltableentriesfromdiagram
119
LRitem
120
N ® α•β
Alreadymatched TobematchedInput
Hypothesisaboutαβ beingapossiblehandle,sofarwe’vematchedα,expectingtoseeβ
TypesofLR(0)items
121
N ® α•β Shift Item
N ® αβ• Reduce Item
LR(0)automatonexample
122
Z® •E$E® •TE® •E+TT® •iT® •(E)
T® (•E)E® •TE® •E+TT® •iT® •(E)
E® E+T•
T® (E)•Z® E$•
Z® E•$E® E•+T E® E+•T
T® •iT® •(E)
T® i•
T® (E•)E® E•+T
E® T•q0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
reducestateshiftstate
Computingitemsets
• Initialset– Zisinthestartsymbol– e-closure({Z® •α |Z® α isinthegrammar})
• NextsetfromasetSandthenextsymbolX– step(S,X)={N® αX•β |N® α•Xβ intheitemsetS}– nextSet(S,X)=e-closure(step(S,X))
123
Operationsfortransitiondiagramconstruction
• Initial={S’® •S$}
• ForanitemsetIClosure(I)=Closure(I)∪
{X® •µ isingrammar|N® α•Xβ inI}
• Goto(I,X)={N® αX•β |N® α•Xβ inI}
124
Initialexample
• Initial={S® •E$}
125
(1)S® E$(2)E® T(3)E® E+T(4)T® id(5)T® (E)
Grammar
Closureexample
• Initial={S® •E$}• Closure({S® •E$})={
S® •E$E® •TE® •E+TT® •idT® •(E)}
126
(1)S® E$(2)E® T(3)E® E+T(4)T® id(5)T® (E)
Grammar
Gotoexample
• Initial={S® •E$}• Closure({S® •E$})={
S® •E$E® •TE® •E+TT® •idT® •(E)}
• Goto({S® •E$,E® •E+T,T® •id},E)={S® E• $,E® E• +T}
127
(1)S® E$(2)E® T(3)E® E+T(4)T® id(5)T® (E)
Grammar
Constructingthetransitiondiagram
• Startwithstate0containingitemClosure({S® •E$})
• Repeatuntilnonewstatesarediscovered– ForeverystatepcontainingitemsetIp,andsymbolN,computestateqcontainingitemsetIq=Closure(goto(Ip,N))
128
LR(0)automatonexample
129
Z® •E$E® •TE® •E+TT® •iT® •(E)
T® (•E)E® •TE® •E+TT® •iT® •(E)
E® E+T•
T® (E)•Z® E$•
Z® E•$E® E•+T E® E+•T
T® •iT® •(E)
T® i•
T® (E•)E® E•+T
E® T•q0
q1
q2
q3
q4
q5
q6
q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
reducestateshiftstate
Automatonconstructionexample
130
(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )S®•E$
q0
Initialize
131
(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )
S® •E$E® •TE® •E+TT® •iT® •(E)
q0
applyClosure
Automatonconstructionexample
Automatonconstructionexample
132
(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )
S® •E$E® •TE® •E+TT® •iT® •(E)
q0 E® T•
q6
TT® (•E)E® •TE® •E+TT® •iT® •(E)
(
T® i•
q5i
S® E•$E® E•+T
q1E
133
(1) S ® E $(2) E ® T(3) E ® E + T(4) T ® id (5) T ® ( E )
S® •E$E® •TE® •E+TT® •iT® •(E)
T® (•E)E® •TE® •E+TT® •iT® •(E)
E® E+T•
T® (E)•S® E$•
Z® E•$E® E•+T E® E+•T
T® •iT® •(E)
T® i•
T® (E•)E® E•+T
E® T•q0
q1
q2
q3
q4
q5
q6q7
q8
q9
T
(
i
E
+
$
T
)
+
E
i
T
(i
(
terminaltransitioncorrespondstoshiftactioninparsetable
non-terminaltransitioncorrespondstogotoactioninparsetable
asinglereduceitemcorrespondstoreduceaction
Automatonconstructionexample
Arewedone?
• Canmakeatransitiondiagramforanygrammar
• CanmakeaGOTOtableforeverygrammar
• CannotmakeadeterministicACTIONtableforeverygrammar
134
LR(0)conflicts
135
Z® E$E® TE® E+TT® iT® (E)T® i[E]
Z® •E$E® •T
E® •E+TT® •iT® •(E)T® •i[E] T® i•
T® i•[E]
q0
q5
T
(
i
E Shift/reduceconflict
…
…
…
LR(0)conflicts
136
Z® E$E® TE® E+TT® iV® iT® (E)
Z® •E$E® •T
E® •E+TT® •iT® •(E)T® •i[E] T® i•
V® i•
q0
q5
T
(
i
E reduce/reduceconflict
…
…
…
LR(0)conflicts
• Anygrammarwithane-rulecannotbeLR(0)• Inherentshift/reduceconflict
– A® e• – reduceitem– P® α•Aβ – shiftitem– A® e• canalwaysbepredictedfromP® α•Aβ
137
Conflicts
• Canconstructadiagramforeverygrammarbutsomemayintroduceconflicts
• shift-reduceconflict:anitemsetcontainsatleastoneshiftitemandonereduceitem
• reduce-reduceconflict:anitemsetcontainstworeduceitems
138
LRvariants
• LR(0)– whatwe’veseensofar• SLR(0)
– RemovesinfeasiblereduceactionsviaFOLLOWsetreasoning
• LR(1)– LR(0)withonelookaheadtokeninitems
• LALR(0)– LR(1)withmergingofstateswithsameLR(0)component 139
LR(0)GOTO/ACTIONStables
140
State i + ( ) $ E T action
q0 q5 q7 q1 q6 shift
q1 q3 q2 shift
q2 Z® E$
q3 q5 q7 q4 Shift
q4 E® E+T
q5 T® i
q6 E® T
q7 q5 q7 q8 q6 shift
q8 q3 q9 shift
q9 T® E
GOTOTableACTIONTable
ACTIONtabledeterminedonly bystate,ignoresinput
GOTOtableisindexedbystateandagrammarsymbolfromthestack
SLRparsing
• Ahandleshouldnotbereducedtoanon-terminalNifthelookaheadisatokenthatcannotfollowN
• AreduceitemN® α• isapplicableonlywhenthelookaheadisinFOLLOW(N)– IfbisnotinFOLLOW(N)weprovedthereisnoderivation
Sè*βNb.– Thus,itissafetoremovethereduceitemfromtheconflicted
state
• DiffersfromLR(0)onlyontheACTIONtable– Nowarowintheparsingtablemaycontainbothshiftactionsand
reduceactionsandweneedtoconsultthecurrenttokentodecidewhichonetotake
141
SLRactiontable
142
State i + ( ) [ ] $
0 shift shift
1 shift accept
2
3 shift shift
4 E® E+T E® E+T E® E+T
5 T® i T® i shift T® i
6 E® T E® T E® T
7 shift shift
8 shift shift
9 T® (E) T® (E) T® (E)
vs.
state action
q0 shift
q1 shift
q2
q3 shift
q4 E® E+T
q5 T® i
q6 E® T
q7 shift
q8 shift
q9 T® E
SLR– use1tokenlook-ahead LR(0)– nolook-ahead… as before…T ® i T ® i[E]
Lookaheadtokenfromtheinput
LR(1)grammars
• InSLR:areduceitemN® α• isapplicableonlywhenthelookahead isinFOLLOW(N)
• ButFOLLOW(N)mergeslookahead forallalternativesforN– Insensitivetothecontextofagivenproduction
• LR(1)keepslookahead witheachLRitem• Idea:amorerefinednotionoffollowscomputedperitem 143
LR(1)items• LR(1)itemisapair
– LR(0)item– Lookaheadtoken
• Meaning– Wematchedthepartleftofthedot,lookingtomatchtheparton
therightofthedot,followedbythelookaheadtoken
• Example– TheproductionL® idyieldsthefollowingLR(1)items
144
[L→● id,*][L→● id,=][L→● id,id][L→● id,$][L→id●,*][L→id●,=][L→id●,id][L→id●,$]
(0)S’→S(1)S→L=R(2)S→R(3)L→*R(4)L→id(5)R→L
[L→● id][L→id●]
LR(0)items
LR(1)items
LR(1)items• LR(1)itemisapair
– LR(0)item– Lookaheadtoken
• Meaning– Wematchedthepartleftofthedot,lookingtomatchtheparton
therightofthedot,followedbythelookaheadtoken
• Example– TheproductionL® idyieldsthefollowingLR(1)items
• Reduceonlyifthetheexpectedlookhead matchestheinput– [L→id●,=]willbeusedonlyifthenextinputtokenis=
145
LALR(1)
• LR(1)tableshavehugenumberofentries• Oftendon’tneedsuchrefinedobservation(andcost)
• Idea:findstateswiththesameLR(0)componentandmergetheirlookaheads componentaslongastherearenoconflicts
• LALR(1)notaspowerfulasLR(1)intheorybutworksquitewellinpractice– Mergingmaynotintroducenewshift-reduceconflicts,onlyreduce-reduce,whichisunlikelyinpractice
146
Summary
147
LRisMorePowerfulthanLL
• AnyLL(k)languageisalsoinLR(k),i.e.,LL(k)⊂ LR(k).– LRismorepopularinautomatictools
• Butlessintuitive
• Also,thelookaheadiscounteddifferentlyinthetwocases– InanLL(k)derivationthealgorithmseestheleft-handsideofthe
rule+kinput tokensandthenmustselectthederivationrule– InLR(k),thealgorithm“sees”allright-handsideofthederivation
rule+kinputtokensandthenreduces• LR(0)seestheentireright-side,butnoinputtoken
148
terminal Integer NUMBER;terminal PLUS,MINUS,MULT,DIV;terminal LPAREN, RPAREN;terminal UMINUS;nonterminal Integer expr;precedence left PLUS, MINUS;precedence left DIV, MULT;Precedence left UMINUS;%%expr ::= expr:e1 PLUS expr:e2
{: RESULT = new Integer(e1.intValue() + e2.intValue()); :}| expr:e1 MINUS expr:e2{: RESULT = new Integer(e1.intValue() - e2.intValue()); :}| expr:e1 MULT expr:e2{: RESULT = new Integer(e1.intValue() * e2.intValue()); :}| expr:e1 DIV expr:e2{: RESULT = new Integer(e1.intValue() / e2.intValue()); :}| MINUS expr:e1 %prec UMINUS{: RESULT = new Integer(0 - e1.intValue(); :}| LPAREN expr:e1 RPAREN{: RESULT = e1; :}| NUMBER:n{: RESULT = n; :}
149
Usingtoolstoparse+createAST
GrammarHierarchy
150
Non-ambiguous CFGLR(1)
LALR(1)
SLR(1)
LL(1)
LR(0)
Earley Parsing
151Jay Earley, PhD
Earley Parsing
• InventedbyJayEarley [PhD.1968]
• Handlesarbitrarycontextfreegrammars– Canhandleambiguousgrammars
• ComplexityO(N3)whenN=|input|• Usesdynamicprogramming
– Compactlyencodesambiguity152
Dynamicprogramming
• BreakaproblemPintosubproblems P1…Pk– SolvePbycombiningsolutionsforP1…Pk– Memoize (store)solutionstosubproblemsinsteadofre-computation
• Bellman-Fordshortestpathalgorithm– Sol(x,y,i)=minimumof
• Sol(x,y,i-1)• Sol(t,y,i-1)+weight(x,t)foredges(x,t)
153
Earley Parsing
• Dynamicprogrammingimplementationofarecursivedescentparser– S[N+1] Sequenceofsetsof“Earley states”
• N =|INPUT|• Earley state(item)sis asententialform+auxinfo
– S[i] Allparsetreethatcanbeproduced(byaRDP)afterreadingthefirsti tokens• S[i+1]builtusingS[0]…S[i]
154
EarleyParsing
• ParsearbitrarygrammarsinO(|input|3)– O(|input|2)forunambigous grammer– LinearformostLR(k)langaues
• Dynamicprogrammingimplementationofarecursivedescentparser– S[N+1]Sequenceofsetsof“Earley states”
• N=|INPUT|• Earley statesisasententialform+auxinfo
– S[i]Allparsetreethatcanbeproduced(byanRDP)afterreadingthefirsti tokens• S[i+1]builtusingS[0]…S[i]
155
EarleyStates
• s=<constituent,back>– constituent(dottedrule)forAàαβ
Aà•αβpredicatedconstituentsAàα•βin-progressconstituentsAàαβ•completedconstituents
– backpreviousEarlystateinderivation
156
Earley States
• s=<constituent,back>– constituent (dottedrule)forAàαβ
Aà•αβpredicated constituentsAàα•β in-progressconstituentsAàαβ• completed constituents
– backpreviousEarlystateinderivation
157
EarleyParser
Input=x[1…N]S[0]=<E’à •E,0>;S[1]=…S[N]={}fori =0...NdountilS[i]doesnotchangedoforeach s∈ S[i]ifs=<Aà…•a…,b>anda=x[i+1]then//scanS[i+1]=S[i+1]∪ {<Aà…a•…,b> }
ifs=<Aà …•X…,b>andXàαthen//predictS[i]=S[i]∪ {<Xà•α,i > }
ifs=<Aà …•,b>and<Xà…•A…,k>∈ S[b]then//completeS[i]=S[i]∪{<Xà…A•…,k>}
158
Example
159
PRACTICAL EARLEY PARSING 621
S0
S′ → •E , 0E→ •E + E , 0E→ •n , 0
n
S1
E→ n• , 0S′ → E• , 0E→ E • +E , 0
+
S2
E→ E + •E , 0E→ •E + E , 2E→ •n , 2
n
S3
E→ n• , 2E→ E + E• , 0E→ E • +E , 2S′ → E• , 0
FIGURE 1. Earley sets for the grammar E → E + E | n andthe input n + n. Items in bold are ones which correspond to theinput’s derivation.
Earley recommended using lookahead for the COMPLETER
step [2]; it was later shown that a better approach was to uselookahead for the PREDICTOR step [8]; later it was shownthat prediction lookahead was of questionable value in anEarley parser which uses finite automata [9] as ours does.
In terms of implementation, the Earley sets are built inincreasing order as the input is read. Also, each set istypically represented as a list of items, as suggested byEarley [1, 2]. This list representation of a set is particularlyconvenient, because the list of items acts as a ‘work queue’when building the set: items are examined in order, applyingSCANNER, PREDICTOR and COMPLETER as necessary;items added to the set are appended onto the end of the list.
3. THE PROBLEM OF ϵ
At any given point i in the parse, we have two partially-constructed sets. SCANNER may add items to Si+1and Si may have items added to it by PREDICTOR andCOMPLETER. It is this latter possibility, adding items toSi while representing sets as lists, which causes grief withϵ-rules.
When COMPLETER processes an item [A→ •, j ] whichcorresponds to the ϵ-rule A → ϵ, it must look throughSj for items with the dot before an A. Unfortunately,for ϵ-rule items, j is always equal to i—COMPLETER
is thus looking through the partially-constructed set Si .3
Since implementations process items in Si in order, if anitem [B → . . . • A . . . , k] is added to Si after COMPLETER
has processed [A → •, j ], COMPLETER will never add[B → . . . A • . . . , k] to Si . In turn, items resulting directlyand indirectly from [B → . . . A• . . . , k] will be omitted too.This effectively prunes potential derivation paths, which cancause correct input to be rejected. Figure 2 gives an exampleof this happening.
3j = i for ϵ-rule items because they can only be added to an Earleyset by PREDICTOR, which always bestows added items with the parentpointer i.
S′ → S
S → AAAA
A → aA → E
E → ϵ
S0
S′ → •S , 0S → •AAAA , 0A→ •a , 0A→ •E , 0E→ • , 0A→ E• , 0S → A • AAA , 0
a
S1
A→ a• , 0S → A • AAA , 0S → AA • AA , 0A→ •a , 1A→ •E , 1E→ • , 1A→ E• , 1S → AAA • A , 0
FIGURE 2. An unadulterated Earley parser, representing setsusing lists, rejects the valid input a. Missing items in S0 soundthe death knell for this parse.
Two methods of handling this problem have beenproposed. Grune and Jacobs aptly summarize one approach:
‘The easiest way to handle this mare’s nest isto stay calm and keep running the Predictor andCompleter in turn until neither has anything moreto add.’ [10, p. 159]
Aho and Ullman [11] specify this method in their presen-tation of Earley parsing and it is used by ACCENT [12], acompiler–compiler which generates Earley parsers.
The other approach was suggested by Earley [1, 2].He proposed having COMPLETER note that the dot neededto be moved over A, then looking for this whenever futureitems were added to Si . For efficiency’s sake, the collectionof non-terminals to watch for should be stored in a datastructure which allows fast access. We used this methodinitially for the Earley parser in the SPARK toolkit [13].
In our opinion, neither approach is very satisfactory.Repeatedly processing Si , or parts thereof, involves a lotof activity for little gain; Earley’s solution requires anextra, dynamically-updated data structure and the unnaturalmating of COMPLETER with the addition of items. Ideally,we want a solution which retains the elegance of Earley’salgorithm, only processes items in Si once and has no run-time overhead from updating a data structure.
4. AN ‘IDEAL’ SOLUTION
Our solution involves a simple modification to PREDICTOR,based on the idea of nullability. A non-terminal A is saidto be nullable if A ⇒∗ ϵ; terminal symbols, of course,can never be nullable. The nullability of non-terminals ina grammar may be easily precomputed using well-knowntechniques [14, 15]. Using this notion, our PREDICTOR canbe stated as follows (our modification is in bold):
If [A→ . . . • B . . . , j ] is in Si , add [B → •α, i]to Si for all rules B → α. If B is nullable,also add [A→ . . . B • . . . , j] to Si .
THE COMPUTER JOURNAL, Vol. 45, No. 6, 2002
ifs=<Aà…•a…,b>anda=x[i+1]then//scanS[i+1]=S[i+1]∪ {<Aà…a•…,b> }
ifs=<Aà …•X…,b>andXàαthen//predictS[i]=S[i]∪ {<Xà•α,i > }
ifs=<Aà …•,b>and<Xà…•A…,k>∈ S[b]then//completeS[i]=S[i]∪{<Xà…A•…,k>}
Earley Parsing
160Jay Earley, PhD
161