Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.
-
Upload
posy-powell -
Category
Documents
-
view
218 -
download
0
Transcript of Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.
Chap. 6, Bottom-Up Parsing
J. H. WangMay 17, 2011
Outline
• Overview• Shift-Reduce Parsers• LR(0) Table Construction• Conflict Diagnosis• Conflict Resolution and Table
Construction
Overview
• Problems in top-town parsers– Left-recursion– Common prefixes– (Fig. 5.12 vs. Fig. 5.16)
• Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically
ACTOR
• A bottom-up parser begins with parse tree’s leaves, and moves toward its root
• A bottom-up parser traces a rightmost derivation in reverse
• A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS
• (Fig. 4.5 & Fig. 4.6)
• Bottom-up: from terminal symbols to the goal symbol
• Shift-reduce: two most prevalent actions– Shift symbols onto the parse stack– Reduce a string to nonterminals
• LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead
• LR parsers are more general than LL parsers– Yacc: LR parser generator
Shift-Reduce Parsers
• LR parsers and rightmost derivations– LR parses construct rightmost
derivations in reverse– Fig. 6.2
• LR parsing as knitting– How the RHS of a production is found– Fig. 6.1
• In Fig. 6.1:– Right needle: unprocessed portion of the string– Left needle: parser’s stack (processed portion)
• Operations– Shift: transfers a symbol from right needle to
left needle– Reduction: symbols at the top of the parse
stack (left needle)• A
– (Fig. 6.1)
LR Parsing Engine
• A simple driver for shift-reduce parser– Fig. 6.3– Driven by a table (Sec. 6.2.4)– Indexed by the parser’s current state and
the next input symbol• Current state: parser stack
– Shift and reduce actions are performed until• Accepted: input is reduced to the goal symbol• Error: no valid actions found
PUSH
PUSH
PEEK
POP
ADVANCE
PREPEND
ERROR
LR Parse Table
• Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction– How to identify the handle– Which production to employ– (Fig. 6.4 & Fig. 6.5)
• In Fig. 6.5:– [s]: Shift to state s– r: reduction by rule r– Blank: error actions
• A bottom-up parse of “a b b d c $”– Fig. 6.6 & Fig. 6.7– A rightmost derivation in reverse– Shift actions are implied by inability to perform
a useful reduction• Tokens are shifted until a handle appears
LR(k) Parsing
• Concept of LR parsing introduced by Knuth in 1965
• LR(k)– LR(0): number of symbols lookahead
used in constructing the parse table– LR(0) and LR(1): one symbol lookahead
at parse time– Number of columns in parse table: nk
• Properties of LR(k) parsers– Shifting symbols and examining lookahead until
the end of handle is found– Handle is reduced to a nonterminal– Determine whether to shift or reduce, based on
the symbols already shifted (left context) and the next k lookahead symbols (right context)
• A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language– Deterministic: each cell in LR parse table
contains only one entry
Formal Definition of LR(k) Grammars
• A grammar is LR(k) iff. the following conditions imply Ay=Bx– S=>*rm Aw =>rm w
– S=>*rm Bx =>rm y
– Firstk(w)=Firstk(y)
• LR(k) parsers can always determine the correct reduction (A) given– The left context () up to the end of the handle
– The next k symbols (Firstk(w)) of the input
LR(0) Table Construction
• (Fig. 6.2)– E plus E E
• LR(0) item– A grammar production with a bookmark
that indicates the current progress through the production’s RHS• Fresh: E . plus E E• Reducible: E plus E E .
– (Fig. 6.8)
• Parser state: a set of LR(0) items• LR(0) construction algorithm
– Fig. 6.9 & Fig. 6.10– ComputeGoto
• Closure of state s• Transitions from s
– E.g.: Fig. 6.11• Kernel of state s• A DFA called CFSM (characteristic finite-
state machine)
OMPUTE
DD TATE
DVANCE OT
DD TATE
RODUCTIONS OR
XTRACT LEMENT
OMPUTE OTO
DVANCE OT
RODUCTIONS OR
OMPUTE OTO
LOSURE
LOSURE
DD TATE
• CFSM recognizes its grammar’s viable prefixes– Viable prefix: any prefix that does not
extend beyond its handle– Accept state in CFSM: a viable prefix
that ends with a handle• Reduction• (Fig. 6.12)
• For LR(0) grammar, the following properties– Given a syntactically correct input string, CFSM
will block only in double-boxed states– There’s at most one item in any double-boxed
state– If the input string is syntactically invalid, parser
will enter a state that the offending symbol cannot be shifted
• To complete that parse table– (Fig. 6.13 & 6.14)– E.g.: (Fig. 6.15)
OMPLETE ABLE
OMPUTE OOKAHEAD
RY ULE N TATE
SSERT NTRY
SSERT NTRY
EPORT ONFLICT
RY ULE N TATE
SSERT NTRY
OMPUTE OOKAHEAD
RY ULE N TATE
Conflict Diagnosis
• A parse table conflict arises when the table-construction method cannot decide between multiple alternatives for some table entry– Shift/reduce conflicts– Reduce/reduce conflicts
• Reasons for conflicts– Grammar is ambiguous– Grammar is no ambiguous, but current table-
building approach cannot resolve the conflict• Given more lookahead• Use a more powerful method
Ambiguous Grammars
• Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts– Determine a sequence of vocabulary
symbols that cause the parse to move from the start state to the inadequate state• E plus E
– We obtain a snapshot • E plus E . plus E• (Fig. 6.17)
• Top parse tree– Reduction– Left-associative grouping for addition
• Bottom parse tree– Shift– Right-associative grouping for addition
• -> we eliminate the ambiguity by creating a grammar that favors left-association– (Fig. 6.18)
Grammars that are not LR(k)
• Reduce/reduce conflict– Start=>rm Exprs $
=>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $
Conflict Resolution and Table Construction
• Increasingly sophisticated lookahead techniques to resolve conflicts– SLR(k): simple– LALR(k)– LR(k): the most powerful
SLR(k) Table Construction
• SLR(k): Simple LR with k tokens of lookahead– A grammar that is not LR(0): Fig. 6.20– Input string: num plus num times num $
• Replacing a terminal by a nonterminal whose role in the grammar in equivalent– (Fig. 6.21)
• LR(0) construction: (Fig. 6.22)– Shift/reduce conflict of state 6
• Shift: (can continue as in Fig.6.21)• Reduce: block in state 3
– E time num $ is not a valid sentential form– E -> E plus T is appropriate under some conditions
• For sentential forms– E plus T $– E plus T plus num $
• If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form– plus Follow(E)– TryRuleInState(): (Fig.6.23)– SLR(1) parse table: (Fig. 6.24)
RY ULE N TATE
RY ULE N TATE
SSERT NTRY
LALR(k) Table Construction
• Sometimes SLR(k) construction fails only because the Followk information is not rule specific– (Fig.6.25)– Grammar is not ambiguous– State 3 has shift/reduce conflict
– Followk(A) = {b$k-1, c$k-1, $k}• Insufficient to resolve the conflict• (Fig. 6.26)
• LALR(k): Lookahead Ahead LR with k tokens of lookahead– Same number of rows (states) as LR(0)
table– The most popular LR table-building
• Balance of power and efficiency
– Redefine two methods• TryRuleInState: ItemFollow set (Fig. 6.27)• ComputeLookahead: lookahead propagation
graph (Fig. 6.28)
RY ULE N TATE
RY ULE N TATE
SSERT NTRY
LALR Propagation Graph
• Each LR(0) item occurs at most once in any state– The pair (s, A->.): a vertex in the
graph– Edge between items i and j
• Symbols that follow the reducible form of item i should be included in the corresponding set of symbols for item j
• For item A->.B, any symbol in First() can follow each closure item B->.
• Propagation edges– An edge is placed from an item A->.B in
state s to item A->B. in state t– When =>*λ, any symbol that can follow A can
also follow B
• Example: – Building propagation graph (Fig. 6.29) – Evaluating propagation graph (Fig. 6.30)
• In general, multiple passes can be required for convergence– (Fig. 6.31)– (Fig. 6.32)– (Fig. 6.33)
• In practice, LALR(1) lookahead computations converge quickly, usually in one or two passes
• LALR(1) is a powerful parsing method• LALR(1) grammars are available for all
popular programming languages
LR(k) Table Construction
• LR(k) parsing: not very practical because– LR(1) tables are typically much larger than
LR(0) tables (for SLR(k) and LALR(k))– It’s rare that LR(1) can handle a grammar for
which LALR(1) fails• Ex. (Fig. 6.35)
• When LALR(1) fails– Grammar is ambiguous: LR(k) cannot help– More lookahead needed: LR(k) can help, but
LALR(k) might suffice– No amount of lookahead suffices: LR(k) cannot
help
• E.g. item 14– ItemFollow(14)={rb, rp}– ItemFollow(15)={rb, rp}– Reduce/reduce conflict
• A state in LR(k) is uniquely identified not only by its kernel, but also its lookahead– For LR(k), we extend an item’s notation from A-
>. to [A->., w]• For LR(1), w is a symbol that can follow A after
reduction• For LR(k), w is a k-length string that can follow A after
reduction
– The number of states in LR(k) is usually much larger
• We can also begin with LALR(1), and split states selectively
Thanks for Your Attention!