Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Chap. 6, Bottom-Up Parsing

J. H. WangMay 17, 2011

Outline

• Overview• Shift-Reduce Parsers• LR(0) Table Construction• Conflict Diagnosis• Conflict Resolution and Table

Construction

Overview

• Problems in top-town parsers– Left-recursion– Common prefixes– (Fig. 5.12 vs. Fig. 5.16)

• Bottom-up parsers can handle the largest class of grammars that can be parsed deterministically

• A bottom-up parser begins with parse tree’s leaves, and moves toward its root

• A bottom-up parser traces a rightmost derivation in reverse

• A bottom-up parser uses a grammar rule to replace the rule’s RHS with its LHS

• (Fig. 4.5 & Fig. 4.6)

• Bottom-up: from terminal symbols to the goal symbol

• Shift-reduce: two most prevalent actions– Shift symbols onto the parse stack– Reduce a string to nonterminals

• LR(k): scan the input from the left, producing a rightmost derivation in reverse, using k symbols of lookahead

• LR parsers are more general than LL parsers– Yacc: LR parser generator

Shift-Reduce Parsers

• LR parsers and rightmost derivations– LR parses construct rightmost

derivations in reverse– Fig. 6.2

• LR parsing as knitting– How the RHS of a production is found– Fig. 6.1

• In Fig. 6.1:– Right needle: unprocessed portion of the string– Left needle: parser’s stack (processed portion)

• Operations– Shift: transfers a symbol from right needle to

left needle– Reduction: symbols at the top of the parse

stack (left needle)• A

– (Fig. 6.1)

LR Parsing Engine

• A simple driver for shift-reduce parser– Fig. 6.3– Driven by a table (Sec. 6.2.4)– Indexed by the parser’s current state and

the next input symbol• Current state: parser stack

– Shift and reduce actions are performed until• Accepted: input is reduced to the goal symbol• Error: no valid actions found

PUSH

PUSH

PEEK

POP

ADVANCE

PREPEND

ERROR

LR Parse Table

• Given a sentential form, the handle is defined as the sequence of symbols that will next be replaced by reduction– How to identify the handle– Which production to employ– (Fig. 6.4 & Fig. 6.5)

• In Fig. 6.5:– [s]: Shift to state s– r: reduction by rule r– Blank: error actions

• A bottom-up parse of “a b b d c $”– Fig. 6.6 & Fig. 6.7– A rightmost derivation in reverse– Shift actions are implied by inability to perform

a useful reduction• Tokens are shifted until a handle appears

LR(k) Parsing

• Concept of LR parsing introduced by Knuth in 1965

• LR(k)– LR(0): number of symbols lookahead

used in constructing the parse table– LR(0) and LR(1): one symbol lookahead

at parse time– Number of columns in parse table: nk

• Properties of LR(k) parsers– Shifting symbols and examining lookahead until

the end of handle is found– Handle is reduced to a nonterminal– Determine whether to shift or reduce, based on

the symbols already shifted (left context) and the next k lookahead symbols (right context)

• A grammar is LR(k) iff. it’s possible to construct an LR parse table such that k tokens of lookahead allows the parser to recognize exactly the strings in the grammar’s language– Deterministic: each cell in LR parse table

contains only one entry

Formal Definition of LR(k) Grammars

• A grammar is LR(k) iff. the following conditions imply Ay=Bx– S=>*rm Aw =>rm w

– S=>*rm Bx =>rm y

– Firstk(w)=Firstk(y)

• LR(k) parsers can always determine the correct reduction (A) given– The left context () up to the end of the handle

– The next k symbols (Firstk(w)) of the input

LR(0) Table Construction

• (Fig. 6.2)– E plus E E

• LR(0) item– A grammar production with a bookmark

that indicates the current progress through the production’s RHS• Fresh: E . plus E E• Reducible: E plus E E .

– (Fig. 6.8)

• Parser state: a set of LR(0) items• LR(0) construction algorithm

– Fig. 6.9 & Fig. 6.10– ComputeGoto

• Closure of state s• Transitions from s

– E.g.: Fig. 6.11• Kernel of state s• A DFA called CFSM (characteristic finite-

state machine)

OMPUTE

DD TATE

DVANCE OT

DD TATE

RODUCTIONS OR

XTRACT LEMENT

OMPUTE OTO

DVANCE OT

RODUCTIONS OR

OMPUTE OTO

LOSURE

LOSURE

DD TATE

• CFSM recognizes its grammar’s viable prefixes– Viable prefix: any prefix that does not

extend beyond its handle– Accept state in CFSM: a viable prefix

that ends with a handle• Reduction• (Fig. 6.12)

• For LR(0) grammar, the following properties– Given a syntactically correct input string, CFSM

will block only in double-boxed states– There’s at most one item in any double-boxed

state– If the input string is syntactically invalid, parser

will enter a state that the offending symbol cannot be shifted

• To complete that parse table– (Fig. 6.13 & 6.14)– E.g.: (Fig. 6.15)

OMPLETE ABLE

OMPUTE OOKAHEAD

RY ULE N TATE

SSERT NTRY

SSERT NTRY

EPORT ONFLICT

RY ULE N TATE

SSERT NTRY

OMPUTE OOKAHEAD

RY ULE N TATE

Conflict Diagnosis

• A parse table conflict arises when the table-construction method cannot decide between multiple alternatives for some table entry– Shift/reduce conflicts– Reduce/reduce conflicts

• Reasons for conflicts– Grammar is ambiguous– Grammar is no ambiguous, but current table-

building approach cannot resolve the conflict• Given more lookahead• Use a more powerful method

Ambiguous Grammars

• Using state 5 in Fig.6.16 as an example, the steps taken to understand conflicts– Determine a sequence of vocabulary

symbols that cause the parse to move from the start state to the inadequate state• E plus E

– We obtain a snapshot • E plus E . plus E• (Fig. 6.17)

• Top parse tree– Reduction– Left-associative grouping for addition

• Bottom parse tree– Shift– Right-associative grouping for addition

• -> we eliminate the ambiguity by creating a grammar that favors left-association– (Fig. 6.18)

Grammars that are not LR(k)

• Reduce/reduce conflict– Start=>rm Exprs $

=>rm E a $ =>rm E plus num a $ =>*rm E plus … plus num a $ =>rm num plus … plus num a $

Conflict Resolution and Table Construction

• Increasingly sophisticated lookahead techniques to resolve conflicts– SLR(k): simple– LALR(k)– LR(k): the most powerful

SLR(k) Table Construction

• SLR(k): Simple LR with k tokens of lookahead– A grammar that is not LR(0): Fig. 6.20– Input string: num plus num times num $

• Replacing a terminal by a nonterminal whose role in the grammar in equivalent– (Fig. 6.21)

• LR(0) construction: (Fig. 6.22)– Shift/reduce conflict of state 6

• Shift: (can continue as in Fig.6.21)• Reduce: block in state 3

– E time num $ is not a valid sentential form– E -> E plus T is appropriate under some conditions

• For sentential forms– E plus T $– E plus T plus num $

• If the reduction can lead to a successful parse, then plus can appear next to E in some valid sentential form– plus Follow(E)– TryRuleInState(): (Fig.6.23)– SLR(1) parse table: (Fig. 6.24)

RY ULE N TATE

RY ULE N TATE

SSERT NTRY

LALR(k) Table Construction

• Sometimes SLR(k) construction fails only because the Followk information is not rule specific– (Fig.6.25)– Grammar is not ambiguous– State 3 has shift/reduce conflict

– Followk(A) = {b$k-1, c$k-1, $k}• Insufficient to resolve the conflict• (Fig. 6.26)

• LALR(k): Lookahead Ahead LR with k tokens of lookahead– Same number of rows (states) as LR(0)

table– The most popular LR table-building

• Balance of power and efficiency

– Redefine two methods• TryRuleInState: ItemFollow set (Fig. 6.27)• ComputeLookahead: lookahead propagation

graph (Fig. 6.28)

RY ULE N TATE

RY ULE N TATE

SSERT NTRY

LALR Propagation Graph

• Each LR(0) item occurs at most once in any state– The pair (s, A->.): a vertex in the

graph– Edge between items i and j

• Symbols that follow the reducible form of item i should be included in the corresponding set of symbols for item j

• For item A->.B, any symbol in First() can follow each closure item B->.

• Propagation edges– An edge is placed from an item A->.B in

state s to item A->B. in state t– When =>*λ, any symbol that can follow A can

also follow B

• Example: – Building propagation graph (Fig. 6.29) – Evaluating propagation graph (Fig. 6.30)

• In general, multiple passes can be required for convergence– (Fig. 6.31)– (Fig. 6.32)– (Fig. 6.33)

• In practice, LALR(1) lookahead computations converge quickly, usually in one or two passes

• LALR(1) is a powerful parsing method• LALR(1) grammars are available for all

popular programming languages

LR(k) Table Construction

• LR(k) parsing: not very practical because– LR(1) tables are typically much larger than

LR(0) tables (for SLR(k) and LALR(k))– It’s rare that LR(1) can handle a grammar for

which LALR(1) fails• Ex. (Fig. 6.35)

• When LALR(1) fails– Grammar is ambiguous: LR(k) cannot help– More lookahead needed: LR(k) can help, but

LALR(k) might suffice– No amount of lookahead suffices: LR(k) cannot

help

• E.g. item 14– ItemFollow(14)={rb, rp}– ItemFollow(15)={rb, rp}– Reduce/reduce conflict

• A state in LR(k) is uniquely identified not only by its kernel, but also its lookahead– For LR(k), we extend an item’s notation from A-

>. to [A->., w]• For LR(1), w is a symbol that can follow A after

reduction• For LR(k), w is a k-length string that can follow A after

reduction

– The number of states in LR(k) is usually much larger

• We can also begin with LALR(1), and split states selectively

Thanks for Your Attention!

Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.

Documents

Transcript of Chap. 6, Bottom-Up Parsing J. H. Wang May 17, 2011.