Bottom up parser

62
Bottom up parsing

Transcript of Bottom up parser

Page 1: Bottom up parser

Bottom up parsing

Page 2: Bottom up parser

Bottom-Up Parsing

Bottom-up parsing is more general than top- down.

A bottom-up parser builds a derivation by working

from the input sentence back toward the start symbol S

Preferred method in practice

Also called LR parsing

L means that tokens are read left to right

R means that it constructs a rightmost derivation

Page 3: Bottom up parser

Bottom-up Parsing

• Two types of bottom-up parsing:

• Operator-Precedence parsing

• LR parsing

• covers wide range of grammars.

• SLR – simple LR parser

• LR – most general LR parser

• LALR – intermediate LR parser (Lookahead LR

parser)

Page 4: Bottom up parser

Bottom-Up Parsing

LR parsing reduces a string to the start symbol by inverting productions:

Str input string of terminals

repeat

Identify β in str such that A →β is a production

Replace β by A in str

until str = S (the start symbol) OR all possibilities are exhausted

Page 5: Bottom up parser

Bottom-Up Parsing

int + (int) + (int)

E + (int) + (int)

E + (E) + (int)

E + (int)

E + (E)

E

E → E + ( E ) | int

A rightmost derivation

in reverse

Page 6: Bottom up parser
Page 7: Bottom up parser

ReductionsBottom-up parsing as the process of

"reducing" a string w to the start symbol of the grammar.

At each reduction step, a substring that matches the body of a production is replaced by the non-terminal at the head of that production.

Page 8: Bottom up parser

Handle

Handle of a string: Substring that matches the

RHS of some production AND whose reduction

to the non-terminal on the LHS is a step along

the reverse of some rightmost derivation.

Page 9: Bottom up parser

Handles

Formally:

Handle of a right sentential form is <A ,

location of in >

i.e. A is a handle of at the location

immediately after the end of , if:S => A =>

A certain sentential form may have many different

handles.

Right sentential forms of a non-ambiguous grammar

have one unique handle

Page 10: Bottom up parser

Example

S aABe aAde aAbcde abbcde

S aABe

A Abc | b

B d

S aABe is a handle of aABe in location 1.

B d is a handle of aAde in location 3.

A Abc is a handle of aAbcde in location 2.

A b is a handle of abbcde in location 2.

Page 11: Bottom up parser

Handle Pruning

A rightmost derivation in reverse can be obtained by “handle-pruning.”

abbcde

Find the handle = b at loc. 2

aAbcde

b at loc. 3 is not a handle:

aAAcde

... blocked.

Page 12: Bottom up parser

Handle-pruning

The process of discovering a handle & reducing it to the appropriate left-hand side is

called handle pruning.

Handle pruning forms the basis for a bottom-up parsing method.

To construct a rightmost derivation

S = 0 1 2 … n-1 n = w

Apply the following simple algorithm

for i n to 1 by -1

Find the handle Ai i in i

Replace i with Ai to generate i-1

Page 13: Bottom up parser

1. S -> 0 S1|01 indicate the handle in each of the following right-sentential forms:

1. 000111

2. 00S11

2. For the grammar S -> S S + | S S * | a indicate the handle in each of the following

right-sentential forms:

1. SSS + a * +

2. SS + a * a+

3. aaa * a + +.

3. Give bottom-up parses for the following input strings

1. The input 000111 according to the grammar of

Exercise 1

2. The input aaa * a + + according to the grammar of 2

Page 14: Bottom up parser

Shift Reduce Parsing with a

StackTwo problems:

locate a handle

decide which production to use (if there are

more than two candidate productions).

Page 15: Bottom up parser

Shift-reduce Parsing

A shift-reduce parser is a stack automaton with four actions

Shift — next word is shifted onto the stack

Reduce — right end of handle is at top of stack

Locate left end of handle within the stack

Pop handle off stack & push appropriate lhs

Accept — stop parsing & report success

Error — call an error reporting/recovery routine

Accept & Error are simple

Shift is just a push and a call to the scanner

Reduce takes |rhs| pops & 1 push

Page 16: Bottom up parser

x - 2 * y

Stack Input Handle Action

$ id - num * id None shift

$ id - num * id

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Page 17: Bottom up parser

Back to x - 2 * yStack Input Handle Action

$ id - num * id none shift

$ id - num * id 8,1 reduce 8

$ Factor - num * id 6,1 reduce 6

$ Term - num * id 3,1 reduce 4

$ Expr - num * id

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Page 18: Bottom up parser

Back to x - 2 * yStack Input Handle Action

$ id - num * id none shift

$ id - num * id 8,1 reduce 8

$ Factor - num * id 6,1 reduce 6

$ Term - num * id 3,1 reduce 4

$ Expr - num * id

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Expr is not a handle at this point because it does not

occur at this point in the derivation.

Page 19: Bottom up parser

Back to x - 2 * yStack Input Handle Action

$ id - num * id none shift

$ id - num * id 8,1 reduce 8

$ Factor - num * id 6,1 reduce 6

$ Term - num * id 3,1 reduce 3

$ Expr - num * id none shift

$ Expr - num * id none shift

$ Expr - num * id

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Page 20: Bottom up parser

Back to x - 2 * yStack Input Handle Action

$ id - num * id none shift

$ id - num * id 8,1 reduce 8

$ Factor - num * id 6,1 reduce 6

$ Term - num * id 3,1 reduce 3

$ Expr - num * id none shift

$ Expr - num * id none shift

$ Expr - num * id 7,3 reduce 7

$ Expr - Factor * id 6,3 reduce 6

$ Expr - Term * id

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Page 21: Bottom up parser

Back to x - 2 * yStack Input Handle Action

$ id - num * id none shift

$ id - num * id 8,1 reduce 8

$ Factor - num * id 6,1 reduce 6

$ Term - num * id 3,1 reduce 3

$ Expr - num * id none shift

$ Expr - num * id none shift

$ Expr - num * id 7,3 reduce 7

$ Expr - Factor * id 6,3 reduce 6

$ Expr - Term * id none shift

$ Expr - Term * id none shift

$ Expr - Term * id

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Page 22: Bottom up parser

Back to x - 2 * y

5 shifts +

9 reduces +

1 accept

Stack Input Handle Action

$ id - num * id none shift

$ id - num * id 8,1 reduce 8

$ Factor - num * id 6,1 reduce 6

$ Term - num * id 3,1 reduce 3

$ Expr - num * id none shift

$ Expr - num * id none shift

$ Expr - num * id 7,3 reduce 7

$ Expr - Factor * id 6,3 reduce 6

$ Expr - Term * id none shift

$ Expr - Term * id none shift

$ Expr - Term * id 8,5 reduce 8

$ Expr - Term * Factor 4,5 reduce 4

$ Expr - Term 2,3 reduce 2

$ Expr 0,1 reduce 0

$ Goal none accept

1. Shift until the top of the stack is the right end of a handle

2. Find the left end of the handle and reduce

0 Goal Expr

1 Expr Expr + Term

2 | Expr - Term

3 | Term

4 Term Term * Factor

5 | Term / Factor

6 | Factor

7 Factor number

8 | id

9 | ( Expr )

Page 23: Bottom up parser

Goal

<id,x>

Term

Fact.

Expr –

Expr

<id,y>

<num,2>

Fact.

Fact.Term

Term

*

Stack Input Action

$ id - num * id shift

$ id - num * id reduce 8

$ Factor - num * id reduce 6

$ Term - num * id reduce 3

$ Expr - num * id shift

$ Expr - num * id shift

$ Expr - num * id reduce 7

$ Expr - Factor * id reduce 6

$ Expr - Term * id shift

$ Expr - Term * id shift

$ Expr - Term * id reduce 8

$ Expr - Term * Factor reduce 4

$ Expr - Term reduce 2

$ Expr reduce 0

$ Goal accept

Back to x - 2 * y

Corresponding Parse Tree

Page 24: Bottom up parser

Conflicts During Shift-Reduce Parsing

Conflicts“shift/reduce” or “reduce/reduce”

Example:

stmt if expr then stmt

| if expr then stmt else stmt

| other (any other statement)

Stack Input

if … then stmt else … Shift/ Reduce Conflict

We can’t tell

whether it is a

handle

Page 25: Bottom up parser

LR Parsing

Bottom-up parser based on a concept called

LR(k) parsing

"L" is for left-to-right scanning of the input.

"R" for constructing a rightmost derivation in

reverse,

“k” for the number of input symbols of

lookahead that are used in making parsing

decisions.

Page 26: Bottom up parser

Why LR Parsers?

For a grammar to be LR it is sufficient that a

left-to-right shift-reduce parser be able to

recognize handles of right-sentential forms

when they appear on top of the stack.

Page 27: Bottom up parser

Why LR Parsers?

LR parsers can be constructed to recognize allprogramming language constructs for whichcontext-free grammars can be written.

The LR-parsing method is the most general non-back-tracking shift-reduce parsing method.

An LR parser can detect a syntactic errors.

The class of grammars that can be parsed using LRmethods is a proper superset of the class ofgrammars that can be parsed with predictive or LLmethods.

Page 28: Bottom up parser

Components of LR Parser

X

Y

Z

$

A + B $

Parsing Program

Parse Table

Action Goto

outputstack

Input buffer

Page 29: Bottom up parser

Techniques for Creating Parse Table

SLR: Construct parsing table for small set of grammars called SLR(1).

Easy to develop.

CLR(CANONICAL LR) : Most powerful.

Generates a large parse table.

More difficult develop.

Works for all types of CFG

May detect position of error.

LALR(LOOK AHEAD LR) :Widely used method.

Optimizes the size of parse table, by combining some states.

Information may get lost.

Page 30: Bottom up parser

SLR Parsers

1. Formation of augmented grammar G’ for

the given grammar G

2. Construction of LR(0) collection of

items.

To find LR(0) collection of items Closure(I)

and Goto(I,X) have to be computed.

3. Finding first and follow of non-terminals

4. Construction of parse table.

Page 31: Bottom up parser

Formation of Augmented

GrammarThe augmented grammar G’, is G with a new start

symbol S’ and an additional production S’ -> S

E->E+T|T

T->T*F|F

The augmented grammar G’ is given by

E’->E

E->E+T|T

T->T*F|F

Page 32: Bottom up parser

Items and the LR(O)

AutomatonHow does a shift-reduce parser know when

to shift and when to reduce?

How does the parser know that symbol on thetop of the stack is not a handle?

An LR parser makes shift-reduce decisionsby maintaining states to keep track of all theoperations.

States represent sets of "items."

Page 33: Bottom up parser

Items and the LR(O)

Automaton An LR(O) item of a grammar G is a production of G, with a

dot at some position in the body.

Production A -> XYZ yields the four items

A -> ·XYZ

A -> X·YZ

A -> XY·Z

A -> XYZ·

A ->ε generates only one item, A ->.

An item indicates how much of a production we have seen

at a given point in the parsing process.

Page 34: Bottom up parser

Items and the LR(O)

Automaton The item A -> ·XYZ indicates that we hope to see a string derivable from XYZ next

on the input.

Item A -> X·YZ indicates that we have just seen on the input a string derivable from

X and that we hope next to see a string derivable from YZ.

Item A -> XY·Z indicates that we have just seen on the input a string derivable from

XY and that we hope next to see a string derivable from Z.

Item A -> XYZ· indicates that we have seen the body XY Z and that it may be time

to reduce XYZ to A

Page 35: Bottom up parser

Items and the LR(O)

Automaton Collection of sets of LR(0) items, called the

canonical LR(0) collection.

Provides the basis for constructing a deterministic

finite automaton that is used to make parsing

decisions.

Automaton is called an LR(0) automaton.

Each state of the LR(0) automaton represents a

set of items in the canonical LR(0) collection.

Page 36: Bottom up parser

Construction of LR(0) Items

Items are viewed as states in NFA.

Grouped to form same states.

Process of grouping together is called subset

Construction Algorithm.

Closure and goto operations have to be

computed.

Page 37: Bottom up parser

Closure of Item Sets

If I is a set of items for a grammar G, then CLOSURE(I)

is the set of items constructed from I by the two rules:

Initially, add every item in I to CLOSURE(I).

If A -> α·Bγ is in CLOSURE(I) and B -> γ is a

production, then add the item B -> γ to CLOSURE(I),

if it is not already there. Apply this rule until no more

new items can be added to CLOSURE (I).

Page 38: Bottom up parser

Computation of CLOSURE

E’ EE E + T | T

T T * F | F

F ( E ) | id

I0

E’ .E

E .E + T

E .T

T .T * F

T .F

F .( E )

F .id

Page 39: Bottom up parser

Computation of CLOSURE

SetOfltems CLOSURE (I) {

J = I;

repeat

for ( each item A -> a·Bγ in J )

for ( each production B ->γ of G )

if (B ->.γ is not in J )

add B ->.γ to J;

until no more items are added to J on one round;

return J;

}

Page 40: Bottom up parser

Kernel items : The initial item, S' ->·S, and all items whose dots are not at

the left end.

Non-kernel items : All items with their dots at the left end, except for S' ->

·S.

Page 41: Bottom up parser

The Function GOTO

GOTO (I, X) is defined to be the closure of the set of all items [A -> αX.β] such that

[A -> α.Xβ] is in I.

If I is the set of two items { [E' -> E·] , [E -> E· + T] } , then

GOTO(I, +) contains the items

E -> E + ·T

T -> ·T * F

T -> ·F

F -> · (E)

F -> ·id

Page 42: Bottom up parser

The Function GOTO

Page 43: Bottom up parser

Grammar

S E + S | E

E num

E num .

S’ S . $

num

ES’ . S $

S .E + S

S . E

E .num

1

S E . +S

S E .

2

S E + S .

5

S E + . S

S . E + S

S . E

E . num

3

S’ S $ .

7

4

S

S

$

+

E

num

Page 44: Bottom up parser

num + $ E S

1 s4 g2 g6

2 s3 SE

3 s4 g2 g5

4 Enum Enum

5 SE+S

6 s7

7 accept

Page 45: Bottom up parser

S' SS L=RS RL *RL idR L

id

S' SS L=RS RL *RL idR L

L id

S L =RR L

S' S I0

I1

I2

I3

S R I4

L * RR LL idL * R

I5

S

L

*

id R

S L= RR LL *RL id

I6

=

RS L=R

R L L

L

I7

id

I3

*

*

L *R R

I8

I9

Page 46: Bottom up parser

state action gotoid = * $ S L R

0 s3 s5 1 2 41 accept2 s6/r(RL)3 r(Lid) r(Lid)4 r(SR)5 s3 s5 7 86 s3 s5 7 97 r(RL) r(RL)8 r(L*R) r(L*R)9 r(SL=R)

Page 47: Bottom up parser
Page 48: Bottom up parser
Page 49: Bottom up parser

Structure of the LR Parsing

Table1. The ACTION function takes as arguments a state i and a terminal

a (or $, the input endmarker). The value of ACTION[i, a] can have one of four forms:

a) Shift j , where j is a state. The action taken by the parser shifts input a to the stack, but uses state j to represent a .

b) Reduce A ->β. The action of the parser reduces β on the top of the stack to head A.

c) Accept. The parser accepts the input and finishes parsing;

d) Error. The parser discovers an error in its input and takes some corrective action.

2. Extend the GOTO function, defined on sets of items, to states: if GOTo [Ii , A] = Ij , then GOTO also maps a state i and a nonterminal A to state j .

Page 50: Bottom up parser

Construction of SLR Parse

Table1. Construct C={I0,I1…….In} the collection sets of LR(0)

items for G’.

2. Initial state of the parser is constructed from the set of items

for [S’->S]

3. State I is constructed from Ii. The parsing actions are

determined as follows.

1. If [A->α.aβ] is in Ii and GOTO(Ii,a]=Ij, then ACTION[i,a]

is set to ‘shift j’ here ‘a’ must be a terminal.

2. If[A->α.] is in Ii, then ACTION[i,a] is set to reduce A->a

for all ‘a’ Follow(A).

3. If S’->S is in Ii, than action[i, $] is set to ‘accept’.

Page 51: Bottom up parser

Construction of SLR parsing

table4. GOTO transitions are constructed for all non-terminals. If GOTO(Ii,A) =Ij, then

goto(i,A)of the parse table is set to j.

5. All other entries are error entries.

Page 52: Bottom up parser

LR-Parser Configurations

Helps to have complete state o its stack and the remaining

input. A configuration of an LR parser is a pair:(s0s1

………… sm, aiai+1…………an$).

where the first component is the stack contents and the

second component is the remaining input.

Page 53: Bottom up parser

Behavior of the LR Parser

Page 54: Bottom up parser

LR-parsing algorithm.

INPUT: An input string w and an LR-parsing table with functions ACTION

and GOTO for a grammar G.

OUTPUT: If w is in L ( G), the reduction steps of a bottom-up parse for W ;

otherwise, an error indication.

METHOD: Initially, the parser has So on its stack, where So is the initial

state, and w$ in the input buffer.

Page 55: Bottom up parser

LR-Parser Configurations

Page 56: Bottom up parser
Page 57: Bottom up parser
Page 58: Bottom up parser
Page 59: Bottom up parser
Page 60: Bottom up parser
Page 61: Bottom up parser
Page 62: Bottom up parser