G52CMP: Lecture 7 - cs.nott.ac.ukpsznhn/G52CMP-Obsolete/Lecture... · G52CMP: Lecture 7 Syntactic...

Post on 31-May-2020

4 views 0 download

Transcript of G52CMP: Lecture 7 - cs.nott.ac.ukpsznhn/G52CMP-Obsolete/Lecture... · G52CMP: Lecture 7 Syntactic...

G52CMP: Lecture 7Syntactic Analysis: Bottom-Up Parsing

Henrik Nilsson

University of Nottingham, UK

G52CMP: Lecture 7 – p.1/39

This Lecture

• Parsing strategies: top-down and bottom-up.• Shift-Reduce parsing theory.• LR(0) parsing.• LR(0), LR(k), and LALR(k) grammars

G52CMP: Lecture 7 – p.2/39

Parsing Strategies

There are two basic strategies for parsing:• Top-down parsing :

- Attempts to construct the parse tree fromthe root downward .

- Traces out a leftmost derivation .- Exemplified by Recursive-Descent Parsing.

G52CMP: Lecture 7 – p.3/39

Parsing Strategies

There are two basic strategies for parsing:• Top-down parsing :

- Attempts to construct the parse tree fromthe root downward .

- Traces out a leftmost derivation .- Exemplified by Recursive-Descent Parsing.

• Bottom-up parsing :- Attempts to construct the parse tree from

the leaves working up toward the root .- Traces out a rightmost derivation in reverse .

G52CMP: Lecture 7 – p.3/39

Top-Down: Leftmost DerivationConsider the grammar:

S → aABe A → bcA | c B → d

Call sequence for predictive parser on abccde:parseS S ⇒

lm

aABe

read aparseA ⇒

lm

abcABe

read bread cparseA ⇒

lm

abccBe

read cparseB ⇒

lm

abccde

read dread e

G52CMP: Lecture 7 – p.4/39

Shift-Reduce Parsing

Shift-reduce parsing is a general style ofbottom-up syntax analysis:

• Works from the leaves toward the root of theparse tree.

• Has two basic actions:- Shift (read) next terminal symbol.- Reduce a sequence of read terminals and

previously reduced nonterminalscorresponding to the RHS of a productionto LHS nonterminal of that production.

G52CMP: Lecture 7 – p.5/39

Bottom-Up: Rightmost Der. in Reverse

Consider (again) the grammar:

S → aABe A → bcA | c B → d

Reduction steps for the sentence abccde to S

abccde (reduce by A → c)abcAde (reduce by A → bcA)aAde (reduce by B → d)aABe (reduce by S → aABe)S

G52CMP: Lecture 7 – p.6/39

Bottom-Up: Rightmost Der. in Reverse

Consider (again) the grammar:

S → aABe A → bcA | c B → d

Reduction steps for the sentence abccde to S

abccde (reduce by A → c)abcAde (reduce by A → bcA)aAde (reduce by B → d)aABe (reduce by S → aABe)S

These reductions trace out the followingrightmost derivation in reverse:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde ⇒rm

abccde

G52CMP: Lecture 7 – p.6/39

LL, LR, and LALR parsing (1)

Three important classes of parsing methods:

G52CMP: Lecture 7 – p.7/39

LL, LR, and LALR parsing (1)

Three important classes of parsing methods:• LL(k):

- input scanned Left to right- Leftmost derivation- k symbols of lookahead

G52CMP: Lecture 7 – p.7/39

LL, LR, and LALR parsing (1)

Three important classes of parsing methods:• LL(k):

- input scanned Left to right- Leftmost derivation- k symbols of lookahead

• LR(k):- input scanned Left to right- Rightmost derivation in reverse- k symbols of lookahead

G52CMP: Lecture 7 – p.7/39

LL, LR, and LALR parsing (1)

Three important classes of parsing methods:• LL(k):

- input scanned Left to right- Leftmost derivation- k symbols of lookahead

• LR(k):- input scanned Left to right- Rightmost derivation in reverse- k symbols of lookahead

• LALR( k): LookAhead LR, simplified LR parsingG52CMP: Lecture 7 – p.7/39

LL, LR, and LALR parsing (2)

By extension, the classes of grammars thesemethods can handle are also classified as LL(k),LR(k), and LALR(k).

G52CMP: Lecture 7 – p.8/39

Why study LR and LALR parsing?• These methods handle a wide class of

grammars of practical significance.

G52CMP: Lecture 7 – p.9/39

Why study LR and LALR parsing?• These methods handle a wide class of

grammars of practical significance.• In particular, handles left- and right-recursive

grammars (but left rec. needs less stack).

G52CMP: Lecture 7 – p.9/39

Why study LR and LALR parsing?• These methods handle a wide class of

grammars of practical significance.• In particular, handles left- and right-recursive

grammars (but left rec. needs less stack).• LALR is a good compromise between express-

iveness and space cost of implementation.

G52CMP: Lecture 7 – p.9/39

Why study LR and LALR parsing?• These methods handle a wide class of

grammars of practical significance.• In particular, handles left- and right-recursive

grammars (but left rec. needs less stack).• LALR is a good compromise between express-

iveness and space cost of implementation.• Consequently, many parser generator tools

based on LALR.

G52CMP: Lecture 7 – p.9/39

Why study LR and LALR parsing?• These methods handle a wide class of

grammars of practical significance.• In particular, handles left- and right-recursive

grammars (but left rec. needs less stack).• LALR is a good compromise between express-

iveness and space cost of implementation.• Consequently, many parser generator tools

based on LALR.• We will mainly study LR(0) parsing because it

is the simplest, yet uses the samefundamental principles as LR(1) and LALR(1).

G52CMP: Lecture 7 – p.9/39

Shift-Reduce Parsing Theory (1)

Some terminology:• An item for a CFG is a production with a dot

anywhere in the RHS.

G52CMP: Lecture 7 – p.10/39

Shift-Reduce Parsing Theory (1)

Some terminology:• An item for a CFG is a production with a dot

anywhere in the RHS.

For example, the items for the grammar

S → aAc A → Ab | ǫ

areS → · aAc A → ·Ab

S → a · Ac A → A · b

S → aA · c A → Ab ·

S → aAc · A → ·G52CMP: Lecture 7 – p.10/39

Shift-Reduce Parsing Theory (2)

• Recap: Given a CFG G = (N,T, P, S), astring φ ∈ (N ∪ T )∗ is a sentential form for G

iff S∗⇒G

φ.

G52CMP: Lecture 7 – p.11/39

Shift-Reduce Parsing Theory (2)

• Recap: Given a CFG G = (N,T, P, S), astring φ ∈ (N ∪ T )∗ is a sentential form for G

iff S∗⇒G

φ.

• A right-sentential form is a sentential formthat can be derived by a rightmost derivation.

G52CMP: Lecture 7 – p.11/39

Shift-Reduce Parsing Theory (2)

• Recap: Given a CFG G = (N,T, P, S), astring φ ∈ (N ∪ T )∗ is a sentential form for G

iff S∗⇒G

φ.

• A right-sentential form is a sentential formthat can be derived by a rightmost derivation.

• A handle of a right-sentential form φ is asubstring α of φ such that S

∗⇒rm

δAw ⇒rm

δαw

and δαw = φ, where α, δ, φ ∈ (N ∪ T )∗, andw ∈ T ∗.

G52CMP: Lecture 7 – p.11/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

The following is a rightmost derivation:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

The following is a rightmost derivation:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

aABe, aAde and abcAde are right-sentential forms.Handle for each?

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

The following is a rightmost derivation:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

aABe, aAde and abcAde are right-sentential forms.Handle for each? aABe

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

The following is a rightmost derivation:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

aABe, aAde and abcAde are right-sentential forms.Handle for each? aABe, d

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

The following is a rightmost derivation:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

aABe, aAde and abcAde are right-sentential forms.Handle for each? aABe, d, and bcA

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (3)

For example, consider the grammar:

S → aABe A → bcA | c B → d

The following is a rightmost derivation:

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

aABe, aAde and abcAde are right-sentential forms.Handle for each? aABe, d, and bcA

For an unambiguous grammar, the rightmostderivation is unique. Thus we can talk about “thehandle” rather than merely “a handle”.

G52CMP: Lecture 7 – p.12/39

Shift-Reduce Parsing Theory (4)

• A viable prefix of a right-sentential form φ isany prefix γ of φ ending no farther right thanthe right end of the handle of φ.

G52CMP: Lecture 7 – p.13/39

Shift-Reduce Parsing Theory (4)

• A viable prefix of a right-sentential form φ isany prefix γ of φ ending no farther right thanthe right end of the handle of φ.

• An item A → α · β is valid for a viable prefix γif there is a rightmost derivation

S∗⇒rm

δAw ⇒rm

δαβw

and δα = γ.

G52CMP: Lecture 7 – p.13/39

Shift-Reduce Parsing Theory (4)

• A viable prefix of a right-sentential form φ isany prefix γ of φ ending no farther right thanthe right end of the handle of φ.

• An item A → α · β is valid for a viable prefix γif there is a rightmost derivation

S∗⇒rm

δAw ⇒rm

δαβw

and δα = γ.• An item is complete if the the dot is the

rightmost symbol in the item.G52CMP: Lecture 7 – p.13/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

Viable prefixes?

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

Viable prefixes? ǫ,

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

Viable prefixes? ǫ, a,

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

Viable prefixes? ǫ, a, ab,

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

Viable prefixes? ǫ, a, ab, abc,

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (5)

Consider the grammar

S → aABe A → bcA | c B → d

and the rightmost derivation

S ⇒rm

aABe ⇒rm

aAde ⇒rm

abcAde

The right-sentential form abcAde has handle bcA.

Viable prefixes? ǫ, a, ab, abc, abcA.

G52CMP: Lecture 7 – p.14/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid item

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab A → b · cA

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab A → b · cA

abc

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab A → b · cA

abc A → bc · A

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab A → b · cA

abc A → bc · A

abcA

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab A → b · cA

abc A → bc · A

abcA A → bcA ·

G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (6)

Last derivation step aAde ⇒rm

abcAde by

production A → bcA, meaning the handle is bcA.

Valid item for each non-ǫ viable prefix of abcAdeconsidering this particular derivation only?

Viable prefix Valid itema A → · bcA

ab A → b · cA

abc A → bc · A

abcA A → bcA ·

Any complete valid item?G52CMP: Lecture 7 – p.15/39

Shift-Reduce Parsing Theory (7)Knowing the valid items for a viable prefix allowsa rightmost derivation in reverse to be found:

G52CMP: Lecture 7 – p.16/39

Shift-Reduce Parsing Theory (7)Knowing the valid items for a viable prefix allowsa rightmost derivation in reverse to be found:

• If A → α · is a complete valid item for a viableprefix γ = δα of a right-sentential form γw(w ∈ T ∗), then it appears that A → α can beused at the last step, and that the previousright-sentential form is δAw.

G52CMP: Lecture 7 – p.16/39

Shift-Reduce Parsing Theory (7)Knowing the valid items for a viable prefix allowsa rightmost derivation in reverse to be found:

• If A → α · is a complete valid item for a viableprefix γ = δα of a right-sentential form γw(w ∈ T ∗), then it appears that A → α can beused at the last step, and that the previousright-sentential form is δAw.

• If this indeed always is the case for a CFGG, then for any x ∈ L(G), since x is a right-sentential from, previous right-sententialforms can be determined until S is reached,giving a right-most derivation of x.

G52CMP: Lecture 7 – p.16/39

Shift-Reduce Parsing Theory (8)

Of course, if A → α · is a complete valid item fora viable prefix γ = δα, in general, we only know itmay be possible to use A → α to derive γwfrom δAw. For example:

G52CMP: Lecture 7 – p.17/39

Shift-Reduce Parsing Theory (8)

Of course, if A → α · is a complete valid item fora viable prefix γ = δα, in general, we only know itmay be possible to use A → α to derive γwfrom δAw. For example:

• A → α · may be valid because of a differentrightmost derivation S

∗⇒rm

δAw′ ⇒rm

φw′.

G52CMP: Lecture 7 – p.17/39

Shift-Reduce Parsing Theory (8)

Of course, if A → α · is a complete valid item fora viable prefix γ = δα, in general, we only know itmay be possible to use A → α to derive γwfrom δAw. For example:

• A → α · may be valid because of a differentrightmost derivation S

∗⇒rm

δAw′ ⇒rm

φw′.

• There could be two or more complete itemsvalid for γ.

G52CMP: Lecture 7 – p.17/39

Shift-Reduce Parsing Theory (8)

Of course, if A → α · is a complete valid item fora viable prefix γ = δα, in general, we only know itmay be possible to use A → α to derive γwfrom δAw. For example:

• A → α · may be valid because of a differentrightmost derivation S

∗⇒rm

δAw′ ⇒rm

φw′.

• There could be two or more complete itemsvalid for γ.

• There could be a handle of γw that includessymbols of w.

G52CMP: Lecture 7 – p.17/39

LR(0) Parsing (1)

• A CFG for which knowing a complete validitem is enough to determine the previousright-sentential form is called LR(0).

G52CMP: Lecture 7 – p.18/39

LR(0) Parsing (1)

• A CFG for which knowing a complete validitem is enough to determine the previousright-sentential form is called LR(0).

• For every CFG whatsoever, the set of viableprefixes is regular .

G52CMP: Lecture 7 – p.18/39

LR(0) Parsing (1)

• A CFG for which knowing a complete validitem is enough to determine the previousright-sentential form is called LR(0).

• For every CFG whatsoever, the set of viableprefixes is regular .

• Thus, an efficient parser can be developed foran LR(0) CFG based on a DFA forrecognising viable prefixes and their validitems.

G52CMP: Lecture 7 – p.18/39

LR(0) Parsing (1)

• A CFG for which knowing a complete validitem is enough to determine the previousright-sentential form is called LR(0).

• For every CFG whatsoever, the set of viableprefixes is regular .

• Thus, an efficient parser can be developed foran LR(0) CFG based on a DFA forrecognising viable prefixes and their validitems.

• The states of the DFA are sets of items validfor a recognised viable prefix.

G52CMP: Lecture 7 – p.18/39

LR(0) Parsing (2)A DFA recognising viable prefixes for the CFG

S → aABe A → bcA | c B → d

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

G52CMP: Lecture 7 – p.19/39

LR(0) Parsing (3)

Drawing conventions for “LR DFAs”:• For the purpose of recognizing the set of

viable prefixes, all drawn states areconsidered accepting.

• Error transitions and error states are notdrawn.

G52CMP: Lecture 7 – p.20/39

LR(0) Parsing (4)

How to construct such a DFA is beyond thescope of this course. See e.g. Aho, Sethi, Ullman(1986) for details. However, some observations:

G52CMP: Lecture 7 – p.21/39

LR(0) Parsing (4)

How to construct such a DFA is beyond thescope of this course. See e.g. Aho, Sethi, Ullman(1986) for details. However, some observations:

• Recall that the viable prefixes for theright-sentential form abcAde are ǫ, a, ab, abc,abcA. They are indeed all recognised by theDFA (all states are considered accepting).

G52CMP: Lecture 7 – p.21/39

LR(0) Parsing (4)

How to construct such a DFA is beyond thescope of this course. See e.g. Aho, Sethi, Ullman(1986) for details. However, some observations:

• Recall that the viable prefixes for theright-sentential form abcAde are ǫ, a, ab, abc,abcA. They are indeed all recognised by theDFA (all states are considered accepting).

• Recall that the item A → bc · A is valid for theviable prefix abc. The corresponding DFAstate indeed contains that item. (Along withmore items in this case!)

G52CMP: Lecture 7 – p.21/39

LR(0) Parsing (5)

• Recall that item A → bcA · is a complete validitem for the viable prefix abcA. Thecorresponding DFA state indeed contains thatitem (and only that item).

G52CMP: Lecture 7 – p.22/39

LR(0) Parsing (6)

Given a DFA recognising viable prefixes, anLR(0) parser can be constructed as follows:

G52CMP: Lecture 7 – p.23/39

LR(0) Parsing (6)

Given a DFA recognising viable prefixes, anLR(0) parser can be constructed as follows:

• In a state without complete items : Shift

G52CMP: Lecture 7 – p.23/39

LR(0) Parsing (6)

Given a DFA recognising viable prefixes, anLR(0) parser can be constructed as follows:

• In a state without complete items : Shift- Read next terminal symbol and push it

onto an internal parse stack.

G52CMP: Lecture 7 – p.23/39

LR(0) Parsing (6)

Given a DFA recognising viable prefixes, anLR(0) parser can be constructed as follows:

• In a state without complete items : Shift- Read next terminal symbol and push it

onto an internal parse stack.- Move to new state by following the edge

labelled by the read terminal.

G52CMP: Lecture 7 – p.23/39

LR(0) Parsing (7)

• In a state with a single complete item : Reduce

G52CMP: Lecture 7 – p.24/39

LR(0) Parsing (7)

• In a state with a single complete item : Reduce- The top of the parse stack contains the

handle of the current right-sentential form(since we have recognised a viable prefixfor which a single complete item is valid).

G52CMP: Lecture 7 – p.24/39

LR(0) Parsing (7)

• In a state with a single complete item : Reduce- The top of the parse stack contains the

handle of the current right-sentential form(since we have recognised a viable prefixfor which a single complete item is valid).

- The handle is just the RHS of the valid item.

G52CMP: Lecture 7 – p.24/39

LR(0) Parsing (7)

• In a state with a single complete item : Reduce- The top of the parse stack contains the

handle of the current right-sentential form(since we have recognised a viable prefixfor which a single complete item is valid).

- The handle is just the RHS of the valid item.- Reduce to the previous right-sentential

form by replacing the handle on the parsestack with the LHS of the valid item.

G52CMP: Lecture 7 – p.24/39

LR(0) Parsing (7)

• In a state with a single complete item : Reduce- The top of the parse stack contains the

handle of the current right-sentential form(since we have recognised a viable prefixfor which a single complete item is valid).

- The handle is just the RHS of the valid item.- Reduce to the previous right-sentential

form by replacing the handle on the parsestack with the LHS of the valid item.

- Move to the state indicated by the newviable prefix on the parse stack.

G52CMP: Lecture 7 – p.24/39

LR(0) Parsing (8)

• If a state contains both complete andincomplete items, or if a state contains morethan one complete item, then the grammar isnot LR(0) .

G52CMP: Lecture 7 – p.25/39

LR(0) Parsing (9)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

Note: γw is the current right-sentential form.State Stack (γ) Input (w) Move

I0 ǫ abccde

G52CMP: Lecture 7 – p.26/39

LR(0) Parsing (9)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

Note: γw is the current right-sentential form.State Stack (γ) Input (w) Move

I0 ǫ abccde Shift

G52CMP: Lecture 7 – p.26/39

LR(0) Parsing (9)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

Note: γw is the current right-sentential form.State Stack (γ) Input (w) Move

I0 ǫ abccde ShiftI1 a bccde

G52CMP: Lecture 7 – p.26/39

LR(0) Parsing (9)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

Note: γw is the current right-sentential form.State Stack (γ) Input (w) Move

I0 ǫ abccde ShiftI1 a bccde Shift

G52CMP: Lecture 7 – p.26/39

LR(0) Parsing (10)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI2 ab ccde

G52CMP: Lecture 7 – p.27/39

LR(0) Parsing (10)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI2 ab ccde Shift

G52CMP: Lecture 7 – p.27/39

LR(0) Parsing (10)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI2 ab ccde ShiftI3 abc cde

G52CMP: Lecture 7 – p.27/39

LR(0) Parsing (10)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI2 ab ccde ShiftI3 abc cde Shift

G52CMP: Lecture 7 – p.27/39

LR(0) Parsing (10)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI2 ab ccde ShiftI3 abc cde ShiftI5 abcc de

G52CMP: Lecture 7 – p.27/39

LR(0) Parsing (10)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI2 ab ccde ShiftI3 abc cde ShiftI5 abcc de Reduce by A → c

G52CMP: Lecture 7 – p.27/39

LR(0) Parsing (11)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI4 abcA de

G52CMP: Lecture 7 – p.28/39

LR(0) Parsing (11)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI4 abcA de Reduce by A → bcA

G52CMP: Lecture 7 – p.28/39

LR(0) Parsing (11)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI4 abcA de Reduce by A → bcAI6 aA de

G52CMP: Lecture 7 – p.28/39

LR(0) Parsing (11)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI4 abcA de Reduce by A → bcAI6 aA de Shift

G52CMP: Lecture 7 – p.28/39

LR(0) Parsing (11)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI4 abcA de Reduce by A → bcAI6 aA de ShiftI7 aAd e

G52CMP: Lecture 7 – p.28/39

LR(0) Parsing (11)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI4 abcA de Reduce by A → bcAI6 aA de ShiftI7 aAd e Reduce by B → d

G52CMP: Lecture 7 – p.28/39

LR(0) Parsing (12)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI8 aAB e

G52CMP: Lecture 7 – p.29/39

LR(0) Parsing (12)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI8 aAB e Shift

G52CMP: Lecture 7 – p.29/39

LR(0) Parsing (12)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI8 aAB e ShiftI9 aABe ǫ

G52CMP: Lecture 7 – p.29/39

LR(0) Parsing (12)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI8 aAB e ShiftI9 aABe ǫ Reduce by S → aABe

G52CMP: Lecture 7 – p.29/39

LR(0) Parsing (12)

a

b c

cc

b

A

A

d e

S · aABeB

B d ·

A bcA · A bc · A A · bcA A · c

A c ·

S a · ABe A · bcA A · c

A b · cA

S aAB · e

S aABe ·

S aA · BeB · d

I0I1

I2

I3I4

I5

I6

I7

I8

I9

State Stack (γ) Input (w) MoveI8 aAB e ShiftI9 aABe ǫ Reduce by S → aABe

S ǫ DoneG52CMP: Lecture 7 – p.29/39

LR(0) Parsing (13)Complete sequence (γw is right-sentential form):

State Stack (γ) Input (w) MoveI0 ǫ abccde ShiftI1 a bccde ShiftI2 ab ccde ShiftI3 abc cde ShiftI5 abcc de Reduce by A → cI4 abcA de Reduce by A → bcAI6 aA de ShiftI7 aAd e Reduce by B → dI8 aAB e ShiftI9 aABe ǫ Reduce by S → aABe

S ǫ DoneCf: S ⇒

rm

aABe ⇒rm

aAde ⇒rm

abcAde ⇒rm

abccde

G52CMP: Lecture 7 – p.30/39

LR(0) Parsing (14)

Even more clear that the parser carries out therightmost derivation in reverse if we look at theright-sentential forms γw of the reduction stepsonly:

abccde ⇐rm

abcAde ⇐rm

aAde ⇐rm

aABe ⇐rm

S

G52CMP: Lecture 7 – p.31/39

LR Parsing & Left/Right Recursion (1)

Remark: Note how the right-recursiveproduction

A → bcA

causes symbols bc to pile up on the parse stackuntil a reduction by

A → c

can occur, in turn allowing the stacked symbolsto be reduced away.

G52CMP: Lecture 7 – p.32/39

LR Parsing & Left/Right Recursion (2)

Even clearer if considering parsing of a stringslike

abcbcbccde or abcbcbcbcbccde

Exercise: Try parsing these!

Left-recursion allows reduction to happensooner, thus keeping the size of the parse stackdown. This is why left-recursive grammars oftenare preferred for LR parsing.

G52CMP: Lecture 7 – p.33/39

LR(1) Grammars (1)

• In practice, LR(0) tends to be a bit toorestrictive.

G52CMP: Lecture 7 – p.34/39

LR(1) Grammars (1)

• In practice, LR(0) tends to be a bit toorestrictive.

• If we add one symbol of “lookahead” bydetermining the set of terminals thatpossibly could follow a handle beingreduced by a production A → β, then a widerclass of grammars can be handled.

G52CMP: Lecture 7 – p.34/39

LR(1) Grammars (1)

• In practice, LR(0) tends to be a bit toorestrictive.

• If we add one symbol of “lookahead” bydetermining the set of terminals thatpossibly could follow a handle beingreduced by a production A → β, then a widerclass of grammars can be handled.

• Such grammars are called LR(1).

G52CMP: Lecture 7 – p.34/39

LR(1) Grammars (2)Idea:

G52CMP: Lecture 7 – p.35/39

LR(1) Grammars (2)Idea:

• Associate a lookahead set with items:

A → α · β, {a1, a2, . . . , an}

G52CMP: Lecture 7 – p.35/39

LR(1) Grammars (2)Idea:

• Associate a lookahead set with items:

A → α · β, {a1, a2, . . . , an}

• On reduction, a complete item is only valid ifthe next input symbol belongs to itslookahead set.

G52CMP: Lecture 7 – p.35/39

LR(1) Grammars (2)Idea:

• Associate a lookahead set with items:

A → α · β, {a1, a2, . . . , an}

• On reduction, a complete item is only valid ifthe next input symbol belongs to itslookahead set.

• Thus it is OK to have two or moresimultaneously valid complete items, as longas their lookahead sets are disjoint .

(Similar to predictive recursive-descent parsing.)G52CMP: Lecture 7 – p.35/39

LR(k) Grammars

It is possible to have more than one symbol oflookahead:

G52CMP: Lecture 7 – p.36/39

LR(k) Grammars

It is possible to have more than one symbol oflookahead:

• In general, a grammar that may be parsedwith k symbols of lookahead is called LR(k).

G52CMP: Lecture 7 – p.36/39

LR(k) Grammars

It is possible to have more than one symbol oflookahead:

• In general, a grammar that may be parsedwith k symbols of lookahead is called LR(k).

• However, k > 1 does not add to the class oflanguages that can be defined.

G52CMP: Lecture 7 – p.36/39

LALR Grammars

A problem with LR(k) parsers is that the DFAsbecome very large.

G52CMP: Lecture 7 – p.37/39

LALR Grammars

A problem with LR(k) parsers is that the DFAsbecome very large.

• LALR (“lookahead LR”) is a simplifiedconstruction that leads to much smaller DFAs.

G52CMP: Lecture 7 – p.37/39

LALR Grammars

A problem with LR(k) parsers is that the DFAsbecome very large.

• LALR (“lookahead LR”) is a simplifiedconstruction that leads to much smaller DFAs.

• The basic idea is to reduce the number ofstates by merging sets of LR(1) items thatare “similar”.

G52CMP: Lecture 7 – p.37/39

LALR Grammars

A problem with LR(k) parsers is that the DFAsbecome very large.

• LALR (“lookahead LR”) is a simplifiedconstruction that leads to much smaller DFAs.

• The basic idea is to reduce the number ofstates by merging sets of LR(1) items thatare “similar”.

• LALR places some additional constraints on agrammar, but those constraints are not toosevere in practice. Most programminglanguages have LALR grammars.

G52CMP: Lecture 7 – p.37/39

Summary (1)

• For any CFG, the set of viable prefixes is aregular language , and can thus berecognized by a DFA.

• Each state of this DFA can be labelled by theset of items that are valid for the viableprefixes that particular state accepts.

G52CMP: Lecture 7 – p.38/39

Summary (2)• If the grammar is such that each complete item

labels a state of its own , then we can findthe viable prefix of the previous right-sententialform in a right-most derivation by replacingthe right-hand side of the item, the handle, bythe non-terminal to the left of the arrow. Thisis a reduction step .

• By repeatedly shifting (reading) inputsymbols and reducing when possible, wehave a way to parse the input bottom-up.

• A grammar satisfying the above condition isan LR(0) grammar.

G52CMP: Lecture 7 – p.39/39