Download - ML-YACC

ML-YACC

David Walker

COS 320

Outline

• Last Week– Introduction to Lexing, CFGs, and Parsing

• Today:– More parsing:

• automatic parser generation via ML-Yacc

– Reading: Chapter 3 of Appel

Parser Implementation• Implementation Options:

1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a

weekend in the Bahamas

2. Use a Parser Generator– Very general & robust. sometimes not quite as

efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.

Parser Specification







parsergenerator

Parser







parsergenerator

Parser

abstract syntax

stream oftokens

ML-Yacc specification

• three parts:

User Declarations: declare values available in the rule actions

%%

ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts

%%

Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc declarations (preliminaries)

• specify type of positions%pos int * int

• specify terminal and nonterminal symbols%term IF | THEN | ELSE | PLUS | MINUS ...%nonterm prog | exp | op

• specify end-of-parse token%eop EOF

• specify start symbol (by default, non terminal in LHS of first rule)

%start prog

attribute-grammars

• ML-Yacc uses an attribute-grammar scheme– each nonterminal may have a semantic value

associated with it– when the parser reduces with (X ::= s)

• a semantic action will be executed• uses semantic values from symbols in s

– when parsing is completed successfully• parser returns semantic value associated with the

start symbol• usually a parse tree

attribute-grammars

• semantic actions typically build the abstract syntax for the internal language

• to use semantic values during parsing, we must declare symbol types:– %terminal NUM of int | PLUS | MUL | ...– %nonterminal exp of int | fact of int | base of int

• type of semantic action must match type declared for LHS nonterminal in rule

ML-Yacc with Semantic Actions

datatype exp = Int of int | Add of exp * exp | Mul of exp * exp

%%...%%

exp : fact (fact) | fact PLUS exp (Add (fact, exp))

fact : base (base) | base MUL exp (Mul (base, exp))

base : NUM (Int NUM) | LPAR exp RPAR (exp)

computingabstract syntaxvia semanticactions

A simpler grammar


%%...%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)

why don’t we just use this simpler grammar?

A simpler grammar


%%...%%


this grammar isambiguous!

NUM + NUM * NUM

NUMNUM

NUM+

*E E

E

E E

NUMNUM

NUM *

+ EE

E

E E

a simpler grammar


%%...%%


But it is so cleanthat it would be nice to use. Moreover, weknow which parsetree we want. Wejust need a mechanism to specify it!

NUM + NUM * NUM

NUMNUM

NUM+

*E E

E

E E

NUMNUM

NUM *

+ EE

E

E E

Recall how LR parsing works:

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

NUM + NUM * NUM

State of parse so far:

Input from lexer:

E + E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

We have a shift-reduce conflict.What should we do to get the right parse?

elements ofdesired parseparsed so far



NUM + NUM * NUM


Input from lexer:

E + E *

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

We have a shift-reduce conflict.What should we do to get the right parse?SHIFT




NUM + NUM * NUM


Input from lexer:

E + E * NUM

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:


SHIFT SHIFT



NUM + NUM * NUM


Input from lexer:

E + E * E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:


REDUCE



NUM + NUM * NUM


Input from lexer:

E + E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:


REDUCE



NUM + NUM * NUM


Input from lexer:

E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:


REDUCE

The alternative parse


NUM + NUM * NUM


Input from lexer:

E + E

yet to read

We have a shift-reduce conflict.Suppose we REDUCE next

elementsparsed so far

NUMNUM

+E E



NUM + NUM * NUM


Input from lexer:

E

yet to read

REDUCE


NUMNUM

+E E

E



NUM + NUM * NUM


Input from lexer:

E * E

yet to read

Now: SHIFT SHIFT REDUCE


NUMNUM

+E E

E E

NUM

*



NUM + NUM * NUM


Input from lexer:

E

yet to read

REDUCE

NUMNUM

NUM+

*E E

E

E E


Summary


NUM + NUM * NUM


Input from lexer:

E + E

yet to read

NUMNUM

NUM *

+ EE

E

E E

desired parse tree:

We have a shift-reduce conflict.We have E + E on stack, we see *.We want to shift. We ALWAYS want toshift since * has higher precedence than +==> symbols to the right on the stack get processed first


Example 2

exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR

NUM - NUM - NUM


Input from lexer:

E - E

yet to read

We have a shift-reduce conflict.We have E - E on stack, we see -.We want “-” to be a left-associative operator.ie: NUM – NUM – NUM == ((NUM – NUM) – NUM)What do we do?

NUMNUM

-E E


Example 2


NUM - NUM - NUM


Input from lexer:

E

yet to read

We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do?REDUCE

NUMNUM

-E E


E

Example 2


NUM - NUM - NUM


Input from lexer:

E - E

yet to read

SHIFT SHIFT REDUCE

NUMNUM

NUM-

-E E

E E


Example 2


NUM - NUM - NUM


Input from lexer:

E

yet to read

REDUCE

NUMNUM

NUM-

-E E

E

E E


Example 2: Summary


NUM - NUM - NUM


Input from lexer:

E

yet to readNUMNUM

NUM-

-E E

E

E E


We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do? REDUCE. We ALWAYSwant to reduce since – is left-associative.

precedence and associativity

• three solutions to dealing with operator precedence and associativity:1) let Yacc complain.

• its default choice is to shift when it encounters a shift-reduce error

• BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant

2) rewrite the grammar to eliminate ambiguity• can be complicated and less clear

3) use Yacc precedence directives• %left, %right %nonassoc

precedence and associativity• given directives, ML-Yacc assigns precedence to each

terminal and rule– precedence of terminal based on order in which associativity is

specified– precedence of rule is the precedence of the right-most terminal

• eg: precedence of (E ::= E + E) == prec(+)

• a shift-reduce conflict is resolved as follows– prec(terminal) > prec(rule) ==> shift– prec(terminal) < prec(rule) ==> reduce– prec(terminal) = prec(rule) ==>

• assoc(terminal) = left ==> reduce• assoc(terminal) = right ==> shift• assoc(terminal) = nonassoc ==> report as error

........E % E

....................T E

yet to read

input: terminal T next:

RHS of rule on stack:


...E PLUS E

....................MUL E

yet to read



precedence directives:


prec(MUL) > prec(PLUS)


... E PLUS E

....................MUL E

yet to read





prec(MUL) > prec(PLUS)

SHIFT


...E PLUS E

....................SUB E

yet to read





prec(PLUS) = prec(SUB)


...E PLUS E

....................SUB E

yet to read





prec(PLUS) = prec(SUB)

REDUCE

the fixdatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp

%%

%left PLUS MINUS%left MUL DIV%left UMINUS

%%

exp : NUM (Int NUM) | MINUS exp %prec UMINUS (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

...E MINUS E

....................MUL E

yet to read

changing precedence of rulealters decision:

prec(UMINUS) > prec(MUL) ==> we REDUCE

the dangling else problem

• Grammar:S ::= if E then S else S | if E then S | ...

• Consider: if a then if b then S else S– parse 1: if a then (if b then S else S)– parse 2: if a then (if b then S) else S

• Parser reports shift-reduce error– in default behavior: shift (what we want)

default behavior of ML-Yacc

• Shift-Reduce error– shift

• Reduce-Reduce error– reduce by first rule– generally considered unacceptable

• for assignment 3, your job is to write a grammar for Fun such that there are no conflicts– you may use precedence directives tastefully

Note: To enter ML-Yacc hell, use a parser to catch type errors

• when doing assignment 3, your job is to catch parse errors

• there are lots of programming errors that will slip by the parser:– eg: 3 + true– catching these sorts of errors is the job of the type

checker– just as catching program structure errors was the job

of the parser, not the lexer– attempting to do type checking in the parser is

impossible (in general)• why? Hint: what does “context-free grammar” imply?