ML-YACC
David Walker
COS 320
Outline
• Last Week– Introduction to Lexing, CFGs, and Parsing
• Today:– More parsing:
• automatic parser generation via ML-Yacc
– Reading: Chapter 3 of Appel
Parser Implementation• Implementation Options:
1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a
weekend in the Bahamas
2. Use a Parser Generator– Very general & robust. sometimes not quite as
efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.
Parser Specification
Parser Implementation• Implementation Options:
1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a
weekend in the Bahamas
2. Use a Parser Generator– Very general & robust. sometimes not quite as
efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.
Parser Specification
parsergenerator
Parser
Parser Implementation• Implementation Options:
1. Write a Parser from scratch– not as boring as writing a lexer, but not exactly a
weekend in the Bahamas
2. Use a Parser Generator– Very general & robust. sometimes not quite as
efficient as hand-written parsers. Nevertheless, good for lazy compiler writers.
Parser Specification
parsergenerator
Parser
abstract syntax
stream oftokens
ML-Yacc specification
• three parts:
User Declarations: declare values available in the rule actions
%%
ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts
%%
Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax
ML-Yacc declarations (preliminaries)
• specify type of positions%pos int * int
• specify terminal and nonterminal symbols%term IF | THEN | ELSE | PLUS | MINUS ...%nonterm prog | exp | op
• specify end-of-parse token%eop EOF
• specify start symbol (by default, non terminal in LHS of first rule)
%start prog
Simple ML-Yacc Example%%
%term NUM | PLUS | MUL | LPAR | RPAR%nonterm exp | fact | base
%pos int%start exp%eop EOF
%%
exp : fact () | fact PLUS exp ()
fact : base () | base MUL factor ()
base : NUM () | LPAR exp RPAR ()
grammar rules
semantic actions(currentlydo nothing)
grammarsymbols
attribute-grammars
• ML-Yacc uses an attribute-grammar scheme– each nonterminal may have a semantic value
associated with it– when the parser reduces with (X ::= s)
• a semantic action will be executed• uses semantic values from symbols in s
– when parsing is completed successfully• parser returns semantic value associated with the
start symbol• usually a parse tree
attribute-grammars
• semantic actions typically build the abstract syntax for the internal language
• to use semantic values during parsing, we must declare symbol types:– %terminal NUM of int | PLUS | MUL | ...– %nonterminal exp of int | fact of int | base of int
• type of semantic action must match type declared for LHS nonterminal in rule
ML-Yacc with Semantic Actions%%
%term NUM of int | PLUS | MUL | LPAR | RPAR%nonterm exp of int | fact of int | base of int
%pos int%start exp%eop EOF
%%
exp : fact (fact) | fact PLUS exp (fact + exp)
fact : base (base) | base MUL base (base1 * base2)
base : NUM (NUM) | LPAR exp RPAR (exp)
grammar ruleswithsemantic actions
grammarsymbolswithtypedeclarations
computinginteger resultvia semanticactions
ML-Yacc with Semantic Actions
datatype exp = Int of int | Add of exp * exp | Mul of exp * exp
%%...%%
exp : fact (fact) | fact PLUS exp (Add (fact, exp))
fact : base (base) | base MUL exp (Mul (base, exp))
base : NUM (Int NUM) | LPAR exp RPAR (exp)
computingabstract syntaxvia semanticactions
A simpler grammar
datatype exp = Int of int | Add of exp * exp | Mul of exp * exp
%%...%%
exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)
why don’t we just use this simpler grammar?
A simpler grammar
datatype exp = Int of int | Add of exp * exp | Mul of exp * exp
%%...%%
exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)
this grammar isambiguous!
NUM + NUM * NUM
NUMNUM
NUM+
*E E
E
E E
NUMNUM
NUM *
+ EE
E
E E
a simpler grammar
datatype exp = Int of int | Add of exp * exp | Mul of exp * exp
%%...%%
exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | LPAR exp RPAR (exp)
But it is so cleanthat it would be nice to use. Moreover, weknow which parsetree we want. Wejust need a mechanism to specify it!
NUM + NUM * NUM
NUMNUM
NUM+
*E E
E
E E
NUMNUM
NUM *
+ EE
E
E E
Recall how LR parsing works:
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
We have a shift-reduce conflict.What should we do to get the right parse?
elements ofdesired parseparsed so far
Recall how LR parsing works:
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E *
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
We have a shift-reduce conflict.What should we do to get the right parse?SHIFT
elements ofdesired parseparsed so far
Recall how LR parsing works:
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E * NUM
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
elements ofdesired parseparsed so far
SHIFT SHIFT
Recall how LR parsing works:
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E * E
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
elements ofdesired parseparsed so far
REDUCE
Recall how LR parsing works:
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
elements ofdesired parseparsed so far
REDUCE
Recall how LR parsing works:
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
elements ofdesired parseparsed so far
REDUCE
The alternative parse
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E
yet to read
We have a shift-reduce conflict.Suppose we REDUCE next
elementsparsed so far
NUMNUM
+E E
The alternative parse
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E
yet to read
REDUCE
elementsparsed so far
NUMNUM
+E E
E
The alternative parse
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E * E
yet to read
Now: SHIFT SHIFT REDUCE
elementsparsed so far
NUMNUM
+E E
E E
NUM
*
The alternative parse
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E
yet to read
REDUCE
NUMNUM
NUM+
*E E
E
E E
elementsparsed so far
Summary
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
NUM + NUM * NUM
State of parse so far:
Input from lexer:
E + E
yet to read
NUMNUM
NUM *
+ EE
E
E E
desired parse tree:
We have a shift-reduce conflict.We have E + E on stack, we see *.We want to shift. We ALWAYS want toshift since * has higher precedence than +==> symbols to the right on the stack get processed first
elements ofdesired parseparsed so far
Example 2
exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR
NUM - NUM - NUM
State of parse so far:
Input from lexer:
E - E
yet to read
We have a shift-reduce conflict.We have E - E on stack, we see -.We want “-” to be a left-associative operator.ie: NUM – NUM – NUM == ((NUM – NUM) – NUM)What do we do?
NUMNUM
-E E
elementsparsed so far
Example 2
exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR
NUM - NUM - NUM
State of parse so far:
Input from lexer:
E
yet to read
We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do?REDUCE
NUMNUM
-E E
elementsparsed so far
E
Example 2
exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR
NUM - NUM - NUM
State of parse so far:
Input from lexer:
E - E
yet to read
SHIFT SHIFT REDUCE
NUMNUM
NUM-
-E E
E E
elementsparsed so far
Example 2
exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR
NUM - NUM - NUM
State of parse so far:
Input from lexer:
E
yet to read
REDUCE
NUMNUM
NUM-
-E E
E
E E
elementsparsed so far
Example 2: Summary
exp ::= NUM | exp PLUS exp | exp MUL exp | exp MINUS exp | LPAR exp RPAR
NUM - NUM - NUM
State of parse so far:
Input from lexer:
E
yet to readNUMNUM
NUM-
-E E
E
E E
elementsparsed so far
We have a shift-reduce conflict.We have E - E on stack, we see -.What do we do? REDUCE. We ALWAYSwant to reduce since – is left-associative.
precedence and associativity
• three solutions to dealing with operator precedence and associativity:1) let Yacc complain.
• its default choice is to shift when it encounters a shift-reduce error
• BAD: programmer intentions unclear; harder to debug other parts of your grammar; generally inelegant
2) rewrite the grammar to eliminate ambiguity• can be complicated and less clear
3) use Yacc precedence directives• %left, %right %nonassoc
precedence and associativity• given directives, ML-Yacc assigns precedence to each
terminal and rule– precedence of terminal based on order in which associativity is
specified– precedence of rule is the precedence of the right-most terminal
• eg: precedence of (E ::= E + E) == prec(+)
• a shift-reduce conflict is resolved as follows– prec(terminal) > prec(rule) ==> shift– prec(terminal) < prec(rule) ==> reduce– prec(terminal) = prec(rule) ==>
• assoc(terminal) = left ==> reduce• assoc(terminal) = right ==> shift• assoc(terminal) = nonassoc ==> report as error
........E % E
....................T E
yet to read
input: terminal T next:
RHS of rule on stack:
precedence and associativity
datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp
%%
%left PLUS MINUS%left MUL DIV
%%
exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)
precedence and associativity
...E PLUS E
....................MUL E
yet to read
input: terminal T next:
RHS of rule on stack:
precedence directives:
%left PLUS MINUS%left MUL DIV
prec(MUL) > prec(PLUS)
precedence and associativity
... E PLUS E
....................MUL E
yet to read
input: terminal T next:
RHS of rule on stack:
precedence directives:
%left PLUS MINUS%left MUL DIV
prec(MUL) > prec(PLUS)
SHIFT
precedence and associativity
...E PLUS E
....................SUB E
yet to read
input: terminal T next:
RHS of rule on stack:
precedence directives:
%left PLUS MINUS%left MUL DIV
prec(PLUS) = prec(SUB)
precedence and associativity
...E PLUS E
....................SUB E
yet to read
input: terminal T next:
RHS of rule on stack:
precedence directives:
%left PLUS MINUS%left MUL DIV
prec(PLUS) = prec(SUB)
REDUCE
one more exampledatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp
%%
%left PLUS MINUS%left MUL DIV
%%
exp : NUM (Int NUM) | MINUS exp (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)
...MINUS E
....................MUL E
yet to read
what happens?
one more exampledatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp
%%
%left PLUS MINUS%left MUL DIV
%%
exp : NUM (Int NUM) | MINUS exp (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)
...MINUS E
....................MUL E
yet to read
what happens?
prec(*) > prec(-) ==> we SHIFT
the fixdatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp
%%
%left PLUS MINUS%left MUL DIV%left UMINUS
%%
exp : NUM (Int NUM) | MINUS exp %prec UMINUS (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)
...MINUS E
....................MUL E
yet to read
the fixdatatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp| Uminus of exp
%%
%left PLUS MINUS%left MUL DIV%left UMINUS
%%
exp : NUM (Int NUM) | MINUS exp %prec UMINUS (Uminus exp) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)
...E MINUS E
....................MUL E
yet to read
changing precedence of rulealters decision:
prec(UMINUS) > prec(MUL) ==> we REDUCE
the dangling else problem
• Grammar:S ::= if E then S else S | if E then S | ...
• Consider: if a then if b then S else S– parse 1: if a then (if b then S else S)– parse 2: if a then (if b then S) else S
• Parser reports shift-reduce error– in default behavior: shift (what we want)
the dangling else problem
• Grammar:S ::= if E then S else S | if E then S | ...
• Alternative solution is to rewrite grammar:S ::= M | UM ::= if E then M else M | ...U ::= if E then S | if E then M else U
default behavior of ML-Yacc
• Shift-Reduce error– shift
• Reduce-Reduce error– reduce by first rule– generally considered unacceptable
• for assignment 3, your job is to write a grammar for Fun such that there are no conflicts– you may use precedence directives tastefully
Note: To enter ML-Yacc hell, use a parser to catch type errors
• when doing assignment 3, your job is to catch parse errors
• there are lots of programming errors that will slip by the parser:– eg: 3 + true– catching these sorts of errors is the job of the type
checker– just as catching program structure errors was the job
of the parser, not the lexer– attempting to do type checking in the parser is
impossible (in general)• why? Hint: what does “context-free grammar” imply?
Top Related