Lab 3: Using ML-Yacc
description
Transcript of Lab 3: Using ML-Yacc
Lab 3: Using ML-Yacc
Zhong [email protected]
How to write a parser? Write a parser by hand Use a parser generator
May not be as efficient as hand-written parser General and robust How it works?
Parser Specification parser
generator
Parser
abstract syntax
stream oftokens
ML-Yacc specification Three parts again
User Declarations: declare values available in the rule actions
%%
ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts
%%
Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax
ML-Yacc Definitions specify type of positions
%pos int * int specify terminal and nonterminal symbols
%term IF | THEN | ELSE | PLUS | MINUS ...%nonterm prog | exp | op
specify end-of-parse token%eop EOF
specify start symbol (by default, non terminal in LHS of first rule)
%start prog
A Simple ML-Yacc File%%
%term NUM | PLUS | MUL | LPAR | RPAR%nonterm exp | fact | base
%pos int%start exp%eop EOF
%%
exp : fact () | fact PLUS exp ()
fact : base () | base MUL factor ()
base : NUM () | LPAR exp RPAR ()
grammar rules
semantic actions(currentlydo nothing)
grammarsymbols
each nonterminal may have a semantic value associated with it
when the parser reduces with (X ::= s) a semantic action will be executed uses semantic values from symbols in s
when parsing is completed successfully parser returns semantic value associated with the
start symbol usually a syntax tree
to use semantic values during parsing, we must declare symbol types: %terminal NUM of int | PLUS | MUL | ... %nonterminal exp of int | fact of int | base of int
type of semantic action must match type declared for the nonterminal in rule
A Simple ML-Yacc File with Action%%
%term NUM of int | PLUS | MUL | LPAR | RPAR%nonterm exp of int | fact of int | base of int
%pos int%start exp%eop EOF
%%
exp : fact (fact) | fact PLUS exp (fact + exp)
fact : base (base) | base MUL base (base1 * base2)
base : NUM (NUM) | LPAR exp RPAR (exp)
grammar ruleswithsemantic actions
grammarsymbolswithtypedeclarations
computinginteger resultvia semanticactions
Conflicts in ML-Yacc We often write ambiguous grammar
Example Tokens from lexer
NUM PLUS NUM MUL NUM
State of Parser E+E
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
To be read
Conflicts in ML-Yacc We often write ambiguous grammar
Example Tokens from lexer
NUM PLUS NUM MUL NUM
State of Parser E+E Result is : E+(E*E)
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
To be read
Shift E+E*Shift E+E*EReduce E+EReduce E
If we shift
Conflicts in ML-Yacc We often write ambiguous grammar
Example Tokens from lexer
NUM PLUS NUM MUL NUM
State of Parser E+E Result is: (E+E)*E
exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR
To be read
Reduce EShift E*Shift E*EReduce E
If we reduce
This is a shift-reduce conflict We want E+E*E, because “*” has higher
precedence than “+” Another shift-reduce conflict
Tokens from lexer NUM PLUS NUM PLUS NUM
State of Parser E+E Result is : E+(E+E) and (E+E)+E
To be read
Shift E+E+Shift E+E+EReduce E+EReduce E
If we shift
Reduce EShift E+Shift E+EReduce E
If we reduce
Deal with shift-reduce conflicts This case, we need to reduce, because “+” is
left associative Deal with it!
let ML-Yacc complain. default choice is to shift when it encounters a shift-
reduce error BAD: programmer intentions unclear; harder to debug
other parts of your grammar; generally inelegant rewrite the grammar to eliminate ambiguity
can be complicated and less clear use Yacc precedence directives
%left, %right %nonassoc
Precedence and Associativity precedence of terminal based on order in
which associativity is specified precedence of rule is the precedence of the
right-most terminal eg: precedence of (E ::= E + E) == prec(+)
a shift-reduce conflict is resolved as follows prec(terminal) > prec(rule) ==> shift prec(terminal) < prec(rule) ==> reduce prec(terminal) = prec(rule) ==>
assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error
datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp
%%
%left PLUS MINUS%left MUL DIV
%%
exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)
Higher precedence
Reduce-reduce Conflict This kind of conflict is more difficult to deal
with Example
When we get a “word” from lexer, word -> maybeword -> sequence (rule 1) empty –> sequence word -> sequence (rule 2)
We have more than one way to get “sequence” from input “word”
sequence::= | maybeword | sequence wordmaybeword: := | word
Reduce-reduce Conflict Reduce-reduce conflict means there are two
or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar.
ML-Yacc reduce by first rule Generally, reduce-reduce conflict is not allowed in
your ML-Yacc file We need to fix our grammarsequence::=
| sequence word
Summary of conflicts Shift-reduce conflict
precedence and associativity Shift by default
Reduce-reduce conflict reduce by first rule Not allowed!
Lab3 Your job is to finish a parser for C language Input: A “.c” file Output: “Success!” if the “.c” file is correct File description
c.lex c.grm main.sml call-main.sml sources.cm lab3.mlb test.c
Using ML-Yacc Read the ML-Yacc Manual Run
If your finish “c.grm” and “c.lex” In command-line: (use MLton’s)
mlyacc c.grm mllex c.lex
we will get “c.grm.sig”, “c.grm.sml”, “c.grm.desc”, “c.lex.sml”
Then compile Lab3 Start SML/NJ, Run CM.make “sources.cm”; or in command-line, mlton lab3.mlb
To run lab3 In SML/NJ, Main.parse “test.c”; or in command-line, lab3 test.c
“Debug” ML-Yacc File When you run mlyacc, you’ll see error messages
if your ml-yacc file has conflicts. For example, mlyacc c.grm
2 shift/reduce conflicts open file “c.grm.desc”(This file is generated by
mlyacc) The beginning of this file
the rest are all the states
rule 12 means the 12th rule (from 0) in your ML-Yacc file
2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12)error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12)
state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog goto 429 structs
goto 2 structdec goto 1 .reduce by rule 12
Use ML-lex with ML-yacc Most of the work in “c.lex” this time can be
copied from Lab2 You can re-use Regular expressions and
Lexical rules Difference with Lab2
You have to define “token” in “c.grm” %term INT of int | EOF “%term” in “c.grm” will be automatically in “c.grm.sig”signature C_TOKENS =
sigtype ('a,'b) tokentype svalueval EOF: 'a * 'a -> (svalue,'a) tokenval INT: (int) * 'a * 'a -> (svalue,'a) tokenend
Hints Read ML-Yacc Manual Read the language specification Test a lot!