Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
-
Upload
griselda-clark -
Category
Documents
-
view
214 -
download
0
Transcript of Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.
![Page 1: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/1.jpg)
Lesson 10
CDT301 – Compiler Theory, Spring 2011Teacher: Linus Källberg
![Page 2: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/2.jpg)
2
Outline
• Flex• Bison• Abstract syntax trees
![Page 3: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/3.jpg)
FLEX
3
![Page 4: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/4.jpg)
Flex
• Tool for automatic generation of scanners• Open-source version of Lex• Takes regular expressions as input• Outputs a C (or C++) file for the scanner
4
![Page 5: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/5.jpg)
Flex
5
Regexps
mylexer.l
int yylex() …
mylexer.c
Flex C compiler01101000110101010…
mylexer.obj
![Page 6: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/6.jpg)
The input file to Flex
Definitions%%Rules%%User code
6
![Page 7: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/7.jpg)
The definitions section• Macro definitions:
– Specify a letter:letter [A-Za-z]
– Specify a delimiter:delimiter [ ,:;.]
– Specify a digit:digit [0-9]
– Specify an identifier:id letter(letter|digit)*
7
![Page 8: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/8.jpg)
The definitions section
• User code:%{#include <stdio.h>int a_nice_global_variable = 0;int my_favourite_function(void) {return 42;}%}
8
![Page 9: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/9.jpg)
The rules section
• Rule = regexp + C code• Longest matching pattern is used• If two equally long patterns match, the first one in
the file is used• Examples:=|>=?|<(=|>)? { return RELOP; }{id} { return ID; }
9
![Page 10: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/10.jpg)
The regexp language of Flex
? Previous regexp is optional{} Macro expansion (defined in the definitions
section). Matches any character that is not end of
line$ Matches the end of a line^ Matches the beginning of a line[] Matches any enclosed character
10
![Page 11: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/11.jpg)
The [] syntax
• Similar to | but more powerful• Example:
digit [0123456789]is the same as
digit 0|1|2|3|4|5|6|7|8|9• Special characters inside the brackets: – and ^
digit [0-9] letter [A-Za-z]non_digit [^0-9]
11
![Page 12: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/12.jpg)
The user code section
• Only C code valid here• Will be copied unchanged to the
generated C file
12
![Page 13: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/13.jpg)
The generated scanner
• By default, a function called yylex() is defined– Works similar to your GetNextToken() from lab 1– The name can be changed with options
• Some globals are defined as well (can be changed into local variables with options):
yyin The file to read from yytext The matched lexeme (char*) yyleng The length of yytext yylineno Line number of the match
13
![Page 14: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/14.jpg)
The yywrap() function
• Called upon end-of-file• Should be supplied by the user• Suppressed with %option noyywrap
or --noyywrap
14
![Page 15: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/15.jpg)
Scanner states in Flex
• Affects what tokens should be recognized• Example from the language ALF:{ fref 32 DEADC0DE } <- Identifier{ hex_val DEADC0DE } <- Hex
constant
15
![Page 16: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/16.jpg)
Scanner states in Flex
• Declare state:%x READ_HEX
• Use the state to make rules conditional:hex_val { BEGIN(READ_HEX); return HEX_VAL_KW; }[a-zA-Z_][a-zA-Z0-9_]* { return ID; }<READ_HEX>[0-9a-fA-F]+ { BEGIN(INITIAL); return NUM; }
16
![Page 17: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/17.jpg)
Online resources
http://flex.sourceforge.net/manual/index.html
17
![Page 18: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/18.jpg)
BISON
18
![Page 19: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/19.jpg)
Bison
• Tool for automatic generation of parsers• Open-source alternative to Yacc• Takes an SDT scheme as input• Outputs C (or C++) source code for an LALR
parser• Commonly used together with Flex
19
![Page 20: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/20.jpg)
Bison
20
SDT scheme
myparser.yint parse() …
myparser.c
Bison C compiler01101000110101010…
myparser.obj
Token definitions
myparser.h
![Page 21: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/21.jpg)
The input file to Bison
Definitions%%SDT scheme%%User code
21
![Page 22: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/22.jpg)
Definitions section
• Define tokens• Define operator precedence• Define operator associativity• Define the types of grammar symbol attributes• Write C code between %{ and %}• Issue certain commands to Bison
22
![Page 23: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/23.jpg)
Token definition
• Normal case:%token IDENTIFIER%token WHILE
• Token, precedence, associativity, and type:%left <Operator> RELOP%left <Operator> MINUSOP PLUSOP%right <Operator> NOTOP
• Enables use of ambiguous grammars!23
![Page 24: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/24.jpg)
Defining types
• Just enter the type inside <> before the list of tokens:
%left <Operator> RELOP%left <Operator> MULOP%right <Operator> NOTOP UNOP%token <String> ID STRING
• Or the same for non-terminals:%type <Node> stmnt expr actuals exprs24
![Page 25: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/25.jpg)
The variable yylval• Used by the lexical analyzer to store token attributes• Default type is int• May be given another type(s) using %union:
%union {int Operator;char *String;NODE_TYPE Node;}
• The type (member name) is then used like this:%token <String> ID STRING
25
![Page 26: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/26.jpg)
Code provided by the user
• yyerror(char* msg)– Function called on syntax errors
• yylex()– Function called to get the next token
26
![Page 27: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/27.jpg)
Options to Bison
• Given on the command line or in the grammar file• --defines or %defines: Output a C header file with
definitions useful to a scanner– Tokens (#defines) and the type on yylval
• %error-verbose: More detailed error messages• --name-prefix or %name-prefix: Change the default
“yy” prefix on all names• %define api.pure: Do not use globals• --verbose or %verbose: Write detailed information to
extra output file27
![Page 28: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/28.jpg)
Translation scheme sectiondecl : BASIC_TYPE idents ';'
;
idents : idents ',' ident| ident;
ident : ID;
28
![Page 29: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/29.jpg)
Semantic actions
• Written in C• Executed when the production is used in a
reduction• $$, $1, $2, etc. refer to the attributes of the
grammar symbols– Can be used as regular C variables– $$ refer to the attribute of the head, $1 to the
attribute of the first symbol in the body, etc.E : E '+' T { $$ = $1 + $3; } ;
29
![Page 30: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/30.jpg)
Using ambiguousgrammars in Bison
• Default actions:– Reduce/reduce: choose first rule in file– Shift/reduce: always shift
• With explicit precedence and associativity:– Shift/reduce: Compare prec/ass of rule with
that of lookahead token
30
![Page 31: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/31.jpg)
The %expect declaration
• To suppress shift/reduce warnings:%expect n
where n is the exact nr of conflicts
31
![Page 32: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/32.jpg)
Contextual precedence
• Same token might have different precedence depending on context:
expr → expr – expr| expr * expr| – expr| id
32
Stack Input
… – expr* expr …
![Page 33: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/33.jpg)
Contextual precedence
• Define dummy token:%left '-'%left '*'%left UMINUS
• Use the %prec modifier:expr → – expr %prec UMINUS
33
![Page 34: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/34.jpg)
Examples of parser configurations
Stack Input Action… if (cond) stmt else … shift
Stack Input Action… expr + expr * … shift
Stack Input Action… expr * expr + … red. expr → expr * expr
Stack Input Action… expr * expr * … red. expr → expr * expr
34
![Page 35: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/35.jpg)
Online resources
http://www.gnu.org/software/bison/manual/html_node/index.html
35
![Page 36: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/36.jpg)
ABSTRACT SYNTAX TREES
36
![Page 37: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/37.jpg)
Abstract syntax trees
• “AST” or just “syntax tree”
37
E
E E
a
+
E E
b5 *
+
*a
b5
![Page 38: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/38.jpg)
Syntax trees vs. parse trees
Parse trees:• Interior nodes are
nonterminals, leaves are terminals
• Rarely constructed as an explicit data structure
• Represents the concrete syntax
Syntax trees:• Interior nodes are
“operators”, leaves are operands
• Commonly constructed as an explicit data structure
• Represents the abstract syntax
38
![Page 39: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/39.jpg)
Why syntax trees?
• Simplifies subsequent analyses• Independent on the parsing strategy• Makes it easier to add new analysis passes
without having to modify the parser• More compact representation than parse
trees
39
![Page 40: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/40.jpg)
Syntax tree exampleif (a < 1) b = 2 + 3;else { c = d * 4; e(f, 5); }
40
if
< =
a
=
c
call e
f*1 b +
2 3 d 4
null
nullnull
5 null
![Page 41: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/41.jpg)
Exercise (1)
• Draw an abstract syntax tree for the statement
while (i < 100) { x = 2 * x; i = i + 1; }
41
![Page 42: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/42.jpg)
Constructing a syntaxtree in Bison
expr : expr '+' expr { $$ = createOpNode($1, '+' ,$3); }| expr '*' expr { $$ = createOpNode($1, '*' ,$3); }| ID { $$ = createIdNode($1.name); };
42
![Page 43: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/43.jpg)
Constructing a syntaxtree in Bison
stmt : RETURN expr ';' { $$ = mReturn($2, $1); } ;
stmts : stmts stmt { $$ = connectStmts($1, $2); }| { $$ = NULL; };
43
![Page 44: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/44.jpg)
Conclusion
• Flex generates C source code for a scanner given a set of regular expressions
• Bison generates C source code for a bottom-up parser given a syntax-directed translation scheme
• Building syntax trees simplifies subsequent analyses of the program
• Syntax trees can be built in semantic actions44
![Page 45: Lesson 10 CDT301 – Compiler Theory, Spring 2011 Teacher: Linus Källberg.](https://reader030.fdocuments.in/reader030/viewer/2022032607/56649ebb5503460f94bc4348/html5/thumbnails/45.jpg)
Next time
• Syntax-directed definitions and translation schemes
• Semantic analysis and type analysis
45