Post on 31-Dec-2015
Lesson 10
CDT301 – Compiler Theory, Spring 2011Teacher: Linus Källberg
2
Outline
• Flex• Bison• Abstract syntax trees
FLEX
3
Flex
• Tool for automatic generation of scanners• Open-source version of Lex• Takes regular expressions as input• Outputs a C (or C++) file for the scanner
4
Flex
5
Regexps
mylexer.l
int yylex() …
mylexer.c
Flex C compiler01101000110101010…
mylexer.obj
The input file to Flex
Definitions%%Rules%%User code
6
The definitions section• Macro definitions:
– Specify a letter:letter [A-Za-z]
– Specify a delimiter:delimiter [ ,:;.]
– Specify a digit:digit [0-9]
– Specify an identifier:id letter(letter|digit)*
7
The definitions section
• User code:%{#include <stdio.h>int a_nice_global_variable = 0;int my_favourite_function(void) {return 42;}%}
8
The rules section
• Rule = regexp + C code• Longest matching pattern is used• If two equally long patterns match, the first one in
the file is used• Examples:=|>=?|<(=|>)? { return RELOP; }{id} { return ID; }
9
The regexp language of Flex
? Previous regexp is optional{} Macro expansion (defined in the definitions
section). Matches any character that is not end of
line$ Matches the end of a line^ Matches the beginning of a line[] Matches any enclosed character
10
The [] syntax
• Similar to | but more powerful• Example:
digit [0123456789]is the same as
digit 0|1|2|3|4|5|6|7|8|9• Special characters inside the brackets: – and ^
digit [0-9] letter [A-Za-z]non_digit [^0-9]
11
The user code section
• Only C code valid here• Will be copied unchanged to the
generated C file
12
The generated scanner
• By default, a function called yylex() is defined– Works similar to your GetNextToken() from lab 1– The name can be changed with options
• Some globals are defined as well (can be changed into local variables with options):
yyin The file to read from yytext The matched lexeme (char*) yyleng The length of yytext yylineno Line number of the match
13
The yywrap() function
• Called upon end-of-file• Should be supplied by the user• Suppressed with %option noyywrap
or --noyywrap
14
Scanner states in Flex
• Affects what tokens should be recognized• Example from the language ALF:{ fref 32 DEADC0DE } <- Identifier{ hex_val DEADC0DE } <- Hex
constant
15
Scanner states in Flex
• Declare state:%x READ_HEX
• Use the state to make rules conditional:hex_val { BEGIN(READ_HEX); return HEX_VAL_KW; }[a-zA-Z_][a-zA-Z0-9_]* { return ID; }<READ_HEX>[0-9a-fA-F]+ { BEGIN(INITIAL); return NUM; }
16
Online resources
http://flex.sourceforge.net/manual/index.html
17
BISON
18
Bison
• Tool for automatic generation of parsers• Open-source alternative to Yacc• Takes an SDT scheme as input• Outputs C (or C++) source code for an LALR
parser• Commonly used together with Flex
19
Bison
20
SDT scheme
myparser.yint parse() …
myparser.c
Bison C compiler01101000110101010…
myparser.obj
Token definitions
myparser.h
The input file to Bison
Definitions%%SDT scheme%%User code
21
Definitions section
• Define tokens• Define operator precedence• Define operator associativity• Define the types of grammar symbol attributes• Write C code between %{ and %}• Issue certain commands to Bison
22
Token definition
• Normal case:%token IDENTIFIER%token WHILE
• Token, precedence, associativity, and type:%left <Operator> RELOP%left <Operator> MINUSOP PLUSOP%right <Operator> NOTOP
• Enables use of ambiguous grammars!23
Defining types
• Just enter the type inside <> before the list of tokens:
%left <Operator> RELOP%left <Operator> MULOP%right <Operator> NOTOP UNOP%token <String> ID STRING
• Or the same for non-terminals:%type <Node> stmnt expr actuals exprs24
The variable yylval• Used by the lexical analyzer to store token attributes• Default type is int• May be given another type(s) using %union:
%union {int Operator;char *String;NODE_TYPE Node;}
• The type (member name) is then used like this:%token <String> ID STRING
25
Code provided by the user
• yyerror(char* msg)– Function called on syntax errors
• yylex()– Function called to get the next token
26
Options to Bison
• Given on the command line or in the grammar file• --defines or %defines: Output a C header file with
definitions useful to a scanner– Tokens (#defines) and the type on yylval
• %error-verbose: More detailed error messages• --name-prefix or %name-prefix: Change the default
“yy” prefix on all names• %define api.pure: Do not use globals• --verbose or %verbose: Write detailed information to
extra output file27
Translation scheme sectiondecl : BASIC_TYPE idents ';'
;
idents : idents ',' ident| ident;
ident : ID;
28
Semantic actions
• Written in C• Executed when the production is used in a
reduction• $$, $1, $2, etc. refer to the attributes of the
grammar symbols– Can be used as regular C variables– $$ refer to the attribute of the head, $1 to the
attribute of the first symbol in the body, etc.E : E '+' T { $$ = $1 + $3; } ;
29
Using ambiguousgrammars in Bison
• Default actions:– Reduce/reduce: choose first rule in file– Shift/reduce: always shift
• With explicit precedence and associativity:– Shift/reduce: Compare prec/ass of rule with
that of lookahead token
30
The %expect declaration
• To suppress shift/reduce warnings:%expect n
where n is the exact nr of conflicts
31
Contextual precedence
• Same token might have different precedence depending on context:
expr → expr – expr| expr * expr| – expr| id
32
Stack Input
… – expr* expr …
Contextual precedence
• Define dummy token:%left '-'%left '*'%left UMINUS
• Use the %prec modifier:expr → – expr %prec UMINUS
33
Examples of parser configurations
Stack Input Action… if (cond) stmt else … shift
Stack Input Action… expr + expr * … shift
Stack Input Action… expr * expr + … red. expr → expr * expr
Stack Input Action… expr * expr * … red. expr → expr * expr
34
Online resources
http://www.gnu.org/software/bison/manual/html_node/index.html
35
ABSTRACT SYNTAX TREES
36
Abstract syntax trees
• “AST” or just “syntax tree”
37
E
E E
a
+
E E
b5 *
+
*a
b5
Syntax trees vs. parse trees
Parse trees:• Interior nodes are
nonterminals, leaves are terminals
• Rarely constructed as an explicit data structure
• Represents the concrete syntax
Syntax trees:• Interior nodes are
“operators”, leaves are operands
• Commonly constructed as an explicit data structure
• Represents the abstract syntax
38
Why syntax trees?
• Simplifies subsequent analyses• Independent on the parsing strategy• Makes it easier to add new analysis passes
without having to modify the parser• More compact representation than parse
trees
39
Syntax tree exampleif (a < 1) b = 2 + 3;else { c = d * 4; e(f, 5); }
40
if
< =
a
=
c
call e
f*1 b +
2 3 d 4
null
nullnull
5 null
Exercise (1)
• Draw an abstract syntax tree for the statement
while (i < 100) { x = 2 * x; i = i + 1; }
41
Constructing a syntaxtree in Bison
expr : expr '+' expr { $$ = createOpNode($1, '+' ,$3); }| expr '*' expr { $$ = createOpNode($1, '*' ,$3); }| ID { $$ = createIdNode($1.name); };
42
Constructing a syntaxtree in Bison
stmt : RETURN expr ';' { $$ = mReturn($2, $1); } ;
stmts : stmts stmt { $$ = connectStmts($1, $2); }| { $$ = NULL; };
43
Conclusion
• Flex generates C source code for a scanner given a set of regular expressions
• Bison generates C source code for a bottom-up parser given a syntax-directed translation scheme
• Building syntax trees simplifies subsequent analyses of the program
• Syntax trees can be built in semantic actions44
Next time
• Syntax-directed definitions and translation schemes
• Semantic analysis and type analysis
45