2 Compiler Overview
-
Upload
syed-faisal-ali -
Category
Documents
-
view
212 -
download
0
Transcript of 2 Compiler Overview
-
8/2/2019 2 Compiler Overview
1/5
2 Compiler Overview
Contents
y Introductiony The Scannery The Parsery The Semantic Analyzery The Intermediate Code Generatory The Optimizery The Code Generator
Introduction
What is a compiler?
y A recognizer (of some source language L).y A translator (of programs written in L into programs written in some object or
target language L').
Here's a simple pictorial view:source program --> COMPILER --> object program
|
--> error messages
A compiler is itselfa program, written in somehost language. (In cs536, students will
implement a compiler for a simple source language using Java as thehost language.)
A compiler operates inphases;each phase translates the source program from one
representation to another. Different compilers may include different phases, and/or
may order them somewhat differently. A typical organization is shown below.
source code
(sequence of characters)
||||
\/
----------------------
| lexical analyzer |
| (scanner) |
----------------------
||
|| sequence of tokens
-
8/2/2019 2 Compiler Overview
2/5
\/
----------------------
| syntax analyzer |
| (parser) |
----------------------
||
|| abstract-syntax tree
\/
----------------------
| semantic analyzer |
----------------------
||
|| augmented, annotated abstract-syntax tree
\/
----------------------
| intermediate code |
| generator | /\
---------------------- ||
|| FRONT END
|| intermediate code ----------------------------------
\/ BACK END---------------------- ||
| optimizer | \/
----------------------
||
|| optimized intermediate code
\/
----------------------
| code |
| generator |
----------------------
||
||
\/
object program (might be assembly code or machine code)
Below, we look at each phase of the compiler.
The Scanner
The scanner is called by the parser;here's how it works:
y The scanner reads characters from the source program.y The scanner groups the characters into lexemes (sequences of characters that
"go together").y Each lexeme corresponds to a token; the scanner returns the next token (plus
maybe some additional information) to the parser.
y The scannermay also discover lexical errors (e.g.,erroneous characters).The definitions of what is a lexeme, token, or bad character all depend on the sourcelanguage.
-
8/2/2019 2 Compiler Overview
3/5
Example
Here are some Java lexemes and the corresponding tokens:
lexeme: ; = index tmp 37
102
corresponding token: SEMI-COLON ASSIGN IDENT IDENT INT-LIT
INT-LIT
Note that multiple lexemes can correspond to the same token (e.g., there aremany
identifiers).
Given the source code:
position = initial + rate * 60 ;
a Java scanner would return the following sequence of tokens:IDENT ASSIGN IDENT PLUS IDENT TIMES INT-LIT SEMI-COLON
Erroneous characters for Java source include # and control-a.
The Parser
The parser:
y Groups tokens into "grammatical phrases", discovering the underlying structureof the source program.
y Finds syntaxerrors. Forexample, in Java the source codeposition = * 5 ;
corresponds to the sequence of tokens:
IDENT ASSIGN TIMES INT-LIT SEMI-COLON
All are legal tokens, but that sequence of tokens is erroneous.
y Might find some "static semantic" errors,e.g., a use of an undeclared variable,or variables that aremultiply declared.
y Might generate code, or build some intermediate representation of the programsuch as an abstract-syntax tree.
Example
source code: position = initial + rate * 60 ;
-
8/2/2019 2 Compiler Overview
4/5
abstract-syntax tree: =
/ \
position +
/ \
initial *
/ \
rate 60
Notes:
y The interior nodes of the tree areoperators.y A node's children are its operands.y Each subtree forms a "logical unit",e.g., the subtree with * at its root shows
that becausemultiplication has higher precedence than addition, this operation
must be performed as a unit (notinitial+rate).
The Semantic Analyzer
The semantic analyzer checks for (more) "static semantic" errors,e.g., typeerrors. It
may also annotate and/or change the abstract syntax tree (e.g., it might annotateeach
node that represents an expression with its type).Example:
Abstract syntax tree before semantic analysis
=
/ \
/ \
position +
/ \
/ \initial *
/ \
/ \
rate 60
Abstract syntax tree after semantic analysis
= (float)
/ \
/ \
position + (float)(float) / \
/ \
initial * (float)
(float) / \
/ \
rate intToFloat (float)
|
|
60 (int)
-
8/2/2019 2 Compiler Overview
5/5
The Intermediate Code Generator
The intermediate code generator translates from abstract-syntax tree to intermediate
code. One possibility is 3-address code (code in whicheach instruction involves at
most 3 operands). Below is an example of 3-address code for the abstract-syntax treeshown above. Note that in this example, the second and third instructions eachhave
exactly three operands (the location where the result of the operation is stored, and
two source operands); the first and fourth instructions have just two operands
("temp1" and "60" for instruction 1, and "position" and "temp3" for instruction 4).
temp1 = inttofloat(60)
temp2 = rate * temp1
temp3 = initial + temp2
position = temp3
The Optimizer
The optimizer tries to improve code generated by the intermediate code generator.
The goal is usually to make code run faster, but the optimizermay also try to make thecode smaller. In theexample above, an optimizermight first discover that the
conversion of the integer 60 to a floating-point number can be done at compile time
instead of at run time. Then it might discover that there is no need for "temp1" or
"temp3". Here's the optimized code:
temp2 = rate * 60.0
position = initial + temp2
The Code Generator
The code generator generates object code from (optimized) intermediate code. For
example, the following codemight be generated for our running example:
.data
c1:
.float 60.0
.textl.s $f0,rate
mul.s $f0,c1
l.s $f2,initial
add.s $f0,$f0,$f2
s.s $f0,position