RIT 08/11/47Chapter 11 Chapter 1: Introduction to Compiling Dr. Winai Wichaipanitch Rajamangala...
-
Upload
nigel-lester -
Category
Documents
-
view
215 -
download
0
Transcript of RIT 08/11/47Chapter 11 Chapter 1: Introduction to Compiling Dr. Winai Wichaipanitch Rajamangala...
RIT
08/11/47 Chapter 1 1
Chapter 1: Introduction to CompilingChapter 1: Introduction to Compiling
Dr. Winai WichaipanitchRajamangala Institute of Technology
Klong 6 Thanyaburi Pathumthani 12110Tel: 06-999-2974
[email protected]://www.en.rit.ac.th/winai
RIT
08/11/47 Chapter 1 2
Purpose of CompilerPurpose of Compiler
Compilers translate a program written into one Compilers translate a program written into one language (source) into another (target)language (source) into another (target)
CompilerSource program
Target Program
Error messages
Diverse & Varied
RIT
08/11/47 Chapter 1 3
Introduction to CompilersIntroduction to Compilers
As a Discipline, Involves Multiple CS&E AreasAs a Discipline, Involves Multiple CS&E Areas Programming Languages and Algorithms Theory of Computing & Software Engineering Computer Architecture & Operating Systems
RIT
08/11/47 Chapter 1 4
Translation MechanismsTranslation Mechanisms
CompilationCompilation To translate a source program in one language into an
executable program in another language and produce results while executing the new program
Examples: C, C++, FORTRAN
InterpretationInterpretation To read a source program and produce the results while
understanding that program Examples: BASIC, LISP
Case Study: JAVACase Study: JAVA First, translate to java bytecode Second, execute by interpretation (JVM)
RIT
08/11/47 Chapter 1 5
Comparison of Compiler/InterpreterComparison of Compiler/Interpreter
CompilerCompiler InterpreterInterpreterOverviewOverview
AdvantagesAdvantages Fast program execution;Fast program execution;Exploit architecture Exploit architecture features;features;
Easy to debug;Easy to debug;Flexible to modify;Flexible to modify;Machine independent;Machine independent;
DisadvantaDisadvantagesges
Pre-processing of Pre-processing of program;program;Complexity;Complexity;
Execution overhead;Execution overhead;Space overhead;Space overhead;
interpreter
SourceCode
Data
Resultscompiler
SourceCode
Data Results
Object code
RIT
08/11/47 Chapter 1 6
Classifications of CompilersClassifications of Compilers
Compilers Viewed from Many PerspectivesCompilers Viewed from Many Perspectives
However, All utilize same basic tasks to However, All utilize same basic tasks to accomplish their actionsaccomplish their actions
Single Pass
Multiple Pass
Load & Go
Construction
Debugging
OptimizingFunctional
RIT
08/11/47 Chapter 1 7
เรายั�งไม่ทราบค่าแอดเดรส ด�งนั้��นั้ต้�องอานั้ เรายั�งไม่ทราบค่าแอดเดรส ด�งนั้��นั้ต้�องอานั้ Source code Source code 2 2 ค่ร��งค่ร��ง
RIT
08/11/47 Chapter 1 8
The ModelThe Model
The TWO Fundamental Parts:The TWO Fundamental Parts:
We Will Discuss Both in This Class, andWe Will Discuss Both in This Class, andFOCUS on analysis.FOCUS on analysis.
Analysis:
Synthesis:
Decompose Source into an intermediate representation
Target program generation from representation
RIT
08/11/47 Chapter 1 9
Important Notes
Today: There are many Today: There are many Software ToolsSoftware Tools for helping with the for helping with the AnalysisAnalysis Part. This Wasn’t the Case in Early Days. Part. This Wasn’t the Case in Early Days. (some) (some) analysis is also important inanalysis is also important in::
Structure / Syntax directed editors: Force “syntactically” correct code to be entered
Pretty Printers: Standardized version for program structure (i.e., blank space, indenting, etc.)
Static Checkers: A “quick” compilation to detect rudimentary errors
Interpreters: “real” time execution of code a “line-at-a-time”
RIT
08/11/47 Chapter 1 10
Important Notes
Compilation Is Compilation Is NotNot Limited to Programming Language Limited to Programming Language ApplicationsApplications Text Formatters
LATEX & TROFF Are Languages Whose Commands Format Text
Silicon Compilers Textual / Graphical: Take Input and Generate Circuit Design
Database Query Processors Database Query Languages Are Also a Programming
Language
Input is compiled Into a Set of Operations for Accessing the Database
RIT
08/11/47 Chapter 1 11
The Many The Many PhasesPhases of a Compiler of a CompilerSource Program
Lexical Analyzer
1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Symbol-table Manager
Error Handler
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
RIT
08/11/47 Chapter 1 12
Phases of A Modern CompilerPhases of A Modern Compiler
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Code Optimizer
Code Generation
Source ProgramSource Program IF (a<b) THEN c=1*d;
Token SequenceToken Sequence
Syntax TreeSyntax Tree
3-Address Code3-Address Code
Optimized 3-Addr. CodeOptimized 3-Addr. Code
Assembly CodeAssembly Code
IF (ID“a”
<ID“b”
THENID“c”
=CONST
“1” *ID“d”
IF_stmt
<a
b
cond_expr
listassign_stmt
c
*
lhs
rhs 1
dGE a, b, L1MUlT 1, d, cL1:
GE a, b, L1MOV d, cL1: loadi R1,a
cmpi R1,bjge L1loadi R1,dstorei R1,cL1:
RIT
08/11/47 Chapter 1 13
Language-Processing SystemLanguage-Processing System
Source Program
Pre-Processor1
Compiler2
Assembler3
RelocatableMachine Code
4
Loader Link/Editor
5
Executable
Library,relocatable object files
RIT
08/11/47 Chapter 1 14
Three Phases:Three Phases: Linear / Lexical Analysis:
L-to-r Scan to Identify Tokenstoken: sequence of chars having a collective meaning
Hierarchical Analysis:
Grouping of Tokens Into Meaningful Collection
Semantic Analysis:
Checking to ensure Correctness of Components
The Analysis Task For Compilation
RIT
08/11/47 Chapter 1 15
Phase 1. Lexical Analysis
Easiest Analysis - Identify tokens which are the basic building blocks
For Example:
All are tokens
Blanks, Line breaks, etc. are scanned out
Position := initial + rate * 60 ;_______ __ _____ _ ___ _ __ _
RIT
08/11/47 Chapter 1 16
Phase 2. Phase 2. Hierarchical AnalysisHierarchical Analysisaka aka ParsingParsing or or Syntax AnalysisSyntax Analysis
For previous example,
we would have
Parse Tree:
identifier
identifier
expression
identifier
expression
number
expression
expression
expression
assignment statement
position
:=
+
*
60
initial
rate
Nodes of tree are constructed using a grammar for the language
RIT
08/11/47 Chapter 1 17
What is a Grammar?What is a Grammar?
Grammar is a Set of Rules Which Govern the Grammar is a Set of Rules Which Govern the Interdependencies & Structure Among the TokensInterdependencies & Structure Among the Tokens
statement is an assignment statement, or while statement, or if statement, or ...
assignment statement
expression is an
is an identifier := expression ;
(expression), or expression + expression, or expression * expression, or number, or identifier, or ...
RIT
08/11/47 Chapter 1 18
if statement
if expression then statement else statement ;
id idrelop
num = 0
assign statement
id :=expression
id
0
assign statement
id :=expression
id idmulop
avg
avg
num/
Syntax TreeSyntax Tree
RIT
08/11/47 Chapter 1 19
Why Have We Divided Analysis Why Have We Divided Analysis in This Manner?in This Manner?
Lexical Analysis - Scans Input, Its Linear Actions Lexical Analysis - Scans Input, Its Linear Actions Are Not RecursiveAre Not Recursive Identify Only Individual “words” that are the
the Tokens of the Language Recursion Is Required to Identify Structure of an Recursion Is Required to Identify Structure of an
Expression, As Indicated in Parse TreeExpression, As Indicated in Parse Tree Verify that the “words” are Correctly
Assembled into “sentences”
RIT
08/11/47 Chapter 1 20
Phase 3. Semantic AnalysisPhase 3. Semantic Analysis
Find More Complicated Semantic Errors and Find More Complicated Semantic Errors and Support Code GenerationSupport Code Generation
Parse Tree Is Augmented With Semantic ActionsParse Tree Is Augmented With Semantic Actions
position
initial
rate
:=+
*
60
Compressed Tree
position
initial
rate
:=+
*
inttoreal
60
Conversion Action
RIT
08/11/47 Chapter 1 21
Phase 3. Semantic AnalysisPhase 3. Semantic Analysis
Most ImportantMost Important Activity in This Phase: Activity in This Phase:
Type CheckingType Checking - - Legality of OperandsLegality of Operands
RIT
08/11/47 Chapter 1 22
Supporting Phases/ Activities for Analysis
Symbol Table Creation / MaintenanceSymbol Table Creation / Maintenance Contains Info (storage, type, scope, args) on
Each “Meaningful” Token, Typically Identifiers Data Structure Created / Initialized During
Lexical Analysis Utilized / Updated During Later Analysis &
Synthesis
RIT
08/11/47 Chapter 1 23
Name Type Def/ UDef other
avg realid D . . .
if keyword . . .
num intid D . . .
sum realid D . . .
then keyword . . .
Symbol Table for ExampleSymbol Table for Example
RIT
08/11/47 Chapter 1 24
Detection of Different Errors Which Correspond to All Phases
What Kinds of Errors Are Found During the Analysis Phase?
What Happens When an Error Is Found?
Error HandlingError Handling
RIT
08/11/47 Chapter 1 25
The Many The Many PhasesPhases of a Compiler of a CompilerSource Program
Lexical Analyzer
1
Syntax Analyzer2
Semantic Analyzer3
Intermediate Code Generator
4
Code Optimizer5
Code Generator6
Target Program
Symbol-table Manager
Error Handler
1, 2, 3 : Analysis - Our Focus
4, 5, 6 : Synthesis
RIT
08/11/47 Chapter 1 26
The Synthesis Task For Compilation Intermediate Code GenerationIntermediate Code Generation
Abstract Machine Version of Code - Independent of Architecture
Easy to Produce and Do Final, Machine Dependent Code Generation
Code OptimizationCode Optimization Find More Efficient Ways to Execute Code Replace Code With More Optimal Statements 2-approaches: High-level Language &
“Peephole” Optimization Final Code GenerationFinal Code Generation
Generate Relocatable Machine Dependent Code
RIT
08/11/47 Chapter 1 27
Reviewing the Entire ProcessReviewing the Entire Process
Errors
position := initial + rate * 60
lexical analyzer
syntax analyzer
semantic analyzer
intermediate code generator
id1 := id2 + id3 * 60
:=
id1id2l
id3
+*
60
:=
id1id2l
id3
+*
inttoreal
60
Symbol Table
position ....
initial ….
rate….
RIT
08/11/47 Chapter 1 28
Reviewing the Entire ProcessReviewing the Entire Process
Errors
intermediate code generator
code optimizer
final code generator
temp1 := inttoreal(60)
temp2 := id3 * temp1
temp3 := id2 + temp2
id1 := temp3
temp1 := id3 * 60.0
id1 := id2 + temp1
MOVF id3, R2
MULF #60.0, R2MOVF id2, R1ADDF R1, R2MOVF R1, id1
position ....
initial ….
rate….
Symbol Table
3 address code
RIT
08/11/47 Chapter 1 29
AssemblersAssemblers
Assembly code: names are used for instructions, Assembly code: names are used for instructions, and names are used for memory addresses.and names are used for memory addresses.
Two-pass Assembly:Two-pass Assembly: First Pass: all identifiers are assigned to
memory addresses (0-offset)e.g. substitute 0 for a, and 4 for b
Second Pass: produce relocatable machine code:
MOV a, R1
ADD #2, R1MOV R1, b
0001 01 00 00000000 *
0011 01 10 000000100010 01 00 00000100 *
relocationbit
RIT
08/11/47 Chapter 1 30
Loaders and Link-EditorsLoaders and Link-Editors
Loader: taking relocatable machine code, altering Loader: taking relocatable machine code, altering the addresses and placing the altered instructionsthe addresses and placing the altered instructionsinto memory.into memory.
Link-editor: taking many (relocatable) machine Link-editor: taking many (relocatable) machine code programs (with cross-references) and produce code programs (with cross-references) and produce a single file.a single file. Need to keep track of correspondence between
variable names and corresponding addresses in each piece of code.
RIT
08/11/47 Chapter 1 31
Compiler Cousins:Compiler Cousins: PreprocessorsPreprocessors Provide Input to Compilers
1. Macro Processing
#define in C: does text substitution before compiling
#define X 3
#define Y A*B+C
#define Z getchar()
RIT
08/11/47 Chapter 1 32
2. File Inclusion
#include in C - bring in another file before compiling
defs.h
//////
//////
//////
main.c
#include “defs.h”
…---…---…---…---…---…---…---…---…---
//////
//////
//////
…---…---…---…---…---…---…---…---…---
RIT
08/11/47 Chapter 1 33
3. Rational Preprocessors
Augment “Old” Languages With Modern Augment “Old” Languages With Modern ConstructsConstructs
Add Macros for If - Then, While, Etc. Add Macros for If - Then, While, Etc.
#Define Can Make C Code More Pascal-like#Define Can Make C Code More Pascal-like
#define begin {
#define end }
#define then
RIT
08/11/47 Chapter 1 34
4. Language Extensions for a Database System
EQUEL - Database query language embedded in C
## Retrieve (DN=Department.Dnum) where
## Department.Dname = ‘Research’
is Preprocessed into:
ingres_system(“Retr…..Research’”,____,____);
a procedure call in a programming language.
RIT
08/11/47 Chapter 1 35
The Grouping of Phases
Front End : Analysis + Intermediate Code Generation
Back End : Code Generation + Optimizationvs.
Number of Passes:
A pass: requires r/w intermediate files
Fewer passes: more efficiency.
However: fewer passes require more sophisticated memory management and compiler phase interaction.
Tradeoffs ……..
RIT
08/11/47 Chapter 1 36
Compiler Construction Tools
Parser Generators : Produce Syntax Analyzers
Scanner Generators : Produce Lexical Analyzers
Syntax-directed Translation Engines : Generate Intermediate Code
Automatic Code Generators : Generate Actual Code
Data-Flow Engines : Support Optimization