Yu-Chen Kuo1 Chapter 1 Introduction to Compiling.
-
date post
19-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Yu-Chen Kuo1 Chapter 1 Introduction to Compiling.
Yu-Chen Kuo 3
• Source languages: Fortran, Pascal, C, etc.• Target languages: another PL, machine Lang• Compilers:
– Single-pass– Multi-pass– Load-and-Go– Debugging– Optimizing
Yu-Chen Kuo 4
Analysis-Synthesis Model
• Compilation: Analysis & Synthesis• Analysis:
– Break source program into pieces– Intermediate representation– Hierarchical structure: syntax tree
• Node: operation• Leaf: arguments
• Synthesis: construct target program from tree
Yu-Chen Kuo 6
Context of a Compiler
• Several other programs to create .exe files– Preprocessor: macros– Assembler: translate assembly into machine
code– Loader/link-editor: link library routines
Yu-Chen Kuo 8
1.2 Analysis of the source program
• Three phases
1. Linear analysis• Divide source program into tokens
2. Hierarchical analysis• Tokens grouped hierarchically
3. Semantic analysis• Ensure components fit meaningfully
Yu-Chen Kuo 9
Lexical Analysis
• Linear analysis: lexical analysis, scanning e.g., position:= initial+rate*60
1. Identifier position
2. Assignment symbol “: =“
3. Identifier initial
4. “+” sign
5. Identifier rate
6. “*” sign
7. number 60
Yu-Chen Kuo 10
Syntax Analysis
• Hierarchical analysis: parsing or syntax analysis
– Group tokens into grammatical phrases
Grammatical phrases: parser tree
Yu-Chen Kuo 12
Syntax Analysis
• Hierarchical structure is expressed by recursive rules
• Recursively define expression1. identifier is an expression2. number is an expression3. expression1 +/ expression2
(expression1) are an expression• By rule 1, initial and rate are exp.• By rule 2, 60 is an exp.• By rule 3, initial+rate*60 is an exp.
Yu-Chen Kuo 13
Syntax Analysis
• Recursively define statement
1. identifier1:= expression2 is a statement
2. while (expression1) do statement2
If (expression1) then statement2
are statements
Yu-Chen Kuo 14
Lexical v.s. Syntax Analysis
• Division is arbitrary
• Recursion or not– recognize identifiers, by linear scan until
neither a letter or a digital was found, no recursion
• E.g., initial
– Not powerful enough to analyze exp. or statement, without putting hierarchical structure
• E.g, ( …..), begin …. end, statements
Yu-Chen Kuo 15
Lexical v.s. Syntax Analysis
• Division is arbitrary
• Recursion or not– recognize identifiers, by linear scan until
neither a letter or a digital was found, no recursion
• E.g., initial
– Not powerful enough to analyze exp. or statement, without putting hierarchical structure
• E.g, ( …..), begin …. end, statements
Yu-Chen Kuo 16
Semantic Analysis
• Check semantic error
• Gather type information for code-generation
• Using hierarchical structure to identify operators and operands
• Doing type checking– E.g, using a real number to index an array (error)– Type convert– E.g, Fig.1.5 ittoreal(60) if initial is a real number
Yu-Chen Kuo 18
Analysis in Text Formatters
• \hbox {<list of boxes>}
• \hbox {\vbox{! 1} \vbox{@ 2}}
Yu-Chen Kuo 20
1.3 The Phases of A Compiler
• Phases
• First three phases: analysis portion
• Last three phases: synthesis portion
• Symbol-table management phase
• Error handler phases
Yu-Chen Kuo 21
Symbol-table Management
• To record the identifiers in source program– Identifier is detected by lexical analysis and then is stored
in symbol table
• To collect the attributes of identifiers (not by lexical analysis)– Storage allocation : memory address– Types– Scope (where it is valid, local or global) – Arguments (in case of procedure names)
• Arguments numbers and types • Call by reference or address• Return types
Yu-Chen Kuo 22
Symbol-table Management
• Semantic analysis uses type information check the type consistence of identifiers
• Code generating uses storage allocation information to generate proper relocation address code
Yu-Chen Kuo 23
Error Detection and Reporting
• Syntax and semantic analysis handle a large fraction of errors
• Lexical phase: could not form any token
• Syntax phase: tokens violate structure rules
• Semantic phase: no meaning of operations– Add an array name and a procedure name
Yu-Chen Kuo 26
The Analysis Phases
• Lexical analysis– Group characters into tokens
• Identifiers
• Keywords (if, while)
• Punctuations ( ‘(‘ ,’)’)
• Multi-character operator (‘:=‘)
– Enter lexical value (lexeme) into symbol table• position, rate, initial
• Syntax analysis– Fig. 1.11(a), 1.11(b)
Yu-Chen Kuo 27
The Analysis Phases
• Syntax analysis
• Semantic analysis– Type checking and converting
Yu-Chen Kuo 28
Intermediate Code Generation
• Represent the source program for an abstract machine code
• Should be easy to produce
• Should be easy to translate into target program
• Three-address code (at most three operands)– temp2:=id3*temp1– every memory location can act like a register
• temp2 BX
Yu-Chen Kuo 29
Code Optimization
• Improve the intermediate code
• Faster-running machine code– temp1 :=id3*60.0
id1:=id2+temp1
Yu-Chen Kuo 30
Code Generation
• Generate relocation machine code or assembly code
– MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
Yu-Chen Kuo 31
1.4 Cousins of The Compiler
• Preprocessors
• Assemblers
• Two-Pass Assembler
• Loaders and Link-Editors
Yu-Chen Kuo 32
Preprocessors
• Macro processing
• File inclusion– #include <global.h> replace by file “global.h”
• Rational preprocessors
• Language extensions– ## query language embedded in C– Translated into procedure call
Yu-Chen Kuo 33
Preprocessors
• Example 1.2– \define\JACM #1; #2; #3
{{\s1 J. ACM} {\bf #1}: #2, pp. #3.}– \JACM 17;4;715-728
J. ACM 17:4, pp. 715-728.
Yu-Chen Kuo 34
Assembler
• Producing relocatable machine code– DW a #10
DW b #20MOV a, R1ADD #2, R1MOV R1, b
• Load content of address a into R1• Add constant 2• Store R1 into address b
Yu-Chen Kuo 35
Two-Pass Assembly
• First pass– Find all identifiers and their storage location and store in sy
mbol table• Identifier Address
a 0
b 4
• Second pass– Translate each operation code into the sequence of bits
– Relocatable machine code
Yu-Chen Kuo 36
Two-Pass Assembly
• Example 1.3
Inst. Code Register Mem/Const. Content (R)
0001(MOV) 01(R1) 00(Mem) 00000000(a) *
0011(ADD) 01(R1) 10(Constant) 00000010
0010(MOV) 01(R1) 00(Mem) 00000100(b) *
Yu-Chen Kuo 37
Two-Pass Assembly
• ‘*’ denotes relocation bit– if data is loaded starting at address 00001111– a should be at location 00001111+00000000– b should be at location 00001111+00000100
Inst. Code Register Mem/Const. Content (R) 0001(MOV) 01(R1) 00(Mem) 00000111(a) * 0011(ADD) 01(R1) 10(Constant) 00000010 0010(MOV) 01(R1) 00(Mem) 00010011(b) *