Post on 15-Dec-2015
Compilers and Language Translation
Gordon College
2
What’s a compiler? All computers only understand machine language
Therefore, high-level language instructions must be translated into machine language prior to execution
10000010010110100100101……
This isa program
3
What’s a compiler? Compiler
A piece of system software that translates high-level languages into machine language
10000010010110100100101……
Congrats!
while (c!='x') { if (c == 'a' || c == 'e' || c == 'i')
printf("Congrats!"); else if (c!='x') printf("You Loser!"); }
Compiler
gcc -o prog program.c
program.c
prog
4
Assembler (a kind of compiler)
LOAD X Assembly
0101 0000 0000 1001 Machine Language
(symbol table)(opcode table)
One-to-one translation
5
Compiler (high-level language translator)
a = b + c - d;
One-to-many translation
0101 00001110001 LOAD B0111 00001110010 ADD C0110 00001110011 SUBTRACT D0100 00001110100 STORE A
0101 00001110001 0111 00001110010…….
6
Goals of a compiler Code produced must be correct
A = (B+C)-(D+E);
Possible translation:LOAD BADD CSTORE BLOAD DADD ESTORE DLOAD BSUBTRACT DSTORE A
No - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intend
Is this correct?
7
Goals of a compiler Code produced should be reasonably efficient
and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+…. 2x50000
sum = 0.0for(i=0;i<50000;i++) {
sum = sum + (2.0 * x[i]);
Optimizing compiler:sum = 0.0for(i=0;i<50000;i++) {
sum = sum + x[i];sum = sum * 2.0; 49,999 less instructions
8
General Structure of a Compiler
9
The Compilation Process Phase I: Lexical analysis
Compiler examines the individual characters in the source program and groups them into syntactical units called tokens
Phase II: Parsing
The sequence of tokens formed by the scanner is checked to see whether it is syntactically correct
ScannerSourcecode
Groups of
tokens
ParserGroups of
tokens
correct
not correct
10
The Compilation Process
Phase III: Semantic analysis and code generation The compiler analyzes the meaning of the
high-level language statement and generates the machine language instructions to carry out these actions
Code Generator
Groupsof
tokens
Machinelanguage
11
The Compilation Process Phase IV: Code optimization
The compiler takes the generated code and sees whether it can be made more efficient
Code Optimizer Machine
languageMachinelanguage
12
Overall Execution Sequence on a High-Level Language Program
13
The Compilation Process
Source program
Original high-level language program
Object program
Machine language translation of the source program
14
Phase I: Lexical Analysis Lexical analyzer
The program that performs lexical analysis
More commonly called a scanner
Job of lexical analyzer Group input characters into tokens
• Tokens: Syntactical units that are treated as single, indivisible entities for the purposes of translation
Classify tokens according to their type
15
Phase I: Lexical AnalysisProgram statement
sum = sum + a[i];
Digital perspective:
tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;
Tokenized:
sum,=,sum,+,a[i],;
16
Phase I: Lexical Analysis
TOKEN TYPE CLASSIFICATION NUMBERSymbol 1Number 2= 3+ 4- 5; 6== 7If 8Else 9( 10) 11[ 12] 13…
Typical Token Classifications
17
Phase I: Lexical Analysis Lexical Analysis Process
1. Discard blanks, tabs, etc. - look for beginning of token.2. Put characters together 3. Repeat step 2 until end of token4. Classify and save token5. Repeat steps 1-4 until end of statement6. Repeat steps 1-5 until end of source code
Scannersum=sum+a[i];
sum 1= 3+ 4a 1[ 12i 1] 13; 6
18
Phase I: Lexical Analysis
Input to a scanner- A high-level language statement from the source program
Scanner’s output- A list of all the tokens in that statement- The classification number of each token found
Scannersum=sum+a[i];
sum 1= 3+ 4a 1[ 12i 1] 13; 6
19
Phase II: Parsing
Parsing phase
A compiler determines whether the tokens recognized by the scanner are a syntactically legal statement
Performed by a parser
20
Phase II: Parsing
Output of a parser
A parse tree, if such a tree exists
An error message, if a parse tree cannot be constructed
Successful construction of a parse tree is proof that the statement is correctly formed
21
Example High-level language statement: a = b + c
22
Grammars, Languages, and BNF
Syntax
The grammatical structure of the language
The parser must be given the syntax of the language
BNF (Backus-Naur Form)Most widely used notation for representing the syntax of a programming language
literal_expression ::= integer_literal | float_literal | string | character
23
Grammars, Languages, and BNF
In BNF
The syntax of a language is specified as a set of rules (also called productions)
A grammar
• The entire collection of rules for a language
Structure of an individual BNF rule
left-hand side ::= “definition”
24
Grammars, Languages, and BNF
BNF rules use two types of objects on the right-hand side of a production Terminals
• The actual tokens of the language
• Never appear on the left-hand side of a BNF rule Nonterminals
• Intermediate grammatical categories used to help explain and organize the language
• Must appear on the left-hand side of one or more rules
25
Grammars, Languages, and BNF
Goal symbol
The highest-level nonterminal
The nonterminal object that the parser is trying to produce as it builds the parse tree
All nonterminals are written inside angle brackets
Java BNF
26
BNF Example<postal-address> ::= <name-part> <street-address> <zip-part> <name-part> ::= <personal-part> <last-name> <opt-jr-part> <EOL> | <personal-part> <name-part> <personal-part> ::= <first-name> | <initial> "." <street-address> ::= <opt-apt-num> <house-num> <street-name> <EOL><zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL><opt-jr-part> ::= "Sr." | "Jr." | <roman-numeral> | ""
Identify the following:Goal symbol, terminals, nonterminals, a individual rule
Is this a legal postal address?Steve Moses Sr. 215 Rose Ave.Everywhere, NC 43563
27
Parsing Concepts and Techniques Fundamental rule of parsing:
By repeated applications of the rules of the grammar-
If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid
statement of the languageelse
the sequence of tokens is not a syntactically valid statement of the language
28
Parsing Example<httpaddress> ::= http:// <hostport> [ / <path> ] [ ? <search> ]<hostport> ::= <host> [ : <port> ]<host> ::= <hostname> | <hostnumber><hostname> ::= <ialpha> [ . <hostname> ]<hostnumber> ::= <digits> . <digits> . <digits> . <digits><port> ::= <digits><path> ::= <void> | <xpalphas> [ / <path> ]<search> ::= <xalphas> [ + <search> ]<xalpha> ::= <alpha> | <digit> | <safe> | <extra> | <escape> <xalphas> ::= <xalpha> [ <xalphas> ]<xpalpha> ::= <xalpha> | +<xpalphas> ::= <xpalpha> [ <xpalpha> ]<ialpha> ::= <alpha> [ <xalphas> ]<alpha> ::= a | b | … | z | A | B | … | Z<digit> ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9<safe> ::= $ | - | _ | @ | . | & | ~<extra> ::= ! | * | " | ' | ( | ) | : | ; | , | <space><escape> ::= % <hex> <hex><hex> ::= <digit> | a | b | c | d | e | f | A | B | C | D | E | F<digits> ::= <digit> [ <digits> ]<void> ::=
Is the following http address legal:http://www.csm.astate.edu/~rossa/cs3543/bnf.html
29
Parsing Concepts and Techniques
Look-ahead parsing algorithms - intelligent parsers
One of the biggest problems in building a compiler is designing a grammar that:
Includes every valid statement that we want to be in the language
Excludes every invalid statement that we do not want to be in the language
30
Parsing Concepts and Techniques
Another problem in constructing a compiler: Designing a grammar that is not ambiguous
An ambiguous grammar allows the construction of two or more distinct parse trees for the same statement
NOT GOOD - multiple interpretations
31
Phase III: Semantics and Code Generation
Semantic analysis
The compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically valid
• If they are validthe compiler can generate machine language
instructionselse
there is a semantic error; machine language instructions are not generated
32
Phase III: Semantics and Code Generation
Semantic analysis
Syntactically correct, but semantically incorrect
example:
sum = a + b;
int a;double sum; data type mismatchchar b;
Semantic recordsa integersum doubleb char
33
Phase III: Semantics and Code Generation
Semantic analysis
a integerb char
temp ?
<expression> + <expression>
<expression>
Parse tree
Semantic recordSemantic record
Semantic record
34
Phase III: Semantics and Code Generation
Semantic analysis
a integerb integer
temp integer
<expression> + <expression>
<expression>
Parse tree
Semantic recordSemantic record
Semantic record
35
Phase III: Semantics and Code Generation
Code generation
Compiler makes a second pass over the parse tree to produce the translated code
36
Phase IV: Code Optimization Two types of optimization
Local Global
Local optimization The compiler looks at a very small block of
instructions and tries to determine how it can improve the efficiency of this local code block
Relatively easy; included as part of most compilers:
37
Phase IV: Code Optimization
Examples of possible local optimizations
Constant evaluation x = 1 + 1 ---> x = 2
Strength reduction x = x * 2 ---> x = x + x
Eliminating unnecessary operations
38
Phase IV: Code Optimization
Global optimization The compiler looks at large segments of the program
to decide how to improve performance Much more difficult; usually omitted from all but the
most sophisticated and expensive production-level “optimizing compilers”
Optimization cannot make an inefficient algorithm efficient - “only makes an efficient algorithm more efficient”
39
Summary A compiler is a piece of system software that
translates high-level languages into machine language
Goals of a compiler: Correctness and the production of efficient and concise code
Source program: High-level language program
40
Summary Object program: The machine language translation
of the source program
Phases of the compilation process
Phase I: Lexical analysis
Phase II: Parsing
Phase III: Semantic analysis and code generation
Phase IV: Code optimization