Compiler1
Chapter V: Compiler
Overview: To study the design and operation of compiler for
high-level programming languages. Contents
Basic compiler (one-pass compiler) functions Machine-dependent extension:
(object-code generation & code optimization) Compiler design alternative:
multi-pass compiler, interpreters, p-code compilers & compiler-compilers.
Compiler3
Basic compiler functions (cont.)
Source program Regard each statement as a sequence of token.
The task of scanning the source statement, recognizing and classifying the various tokens, is known as lexical analysis. (scanner)
Recognized all tokens as some language construct by the grammar. This process is called syntactic analysis or parsing. (parser)
Generation of object code.
Compiler4
Compilation process
Scanning (lexical analysis) Parsing (syntactic analysis)
Code generation
Ps. It can achieve in a single pass !
Compiler5
Grammars
A grammar for a programming language is a formal description of the syntax, of programs and individual statements written in the language.
The difference between syntax and semantics, E.g.,
I := J + K X := Y + I where X,Y : Real I,J,K : IntegerThey are identical syntax.However, the semantic are quite different.
Compiler6
Grammars (cont.)
BNF (Backus-Naur Form) A kind of syntax description. Simple. Widely used. It provide capabilities that are sufficient for most purposes.
BNF consists of a set of rules, each of which defines the syntax of some construct in the programming language. E.g., <read> ::= READ ( <id-list>)
Compiler7
Grammars (cont.)
<read> ::= READ ( <id-list>) <id-list> ::= id | <id-list>, id
Character strings enclosed between < and > are called nonterminal symbol.
Character strings not enclosed between < and > are called terminal symbol (I.e, tokens).
E.g., READ(value, sum, x, y)
Compiler10
Simplified Pascal grammar (cont.)
To display the analysis of a source statement in terms of a grammar a a tree (parse tree or syntax tree).
Compiler12
Grammars (cont.)
Draw parse tree for ALPHA – BETA * GAMMA
If there is more than one possible parse tree for a given statement, the grammar is said to be ambiguous.
The ambiguous grammar would leave doubt about what object code should be generated.
Compiler15
Lexical analysis (scanning)
Scanning the program to be compiled and recognizing the tokens that make up the source statements.
Scanner are usually designed to recognize keywords, operators, and identifiers, integer, floating-point numbers, character strings, …,etc.
The identifier might be defined by the rules: <ident> ::= <letter> | <ident> <letter> | <ident> <digit> <letter> ::= A | B | C | D | … | Z <digit> ::= 0 | 1 | 2 | 3 | … | 9
Compiler18
The lexical scanning
It must deal with the following cases: For example,
DO 10 I = 1, 100 DO 10 I =1 (FORTRAN ignores blank in the statement)
IF (THEN .EQ. ELSE) THEN IF = THENELSE THEN = IFENDIF
A number of tools have been developed for automatically constructing lexical scanners from specifications stated in a special-purpose language.
Compiler19
Modeling Scanners as Finite Automata
The tokens of most programming languages can be recognized by a finite automation.
Starting state vs. final state. If the automation stops in a final state, we say that
it recognizes (or accept) the string being scanned, otherwise, it fails to recognize the string.
Top Related