Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.
-
Upload
melvin-rich -
Category
Documents
-
view
222 -
download
0
description
Transcript of Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.
![Page 1: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/1.jpg)
Module 2 Module 2 Compiler and their Compiler and their
WorkingWorking
Software ConstructionLecture 10 ,11 and 12
![Page 2: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/2.jpg)
2
What are CompilersWhat are Compilers Translate information from one
representation to another Usually information = program Typical Compilers:
• VC, VC++, GCC, JavaC• FORTRAN, Pascal, VB
Translators• Word to PDF• PDF to Postscript
![Page 3: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/3.jpg)
3
Source CodeSource Code Optimized for human
readability Matches human notions of
grammar Uses named constructs such
as variables and procedures
![Page 4: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/4.jpg)
4
How to TranslateHow to Translate Translation is a complex
process source language and
generated code are very different
Need to structure the translation
![Page 5: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/5.jpg)
5
Two-pass CompilerTwo-pass Compiler
FrontEnd
BackEnd
sourcecode
IR machinecode
errorsUse an intermediate representation (IR)Front end maps legal source code into IRBack end maps IR into target machine code
![Page 6: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/6.jpg)
6
The Front-EndThe Front-End
Modules Scanner (also called Lexical analyzer) Parser
scanner parsersourcecode
tokens IR
errors
![Page 7: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/7.jpg)
7
ScannerScanner
Maps character stream into words – basic unit of syntax
Produces pairs – • a word and• its part of speech
scanner parsersourcecode
tokens IR
errors
![Page 8: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/8.jpg)
8
ScannerScanner Example
x = x + y becomes
<id,x> <assign,=><id,x><op,+><id,y>
token typeword
<id,x>
we call the pair “<token type, word>” a “token”typical tokens: number, identifier, +, -, new, while, if
![Page 9: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/9.jpg)
9
ParserParser
scanner parsersourcecode
tokens IR
errors
•Recognizes context-free syntax and reports errors•Guides context-sensitive (“semantic”) analysis•Builds IR for source program
![Page 10: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/10.jpg)
What is Context Free SyntaxWhat is Context Free Syntax To understand this we should have base of
context free grammar It is a set of write and rules such as
10
![Page 11: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/11.jpg)
11
Context-Free GrammarsContext-Free Grammars Context-free syntax is specified
with a grammar G=(S,N,T,P) S is the start symbol N is a set of non-terminal symbols T is set of terminal symbols or words P is a set of productions or rewrite
rules
![Page 12: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/12.jpg)
12
Context-Free GrammarsContext-Free GrammarsGrammar for expressions 1. goal → expr2. expr → expr op term3. | term4. term → number 5. | id6. op → + 7. | -
![Page 13: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/13.jpg)
13
The Front EndThe Front End For this CFG
S = goalT = { number, id, +, -}N = { goal, expr, term, op}P = { 1, 2, 3, 4, 5, 6, 7}
![Page 14: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/14.jpg)
14
Context-Free GrammarsContext-Free Grammars Given a CFG, we can derive
sentences by repeated substitution
Consider the sentence (expression)
x + 2 – y
![Page 15: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/15.jpg)
15
DerivationDerivationProduction Result
goal1 expr2 expr op term5 expr op y7 expr – y2 expr op term – y4 expr op 2 – y6 expr + 2 – y3 term + 2 – y5 x + 2 – y
![Page 16: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/16.jpg)
16
The Front EndThe Front End To recognize a valid
sentence in some CFG, we reverse this process and build up a parse
A parse can be represented by a tree: parse tree or syntax tree
![Page 17: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/17.jpg)
17
ParseParseProduction Result
goal1 expr2 expr op term5 expr op y7 expr – y2 expr op term – y4 expr op 2 – y6 expr + 2 – y3 term + 2 – y5 x + 2 – y
![Page 18: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/18.jpg)
18
Syntax TreeSyntax Tree x+2-y
goal
expr
termopexpr
termopexpr
term
– <id,y>
<id,x>
+ <number, 2>
![Page 19: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/19.jpg)
19
Abstract Syntax TreesAbstract Syntax Trees The parse tree contains a lot
of unneeded information. Compilers often use an
abstract syntax tree (AST).
![Page 20: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/20.jpg)
20
Abstract Syntax TreesAbstract Syntax Trees
This is much more concise AST summarizes grammatical structure without the
details of derivation ASTs are one kind of intermediate representation
(IR)
–<id,y>
<id,x> <number,2>
+
![Page 21: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/21.jpg)
21
Three-pass CompilerThree-pass Compiler
Intermediate stage for code improvement or optimization Analyzes IR and rewrites (or transforms) IR Primary goal is to reduce running time of the compiled code May also improve space usage, power consumption, ... Must preserve “meaning” of the code.
FrontEnd
Sourcecode
machine code
errors
MiddleEnd
BackEnd
IR IR
![Page 22: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/22.jpg)
Lexical AnalysisLexical AnalysisScanner
scanner parsersourcecode
tokens IR
errors
![Page 23: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/23.jpg)
23
Lexical AnalysisLexical Analysis The task of the scanner is to take a program written
in some programming language as a stream of characters and break it into a stream of tokens.
This activity is called lexical analysis. The lexical analyzer partition input string into
substrings, called words, and classifies them according to their role
Output of lexical analysis is a stream of tokens
![Page 24: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/24.jpg)
24
TokensTokensExample:
if( i == j ) z = 0;else z = 1;
Input is just a sequence of characters :
if ( \b i \b = = \b j \n \t ....
![Page 25: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/25.jpg)
25
TokensTokensGoal: partition input string into substrings classify them according to their role A token is a syntactic category Natural language:
“He wrote the program” Words: “He”, “wrote”, “the”, “program” Programming
language: “if(b == 0) a = b”
Words: “if”, “(”, “b”, “==”, “0”, “)”, “a”, “=”, “b”
![Page 26: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/26.jpg)
26
TokensTokens Identifiers: x y11 maxsize Keywords: if else while for Integers: 2 1000 -44 5L Floats: 2.0 0.0034 1e5 Symbols: ( ) + * / { } < > == Strings: “enter x” “error”
![Page 27: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/27.jpg)
27
How to Describe Tokens?How to Describe Tokens? Regular Languages are the
most popular for specifying tokens
• Simple and useful theory• Easy to understand• Efficient implementations
![Page 28: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/28.jpg)
28
Example of LanguagesExample of Languages
Alphabet = English charactersLanguage = English sentences
Alphabet = ASCIILanguage = C++ programs,
Java, C#
![Page 29: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/29.jpg)
29
RecapRecapTokens:
strings of characters representing lexical units of programs such as identifiers, numbers, operators.
Regular Expressions:concise description of tokens. A regular expression describes a set of strings.
Language L(R):set of strings represented by a regular expression R. L(R) is the language denoted by regular expression R.
![Page 30: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/30.jpg)
30
Regular ExpressionRegular ExpressionR|S = either R or SRS = R followed by S (concatenation)R* = concatenation of R zero or more times
(R*= |R|RR|RRR...)R? = | R (zero or one R)R+ = RR* (one or more R)[abc] = a|b|c (any of listed)[a-z] = a|b|....|z (range)[^ab] = c|d|... (anything but ‘a’‘b’)
![Page 31: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/31.jpg)
31
How to Use REsHow to Use REs We need mechanism to determine if
an input string w belongs to L(R), the language denoted by regular expression R.
![Page 32: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/32.jpg)
32
AcceptorAcceptor Such a mechanism is called
an acceptor.
input string
language
w
L
acceptor yes, if w Lno, if w L
![Page 33: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/33.jpg)
33
Finite Automata (FA)Finite Automata (FA) Specification:
Regular Expressions
Implementation: Finite Automata A finite automaton accepts a string if we can follow transitions labelled with characters in the string from start state to some accepting state
![Page 34: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/34.jpg)
SYNTACTIC VS SEMANTIC
ANALYSIS
![Page 35: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/35.jpg)
Syntactic Analysis Natural language analogy: consider the sentence
He wrote the programHe wrote the program
noun verb article noun
subject predicate object
sentence
![Page 36: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/36.jpg)
Syntactic Analysis Programming language
if ( b <= 0 ) a = bbool expr assignment
if-statement
![Page 37: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/37.jpg)
Syntactic Analysisint* foo(int i, int j)){ for(k=0; i j; ) fi( i > j ) return j;}
extra parenthesis
Missing expression
not a keyword
![Page 38: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/38.jpg)
Semantic Analysis Grammatically correct
He wrote the computer
noun verb article noun
subject predicate object
sentence
![Page 39: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/39.jpg)
Semantic Analysisint* foo(int i, int j){ for(k=0; i < j; j++ ) if( i < j-2 ) sum = sum+i return sum;}
undeclared var
return type
mismatch
![Page 40: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/40.jpg)
Role of the Parser Not all sequences of tokens are program. Parser must distinguish between valid and invalid sequences of tokens.
What we needAn expressive way to describe the syntax An acceptor mechanism that determines if input token stream satisfies the syntaxParsing is the process of discovering a derivation for some sentenceMathematical model of syntax – a grammar G.Algortihm for testing membership in L(G).
![Page 41: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/41.jpg)
Backus-Naur Form (BNF) Context-free grammars are (often) given by BNF
expressions (Backus-Naur Form) Grammar rules in a similar form were first used in the description of the Algol60 Language. The notation was developed by John Backus and adapted by Peter Naur for the Algol60 report. Thus the term Backus-Naur Form (BNF) .
The meta-symbols of BNF are: definition or description
::=• meaning "is defined as"
|• meaning "or"
< >• angle brackets used to surround category names.
• optional items are enclosed in meta symbols [ and ]
![Page 42: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/42.jpg)
Meta-symbols of BNF optional items are enclosed in meta symbols [ and ] example: <if_statement> ::= if <boolean_expression> then <statement_sequence>
[ else <statement_sequence> ] end if ;
repetitive items (zero or more times) are enclosed in meta symbols { and }, example: <identifier> ::= <letter> { <letter> | <digit> }
terminals of only one character are surrounded by quotes (") to distinguish them from meta-symbols, example: <statement_sequence> ::= <statement> { ";" <statement> }
In recent text books, terminal and non-terminal symbols are distingue by using bold faces for terminals and suppressing < and > around non-terminals. This improves greatly the readability.
The example then becomes: if_statement ::= if boolean_expression then statement_sequence [ else statement_sequence ] end if ";"
![Page 43: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/43.jpg)
More Useful Grammar1 expr → expr op expr2 | num3 | id4 op → +5 | –6 | *7 | /
![Page 44: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/44.jpg)
Derivation: x – 2 * yRule Sentential Form
- expr1 expr op expr2 <id,x> op expr5 <id,x> – expr1 <id,x> – expr op expr2 <id,x> – <num,2> op expr6 <id,x> – <num,2> expr
3 <id,x> – <num,2> <id,y>
![Page 45: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/45.jpg)
Derivation Such a process of rewrites is called a derivation. Process or discovering a derivations is called parsing At each step, we choose a non-terminal to replace Different choices can lead to different derivations.
Two derivations are of interest
1. Leftmost derivation
2. Rightmost derivation
![Page 46: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/46.jpg)
Derivations Leftmost derivation: replace leftmost non-
terminal (NT) at each step Rightmost derivation: replace rightmost NT at
each step The example on the preceding slides
was leftmost derivation There is also a rightmost derivation
![Page 47: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/47.jpg)
Rightmost DerivationRule Sentential Form
- expr1 expr op expr3 expr op <id,y>6 expr <id,y>1 expr op expr <id,y>2 expr op <num,2> <id,y>
5 expr – <num,2> <id,y>3 <id,x> – <num,2> <id,y>
![Page 48: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/48.jpg)
Derivations The two derivations produce different parse
trees.
The parse trees imply different evaluation orders!
![Page 49: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/49.jpg)
Parse Trees
G
E
E op E
E op Ex –
2 * y
Leftmost derivation
evaluation orderx – ( 2 * y )
![Page 50: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/50.jpg)
Parse Trees
G
E
op
evaluation order(x – 2 ) * y
E
x –
E
E op E
2
* y
Rightmostderivation
![Page 51: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/51.jpg)
Precedence These two derivations point out a problem with the
grammar It has no notion of precedence, or implied order of
evaluation
To add precedence
Create a non-terminal for each level of precedence
Isolate corresponding part of grammar
Force parser to recognize high precedence subexpressions first.
![Page 52: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/52.jpg)
PrecedenceFor algebraic expressions Multiplication and division,
first. (level one) Subtraction and addition,
next (level two)
![Page 53: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/53.jpg)
1 Goal → expr2 expr → expr + term3 | expr – term4 | term5 term → term factor6 | term / factor7 | factor8 factor → number9 | Id
leveltwo
levelone
![Page 54: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/54.jpg)
PrecedenceThis grammar is larger Takes more rewriting to reach some of the terminal
symbols But it encodes expected precedence
Produces same parse tree under leftmost and rightmost derivations Let’s see how it parses
x – 2 * y
![Page 55: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/55.jpg)
Precedence x – 2 * y1 Goal → expr
2 expr → expr + term
3 | expr – term
4 | term
5 term → term factor
6 | term / factor
7 | factor
8 factor → number
9 | Id
Rule Sentential Form- Goal1 expr3 expr – term 5 expr – term factor9 expr – term <id,y>7 expr – factor <id,y>8 expr – <num,2>
<id,y>4 term – <num,2>
<id,y>7 factor – <num,2>
<id,y>9 <id,x> – <num,2>
<id,y> The rightmost derivation
![Page 56: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/56.jpg)
Parse Trees
G
E
F
T
T F
<id,x>
–
*<id,y
>
T
E
T
<num,2>
evaluation orderx – ( 2 * y )
![Page 57: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/57.jpg)
Parse Trees
G
E
F
T
T F
<id,x>
–
*<id,y
>
T
E
T
<num,2>
evaluation orderx – ( 2 * y )
![Page 58: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/58.jpg)
Precedence Both leftmost and rightmost derivations give the
same expression
Because the grammar directly encodes the desired precedence.
![Page 59: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/59.jpg)
Parsing Techniques
![Page 60: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/60.jpg)
Parsing TechniquesTop-down parsers Start at the root of the parse tree
and grow towards leaves. Pick a production and try to match
the input Bad “pick” may need to backtrack Some grammars are backtrack-free.
![Page 61: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/61.jpg)
Top-down parsersAlso called LL parsingL means that tokens are read left to rightL means that the parser constructs a leftmost derivation.
![Page 62: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/62.jpg)
Parsing TechniquesBottom-up parsers Start at the leaves and grow toward root As input is consumed, encode
possibilities in an internal state. Start in a state valid for legal first tokens Bottom-up parsers handle a large class
of grammars Preferred method in practice
![Page 63: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/63.jpg)
Bottom-up ParsingAlso called LR parsing L means that tokens are read left
to right R means that the parser
constructs a rightmost derivation.
![Page 64: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/64.jpg)
Top-Down Parser A top-down parser starts with the root of the
parse tree. The root node is labeled with the goal symbol of
the grammar
![Page 65: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/65.jpg)
Top-Down Parsing Algorithm Construct the root node of the parse tree Repeat until the fringe [ leaves] of the parse tree
matches input string
At a node labeled A, select a production with A on its lhs
for each symbol on its rhs, construct the appropriate child
When a terminal symbol is added to the fringe and it does not match the fringe, backtrack
Find the next node to be expanded
![Page 66: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/66.jpg)
Top-Down Parsing The key is picking right production in step
1.
That choice should be guided by the input string
![Page 67: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/67.jpg)
Expression Grammar1 Goal → expr2 expr → expr + term3 | expr - term4 | term5 term → term * factor6 | term ∕ factor7 | factor8 factor → number9 | id10 | ( expr )Let’s try parsing
x – 2 * y
![Page 68: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/68.jpg)
P Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr + term x – 2 * y4 term + term x – 2 * y7 factor + term x – 2 * y9 <id,x> + term x – 2 * y9 <id,x> + term x – 2 * y
![Page 69: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/69.jpg)
This worked well except that “–” does not match “+”
P Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr + term x – 2 * y4 term + term x – 2 * y7 factor + term x – 2 * y9 <id,x> + term x – 2 * y9 <id,x> + term x – 2 * y
![Page 70: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/70.jpg)
The parser must backtrack to here
P Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr + term x – 2 * y4 term + term x – 2 * y7 factor + term x – 2 * y9 <id,x> + term x – 2 * y9 <id,x> + term x – 2 * y
![Page 71: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/71.jpg)
This time the “–” and “–” matched
P Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr – term x – 2 * y4 term – term x – 2 * y7 factor – term x – 2 * y9 <id,x> – term x – 2 * y9 <id,x> – term x – 2 * y
![Page 72: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/72.jpg)
We can advance past “–” to look at “2”
P Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr – term x – 2 * y4 term – term x – 2 * y7 factor – term x – 2 * y9 <id,x> – term x – 2 * y9 <id,x> – term x – 2 * y- <id,x> – term x – 2 * y
![Page 73: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/73.jpg)
Now, we need to expand “term”
P Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr – term x – 2 * y4 term – term x – 2 * y7 factor – term x – 2 * y9 <id,x> – term x – 2 * y9 <id,x> – term x – 2 * y- <id,x> – term x – 2 * y
![Page 74: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/74.jpg)
P Sentential Form input- <id,x> – term x – 2 * y7 <id,x> – factor x – 2 * y9 <id,x> –
<num,2>x – 2 * y
- <id,x> – <num,2>
x – 2 * y“2” matches “2”
We have more input but no non-terminals left to expand
![Page 75: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/75.jpg)
The expansion terminated too soon
Need to backtrack
P Sentential Form input- <id,x> – term x – 2 * y7 <id,x> – factor x – 2 * y9 <id,x> –
<num,2>x – 2 * y
- <id,x> – <num,2>
x – 2 * y
![Page 76: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/76.jpg)
P Sentential Form input- <id,x> – term x – 2 * y5 <id,x> – term * factor x – 2 * y7 <id,x> – factor * factor x – 2 * y8 <id,x> – <num,2> *
factorx – 2 * y
- <id,x> – <num,2> * factor
x – 2 * y
- <id,x> – <num,2> * factor
x – 2 * y
9 <id,x> – <num,2> * <id,y>
x – 2 * y
- <id,x> – <num,2> * <id,y>
x – 2 * y
Success! We matched and consumed all the input
![Page 77: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/77.jpg)
Another Possible ParseP Sentential Form input- Goal x – 2 * y1 expr x – 2 * y2 expr +term x – 2 * y2 expr +term +term x – 2 * y2 expr +term +term +term x – 2 * y2 expr +term +term +term
+....x – 2 * y
consuming no input!!Wrong choice of expansion leads to non-terminationParser must make the right choice
![Page 78: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/78.jpg)
Left Recursion
Top-down parsers cannot handle left-recursive
grammars
![Page 79: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/79.jpg)
Left Recursion Our expression grammar is left recursive.
This can lead to non-termination in a top-down parser
Non-termination is bad in any part of a compiler
For a top-down parser, any recursion must be a right recursion
We would like to convert left recursion to right recursion
To remove left recursion, we transform the grammar
![Page 80: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/80.jpg)
Eliminating Left RecursionConsider a grammar fragment:
A → A | where neither nor starts with A.
![Page 81: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/81.jpg)
Eliminating Left RecursionWe can rewrite this as:
A → A'
A' → A' |
where A' is a new non-terminal
![Page 82: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/82.jpg)
Eliminating Left RecursionA → A ' A' → A'
|
This accepts the same language but uses only right recursion
![Page 83: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/83.jpg)
Eliminating Left Recursion
The expression grammar we have been using contains two cases of left- recursion
![Page 84: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/84.jpg)
Eliminating Left Recursion
expr → expr + term | expr – term | term
term → term * factor | term ∕ factor | factor
![Page 85: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/85.jpg)
Eliminating Left RecursionApplying the transformation yields
expr → term expr' expr' → + term expr'
| – term expr' |
![Page 86: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/86.jpg)
Eliminating Left RecursionApplying the transformation yields
term → factor term' term' → * factor term'
| ∕ factor term' |
![Page 87: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/87.jpg)
Eliminating Left Recursion These fragments use only
right recursion A top-down parser will
terminate using them.
![Page 88: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/88.jpg)
1 Goal → expr2 expr → term expr' 3 expr' → + term expr' 4 | – term expr'5 | 6 term → factor term' 7 term' → * factor term' 8 | ∕ factor term'9 | 10 factor → number11 | id12 | ( expr )
![Page 89: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/89.jpg)
Predictive Parsing If a top down parser picks the wrong
production, it may need to backtrack Alternative is to look ahead in input and use
context to pick correctly How much lookahead is needed? In general, an arbitrarily large amount Fortunately, large classes of CFGs can be
parsed with limited lookahead Most programming languages constructs fall in
those subclasses
![Page 90: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/90.jpg)
LL[1]....LL[K] PARSING scan input from Left to right do a Leftmost derivation use 1.. k symbols of lookahead is a top-down parsing technique
![Page 91: Module 2 Compiler and their Working Software Construction Lecture 10,11 and 12.](https://reader034.fdocuments.in/reader034/viewer/2022051007/5a4d1b307f8b9ab05999ab54/html5/thumbnails/91.jpg)
FURTHER IN ADVANCE COURSE …….
COMPILER CONSTRUCTION 7TH SEMESTER