Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING...
-
Upload
peter-morrison -
Category
Documents
-
view
215 -
download
3
Transcript of Unit-3 Parsing Theory (Syntax Analyzer) PREPARED BY: PROF. HARISH I RATHOD COMPUTER ENGINEERING...
Unit-3 Parsing Theory (Syntax Analyzer)PREPARED BY:
PROF. HARISH I RATHOD
COMPUTER ENGINEERING DEPARTMENT
GUJARAT POWER ENGINEERING & RESEARCH INSTITUTE
COMPILER DESIGN (170701)
Introduction
• Syntax analysis is the second phase after lexical analysis in compiler design.
• It basically checks the syntax of the language.
• It takes the token from lexical analyzer and groups them in such a way that some programming structure can be recognized.
GPERI – CD - UNIT-3 2
Introduction
• After grouping the tokens if any syntax cannot be recognized then syntactic error will be generated.
• It is a major component of the front end of a compiler.• For the syntactic specification of a programming language, use a
notation called context free grammar.
GPERI – CD - UNIT-3 3
Role of the parser
• It obtains a string of tokens from the lexical analyzer.
• Group the tokens to identify large structure in the program.
• It should be report any syntax error in the program.
• It should be recover from the error so that it can continue to process the rest of the input.
GPERI – CD - UNIT-3 4
Role of the parser
• .
GPERI – CD - UNIT-3 5
Lexical analyzer Parser
Symbol Table
Source Program
Token
getNextToken
Parse Tree
Syntax Error
Context-Free Grammar
• Grammar involves four quantities:• Terminals,• Non-terminals,• A start symbol and• Production.
• One non-terminal is selected as a start symbol.• Each production consist of a non-terminal, followed by an
arrow () or (:=) followed by a string of non-terminals and terminals.
GPERI – CD - UNIT-3 6
Context-Free Grammar
• A context free grammar (CFG) is defined:
• As 4-tuples (VN, ∑, P, S).• Where:• VN = Set of non-terminals• ∑ = Set of terminals.• S = A start symbol.• P = Set of production rules.
• One non-terminal finite string of terminals and/or non-terminals.
GPERI – CD - UNIT-3 7
Context-Free Grammar
• Example.
stmt if ( expr ) stmt else stmt• Where:• Non-terminals: stmt, expr• Terminals: if, ( , ), else• Start symbol: stmt
GPERI – CD - UNIT-3 8
Context-Free Grammar
• Example.expression -> expression + termexpression -> expression – termexpression -> termterm -> term * factorterm -> term / factorterm -> factor
GPERI – CD - UNIT-3 9
Context-Free Grammar
• Example:factor -> ( expression )factor -> id
GPERI – CD - UNIT-3 10
Context-Free Grammar
• Notational Conventions:• Terminal symbols:• Lower case letters such as a,b,c.• Operator symbols such as +, *, -, / etc.• Punctuation symbols such as parentheses, comma and so on.• The digits 0,1, ….. , 9.• Bold face string such as id or if, each of which represents a single
terminal symbol.
GPERI – CD - UNIT-3 11
Context-Free Grammar
• Notational Conventions:• Non-terminal symbols:• Uppercase letters, such as A, B, C. • The letter S, when it appears, it usually the start symbol.• Lowercase, italic such as expr or stmt.
GPERI – CD - UNIT-3 12
Derivation
• The construction of parse tree can be precise by taking a derivational view,
• In which each productions are treated as rewriting rules.• Beginning with start symbol, • Each rewriting step replace a non-terminal by the body of one
of its production.• E E + E | E * E | - E | ( E ) | id
GPERI – CD - UNIT-3 13
Derivation
• list list + digit • list list – digit• list digit• digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
GPERI – CD - UNIT-3 14
Derivation
list=> list + digit=> list – digit + digit=> digit – digit + digit=> 9 – digit + digit=> 9 – 5 + digit=> 9 – 5 + 2
GPERI – CD - UNIT-3 15
Derivation
• This is an example leftmost derivation, • because we replaced the leftmost nonterminal (underlined) in
each step. • Likewise, a rightmost derivation replaces the rightmost
nonterminal in each step.
GPERI – CD - UNIT-3 16
Derivation
• Construct a CFG, for the language L = {w c w : w ϵ (a,b)*}.• Sol,
• G = (VN,∑,P,S)• Here,
• VN = {S}, ∑ = {a,b,c}• Production rule P is defined as
GPERI – CD - UNIT-3 17
S -> a S aS -> b S bS -> c
Parse Tree
• The string generated by a context free grammar can be represented by a hierarchical structure called tree.
• Such tree representing derivations are called derivation trees or parse tree or syntax tree.
GPERI – CD - UNIT-3 18
Parse Tree
• Characteristics of parse tree:• The root of the tree is labeled by the start symbol.• Each leaf of the tree is labeled by a terminal (token or ϵ).• Each interior node is labeled by a nonterminal.• If A → X1 X2 … Xn is a production, then node A has immediate
children X1, X2, …, Xn where Xi is a (non)terminal or ε (ε denotes the empty string)
GPERI – CD - UNIT-3 19
Parse Tree - Example
GPERI – CD - UNIT-3 20
list
digit
list digit
list
digit
9 - 5 + 2
Exercise
• Write a CGF, which generates strings having equal number of a’s and b’s:
• Sol:
• CGF, G = (VN,∑,P,S) where VN = {S}, ∑ = {a,b}• P is defined as:• S -> aSb • S -> bSa • S -> ^
GPERI – CD - UNIT-3 21
Exercise
• Construct a CGF for the language L = {anbn : n >= 1}• Sol:
• CGF, G = (VN,∑,P,S) where VN = {S}, ∑ = {a,b}• P is defined as:• S -> aSb • S -> ab
GPERI – CD - UNIT-3 22
Exercise
• Write a CGF, which generates string of balanced parenthesis.• Sol: Grammar will accept the balanced right and left
parenthesis. e.g. (), ((( ))),
• CGF, G = (VN,∑,P,S) where VN = {S}, ∑ = { ( , )}• P is given by:• S -> SS• S -> (S)• S -> ^
GPERI – CD - UNIT-3 23
Exercise
• A CGF given by the productions is:• S -> a | a A S• A -> bS• Obtain the derivation tree of the word : a b a a b a a.
GPERI – CD - UNIT-3 24
Exercise
• Given the grammar G = (VN,∑,P,S) where VN = {E}, S = E ,• ∑ = {id,+,*,c} and P consist of• E -> E + E | E * E | (E) | id• Obtain the derivation tree for id*id + id and (id+id)*id
GPERI – CD - UNIT-3 25
Ambiguity
• A grammar is said to be ambiguous,• If there exist more than one parse tree for the same sentence.• Example:• S -> aSbS | bSaS | ϵ• For the string “abab” have two different parse tree.
GPERI – CD - UNIT-3 26
Ambiguity
• A classical example of ambiguous grammar is that of:• if-then-else construct of many programming language.• Most of the language have both if-then and if-then-else versions
of the statement.• The grammar rules for it as follows:• stmt -> if condition then stmt else stmt
| if condition then stmt
GPERI – CD - UNIT-3 27
Ambiguity
• Consider the following code segment:• If a>b then
if c>d then x=yelse x=z
GPERI – CD - UNIT-3 28
Ambiguity
• Leftmost derivation
GPERI – CD - UNIT-3 29
stmt
if condition then stmt else stmt
if condition then stmt
a>b x=z
c>d x=y
Ambiguity
• Rightmost derivation
GPERI – CD - UNIT-3 30
stmt
if condition then stmt
if condition then stmt
a>b
x=zc>d x=y
else stmt
Eliminating Ambiguity
• Ambiguities may be eliminated by rewriting the grammar:• If-then-else grammar may be rewritten as:
stmt -> m_stmt | un_stmtm_stmt -> if condition then m_stmt else m_stmt
| other_stmtunm_stmt -> if condition then stmt
| if condition then m_stmt else unm_stmt
GPERI – CD - UNIT-3 31
Eliminating Ambiguity
• Another technique is to modify the language a bit.• Many language require that an if should have a matching endif.• Thus the grammar is modified as• stmt -> if condition then stmt else stmt endif
| if condition then stmt endif
GPERI – CD - UNIT-3 32
Eliminating Ambiguity
• Example: Grammar
GPERI – CD - UNIT-3 33
E -> IE -> E + EE -> E * EE -> (E)I -> a | b | c
• Ambiguity is due to the precedence of operator, if we correct the precedence then ambiguity may be removed.
• Here two causes of ambiguity:1. The precedence of operator is not
respected.2. The sequence of identical operators can
group either from left or from right.• .
Eliminating Ambiguity
• The unambiguous grammar.
GPERI – CD - UNIT-3 34
E -> TT -> FF -> IE -> E + TT -> T * FF -> (E)I -> a | b | c
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 35
E
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 36
E
+ TE
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 37
E
+ TE
T
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 38
E
+ TE
T
F
I
a
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 39
E
+ TE
T
F
I
a
T * F
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 40
E
+ TE
T
F
I
a
T * F
F
I
b
Eliminating Ambiguity
• The solve parse tree for a + b * c
GPERI – CD - UNIT-3 41
E
+ TE
T
F
I
a
T * F
F
I
b
I
c
Left Recursion
• A grammar is left recursive if it has a nonterminal, say A, that has a derivation of Aα from it.
• Presence of left recursion creates difficulties while designing parsers.
• Types of left recursion:• Immediate left recursion• General left recursion
GPERI – CD - UNIT-3 42
Left Recursion
• Immediate left recursion:• It happen with a nonterminal A having production rule of the
form : A -> Aα • OR• The production is recursive if the leftmost symbol on right side
is the same as non-terminal of the left side, for example:• A -> Aα
GPERI – CD - UNIT-3 43
Left Recursion
• Immediate left recursion: (Continue..)• It can be eliminated by introducing a new nonterminal symbol,
say A’.• Modify the grammar:
A -> βA’A’ -> αA’ | ϵ
GPERI – CD - UNIT-3 44
Left Recursion
• Immediate left recursion: (Continue..)• Thus the rule.
A -> Aα1| Aα2|…….| Aαm|β1| β1|…..…| βn
A -> β1A’| β2A’|……| βnA’
A’ -> α1A’| α2A’|……. |αmA’|ϵ
GPERI – CD - UNIT-3 45
Left Recursion
• Immediate left recursion: (Continue..)• Example.
E -> E + T | TT -> T * F | FF -> (E) | id
GPERI – CD - UNIT-3 46
E -> TE’E’ -> +TE’ | ϵT -> FT’T’ -> *FT’ | ϵF -> (E) | id
Left Recursion
• General left recursion: (Continue..)• If there may be no immediate left recursion, a number of
production rules may act together to give a general left recursion.
• For example:S -> AaA -> Sb | c
GPERI – CD - UNIT-3 47
Here, S is left recursive, because:S -> Aa -> Sba
Left Recursion
• Algorithm eliminate left recursion:
1. Arrange non-terminals in some order say A1,A2,….,Am
2. For i = 1 to m do
for j = 1 to i-1 do
for each set of production Ai -> Ajγ and
Aj -> ᵟ1 | ᵟ2 | …….|ᵟk
replace Ai -> Ajγ by Ai -> ᵟ1γ | ᵟ2γ |…..|ᵟkγ
3. Eliminate immediate felt recursion from all production.GPERI – CD - UNIT-3 48
Left Recursion
• Example:S -> AaA -> Sb | c
GPERI – CD - UNIT-3 49
The order of non-terminals S,A.For i = 1, the rule S -> Aa, no immediate left recursionFor i = 2, A -> Sb | c is modified as, A -> Aab | c, which has immediate left recursion, eliminated by modifying the rule as:A -> cA’A’ -> abA’ | ϵ