Compiler Design Lexical Analysis -...
Transcript of Compiler Design Lexical Analysis -...
![Page 1: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/1.jpg)
Compiler Design
Lexical Analysis
Design of a Lexical-Analyzer
Generator
conf. dr. ing. Ciprian-Bogdan Chirila
http://www.cs.upt.ro/~chirila
![Page 2: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/2.jpg)
Outline
The Structure of the Generated Analyzer
Pattern Matching Based on NFA’s
DFA’s for Lexical Analyzers
Implementing the Lookahead Operator
![Page 3: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/3.jpg)
Objectives
to present the architecture of Lex
to discuss two approaches
◦ NFA based
◦ DFA based
implementation of Lex
![Page 4: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/4.jpg)
The Structure of the Generated
Lexical Analyzer fixed program that simulates an automaton
◦ deterministic
◦ nondeterministic
transition table for the automaton
functions that are passed directly through Lex to the output (we will see next)
actions from the input program
◦ as fragments of code
◦ to be invoked at the appropriate time by the automaton simulator
![Page 5: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/5.jpg)
Architecture of a Lexical Analyzer
Generated by Lex
![Page 6: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/6.jpg)
The Generation Process
each regular expression pattern is
transformed into NFA
all NFAs are combined into one
◦ new ε-transitions are added to NFAs Ni for
pattern pi
![Page 7: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/7.jpg)
Example
![Page 8: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/8.jpg)
Example
patterns
◦ a {action A1 for pattern p1}
◦ abb {action A2 for pattern p2}
◦ a*b+ {action A3 for pattern p3}
when several prefixes on the input matches
multiple patterns
◦ always prefer a longer prefix to a shorter prefix
◦ if the longest possible prefix matches multiple
patterns choose the pattern listed first
the lexeme “abb” is taken by the second rule
![Page 9: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/9.jpg)
Conflict Resolution
the three patterns present some conflicts
abb matches p2 and p3
◦ we consider it a lexeme for p2
◦ p2 is listed above p3
aabbbb…
◦ we take the longest lexeme until another a is
reached
◦ we will report the lexeme from the initial a
followed by as many b as there are
![Page 10: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/10.jpg)
Example
![Page 11: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/11.jpg)
Pattern Matching Based on NFA’s
NFA simulation algorithmS=ε-closure(s0);
c=nextChar();
while(c!=eof)
{
S=ε-enclosure(move(S,c));
c=nextChar();
}
if(S∩F!=ø) return “yes”;
else return “no”;
![Page 12: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/12.jpg)
Example input a a b a
![Page 13: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/13.jpg)
Example input a a b a
![Page 14: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/14.jpg)
Example input a a b a
![Page 15: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/15.jpg)
Example input a a b a
pattern a*b+ was found !!!
![Page 16: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/16.jpg)
DFAs Architecture for Lexical
Analyzers to convert NFA for all patterns into DFA
◦ by using the subset construction algorithm
within each DFA state having one or
more NFA accepting states
◦ to determine the first pattern whose
accepting state is represented
◦ to make that pattern the output of the DFA
state
![Page 17: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/17.jpg)
The Subset Construction Algorithm
while(there is an unmarked state T in Dstates)
{
mark T;
for(each input symbol a)
{
U=ε-closure(move(T,a));
if (U is not in Dstates)
add U as unmarked state to Dstates;
Dtran[T,a]=U;
}
}
![Page 18: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/18.jpg)
NFA Example
![Page 19: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/19.jpg)
NFA to DFA Example
![Page 20: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/20.jpg)
DFA Simulation Example a b b a
![Page 21: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/21.jpg)
DFA Simulation Example a b b a
![Page 22: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/22.jpg)
DFA Simulation Example a b b a
![Page 23: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/23.jpg)
Dead States in DFA’s
the automaton not quite a DFA
◦ no transitions on every state x every input
we have omitted
◦ transitions to the dead state Ø
◦ from the dead state Ø to itself
![Page 24: Compiler Design Lexical Analysis - staff.cs.upt.rostaff.cs.upt.ro/~chirila/teaching/upt/mse11-cd/lectures/cd0308.pdf · Lexical Analyzer fixed program that simulates an automaton](https://reader035.fdocuments.in/reader035/viewer/2022081613/5fbe549ce165ae5bc507befb/html5/thumbnails/24.jpg)
Bibliography
Alfred V. Aho, Monica S. Lam, Ravi Sethi,
Jeffrey D. Ullman – Compilers, Principles,
Techniques and Tools, Second Edition,
2007