Lecture 3: Lexical Analysis - Computer Science
Transcript of Lecture 3: Lexical Analysis - Computer Science
![Page 1: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/1.jpg)
The University of North Carolina at Chapel Hill
Lecture 3: Lexical Analysis
COMP 524 Programming Language Concepts Stephen OlivierJanuary 20, 2009
Based on notes by A. Block, N. Fisher, F. Hernandez-Campos, J. Prins and D. Stotts
![Page 2: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/2.jpg)
The University of North Carolina at Chapel Hill
Goal of Lecture
Scanner (lexical analysis)
Parser (syntax analysis)
Semantic analysis & intermediate code gen.
Machine-independent optimization (optional)
Target code generation.
Machine-specific optimization (optional)
Symbol Table
Character Stream
Token Stream
Parse Tree
Abstract syntax tree
Modified intermediate form
Machine language
Modified target language
This includes regular expressions.
![Page 3: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/3.jpg)
The University of North Carolina at Chapel Hill
Scanning
•The main task of scanning is to identify tokens.
![Page 4: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/4.jpg)
The University of North Carolina at Chapel Hill
Pseudo-Code Scanner (Fig 2.5)
We skip any initial white spaceswe read the next characterif it is a ( we look at the next characterif that is a * we have a comment;we skip forward through the terminating *)
otherwise we return a ( and reuse the look-aheadIf it is one of the one-character tokens ([],;=+- etc.)we return that token
...
![Page 5: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/5.jpg)
The University of North Carolina at Chapel Hill
Pseudo-Code Scanner (Fig 2.5)
We skip any initial white spaceswe read the next characterif it is a ( we look at the next characterif that is a * we have a comment;we skip forward through the terminating *)
otherwise we return a ) and reuse the look-aheadIf it is one of the one-character tokens ([],;=+- etc.)we return that token
...
We could just turn this into real code and use that as the scanner, and that would be
fine for small programs...
![Page 6: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/6.jpg)
The University of North Carolina at Chapel Hill
Pseudo-Code Scanner (Fig 2.5)
We skip any initial white spaceswe read the next characterif it is a ( we look at the next characterif that is a * we have a comment;we skip forward through the terminating *)
otherwise we return a ) and reuse the look-aheadIf it is one of the one-character tokens ([],;=+- etc.)we return that token
...
... However, for larger programs that must be correct a more formal approach is more
appropriate.
![Page 7: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/7.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 8: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/8.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε ) “→” denotes assignment
![Page 9: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/9.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε ) “|” denotes or
![Page 10: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/10.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε ) Thus, digit equal “0” or “1” or “2” or ....
![Page 11: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/11.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
“*” denotes zero or more of this type.
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 12: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/12.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
Two REs next to each other denotes concatenation.
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 13: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/13.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
So natural number equals at least“one non-zero digit” followed by
“zero or more digits”.
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 14: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/14.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
“ε” means empty.
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 15: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/15.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
So, what does this mean?
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 16: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/16.jpg)
The University of North Carolina at Chapel Hill
Regular expressions
digit → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
non_zero_digit → 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
natural_number → non_zero_digit digit*
It means “0 or a natural number” followed by “nothing” or by “. and zero or more
digits concluded by a non_zero number”
non_neg_number → (0 | natural_number) ( ( . digit* non_zero_digit) | ε )
![Page 17: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/17.jpg)
The University of North Carolina at Chapel Hill
Regular Expression Rules
• A RE consist of:
• A character (e.g., “0”, “1”, ...)
• The empty string (i.e., “ε”)
• Two REs next to each other (e.g., “non_negative_digit digit”) to denote concatenation.
• Two REs separated by “|” next to each other (e.g., “non_negative_digit | digit”) to denote one RE or the other.
• An RE followed by “*” (called the Kleene star) to denote zero or more iterations of the RE.
• Parentheses (in order to avoid ambiguity).
![Page 18: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/18.jpg)
The University of North Carolina at Chapel Hill
Regular Expression Rules
• A RE consist of:
• A character (e.g., “0”, “1”, ...)
• The empty string (i.e., “ε”)
• Two REs next to each other (e.g., “non_negative_digit digit”) to denote concatenation.
• Two REs separated by “|” next to each other (e.g., “non_negative_digit | digit”) to denote one RE or the other.
• An RE followed by “*” (called the Kleene star) to denote zero or more iterations of the RE.
• Parentheses (in order to avoid ambiguity).
A RE is NEVER defined in terms of itself!Thus, REs cannot define recursive
statements.
![Page 19: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/19.jpg)
The University of North Carolina at Chapel Hill
Regular Expression Rules
• A RE consist of:
• A character (e.g., “0”, “1”, ...)
• The empty string (i.e., “ε”)
• Two REs next to each other (e.g., “non_negative_digit digit”) to denote concatenation.
• Two REs separated by “|” next to each other (e.g., “non_negative_digit | digit”) to denote one RE or the other.
• An RE followed by “*” (called the Kleene star) to denote zero or more iterations of the RE.
• Parentheses (in order to avoid ambiguity).
The set of tokens that can be recognized by regular expressions is called a regular set.
![Page 20: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/20.jpg)
The University of North Carolina at Chapel Hill
Deterministic finite automaton (DFA)
•Every regular set can be defined by using deterministic finite automaton (DFA).
• DFAs are turing machines that have a finite number of states and deterministically move between states.
S
1 1
1
0
00
![Page 21: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/21.jpg)
The University of North Carolina at Chapel Hill
Deterministic finite automaton (DFA)
•Every regular set can be defined by using deterministic finite automaton (DFA).
• DFAs are turning machines that have a finite number of states and deterministically move between states.
S
1 1
1
0
Start State
00
![Page 22: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/22.jpg)
The University of North Carolina at Chapel Hill
Deterministic finite automaton (DFA)
•Every regular set can be defined by using deterministic finite automaton (DFA).
• DFAs are turning machines that have a finite number of states and deterministically move between states.
S
1 1
1
0
Intermediate State
00
![Page 23: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/23.jpg)
The University of North Carolina at Chapel Hill
Deterministic finite automaton (DFA)
•Every regular set can be defined by using deterministic finite automaton (DFA).
• DFAs are turning machines that have a finite number of states and deterministically move between states.
S
1 1
1
00
0
End State (double circle)
![Page 24: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/24.jpg)
The University of North Carolina at Chapel Hill
Deterministic finite automaton (DFA)
•Every regular set can be defined by using deterministic finite automaton (DFA).
• DFAs are turning machines that have a finite number of states and deterministically move between states.
S
1 1
1
00
0
This stands for:0*10*1(0|1)*
![Page 25: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/25.jpg)
The University of North Carolina at Chapel Hill
Constructing DFAs
•A DFA can be constructed from a RE via two steps.
1. Construct a nondeterministic finite automaton (NFA) from the RE.
2. Construct a DFA from the NFA.
3. Minimize the DFA
![Page 26: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/26.jpg)
The University of North Carolina at Chapel Hill
What is an NFA?
•An NFA is similar to a DFA, except that state transitions are nondeterministic.
•This nondeterminism is encapsulated via the epsilon transition (written as ε).
S
0ε
ε 1
![Page 27: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/27.jpg)
The University of North Carolina at Chapel Hill
What is an NFA?
•An NFA is similar to a DFA, except that state transitions are nondeterministic.
•This nondeterminism is encapsulated via the epsilon transition (written as ε).
S
0ε
ε 1
This stands for:0|1
![Page 28: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/28.jpg)
The University of North Carolina at Chapel Hill
What is an NFA?
•An NFA is similar to a DFA, except that state transitions are nondeterministic.
•This nondeterminism is encapsulated via the epsilon transition (written as ε).
S
0ε
ε 1
The ε transitions imply that either transition can be taken with any (or
no) input.
![Page 29: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/29.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
S
Rule 1--Base case: “a”
a
![Page 30: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/30.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
Rule 2--Concatenation: “AB”
S
A
S
B
S
AB
plus
![Page 31: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/31.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
Rule 2--Concatenation: “AB”
S
A
S
B
S
AB
plus
Stands for Some NFA called “A”
![Page 32: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/32.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
Rule 3--Alternation: “A|B”
S
A
S
B
B
S
or
A
A|B
ε
ε
ε
ε
![Page 33: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/33.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
Rule 3--Alternation: “A|B”
S
A
S
B
B
S
or
A
A|B
ε
ε
ε
ε
Notice the epsilon transitions.
![Page 34: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/34.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
Rule 4--Kleene Closure: “A*”
S
A
S
empty or repeated
A
A*
ε ε
ε
ε
![Page 35: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/35.jpg)
The University of North Carolina at Chapel Hill
The four RE rules and NFA
Rule 4--Kleene Closure: “A*”
S
A
S
empty or repeated
A
A*
ε ε
ε
ε
Notice the epsilon transitions.
![Page 36: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/36.jpg)
The University of North Carolina at Chapel Hill
Some examples:
•0|1*
•AB*
•F|(GH*)
•Z*|ε|Y*X
S
S
B
S
Aε
ε
ε
ε
S Aε ε
ε
ε
a
BA
![Page 37: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/37.jpg)
The University of North Carolina at Chapel Hill
Constructing DFAs
•A DFA can be constructed from a RE via two steps.
1. Construct a nondeterministic finite automaton (NFA) from the RE.
2. Construct a DFA from the NFA.
3. Minimize the DFA
![Page 38: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/38.jpg)
The University of North Carolina at Chapel Hill
Constructing DFAs
•A DFA can be constructed from a RE via two steps.
1. Construct a nondeterministic finite automaton (NFA) from the RE.
FOR_KEYWORD → for
IDENTIFIER → Alph ( Alph | Dig )*
REAL → Dig Dig* . Dig*
INT → Dig Dig*
![Page 39: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/39.jpg)
The University of North Carolina at Chapel Hill
Constructing DFAs
•A DFA can be constructed from a RE via two steps.
1. Construct a nondeterministic finite automaton (NFA) from the RE.
2. Construct a DFA from the NFA.
3. Minimize the DFA
![Page 40: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/40.jpg)
The University of North Carolina at Chapel Hill
Constructing a DFA from an NFA.
•Construct the DFA by “collapsing” the states of an NFA.
•Three steps
1.Identify set of states that can be reached from the start state via epsilon-transitions and make this one state.
2.For a given DFA state (which is a set of NFA states) consider each possible input and combine the resulting NFA states into one DFA state.
3.Repeat Step 2 until all states have been added.
![Page 41: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/41.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
1
1
Start
NFA DFA
C,G,E,F
B,D,F
G,E,F
00
0
0|1*00*
ε
![Page 42: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/42.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
1
Start
NFA DFA
All the states that we can reach via ε are in this state
C,G,E,F
B,D,F
G,E,F
0
0
0|1*00*
ε
1
0
![Page 43: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/43.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
1
Start
NFA DFA
C,G,E,F
B,D,F
G,E,F
0
0
0|1*00*
All the states that we can reach via 0 or ε
from SABF
ε
1
0
![Page 44: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/44.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
C,G,E,F
B,D,F
1
Start
NFA DFA
G,E,F
0
0
0|1*00*
All the states that we can reach via 1 or ε
from SABF
ε
1
0
![Page 45: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/45.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
C,G,E,F
B,D,F
1
Start
NFA DFA
G,E,F
0
0
0|1*00*
All the states that we can reach via 0 or ε from B,D,F or
C,G,F
ε
1
0
![Page 46: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/46.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
C,G,E,F
B,D,F
1
Start
NFA DFA
G,E,F
0
0
0|1*00* Self Loop.
ε
1
0
![Page 47: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/47.jpg)
The University of North Carolina at Chapel Hill
An example
ES
A
B D
ε C
ε1
0
ε
ε
F
ε
G0ε
ε
0
S,A,B,F
C,G,E,F
B,D,F
1
Start
NFA DFA
G,E,F
0
0
0|1*00*
ε
1
0
![Page 48: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/48.jpg)
The University of North Carolina at Chapel Hill
Minimize via partitioning
•First, partition states into final and non-final
•Second, determine the effect of the state transition based on what partition the transition goes to.
•Third, Create new partition for those states that have different transitions.
•Fourth, repeat.
![Page 49: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/49.jpg)
The University of North Carolina at Chapel Hill
An example
0
S,A,B,F
C,G,E,F
B,D,F
1
Start
DFA
G,E,F
0
0
1
0
State 0 1
SABF X-2 X-1
BDF X-2 X-1
CGEF X-2 N/A
GEF X-2 N/A
X-1 X-2
![Page 50: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/50.jpg)
The University of North Carolina at Chapel Hill
An example
0
X-1
X-2
1
Start
State 0 1
SABF X-2 X-1
BDF X-2 X-1
CGEF X-2 N/A
GEF X-2 N/A
X-1 X-2
0
![Page 51: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/51.jpg)
The University of North Carolina at Chapel Hill
Scanner Code
•Can create Scanner from the DFA one of two ways:
• Nested case statements (Handwritten)
• Tables (easy to generate from code, hard to write by hand)
![Page 52: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/52.jpg)
The University of North Carolina at Chapel Hill
![Page 53: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/53.jpg)
The University of North Carolina at Chapel Hill
Two complications--Nested Case
•Keywords
• It is possible to maintain a DFA for keywords, but the number of states would be even larger! So, they are handled as exceptions to the rule.
•“Dot-Dot”
• Pascal uses “..” to denote a range of numbers; however, to determine the meaning of the “..” we need to “look ahead” after reading the first “.” to determine if “.” denotes the end of a token or a beginning of a new token.
• “3.14” one token
• “2 .. 5” three tokens
![Page 54: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/54.jpg)
The University of North Carolina at Chapel Hill
This code specifies a two-dimensional transition table,
which tells “whether to move, return token, or announce error”
![Page 55: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/55.jpg)
The University of North Carolina at Chapel Hill
A second table tells when we might have hit the end of a token
(for backing up)
![Page 56: Lecture 3: Lexical Analysis - Computer Science](https://reader030.fdocuments.in/reader030/viewer/2022020912/620323fefa72f352824427d7/html5/thumbnails/56.jpg)
The University of North Carolina at Chapel Hill
Pragmas
•Pragmas are “comments” that provide direction for the compiler.
• For example, “Variable x is used a lot, keep it in memory if possible.”
•These are often handled by the parser since this makes the grammar much simpler.