Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... ·...
Transcript of Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... ·...
![Page 1: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/1.jpg)
Recognition of TokensLecture 3
Section 3.4
Robb T. Koether
Hampden-Sydney College
Mon, Jan 19, 2015
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 1 / 21
![Page 2: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/2.jpg)
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 2 / 21
![Page 3: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/3.jpg)
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 3 / 21
![Page 4: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/4.jpg)
A Class of Tokens
We will explore and demonstrate the concepts of a lexer by usinga simple class of tokens.
digit → [0-9]
digits → digit+
number → digits (. digits)? (E [+-]? digits)?
letter → [A-Za-z]
id → letter (letter | digit)∗
if → if
then → then
else → else
relop → < | > | <= | >= | = | <>
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 4 / 21
![Page 5: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/5.jpg)
Whitespace
In addition to recognizing tokens, the lexer must strip whitespacefrom the input.Whitespace is not a token, but it must be recognized by the lexer.
ws → (blank | tab | newline)+
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 5 / 21
![Page 6: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/6.jpg)
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 6 / 21
![Page 7: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/7.jpg)
The Input Buffer
The input to the lexer is a stream of characters.We may consider the characters to be residing in a buffer.We mark two positions in the buffer.
lexemeBeginforward
The pointer lexemeBegin holds the starting position of the currenttoken.The pointer forward points to the current symbol.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 7 / 21
![Page 8: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/8.jpg)
The Input Buffer
The lexer begins in the start state with the current symbol (pointedto by both lexemeBegin and forward).The process moves from state to state by following the transitionswhose labels match the current symbol (forward).This continues until no further moves are possible.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 8 / 21
![Page 9: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/9.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
The input buffer
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 10: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/10.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Advance one symbol
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 11: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/11.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Could be an identifier; could be a keyword
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 12: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/12.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
It is the keyword int
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 13: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/13.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Skip whitespace
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 14: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/14.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
Could be an identifier; could be a keyword
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 15: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/15.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
It is an identifier
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 16: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/16.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
This is an operator, but which one?
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 17: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/17.jpg)
The Input Buffer
Example (Processing a Statement)
c o u n t = 0 ;i n t
lexemeBegin
forward
It is the assignment operator
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 9 / 21
![Page 18: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/18.jpg)
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 10 / 21
![Page 19: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/19.jpg)
Transition Diagrams
Definition (Transition Diagram)A transition diagram is a directed graph.It consists of a finite set of nodes, called states.One state is designated the start state.The directed edges between states represent transitions.Each transition is labeled with a symbol (or possibly a regularexpression).A subset of the set of states is designated the accepting states.The remaining states are rejecting states.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 11 / 21
![Page 20: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/20.jpg)
Transition Diagrams
Example (Relational Operators)Consider the relational operators <, >, <=, >=, =, and <>.The first symbol may be <, >, =, or something else.If the first symbol is <, then the next symbol may be =, >, orsomething else.If the first symbol is >, then the next symbol may be = orsomething else.If the first symbol is =, then the next symbol is something else.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 12 / 21
![Page 21: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/21.jpg)
Transition Diagrams
Example (Relational Operators)
1
0
=
=
>
<
>
=
other
other
2
3
4
5
6
8
7
LE
NE
LT and retract
EQ
GE
GT and retract
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 13 / 21
![Page 22: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/22.jpg)
Transition Diagrams
Example (Identifiers)
9letter other
10 11 identifierand retract
letter | digit
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 14 / 21
![Page 23: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/23.jpg)
Transition Diagrams
Example (Keywords)
if other keyword
and retract
t h e n other keywordand retract
e l s e other keywordand retract
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 15 / 21
![Page 24: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/24.jpg)
Transition Diagrams
Draw the transition diagram for numbers
digit → [0-9]
digits → digit+
number → digits (. digits)? (E [+-]? digits)?
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 16 / 21
![Page 25: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/25.jpg)
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 17 / 21
![Page 26: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/26.jpg)
Writing the Lexer
The lexer is the program that implements the transition diagram.We could use
A switch statement, and/orAn if-else structure.
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 18 / 21
![Page 27: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/27.jpg)
Writing the Lexer
The Lexer for Relational OperatorsToken getRelop(){
int state = 0;char c = get_next_symbol();while (c == ’<’ || c == ’=’ || c == ’>’){
switch (state){
case 0:if (c == ’<’) state = 1;else if (c == ’=’) state = 2;else if (c == ’>’) state = 3;else fail();break;
...case 8:
retract();return Token(GT);
}}
}
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 19 / 21
![Page 28: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/28.jpg)
Outline
1 A Class of Tokens
2 The Input Buffer
3 Transition Diagrams
4 Writing the Lexer
5 Assignment
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 20 / 21
![Page 29: Recognition of Tokens - people.hsc.edupeople.hsc.edu/faculty-staff/robbk/Coms480/Lectures... · Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College](https://reader033.fdocuments.in/reader033/viewer/2022060307/5f09b52f7e708231d42821fa/html5/thumbnails/29.jpg)
Assignment
AssignmentRead Section 3.4.Exercises 1, 2(c)(i).
Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 21 / 21