Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the...
-
Upload
rachel-jordan -
Category
Documents
-
view
222 -
download
2
Transcript of Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the...
![Page 1: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/1.jpg)
Lexical Analysis - An Introduction
![Page 2: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/2.jpg)
The Front End
The purpose of the front end is to deal with the input language
Perform a membership test: code source language? Is the program well-formed (semantically) ? Build an IR version of the code for the rest of the compiler
Sourcecode
FrontEnd
Errors
Machinecode
BackEnd
IR
![Page 3: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/3.jpg)
The Front End
Scanner Maps stream of characters into words
Basic unit of syntax x = x + y ; becomes <id,x> <eq,=> <id,x> <pl,+> <id,y> <sc,; >
Characters that form a word are its lexeme Its part of speech (or syntactic category) is called its token type Scanner discards white space & (often) comments
Sourcecode Scanner
IRParser
Errors
tokens
Speed is an issue in scanning
use a specialized recognizer
![Page 4: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/4.jpg)
The Front End
Parser Checks stream of classified words (parts of
speech) for grammatical correctness Determines if code is syntactically well-
formed Guides checking at deeper levels than syntax Builds an IR representation of the code
Sourcecode Scanner
IRParser
Errors
tokens
![Page 5: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/5.jpg)
The Big Picture Language syntax is specified with parts of
speech, not words Syntax checking matches parts of speech
against a grammar
1. goal expr
2. expr expr op term3. | term
4. term number5. | id
6. op +7. | –
S = goal
T = { number, id, +, - }
N = { goal, expr, term, op }
P = { 1, 2, 3, 4, 5, 6, 7}
![Page 6: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/6.jpg)
The Big Picture Language syntax is specified with parts of
speech, not words Syntax checking matches parts of speech
against a grammar1. goal expr
2. expr expr op term3. | term
4. term number5. | id
6. op +7. | –
S = goal
T = { number, id, +, - }
N = { goal, expr, term, op }
P = { 1, 2, 3, 4, 5, 6, 7}
No words here!
Parts of speech, not words!
![Page 7: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/7.jpg)
The Big Picture
Scanner
ScannerGenerator
specifications
source code parts of speech & words
tables or code
Specifications written as “regular expressions”
Represent words as indices into a global table
![Page 8: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/8.jpg)
The Big Picture
Why study lexical analysis? Goals:
To simplify specification & implementation of scanners
To understand the underlying techniques and technologies
![Page 9: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/9.jpg)
How to implement a scanner
Regular Expressions
NFA DFA
![Page 10: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/10.jpg)
Regular Expressions
Lexical patterns form a regular language
*** any finite language is regular ***
Regular expressions (REs) describe regular languages
Ever type “rm *.o a.out” ?
![Page 11: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/11.jpg)
Regular Expressions
Regular Expression (over alphabet )
is a RE denoting the set {}
If a is in , then a is a RE denoting {a}
If x and y are REs denoting L(x) and L(y) then x |y is an RE denoting L(x) L(y) xy is an RE denoting L(x)L(y) x* is an RE denoting L(x)*
![Page 12: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/12.jpg)
Set Operations (review)
Operation Definition
Union of L and MWritten L M
L M = {s | s L or s M }
Concatenation of Land M
Written LM
LM = {st | s L and t M }
Kleene closure of LWritten L*
L* = 0i Li
Positive Closure of LWritten L+
L+ = 1i Li
![Page 13: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/13.jpg)
Examples of Regular Expressions
Identifiers:
Letter (a|b|c| … |z|A|B|C| … |Z)
Digit (0|1|2| … |9)
Identifier Letter ( Letter | Digit )*
Numbers:
Integer (+|-|) (0| (1|2|3| … |9)(Digit *) )
Decimal Integer . Digit *
Real ( Integer | Decimal ) E (+|-|) Digit *
Complex ( Real , Real )
![Page 14: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/14.jpg)
Regular Expressions (the point)Regular expressions can be used to specify the words
to be translated to parts of speech by a lexical analyzer
Using results from automata theory and theory of algorithms, we can automatically build recognizers from regular expressions
We study REs and associated theory to automate scanner construction !
![Page 15: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/15.jpg)
Consider the problem of recognizing ILOC register names
Register r (0|1|2| … | 9) (0|1|2| … | 9)*
Allows registers of arbitrary number Requires at least one digit
RE corresponds to a recognizer (or DFA)
Example
S0 S2 S1
r
(0|1|2| … 9)
accepting state
(0|1|2| … 9)
Recognizer for Register
![Page 16: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/16.jpg)
DFA operation Start in state S0 & take transitions on each input character
DFA accepts a word x iff x leaves it in a final state (S2 )
So, r17 takes it through s0, s1, s2 and accepts
r takes it through s0, s1 and fails
Example
S0 S2 S1
r
(0|1|2| … 9)
accepting state
(0|1|2| … 9)
Recognizer for Register
![Page 17: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/17.jpg)
Example (continued)To be useful, recognizer must turn into code
Char next characterState s0
while (Char EOF) State (State,Char) Char next character
if (State is a final state ) then report success else report failure
Skeleton recognizer
Table encoding RE
r0,1,2,3,4,5,6,7,
8,9
All other
s
s0 s1 se se
s1 se s2 se
s2 se s2 se
se se se se
![Page 18: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/18.jpg)
Example (continued)
r0,1,2,3,4,5,6,7,
8,9
All other
s
s0 s1
startse
errorse
error
s1 se
error
s2
addse
error
s2 se
error
s2
addse
error
se se
error
se
errorse
error
Char next characterState s0
while (Char EOF) State (State,Char) perform specified action Char next character
if (State is a final state ) then report success else report failure
Skeleton recognizer Table encoding RE
![Page 19: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/19.jpg)
Algorithm Project 1
Open a file to read fromOpen a file to write to
Open a file to read fromOpen a file to write to
Create a scanner object
Create a scanner object
Call a method from Scanner class to scan, classify each token and write to the output file
Call a method from Scanner class to scan, classify each token and write to the output file
![Page 20: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/20.jpg)
Algorithm scannerRead a line from input fileci= first character of this line
Read a line from input fileci= first character of this line
Recognize the meta characterCurrent Token = meta character
Print out token with new line
Recognize the meta characterCurrent Token = meta character
Print out token with new line
ci==‘#’ || (ci==‘/’ && ci+1 ==‘/’
ci==‘#’ || (ci==‘/’ && ci+1 ==‘/’
false
true
Ci ==‘”’ Ci ==‘”’true Recognize the string (read until you
reach another “)Current Token = string
Print out the token
Recognize the string (read until you reach another “)
Current Token = stringPrint out the token
Ci is a digit Ci is a digit Recognize the number Current Token = number
Print out the token
Recognize the number Current Token = number
Print out the token
Ci is a letter
Ci is a letter Recognize
the id
Recognize the id
token is not a keyword
token is not a keyword
true
false
falsetrue
Token is an ID, print with
tag
Token is an ID, print with
tag
true
falseToken is a keyword
Print the token
Token is a keyword
Print the token
Ci is symbol
Ci is symbol
trueRecognize
the symbol
Recognize the
symbolPrint
false
false
i+= len of token
i+= len of token
i+= len of token
i+= len of token
i+= len of token
i+= len of token
Print ci
i++
Print ci
i++ i++i++
i+= len of token
i+= len of token
Ci is symbol
Ci is symbol
Not the end of file
Not the end of file
the end of line
the end of line
the end of file the end of file
Read another
line
Read another
line
![Page 21: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/21.jpg)
Sample:test1.c
#include <stdio.h>
void sample() {
int b=4;
printf("Helloworld %d",b);
}
int main() {
sample();
}
![Page 22: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/22.jpg)
Token list#include <stdio.h> ---- metavoid ---- keywordsample ---- id( ---- symbol) ---- symbol{ ---- symbolint ---- 2b ---- id= ---- symbol4 ---- number; ---- symbolprintf ---- keyword( ---- symbol"Helloworld %d" ---- string, ---- symbolb ---- id) ---- symbol; ---- symbol} ---- symbolint ---- keywordmain ---- id( ---- symbol) ---- symbol{ ---- symbolsample ---- symbol( ---- symbol) ---- symbol; ---- symbol} ---- symbol
![Page 23: Lexical Analysis - An Introduction. The Front End The purpose of the front end is to deal with the input language Perform a membership test: code source.](https://reader035.fdocuments.in/reader035/viewer/2022062423/56649eab5503460f94bb199a/html5/thumbnails/23.jpg)
Result
#include <stdio.h>
void CS322sample() {
int CS322b=4;
printf("Helloworld %d",CS322b);
}
int CS322main() {
CS322sample();
}