CSC3315 (Spring 2009) 1
CSC 3315CSC 3315Lexical and Syntax Lexical and Syntax AnalysisAnalysis
Hamid HarroudHamid HarroudSchool of Science and Engineering, Akhawayn School of Science and Engineering, Akhawayn
UniversityUniversityhttp://www.aui.ma/~H.Harroud/csc3315/
Constructing a Lexical Analyzer
state = S // S is the start state
repeat {k = next character from the input
if k == EOF // the end of inputif state is a final state then accept
else reject
state = T[state,k]
if state = empty then reject // got stuck
}
Constructing a Lexical Analyzer
Constructing a Lexical Analyzer
int LexAnalyzer() {getChar();if (isLetter(nextChar)) {
addChar();getChar();while (isLetter(nextChar) || isDigit(nextChar)){ addChar(); getChar();}return lookup(lexeme);
} . . .
Constructing a Lexical Analyzer
int LexAnalyzer() {getChar();if (isLetter(nextChar)) { . . .}else if (isDigit(nextChar)) {
addChar();getChar();while (isDigit(nextChar)) { addChar(); getChar();}return INT_LIT;break;
}}
Lexical Errors
Consider the following two programs:
Lexical Errors
Jlex: a scanner generator
JLex.Main(java)
JLex.Main(java)
javacjavac
P.main(java)P.main(java)
jlex specificationxxx.jlex
xxx.jlex.java
generated scannerxxx.jlex.java
Yylex.class
Yylex.class
input programtest.sim
Output of P.main
public class P {public static void main(String[] args) {
FileReader inFile = new FileReader(args[0]); Yylex scanner = new Yylex(inFile);
Symbol token = scanner.next_token(); while (token.sym != sym.EOF) {
switch (token.sym) {case sym.INTLITERAL: System.out.println("INTLITERAL (" + ((IntLitTokenVal)token.value).intVal \+ ")");
break;…
} token = scanner.next_token(); } }
Jlex: a scanner generator
Regular expression rulesregular-expression { action } pattern to be matched code to be executed when
the
pattern is matched
When next_token() method is called, it repeats: Find the longest sequence of characters in the input (starting with
the current character) that matches a pattern. Perform the associated action
until a return in an action is executed.
Matching rules
If several patterns that match the same sequence of characters, then the longest pattern is considered to be matched.
If several patterns that match the same (longest) sequence of characters, then the first such pattern is considered to be matched
so the order of the patterns can be important!
If an input character is not matched in any pattern, the scanner throws an exception
An Example%%
DIGIT= [0-9]
LETTER= [a-zA-Z]
WHITESPACE= [ \t\n] // space, tab, newline
{LETTER}({LETTER}|{DIGIT}*)
{System.out.println(yyline+1
+ ": ID " + yytext());}
{DIGIT}+ {System.out.println(yyline+1 + ": INT");}
"=" {System.out.println(yyline+1 + ": ASSIGN");}
"==" {System.out.println(yyline+1 + ": EQUALS");}
{WHITESPACE}* { }
. {System.out.println(yyline+1 + ": bad char");}
Top Related