Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief...

27
A brief [f] A brief [f] A brief [f] A brief [f]lex lex lex lex tutorial tutorial tutorial tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

Transcript of Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief...

Page 1: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A brief [f]A brief [f]A brief [f]A brief [f]lexlexlexlex tutorialtutorialtutorialtutorial

Saumya DebrayThe University of Arizona

Tucson, AZ 85721

Page 2: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 2

flex (and lex): Overview

Scanner generators:� Helps write programs whose control flow is

directed by instances of regular expressions in the input stream.

Input: a set of regular expressions + actions

Output: C code implementing a scanner:

function: yylexyylexyylexyylex()()()()

file: lexlexlexlex....yyyyyyyy.c.c.c.c

lex (or flex)

Page 3: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 3

Using flex

lex input spec (regexps + actions)

file: lex.yy.c

yylex()

{

}

driver code

compiler

lex

user supplies

main() {…}

orparser() {…}

Page 4: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 4

flex: input format

An input file has the following structure:

definitions %%rules %%user code

optionalrequired

Shortest possible legal flex input:

%%

Page 5: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 5

Definitions

� A series of:� name definitions, each of the form

name definition

e.g.:DIGIT [0-9]CommentStart "/*"ID [a-zA-Z][a-zA-Z0-9]*

� start conditions� stuff to be copied verbatim into the flex output

(e.g., declarations, #includes):– enclosed in %{ … }%, or – indented

Page 6: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 6

Rules

� The rules portion of the input contains a sequence of rules.

� Each rule has the formpattern action

where:

� pattern describes a pattern to be matched on the input

� pattern must be un-indented� action must begin on the same line.

Page 7: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 7

Patterns

� Essentially, extended regular expressions.� Syntax: similar to grep (see man page)� <<EOF>> to match “end of file”� Character classes:

– [:alpha:], [:digit:], [:alnum:], [:space:], etc. (see man page)

� {name} where name was defined earlier.

� “start conditions” can be used to specify that a pattern match only in specific situations.

Page 8: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 8

Example

%{#include <stdio.h>#include <stdlib.h>%}dgt [0-9]%%{dgt}+ return atoi(yytext);%%void main(){

int val, total = 0, n = 0;while ( (val = yylex()) > 0 ) {

total += val;n++;

}if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

Page 9: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 9

Example

%{#include <stdio.h>#include <stdlib.h>%}dgt [0-9]%%{dgt}+ return atoi(yytext);%%void main(){

int val, total = 0, n = 0;while ( (val = yylex()) > 0 ) {

total += val;n++;

}if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

Definition for a digit (could have used builtin definition [:digit:] instead)

Rule to match a number and return its value to the calling routine

Driver code(could instead have been in a separate file)

defin

ition

sru

les

user

cod

e

Page 10: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 10

Example

%{#include <stdio.h>#include <stdlib.h>%}dgt [0-9]%%{dgt}+ return atoi(yytext);%%void main(){

int val, total = 0, n = 0;while ( (val = yylex()) > 0 ) {

total += val;n++;

}if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

defin

ition

sru

les

user

cod

e

defining and using a name

Page 11: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 11

Example

%{#include <stdio.h>#include <stdlib.h>%}dgt [0-9]%%{dgt}+ return atoi(yytext);%%void main(){

int val, total = 0, n = 0;while ( (val = yylex()) > 0 ) {

total += val;n++;

}if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

defin

ition

sru

les

user

cod

e

defining and using a name

char *char *char *char * yytextyytextyytextyytext;;;;a buffer that holds the input characters that actually match the pattern

Page 12: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 12

Example

%{#include <stdio.h>#include <stdlib.h>%}dgt [0-9]%%{dgt}+ return atoi(yytext);%%void main(){

int val, total = 0, n = 0;while ( (val = yylex()) > 0 ) {

total += val;n++;

}if (n > 0) printf(“ave = %d\n”, total/n);

}

A flex program to read a file of (positive) integers and compute the average:

defin

ition

sru

les

user

cod

e

defining and using a name

char *char *char *char * yytextyytextyytextyytext;;;;a buffer that holds the input characters that actually match the pattern

Invoking the scanner: yylexyylexyylexyylex()()()()Each time yylex() is called, the scanner continues processing the input from where it last left off.Returns 0 on end-of-file.

Page 13: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 13

Matching the Input

� When more than one pattern can match the input, the scanner behaves as follows:� the longest match is chosen;� if multiple rules match, the rule listed first in the

flex input file is chosen;� if no rule matches, the default is to copy the next

character to stdout.

� The text that matched (the “token”) is copied to a buffer yytext.

Page 14: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 14

Matching the Input (cont’d)

Pattern to match C-style comments: /* … */

"/*"(.|\n)*"*/"Input:

#include <stdio.h> /* definitions */int main(int argc, char * argv[ ]) {

if (argc <= 1) {printf(“Error!\n”); /* no arguments */

}printf(“%d args given\n”, argc);

return 0;}

Page 15: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 15

Matching the Input (cont’d)

Pattern to match C-style comments: /* … */

"/*"(.|\n)*"*/"Input:

#include <stdio.h> /* definitions */int main(int argc, char * argv[ ]) {

if (argc <= 1) {printf(“Error!\n”); /* no arguments */

}printf(“%d args given\n”, argc);

return 0;}

longest match:

Page 16: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 16

Matching the Input (cont’d)

Pattern to match C-style comments: /* … */

"/*"(.|\n)*"*/"Input:

#include <stdio.h> /* definitions */int main(int argc, char * argv[ ]) {

if (argc <= 1) {printf(“Error!\n”); /* no arguments */

}printf(“%d args given\n”, argc);

return 0;}

longest match:Matched text shown in blue

Page 17: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 17

Start Conditions

� Used to activate rules conditionally.� Any rule prefixed with <S> will be activated only

when the scanner is in start condition S.

� Declaring a start condition S:� in the definition section: %x S

– “%x” specifies “exclusive start conditions”– flex also supports “inclusive start conditions” (“%s”),

see man pages.

� Putting the scanner into start condition S:� action: BEGIN(S)

Page 18: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 18

Start Conditions (cont’d)

� Example:� <STRING>[^"]* { …match string body… }

– [^"] matches any character other than "– The rule is activated only if the scanner is in the start

condition STRING.

� INITIAL refers to the original state where no start conditions are active.

� <*> matches all start conditions.

Page 19: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 19

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*" ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 20: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 20

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 21: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 21

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 22: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 22

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 23: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 23

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 24: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 24

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 25: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 25

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 26: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 26

Using Start Conditions

� Start conditions let us explicitly simulate finite state machines.

� This lets us get around the “longest match” problem for C-style comments.

/ *

non-*

*

*/

non-{ /,* }

flex input:

%x S1, S2, S3%%"/" BEGIN(S1);<S1>"*" BEGIN(S2);<S2>[^*] ; /* stay in S2 */<S2>"*" BEGIN(S3);<S3>"*“ ; /* stay in S3 */<S3>[^*/] BEGIN(S2);<S3>"/" BEGIN(INITIAL);

S1 S2 S3

FSA for C comments:

Page 27: Saumya Debray - wiki.icmc.usp.brwiki.icmc.usp.br/images/4/46/Tutorial_Lex_2.pdfA brief [f]lexA brief [f]lleexxlextutorialtutorial Saumya Debray The University of Arizona Tucson, AZ

A quick tutorial on Lex 27

Putting it all together

� Scanner implemented as a functionint yylex();

� return value indicates type of token found (encoded as a +ve integer);

� the actual string matched is available in yytext.� Scanner and parser need to agree on token type

encodings� let yacc generate the token type encodings

– yacc places these in a file y.tab.h� use “#include y.tab.h” in the definitions section of the flex

input file.� When compiling, link in the flex library using “-lfl”