1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20)...

40
1 Lex

Transcript of 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20)...

Page 1: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

1

Lex

Page 2: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

2

Lex is a lexical analyzer

Var = 12 + 9;if (test > 20) temp = 0;else while (a < 20) temp++;

Lex

Ident: VarInteger: 12Oper: +Integer: 9Semicolumn: ;Keyword: ifParen: (Ident: testOper: >....

Input

Output

Page 3: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

3

For each kind of stringsthere is a regular expression

“if”“then”

“+”“-”“=“

/* operators */

/* keywords */

Lex

Regular expressions

Page 4: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

4

(0|1|2|3|4|5|6|7|8|9)+ /* integers */

/* identifiers */

Lex

Regular expressions

(a|b|..|z|A|B|...|Z)+

Page 5: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

5

integers

[0-9]+(0|1|2|3|4|5|6|7|8|9)+

Page 6: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

6

(a|b|..|z|A|B|...|Z)+ [a-zA-Z]+

identifiers

Page 7: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

7

Each regular expression has an action:

Examples:

\n

Regular expression Action

linenum++

[a-zA-Z]+ printf(“identifier”);

[0-9]+ prinf(“integer”);

Page 8: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

8

Default action: ECHO;

Print the string identifiedto the output

Page 9: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

9

A small program

%%

[a-zA-Z]+ printf(“Identifier\n”);

[0-9]+ prinf(“Integer\n”);

[ \t\n] ; /*skip spaces*/

Page 10: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

10

1234 test

var 566 78

9800

Input Output

IntegerIdentifierIdentifierIntegerIntegerInteger

Page 11: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

11

%%

[a-zA-Z]+ printf(“Identifier\n”);

[0-9]+ prinf(“Integer\n”);

[ \t] ; /*skip spaces*/

. printf(“Error in line: %d\n”, linenum);

Another program%{ int linenum = 1;%}

\n linenum++;

Page 12: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

12

1234 test

var 566 78

9800 +

temp

Input Output

IntegerIdentifierIdentifierIntegerIntegerIntegerError in line 3Identifier

Page 13: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

13

Lex matches the longest input string

“if”“ifend”

Regular Expressions

Input: ifend if ifn

Matches: “ifend” “if” nomatch

Page 14: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

14

Internal Structure of Lex

Lex

Regular expressions

NFA DFAMinimalDFA

The final states of the DFA areassociated with actions

Page 15: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

15

Compilers

Page 16: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

16

Compiler

Program

v = 5;if (v>5) x = 12 + v;while (x !=3) { x = x - 3; v = 10;}......

Add v,v,0cmp v,5jmplt ELSETHEN: add x, 12,vELSE:WHILE:cmp x,3...

Machine Code

Page 17: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

17

Lexicalanalyzer parser

Compiler

program machinecode

Page 18: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

18

Parser knows the grammarof the programming language

Page 19: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

19

Parser

PROGRAM -> STMT_LISTSTMT_LIST -> STMT STMT_LIST | STMT;STMT -> EXPR ; | IF_STMT | WHILE_STMT | { STMT_LIST }

EXPR -> EXPR + EXPR | EXPR - EXPR | IDIF_STMT -> if (EXPR) then STMT | if (EXPR) then STMT else STMTWHILE_STMT-> while (EXPR) do STMT

Page 20: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

20

The parser constructs the derivation for the particular input program

10 + 2 * 5

Parser

E -> E + E | E * E | INT

E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5

input

derivation

Page 21: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

21

10

E

2 5

E => E + E => E + E * E => 10 + E*E => 10 + 2 * E => 10 + 2 * 5

derivation

derivation tree

E E

E E

+

*

Page 22: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

22

10

E

2 5

derivation tree

E E

E E

+

*

mult t1, 10, 5add t2, 10, t1

machine code

Page 23: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

23

Parsing

Page 24: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

24

grammar

Parserinputstring

derivation

Page 25: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

25

Example:

Parserderivation

S

bSAS

aSbS

SSSinput

?aabb

Page 26: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

26

Exhaustive Search

||| bSAaSbSSS

Phase 1:

S

bSaS

aSbS

SSS

aabb

Page 27: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

27

S

bSaS

aSbS

SSS

aabb

Page 28: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

28

||| bSAaSbSSS Phase 2

aSbS

SSS

aabbSSSS

bSaSSSS

aSbSSSS

SSSSSS

aaSbS

bSaSaSbS

aaSbbaSbS

aSSbaSbS

Phase 1

Page 29: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

29

||| bSAaSbSSS Phase 2

aSbS

SSS

aabbSSSS

bSaSSSS

aSbSSSS

SSSSSS

aaSbS

bSaSaSbS

aaSbbaSbS

aSSbaSbS

Phase 1

Page 30: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

30

Phase 2

SSSS

aSbSSSS

SSSSSS

aaSbbaSbS

aSSbaSbS

Phase 3

aabbaaSbbaSbS

Page 31: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

31

Final result of exhaustive search

Parser

derivation

S

bSAS

aSbS

SSSinput

aabb

aabbaaSbbaSbS

(Top-down parsing)

Page 32: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

32

Time complexity of exhaustive search

Suppose there are no productions of the form

A

BA

Number of phases for string : w ||2 w

Page 33: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

33

Time for phase 1: k

k possible derivations

For grammar with rules k

Page 34: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

34

Time for phase 2: 2k

possible derivations2k

Page 35: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

35

Time for phase : ||2 wk

possible derivations||2 wk

||2 w

Page 36: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

36

Total time needed for string :w

||22 wkkk

Extremely bad!!!

Page 37: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

37

There exist faster algorithmsfor specialized grammars

S-grammar: axA

symbol stringof variables

),( aA appears once

Page 38: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

38

S-grammar example:

cS

bSSS

aSS

abccabcSabSSaSS

Each string has a unique derivation

Page 39: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

39

In the exhaustive search parsingthere is only one choice in each phase

For S-grammars:

Total time for parsing string :w ||w

Time for a phase: 1

Page 40: 1 Lex. 2 Lex is a lexical analyzer Var = 12 + 9; if (test > 20) temp = 0; else while (a < 20) temp++; Lex Ident: Var Integer: 12 Oper: + Integer: 9 Semicolumn:

40

For general context-free grammars:

There exists a parsing algorithmthat parses a stringin time

||w3||w