1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

14
1 Syntax Specification Syntax Specification Regular Expressions Regular Expressions

Transcript of 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

Page 1: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

11

Syntax SpecificationSyntax SpecificationRegular ExpressionsRegular Expressions

Page 2: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

22

Phases of CompilationPhases of Compilation

Page 3: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

33

Syntax AnalysisSyntax Analysis

• Syntax:Syntax:– Webster’s definition: Webster’s definition: 1 a : the way in which linguistic 1 a : the way in which linguistic

elements (as words) are put together to form constituents elements (as words) are put together to form constituents (as phrases or clauses)(as phrases or clauses)

• The syntax of a programming languageThe syntax of a programming language– Describes its formDescribes its form

» i.e.i.e. Organization of tokens Organization of tokens (elements)(elements)

– Formal notationFormal notation» Context Free Grammars (CFGs)Context Free Grammars (CFGs)

Page 4: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

44

Review: Formal definition of tokensReview: Formal definition of tokens

• A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet– {read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}

• A set of tokens is a A set of tokens is a regular setregular set that can be defined by that can be defined by comprehension using a comprehension using a regular expressionregular expression

• For every regular set, there is a For every regular set, there is a deterministic finite deterministic finite automatonautomaton (DFA) that can recognize it (DFA) that can recognize it

– i.e.i.e. determine whether a string belongs to the set or not determine whether a string belongs to the set or not– Scanners extract tokens from source code in the same way Scanners extract tokens from source code in the same way

DFAs determine membershipDFAs determine membership

Page 5: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

55

Regular ExpressionsRegular Expressions

• A regular expression (RE) is:A regular expression (RE) is:– A single characterA single character– The empty string, The empty string, – The The concatenationconcatenation of two regular expressions of two regular expressions

» Notation:Notation: RE RE11 RE RE22 ( (i.e. i.e. RERE11 followed by RE followed by RE22))– The The unionunion of two regular expressionsof two regular expressions

» Notation: Notation: RERE11 | RE | RE22

– The The closureclosure of a regular expression of a regular expression» Notation: Notation: RE*RE*» * is known as the * is known as the Kleene starKleene star» * * represents the concatenation of 0 or more stringsrepresents the concatenation of 0 or more strings

– Non-null enumerationNon-null enumeration» Notation: Notation: RERE++

» represents all non-null concatenations of RE (1 or more times)represents all non-null concatenations of RE (1 or more times)

Page 6: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

66

Regular Expressions BasicsRegular Expressions Basics

Let alphabet Let alphabet ={a,b} ={a,b} (means a and b are its only letters)(means a and b are its only letters)

aa**=(=(, a, aa, aaa, ...}, a, aa, aaa, ...}

(ab)*=((ab)*=(, ab, abab, ababab, ...}, ab, abab, ababab, ...}

a a b=(a, b=(a, , b, bb, bb, ...}, b, bb, bb, ...}

(a (a b)* b)*== all strings containing a’s and b’s all strings containing a’s and b’s

(a*b*)*=(ab*)*= all strings containing a’s and b’s(a*b*)*=(ab*)*= all strings containing a’s and b’s

a*b*={aa*b*={aiibbjj| i >=0, j>=0)| i >=0, j>=0)

Page 7: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

77

Building Regular ExpressionsBuilding Regular Expressions

Regular Expressions as LanguageRegular Expressions as Language

• ** while loopwhile loop–iterates 0 or more timesiterates 0 or more times

• concatenationconcatenation uvuv–sequential; first sequential; first u, u, then then vv

• uu vv OROR–select from one or the other or bothselect from one or the other or both

Page 8: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

88

Description Description Regular Expression Regular Expression

Let Let ={a,b} – all expressions over this alphabet={a,b} – all expressions over this alphabet

Strings withStrings with

• exactly one aexactly one a b*ab*b*ab*

• exactly two a’sexactly two a’s b*ab*ab*b*ab*ab*

• one or more a’sone or more a’s (b*ab*)* or (a (b*ab*)* or (a b)*a (a b)*a (a b)* b)*

• even number of a’seven number of a’s (b*ab*ab*)*(b*ab*ab*)*

• even number of a’s and exactly one beven number of a’s and exactly one b

• (aa)*b(aa)* (aa)*b(aa)* (aa)*ab(aa)*a (aa)*ab(aa)*a

• odd number of a’sodd number of a’s (b*ab*ab*)*b*ab*(b*ab*ab*)*b*ab*

• that don’t contain aathat don’t contain aa (b (b ab)*( ab)*( a) a)

Page 9: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

99

Regular Expression Regular Expression Description Description

Same alphabetSame alphabet

• (aa)*(aa)* even number of a’seven number of a’s

• (a (a b) (a b) (a b) (a b) (a b) (a b) (a b) b)

all strings of length 4all strings of length 4

• ((a ((a b) (a b) (a b) (a b) (a b) (a b) (a b))* b))*

strings of length divisible by 4strings of length divisible by 4

• (aa)* (aa)* ((a ((a b) (a b) (a b) (a b) (a b) (a b) (a b))* b))*

strings of a’s of length strings of a’s of length divisible by 4divisible by 4

Page 10: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

1010

Token Definition ExampleToken Definition Example

• Numeric literals in PascalNumeric literals in Pascal– Definition of the token Definition of the token unsigned_numberunsigned_number

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit* | * | digitdigit++

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( . ( ( . unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Recursion is not allowed in Regular Expressions!Recursion is not allowed in Regular Expressions!

Page 11: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

1111

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– Decimal numbersDecimal numbers

number number … …

Page 12: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

1212

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– Decimal numbersDecimal numbers

number number ( + | – | ( + | – | ) ) unsigned_integer unsigned_integer ( ( ( ( unsigned_integer unsigned_integer ) | ) | ) )

Page 13: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

1313

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– IdentifiersIdentifiers

identifier identifier ……

Page 14: 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

1414

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– IdentifiersIdentifiers

identifier identifier letter letter ( ( letterletter | digit | | digit | )* )*

letter letter a | b | c | … | z a | b | c | … | z