1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

Post on 12-Jan-2016

214 views 1 download

Transcript of 1 Syntax Specification Regular Expressions. 2 Phases of Compilation.

11

Syntax SpecificationSyntax SpecificationRegular ExpressionsRegular Expressions

22

Phases of CompilationPhases of Compilation

33

Syntax AnalysisSyntax Analysis

• Syntax:Syntax:– Webster’s definition: Webster’s definition: 1 a : the way in which linguistic 1 a : the way in which linguistic

elements (as words) are put together to form constituents elements (as words) are put together to form constituents (as phrases or clauses)(as phrases or clauses)

• The syntax of a programming languageThe syntax of a programming language– Describes its formDescribes its form

» i.e.i.e. Organization of tokens Organization of tokens (elements)(elements)

– Formal notationFormal notation» Context Free Grammars (CFGs)Context Free Grammars (CFGs)

44

Review: Formal definition of tokensReview: Formal definition of tokens

• A set of tokens is a set of strings over an alphabetA set of tokens is a set of strings over an alphabet– {read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}{read, write, +, -, *, /, :=, 1, 2, …, 10, …, 3.45e-3, …}

• A set of tokens is a A set of tokens is a regular setregular set that can be defined by that can be defined by comprehension using a comprehension using a regular expressionregular expression

• For every regular set, there is a For every regular set, there is a deterministic finite deterministic finite automatonautomaton (DFA) that can recognize it (DFA) that can recognize it

– i.e.i.e. determine whether a string belongs to the set or not determine whether a string belongs to the set or not– Scanners extract tokens from source code in the same way Scanners extract tokens from source code in the same way

DFAs determine membershipDFAs determine membership

55

Regular ExpressionsRegular Expressions

• A regular expression (RE) is:A regular expression (RE) is:– A single characterA single character– The empty string, The empty string, – The The concatenationconcatenation of two regular expressions of two regular expressions

» Notation:Notation: RE RE11 RE RE22 ( (i.e. i.e. RERE11 followed by RE followed by RE22))– The The unionunion of two regular expressionsof two regular expressions

» Notation: Notation: RERE11 | RE | RE22

– The The closureclosure of a regular expression of a regular expression» Notation: Notation: RE*RE*» * is known as the * is known as the Kleene starKleene star» * * represents the concatenation of 0 or more stringsrepresents the concatenation of 0 or more strings

– Non-null enumerationNon-null enumeration» Notation: Notation: RERE++

» represents all non-null concatenations of RE (1 or more times)represents all non-null concatenations of RE (1 or more times)

66

Regular Expressions BasicsRegular Expressions Basics

Let alphabet Let alphabet ={a,b} ={a,b} (means a and b are its only letters)(means a and b are its only letters)

aa**=(=(, a, aa, aaa, ...}, a, aa, aaa, ...}

(ab)*=((ab)*=(, ab, abab, ababab, ...}, ab, abab, ababab, ...}

a a b=(a, b=(a, , b, bb, bb, ...}, b, bb, bb, ...}

(a (a b)* b)*== all strings containing a’s and b’s all strings containing a’s and b’s

(a*b*)*=(ab*)*= all strings containing a’s and b’s(a*b*)*=(ab*)*= all strings containing a’s and b’s

a*b*={aa*b*={aiibbjj| i >=0, j>=0)| i >=0, j>=0)

77

Building Regular ExpressionsBuilding Regular Expressions

Regular Expressions as LanguageRegular Expressions as Language

• ** while loopwhile loop–iterates 0 or more timesiterates 0 or more times

• concatenationconcatenation uvuv–sequential; first sequential; first u, u, then then vv

• uu vv OROR–select from one or the other or bothselect from one or the other or both

88

Description Description Regular Expression Regular Expression

Let Let ={a,b} – all expressions over this alphabet={a,b} – all expressions over this alphabet

Strings withStrings with

• exactly one aexactly one a b*ab*b*ab*

• exactly two a’sexactly two a’s b*ab*ab*b*ab*ab*

• one or more a’sone or more a’s (b*ab*)* or (a (b*ab*)* or (a b)*a (a b)*a (a b)* b)*

• even number of a’seven number of a’s (b*ab*ab*)*(b*ab*ab*)*

• even number of a’s and exactly one beven number of a’s and exactly one b

• (aa)*b(aa)* (aa)*b(aa)* (aa)*ab(aa)*a (aa)*ab(aa)*a

• odd number of a’sodd number of a’s (b*ab*ab*)*b*ab*(b*ab*ab*)*b*ab*

• that don’t contain aathat don’t contain aa (b (b ab)*( ab)*( a) a)

99

Regular Expression Regular Expression Description Description

Same alphabetSame alphabet

• (aa)*(aa)* even number of a’seven number of a’s

• (a (a b) (a b) (a b) (a b) (a b) (a b) (a b) b)

all strings of length 4all strings of length 4

• ((a ((a b) (a b) (a b) (a b) (a b) (a b) (a b))* b))*

strings of length divisible by 4strings of length divisible by 4

• (aa)* (aa)* ((a ((a b) (a b) (a b) (a b) (a b) (a b) (a b))* b))*

strings of a’s of length strings of a’s of length divisible by 4divisible by 4

1010

Token Definition ExampleToken Definition Example

• Numeric literals in PascalNumeric literals in Pascal– Definition of the token Definition of the token unsigned_numberunsigned_number

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit* | * | digitdigit++

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( . ( ( . unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Recursion is not allowed in Regular Expressions!Recursion is not allowed in Regular Expressions!

1111

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– Decimal numbersDecimal numbers

number number … …

1212

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– Decimal numbersDecimal numbers

number number ( + | – | ( + | – | ) ) unsigned_integer unsigned_integer ( ( ( ( unsigned_integer unsigned_integer ) | ) | ) )

1313

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– IdentifiersIdentifiers

identifier identifier ……

1414

ExerciseExercise

digit digit 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

unsigned_integer unsigned_integer digitdigit digitdigit**

unsigned_number unsigned_number unsigned_integer unsigned_integer ( ( ( ( .. unsigned_integer unsigned_integer ) | ) | ) )( ( e ( + | – | ( ( e ( + | – | ) ) unsigned_integer unsigned_integer )) | | ) )

• Regular expression forRegular expression for– IdentifiersIdentifiers

identifier identifier letter letter ( ( letterletter | digit | | digit | )* )*

letter letter a | b | c | … | z a | b | c | … | z