Lecture 2 Lexical Analysis Topics Sample Simple Compiler Operations on strings Regular expressions...
-
date post
22-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Lecture 2 Lexical Analysis Topics Sample Simple Compiler Operations on strings Regular expressions...
Lecture 2 Lexical Analysis
Lecture 2 Lexical Analysis
Topics Topics Sample Simple Compiler Operations on strings Regular expressions Finite Automata
Readings:Readings:
January 11, 2006
CSCE 531 Compiler Construction
– 2 – CSCE 531 Spring 2006
OverviewOverviewLast TimeLast Time
A little History Compilers vs Interpreter Data-Flow View of Compilers Regular Languages Course Pragmatics
Today’s Lecture Today’s Lecture Why Study Compilers? xx
ReferencesReferences Chapter 2, Chapter 3
Assignment Due Wednesday Jan 18Assignment Due Wednesday Jan 18 3.3a; 3.5a,b; 3.6a,b,c; 3.7a; 3.8b
– 3 – CSCE 531 Spring 2006
A Simple Compiler for ExpressionsA Simple Compiler for Expressions
Chapter Two OverviewChapter Two Overview
Structure of the simple compiler, really just Structure of the simple compiler, really just translator for infix expressions translator for infix expressions postfix postfix
Grammars Grammars
Parse TreesParse Trees
Syntax directed TranslationSyntax directed Translation
Predictive ParsingPredictive Parsing
Translator for Simple ExpressionsTranslator for Simple Expressions Grammar Rewritten grammar (equivalent one better for pred. parsing) Parsing modules fig 2.24 Specification of Translator fig 2.35 Structure of translator fig 2.36
– 4 – CSCE 531 Spring 2006
GrammarsGrammars
Grammar (or a context free grammar more correctly) hasGrammar (or a context free grammar more correctly) has
A set of tokens also known as terminalsA set of tokens also known as terminals
A set of nonterminalsA set of nonterminals
A set of productions of the formA set of productions of the form nonterminal nonterminal sequence of tokens and/or nonterminals sequence of tokens and/or nonterminals
A special nonterminal the start symbol.A special nonterminal the start symbol.
ExampleExample
E E E + E E + E
E E E * E E * E
E E digit digit
– 5 – CSCE 531 Spring 2006
DerivationsDerivations
A derivation is a sequence of rewriting of a string of A derivation is a sequence of rewriting of a string of grammar symbols using the productions in a grammar symbols using the productions in a grammar. grammar.
We use the symbol We use the symbol to denote that one string of to denote that one string of grammar symbols is obtained by rewritting another grammar symbols is obtained by rewritting another using a productionusing a production
XX Y if there is a production N Y if there is a production N ββ where where The nonterminal N occurs in the sequence X of Grammar
symbols And Y is the same as X except β replaces the N
ExampleExample
E E E+E E+E d+E d+E d+ E*E d+ E*E d+ E+E*E d+ E+E*E d+d+E*E d+d+E*E d+d+d*E d+d+d*E d+d+d*d d+d+d*d
– 6 – CSCE 531 Spring 2006
Parse TreesParse Trees
A graphical presentation of a derivation, satisfyingA graphical presentation of a derivation, satisfying
Root is the start symbolRoot is the start symbol
Each leaf is a token or Each leaf is a token or εε (note different font from (note different font from text)text)
Each interior node is a nonterminalEach interior node is a nonterminal
If A is a parent with children XIf A is a parent with children X1 1 , X, X22 … X … Xnn then thenA A X X11XX22 … X … Xnn is a production is a production
– 7 – CSCE 531 Spring 2006
Syntax directed TranslationSyntax directed Translation
Frequently the rewritting by a production will be called a reduction Frequently the rewritting by a production will be called a reduction or reducing by the particular production.or reducing by the particular production.
Syntax directed translation attaches action (code) that are done Syntax directed translation attaches action (code) that are done when the reductions are performedwhen the reductions are performed
ExampleExample
EE E + TE + T {print(‘+’);}{print(‘+’);}
EE E - TE - T {print(‘-’);}{print(‘-’);}
EE TT
T T 00 {print(‘0’);} {print(‘0’);}
T T 11 {print(‘1’);} {print(‘1’);}
……
T T 99 {print(‘9’);} {print(‘9’);}
– 9 – CSCE 531 Spring 2006
Specification of the translatorSpecification of the translatorS S L eof L eof figure 2.38figure 2.38
LL E ; L E ; L
L L ЄЄ
EE T E’T E’
E’E’ + T { print(‘+’); } E’+ T { print(‘+’); } E’
E’E’ - T { print(‘-’); } E’ - T { print(‘-’); } E’
EE ЄЄ
TT F T’F T’
T’T’ * F { print(‘*’); } T’* F { print(‘*’); } T’
T’T’ / F { print(‘/’); } T’ T / F { print(‘/’); } T’ T ЄЄ
FF ( E ) ( E )
FF id id { print(id.lexeme);}{ print(id.lexeme);}
FF num num { print(num.value);}{ print(num.value);}
– 10 – CSCE 531 Spring 2006
Translating to codeTranslating to code
E E T E’T E’
E’ E’ + T { print(‘+’); } E’+ T { print(‘+’); } E’
E’ E’ - T { print(‘-’); } E’ - T { print(‘-’); } E’
E E ЄЄ
Expr()Expr()
{{
int t;int t;
term();term();
while(1)while(1)
switch(lookahead){switch(lookahead){
case ‘+’: case ‘-’:case ‘+’: case ‘-’:
t = lookahead;t = lookahead;
match(lookahead);match(lookahead);term();term();
emit(t, NONE);emit(t, NONE);
continue;continue;
……
– 11 – CSCE 531 Spring 2006
Overview of the Code Figure 2.36Overview of the Code Figure 2.36
/class/csce531-001/class/csce531-001
– 12 – CSCE 531 Spring 2006
Operations on StringsOperations on Strings
A language over an alphabet is a set of strings of A language over an alphabet is a set of strings of characters from the alphabet.characters from the alphabet.
Operations on strings: Operations on strings: let x=x1x2…xn and t=t1t2…tm then
Concatenation: xt =xConcatenation: xt =x11xx22…x…xnntt11tt22…t…tmm
Alternation: x|t = either xAlternation: x|t = either x11xx22…x…xnn or t or t11tt22…t…tmm
– 13 – CSCE 531 Spring 2006
Operations on Sets of StringsOperations on Sets of Strings
Operations on sets of strings: Operations on sets of strings:
For these let S = {sFor these let S = {s11, s, s22, … s, … smm} and R = {r} and R = {r11, r, r22, … r, … rnn}}
Alternation: S | T = S U T = {sAlternation: S | T = S U T = {s11, s, s22, … s, … smm, r, r11, r, r22, … r, … rn n } }
Concatenation: Concatenation:
ST ={st | where s ST ={st | where s ЄЄ S and t S and t ЄЄ T} T}
= { s= { s11rr11, s, s11rr22, … s, … s11rrnn, s, s22rr11, … s, … s22rrnn, … s, … smmrr11, … s, … smmrrnn}}
Power: SPower: S22 = S S, S = S S, S33= S= S22 S, S S, Snn =S =Sn-1n-1 S S
What is SWhat is S00??
Kleene Closure: S* = UKleene Closure: S* = U∞∞i=0i=0 S Sii , note S , note S00 = is in S* = is in S*
– 14 – CSCE 531 Spring 2006
Operations cont. Kleene ClosureOperations cont. Kleene Closure
Powers: Powers: S2 = S S S3= S2 S … Sn =Sn-1 S
What is SWhat is S00??
Kleene Closure: S* = UKleene Closure: S* = U∞∞i=0i=0 S Sii , note S , note S00 = is in S* = is in S*
– 15 – CSCE 531 Spring 2006
Examples of Operations on Sets of StringsExamples of Operations on Sets of Strings
Operations on sets of strings: Operations on sets of strings:
For these let S = {a,b,c} and R = {t,u}For these let S = {a,b,c} and R = {t,u}
Alternation: S | T = S U T = {a,b,c,t,uAlternation: S | T = S U T = {a,b,c,t,u } }
Concatenation: Concatenation:
ST ={st | where s ST ={st | where s ЄЄ S and t S and t ЄЄ T} T}
= { at, au, bt, bu, ct, cu}= { at, au, bt, bu, ct, cu}
Power: SPower: S22 = { aa, ab, ac, ba, bb, bc, ca, cb, cc} = { aa, ab, ac, ba, bb, bc, ca, cb, cc}
SS33= { aaa, aab, aac, … ccc} 27 elements= { aaa, aab, aac, … ccc} 27 elements
Kleene closure: S* = {any string of any length of a’s, Kleene closure: S* = {any string of any length of a’s, b’s and c’s}b’s and c’s}
– 16 – CSCE 531 Spring 2006
Examples of Operations on Sets of StringsExamples of Operations on Sets of Strings
– 17 – CSCE 531 Spring 2006
Regular ExpressionsRegular Expressions
For a given alphabet For a given alphabet ΣΣ the following are regular the following are regular expressions:expressions:
If a If a ЄЄ ΣΣ then a is a regular expression and L(a) = { a } then a is a regular expression and L(a) = { a }
ЄЄ is a regular expression and L( is a regular expression and L(ЄЄ) = { ) = { ЄЄ } }
ΦΦ is a regular expression and L( is a regular expression and L(ΦΦ) = ) = ΦΦ
And if s and t are regular expressions denoting And if s and t are regular expressions denoting languages L(s) and L(t) respectively thenlanguages L(s) and L(t) respectively then st is a regular expression and L(st) = L(s) L(t) s | t is a regular expression and L(s | t) = L(s) U L(t) s* is a regular expression and L(s*) = L(s)*
– 18 – CSCE 531 Spring 2006
Why Regular Expressions?Why Regular Expressions?
We use regular expressions to describe the tokensWe use regular expressions to describe the tokens
Examples:Examples:
Reg expr for C identifiersReg expr for C identifiers C identifiers? Any string of letters, underscores and digits that
start with a letter or underscore
ID reg expr = (letter | underscore) (letter | underscore | digit)*
Or more explicitly
ID reg expr = ( a|b|…|z|_)(a|b|…z|_|0|1…|9)*
– 19 – CSCE 531 Spring 2006
Pop QuizPop QuizGiven r and s are regular expressions thenGiven r and s are regular expressions then
What is rWhat is rЄЄ ? ? r | r | ЄЄ ? ?
Describe the Language denoted by 0*110*Describe the Language denoted by 0*110*
Describe the Language denoted by (0|1)*110*Describe the Language denoted by (0|1)*110*
Give a regular expression for the language of 0’s Give a regular expression for the language of 0’s and 1’s such that end in a 1and 1’s such that end in a 1
Give a regular expression for the language of 0’s Give a regular expression for the language of 0’s and 1’s such that every 0 is followed by a 1and 1’s such that every 0 is followed by a 1
– 20 – CSCE 531 Spring 2006
Recognizers of Regular LanguagesRecognizers of Regular LanguagesTo develop efficient lexical analyzers (scanners) we will To develop efficient lexical analyzers (scanners) we will
rely on a mathematical model called finite automata, rely on a mathematical model called finite automata, similar to the state machines that you have probably similar to the state machines that you have probably seen. In particular we will use deterministic finite seen. In particular we will use deterministic finite automata, DFAs.automata, DFAs.
The construction of a lexical analyzer will then proceed as:The construction of a lexical analyzer will then proceed as:
1.1. Identify all tokensIdentify all tokens
2.2. Develop regular expressions for eachDevelop regular expressions for each
3.3. Convert the regular expressions to finite automataConvert the regular expressions to finite automata
4.4. Use the transition table for the finite automata as the Use the transition table for the finite automata as the basis for the scannerbasis for the scanner
We will actually use the tools lex and/or flex for steps 3 We will actually use the tools lex and/or flex for steps 3 and 4.and 4.
– 21 – CSCE 531 Spring 2006
Transition Diagram for a DFATransition Diagram for a DFA
Start in state sStart in state s00 then if the input is “f” make transition to then if the input is “f” make transition to state sstate s11..
The from state sThe from state s1 1 if the input is “o” make transition to state if the input is “o” make transition to state ss22..
And from state sAnd from state s2 2 if the input is “r” make transition to state if the input is “r” make transition to state ss33..
The double circle denotes an “accepting state” which The double circle denotes an “accepting state” which means we recognized the token.means we recognized the token.
Actually there is a missing state and transitionActually there is a missing state and transition
f o rs0 s1 s2 s3
– 22 – CSCE 531 Spring 2006
Now what about “fort”Now what about “fort”
The string “fort” is an identifier, not the keyword “for” The string “fort” is an identifier, not the keyword “for” followed by “t.”followed by “t.”
Thus we can’t really recognize the token until we see a Thus we can’t really recognize the token until we see a terminator – whitespace or a special symbol ( one terminator – whitespace or a special symbol ( one of ,;(){}[] of ,;(){}[]
– 23 – CSCE 531 Spring 2006
Deterministic Finite AutomataDeterministic Finite Automata
A Deterministic finite automaton (DFA) is a mathematical model that consists of
1. a set of states S
2. a set of input symbols ∑ , the input alphabet
3. a transition function δ: S x ∑ S that for each state and each input maps to the next state
4. a state s0 that is distinguished as the start state
5. a set of states F distinguished as accepting (or final) states
– 24 – CSCE 531 Spring 2006
DFA to recognize keyword “for”DFA to recognize keyword “for”
ΣΣ= {a,b,c …z, A,B,…Z,0,…9,’,’, ‘;’, …}= {a,b,c …z, A,B,…Z,0,…9,’,’, ‘;’, …}
S = {sS = {s00, s, s11, s, s22, s, s3, 3, ssdeaddead}}
ss00, is the start state, is the start state
SSF F = {s= {s33}}
δ given by the table below
ff oo rr OthersOthers
ss00 ss11 ssdeaddead
ss11 ssdeaddead
ss22 ssdeaddead
ss33 ssdeaddead
ssdeaddead ssdeaddead ssdeaddead ssdeaddead ssdeaddead
– 25 – CSCE 531 Spring 2006
Language Accepted by a DFALanguage Accepted by a DFA
A string xA string x00xx11…x…xnn is accepted by a DFA M = ( is accepted by a DFA M = (ΣΣ, S, s, S, s00, , δδ, S, SFF) ) if s if si+1i+1= = δδ(s(sii, x, xii) for i=0,1, …n and s) for i=0,1, …n and sn+1n+1 ЄЄ S SFF
i.e. if xi.e. if x00xx11…x…xn n determines a path through the state diagram determines a path through the state diagram for the DFA that ends in an Accepting State.for the DFA that ends in an Accepting State.
Then the language accepted by the DFA Then the language accepted by the DFA M = ( M = (ΣΣ, S, s, S, s00, , δδ, S, SFF), denoted L(M) is the set of all ), denoted L(M) is the set of all
strings accepted by M.strings accepted by M.
– 27 – CSCE 531 Spring 2006
DFA1.cDFA1.c/*/*
* Deteministic Finite Automata Simulation* Deteministic Finite Automata Simulation
* *
* One line of input is read and then processed character by character.* One line of input is read and then processed character by character.
* Thus '\n' (EOL) is treated as the end of input.* Thus '\n' (EOL) is treated as the end of input.
* The major functions are:* The major functions are:
** delta(s,c) - that implements the tranistion function, anddelta(s,c) - that implements the tranistion function, and
** accept(s) - that tells whether state s is an accepting state or not.accept(s) - that tells whether state s is an accepting state or not.
* The particular DFA recognizes strings of digits that end in 000.* The particular DFA recognizes strings of digits that end in 000.
* The DFA has:* The DFA has:
* * S = {0, 1, 2, 3, DEAD_STATE}S = {0, 1, 2, 3, DEAD_STATE}
* Transitions on 0: S0=>S1, S1=>S2, S2=>S3, S3=>S3* Transitions on 0: S0=>S1, S1=>S2, S2=>S3, S3=>S3
* Transitions on non-zero digits: S0=>S0, S1=>S0, S2=>S0, S3=>S0* Transitions on non-zero digits: S0=>S0, S1=>S0, S2=>S0, S3=>S0
* Transitions on non-digits: Si=> DEAD_STATE* Transitions on non-digits: Si=> DEAD_STATE
**
*/*/
– 28 – CSCE 531 Spring 2006
#include <stdio.h>#include <stdio.h>
#define DEAD_STATE -1#define DEAD_STATE -1
#define ACCEPT 1#define ACCEPT 1
#define DO_NOT 0#define DO_NOT 0
#define EOL '\n'#define EOL '\n'
main(){main(){
int c;int c;
int state;int state;
state = 0;state = 0;
while((c = getchar()) != EOL && state != DEAD_STATE){while((c = getchar()) != EOL && state != DEAD_STATE){
state = delta(state, c);state = delta(state, c);
}}
if(accept(state)){if(accept(state)){
printf("Accept!\n");printf("Accept!\n");
}else{}else{
printf("Do not accept!\n");printf("Do not accept!\n");
}}
}}
– 29 – CSCE 531 Spring 2006
/* DFA Transition function delta *//* DFA Transition function delta */
/* delta(s,c) = transition from state s on input c *//* delta(s,c) = transition from state s on input c */
int delta(int s, int c){int delta(int s, int c){
switch (s){switch (s){
case 0: if (c == '0') return 1;case 0: if (c == '0') return 1;
else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;
else return(DEAD_STATE);else return(DEAD_STATE);
break;break;
case 1: if (c == '0') return 2;case 1: if (c == '0') return 2;
else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;
else return(DEAD_STATE);else return(DEAD_STATE);
break;break;
case 2: if (c == '0') return 3;case 2: if (c == '0') return 3;
else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;
else return(DEAD_STATE);else return(DEAD_STATE);
break;break;
case 3: if (c == '0') return 3;case 3: if (c == '0') return 3;
else if((c > '0') && (c <= '9')) return 0;else if((c > '0') && (c <= '9')) return 0;
else return(DEAD_STATE);else return(DEAD_STATE);
break;break;
case DEAD_STATE: return DEAD_STATE;case DEAD_STATE: return DEAD_STATE;
break;break;
default:default:
printf("Bad State\n");printf("Bad State\n");
return(DEAD_STATE);return(DEAD_STATE);
}}
}}
– 30 – CSCE 531 Spring 2006
int accept(state){int accept(state){
if (state == 3) return ACCEPT;if (state == 3) return ACCEPT;
else return DO_NOT;else return DO_NOT;
}}
– 31 – CSCE 531 Spring 2006
Non-Deterministic Finite AutomataNon-Deterministic Finite Automata
What does deterministic mean?What does deterministic mean?
In a Non-Deterministic Finite Automata (NFA) we relax the In a Non-Deterministic Finite Automata (NFA) we relax the restriction that the transition function restriction that the transition function δ maps every state and maps every state and every element of the alphabet to a unique state, i.e. every element of the alphabet to a unique state, i.e. δ: S x ∑ S
An NFA can: Have multiple transitions from a state for the same input Have Є transitions, where a transition from one state to another can
be accomplished without consuming an input character Not have transitions defined for every state and every input
Note for NFAs Note for NFAs δ: S x ∑ 2S where is the power set of Swhere is the power set of S
– 32 – CSCE 531 Spring 2006
Language Accepted by an NFALanguage Accepted by an NFA
A string xA string x00xx11…x…xnn is accepted by an NFA is accepted by an NFA
M = (M = (ΣΣ, S, s, S, s00, , δδ, S, SFF) if s) if si+1i+1= = δδ(s(sii, x, xii) for i=0,1, …n and ) for i=0,1, …n and ssn+1n+1 ЄЄ S SFF
i.e. if xi.e. if x00xx11…x…xn n can determines a path through the state can determines a path through the state diagram for the NFA that ends in an Accepting State, diagram for the NFA that ends in an Accepting State, taking taking ЄЄ where ever necessary. where ever necessary.
Then the language accepted by the DFA Then the language accepted by the DFA M = ( M = (ΣΣ, S, s, S, s00, , δδ, S, SFF), denoted L(M) is the set of ), denoted L(M) is the set of
all strings accepted by M.all strings accepted by M.
– 34 – CSCE 531 Spring 2006
Thompson ConstructionThompson Construction
For any regular expression R construct an NFA, M, that For any regular expression R construct an NFA, M, that accepts the language denoted by R, i.e., L(M) = L(R).accepts the language denoted by R, i.e., L(M) = L(R).