Course Overview
description
Transcript of Course Overview
1
Course Overview
PART I: overview material1 Introduction2 Language processors (tombstone diagrams, bootstrapping)3 Architecture of a compiler
PART II: inside a compiler4 Syntax analysis 5 Contextual analysis6 Runtime organization7 Code generation
PART III: conclusion8 Interpretation9 Review
Supplementary material:Theoretical foundations(Regular expressions)
2
Regular Expressions
• finite state machine is a good “visual” aid– but it is not very suitable as a specification
(its textual description is too clumsy)• regular expressions are a suitable specification
– a more compact way to define a language that can be accepted by an FSM
• used to give the lexical description of a programming language – define each “token” (keywords, identifiers, literals,
operators, punctuation, etc)– define white-space, comments, etc
• these are not tokens, but must be recognized and ignored
3
Example: Pascal identifier
• Lexical specification (in English):– a letter, followed by zero or more letters or
digits• Lexical specification (as a regular
expression): – letter . (letter | digit)* | means "or"
. means "followed by“ (dot may be omitted) * means zero or more instances of ( ) are used for grouping
4
Operands of a regular expression
• Operands are same as labels on the edges of an FSM– single characters, or – the special character (the empty string)
• "letter" is a shorthand for – a | b | c | ... | z | A | B | C | ... | Z
• "digit“ is a shorthand for – 0 | 1 | 2 | … | 9
• sometimes we put the characters in quotes – necessary when denoting | . * ( )
5
Precedence of | . * operators.
• Consider regular expressions:– letter.letter | digit*– letter.(letter | digit)*
Regular Expression Operator
Analogous Arithmetic Operator
Precedence
| plus lowest. times middle* exponentiatio
nhighest
6
TEST YOURSELF
Question 1: Describe (in English) the language defined by each of the following regular expressions:
– letter (letter* | digit*)
– (letter | _ ) (letter | digit | _ )*
– digit* "." digit*
– digit digit* "." digit digit*
7
TEST YOURSELF
Question 2: Write a regular expression for each of these languages:
– The set of all C++ reserved words• Examples: if, while, for, class, int, case, char, true, false
– C++ string literals that begin with ” and end with ” and don’t contain any other ” except possibly in the escape sequence \”
• Example: ”The escape sequence \” occurs in this string”– C++ comments that begin with /* and end with */
and don’t contain any other */ within the string• Example: /* This is a comment * still the same comment
*/
8
Example: Integer Literals
• An integer literal with an optional sign can be defined in English as: – “(nothing or + or -) followed by one or more
digits”• The corresponding regular expression is:
– (+|-|) (digit.digit*) • A new convenient operator ‘+’
– same precedence as ‘*’– digit digit* is the same as – digit + which means "one or more digits"
9
Language Defined by a Regular Expression• Recall: language = set of strings• Language defined by an automaton
– the set of strings accepted by the automaton• Language defined by a regular expression
– the set of strings that match the expression
Regular Exp. Corresponding Set of Strings {""} a {"a"} a.b.c {"abc"} a | b | c {"a", "b", "c"} (a | b | c)* {"", "a", "b", "c", "aa", "ab", ..., "bccabb" ...}
10
Concept of Reg Exp Generating a String
Rewrite regular expression until have only a sequence of letters (string) left
Example(0|1)* 2 (0|1)*(0|1) (0|1)* 2 (0|
1)*1 (0|1)* 2 (0|1)*1 2 (0|1)*1 2 (0|1) (0|1)*1 2 (0|1)1 2 0
Replacement Rules
1) r1 | r2 ––> r1
2) r1 | r2 ––> r2
3) r* ––> r r*4) r* ––>
11
Non–determinism in Generation
• Different rule applications may yield different final results
Example 1 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 1 (0|1)* 2 (0|1)* 1 2 (0|1)* 1 2 (0|1) (0|1)* 1 2 (0|1) 1 2 0
Example 2 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 0 (0|1)* 2 (0|1)* 0 2 (0|1)* 0 2 (0|1) (0|1)* 0 2 (0|1) 0 2 1
12
Concept of Language Generated by Reg Exp
• Set of all strings generated by a regular expression is the language of the regular expression
• In general, language may be infinite• String generated by regular expression
language is often called a “token”
13
Examples of Languages and Reg Exp
= { 0, 1, . }– (0 | 1)+ "." (0 | 1)* | (0 | 1)* "." (0 | 1)+
binary floating point numbers– (0 0)* even-length all-zero strings– 1* (0 1* 0 1*)* binary strings with even
number of zeros
= { a,b,c, 0, 1, 2 }– (a|b|c)(a|b|c|0|1|2)* alphanumeric identifiers– (0|1|2)+ trinary numbers
14
Reg Exp Notational Shorthand
• R + one or more strings of R: R(R*)• R? optional R: (R|)• [abcd] one of listed characters: (a|b|c|d)• [a-z] one character from this range: (a|b|
c|d...|z)• [^abc] anything but one of the listed
chars• [^a-z] any one character not from this
range
15
Equivalence of FSM and Regular Expressions
• Theorem: – For each finite state machine M, we can
construct a regular expression R such that M and R accept the same language.
– [proof omitted]• Theorem:
– For each regular expression R, we can construct a finite state machine M such that R and M accept the same language.
– [proof outline follows]
16
Regular Expressions to NFSM (1)
• For each kind of reg exp, define a NFSM– Notation: NFSM for reg exp M
M
• For
• For input aa
17
Regular Expressions to NFSM (2)
• For A . BA B
• For A | B
B
A
18
Regular Expressions to NFSM (3)
• For A*
A
19
Example of RegExp -> NFSM conversion
• Consider the regular expression(1|0)*1
• The NFSM is
10 1
A BC
D
E
FG H I J
20
Converting NFSM to DFSM
• Simulate the NFSM• Each state of DFSM
– is a non-empty subset of states of the NFSM• Start state of DFSM
– is the set of NFSM states reachable from the NFSM start state using only -moves
• Add a transition S a > S’ to DFSM iff– S’ is the set of NFSM states reachable from any
state in S after consuming only the input a, considering -moves as well
21
Remarks on converting NFSM to DFSM
• An NFSM may be in many states at any time
• How many different states ?• If there are N states, the NFSM must be in
some subset of those N states• How many subsets are there?• 2N = finitely many• For example, if N = 5 then 2N = 32 subsets
22
NFSM -> DFSM Example
10 1
A BC
D
E
FG H I J
ABCDHI
FGHIABCD
EJGHIABCD
0
1
0
10 1
23
TEST YOURSELF
Question 3: First convert each of these regular expressions to a NFSM– (a | b | ) (a | b)
– (ab | ba)* (aa | bb)
Question 4: Next convert each resulting NFSM to a DFSM