Course Overview

23
1 Course Overview PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion 8 Interpretation 9 Review Supplementary material: Theoretical foundations (Regular expressions)

description

Course Overview. PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II: inside a compiler 4Syntax analysis 5Contextual analysis 6Runtime organization 7Code generation PART III: conclusion - PowerPoint PPT Presentation

Transcript of Course Overview

Page 1: Course Overview

1

Course Overview

PART I: overview material1 Introduction2 Language processors (tombstone diagrams, bootstrapping)3 Architecture of a compiler

PART II: inside a compiler4 Syntax analysis 5 Contextual analysis6 Runtime organization7 Code generation

PART III: conclusion8 Interpretation9 Review

Supplementary material:Theoretical foundations(Regular expressions)

Page 2: Course Overview

2

Regular Expressions

• finite state machine is a good “visual” aid– but it is not very suitable as a specification

(its textual description is too clumsy)• regular expressions are a suitable specification

– a more compact way to define a language that can be accepted by an FSM

• used to give the lexical description of a programming language – define each “token” (keywords, identifiers, literals,

operators, punctuation, etc)– define white-space, comments, etc

• these are not tokens, but must be recognized and ignored

Page 3: Course Overview

3

Example: Pascal identifier

• Lexical specification (in English):– a letter, followed by zero or more letters or

digits• Lexical specification (as a regular

expression): – letter . (letter | digit)* | means "or"

. means "followed by“ (dot may be omitted) * means zero or more instances of ( ) are used for grouping

Page 4: Course Overview

4

Operands of a regular expression

• Operands are same as labels on the edges of an FSM– single characters, or – the special character (the empty string)

• "letter" is a shorthand for – a | b | c | ... | z | A | B | C | ... | Z

• "digit“ is a shorthand for – 0 | 1 | 2 | … | 9

• sometimes we put the characters in quotes – necessary when denoting | . * ( )

Page 5: Course Overview

5

Precedence of | . * operators.

• Consider regular expressions:– letter.letter | digit*– letter.(letter | digit)*

Regular Expression Operator

Analogous Arithmetic Operator

Precedence

| plus lowest. times middle* exponentiatio

nhighest

Page 6: Course Overview

6

TEST YOURSELF

Question 1: Describe (in English) the language defined by each of the following regular expressions:

– letter (letter* | digit*)

– (letter | _ ) (letter | digit | _ )*

– digit* "." digit*

– digit digit* "." digit digit*

Page 7: Course Overview

7

TEST YOURSELF

Question 2: Write a regular expression for each of these languages:

– The set of all C++ reserved words• Examples: if, while, for, class, int, case, char, true, false

– C++ string literals that begin with ” and end with ” and don’t contain any other ” except possibly in the escape sequence \”

• Example: ”The escape sequence \” occurs in this string”– C++ comments that begin with /* and end with */

and don’t contain any other */ within the string• Example: /* This is a comment * still the same comment

*/

Page 8: Course Overview

8

Example: Integer Literals

• An integer literal with an optional sign can be defined in English as: – “(nothing or + or -) followed by one or more

digits”• The corresponding regular expression is:

– (+|-|) (digit.digit*) • A new convenient operator ‘+’

– same precedence as ‘*’– digit digit* is the same as – digit + which means "one or more digits"

Page 9: Course Overview

9

Language Defined by a Regular Expression• Recall: language = set of strings• Language defined by an automaton

– the set of strings accepted by the automaton• Language defined by a regular expression

– the set of strings that match the expression

Regular Exp. Corresponding Set of Strings {""} a {"a"} a.b.c {"abc"} a | b | c {"a", "b", "c"} (a | b | c)* {"", "a", "b", "c", "aa", "ab", ..., "bccabb" ...}

Page 10: Course Overview

10

Concept of Reg Exp Generating a String

Rewrite regular expression until have only a sequence of letters (string) left

Example(0|1)* 2 (0|1)*(0|1) (0|1)* 2 (0|

1)*1 (0|1)* 2 (0|1)*1 2 (0|1)*1 2 (0|1) (0|1)*1 2 (0|1)1 2 0

Replacement Rules

1) r1 | r2 ––> r1

2) r1 | r2 ––> r2

3) r* ––> r r*4) r* ––>

Page 11: Course Overview

11

Non–determinism in Generation

• Different rule applications may yield different final results

Example 1 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 1 (0|1)* 2 (0|1)* 1 2 (0|1)* 1 2 (0|1) (0|1)* 1 2 (0|1) 1 2 0

Example 2 (0|1)* 2 (0|1)* (0|1) (0|1)* 2 (0|1)* 0 (0|1)* 2 (0|1)* 0 2 (0|1)* 0 2 (0|1) (0|1)* 0 2 (0|1) 0 2 1

Page 12: Course Overview

12

Concept of Language Generated by Reg Exp

• Set of all strings generated by a regular expression is the language of the regular expression

• In general, language may be infinite• String generated by regular expression

language is often called a “token”

Page 13: Course Overview

13

Examples of Languages and Reg Exp

= { 0, 1, . }– (0 | 1)+ "." (0 | 1)* | (0 | 1)* "." (0 | 1)+

binary floating point numbers– (0 0)* even-length all-zero strings– 1* (0 1* 0 1*)* binary strings with even

number of zeros

= { a,b,c, 0, 1, 2 }– (a|b|c)(a|b|c|0|1|2)* alphanumeric identifiers– (0|1|2)+ trinary numbers

Page 14: Course Overview

14

Reg Exp Notational Shorthand

• R + one or more strings of R: R(R*)• R? optional R: (R|)• [abcd] one of listed characters: (a|b|c|d)• [a-z] one character from this range: (a|b|

c|d...|z)• [^abc] anything but one of the listed

chars• [^a-z] any one character not from this

range

Page 15: Course Overview

15

Equivalence of FSM and Regular Expressions

• Theorem: – For each finite state machine M, we can

construct a regular expression R such that M and R accept the same language.

– [proof omitted]• Theorem:

– For each regular expression R, we can construct a finite state machine M such that R and M accept the same language.

– [proof outline follows]

Page 16: Course Overview

16

Regular Expressions to NFSM (1)

• For each kind of reg exp, define a NFSM– Notation: NFSM for reg exp M

M

• For

• For input aa

Page 17: Course Overview

17

Regular Expressions to NFSM (2)

• For A . BA B

• For A | B

B

A

Page 18: Course Overview

18

Regular Expressions to NFSM (3)

• For A*

A

Page 19: Course Overview

19

Example of RegExp -> NFSM conversion

• Consider the regular expression(1|0)*1

• The NFSM is

10 1

A BC

D

E

FG H I J

Page 20: Course Overview

20

Converting NFSM to DFSM

• Simulate the NFSM• Each state of DFSM

– is a non-empty subset of states of the NFSM• Start state of DFSM

– is the set of NFSM states reachable from the NFSM start state using only -moves

• Add a transition S a > S’ to DFSM iff– S’ is the set of NFSM states reachable from any

state in S after consuming only the input a, considering -moves as well

Page 21: Course Overview

21

Remarks on converting NFSM to DFSM

• An NFSM may be in many states at any time

• How many different states ?• If there are N states, the NFSM must be in

some subset of those N states• How many subsets are there?• 2N = finitely many• For example, if N = 5 then 2N = 32 subsets

Page 22: Course Overview

22

NFSM -> DFSM Example

10 1

A BC

D

E

FG H I J

ABCDHI

FGHIABCD

EJGHIABCD

0

1

0

10 1

Page 23: Course Overview

23

TEST YOURSELF

Question 3: First convert each of these regular expressions to a NFSM– (a | b | ) (a | b)

– (ab | ba)* (aa | bb)

Question 4: Next convert each resulting NFSM to a DFSM