Chapter 4 Syntactic Analysis

95
Fall 2013

description

Chapter 4 Syntactic Analysis. Fall 2013. Syntactic Analysis. Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle Compiler. Structure of a Compiler. Lexical Analyzer. Source code. Symbol Table. - PowerPoint PPT Presentation

Transcript of Chapter 4 Syntactic Analysis

Page 1: Chapter 4 Syntactic Analysis

Fall 2013

Page 2: Chapter 4 Syntactic Analysis

Chart 2

Sub-phases of Syntactic Analysis Grammars Revisited Parsing Abstract Syntax Trees Scanning Case Study: Syntactic Analysis in the Triangle

Compiler

Page 3: Chapter 4 Syntactic Analysis

Chart 3

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Page 4: Chapter 4 Syntactic Analysis

Chart 4

Main functiono Parse source program to discover its phrase structureo Recursive-descent parsingo Constructing an ASTo Scanning to group characters into tokens

Page 5: Chapter 4 Syntactic Analysis

Chart 5

Scanning (or lexical analysis)o Source program transformed to a stream of tokens

• Identifiers• Literals• Operators• Keywords• Punctuation

o Comments and blank spaces discarded Parsing

o To determine the source programs phrase structureo Source program is input as a stream of tokens (from the

Scanner)o Treats each token as a terminal symbol

Representation of phrase structureo AST

Page 6: Chapter 4 Syntactic Analysis

Chart 6

Scan the file character by character and group characters into words and punctuation (tokens), remove white space and comments

Tokens for this example:

letvary:Integeriny:=y+1

let var y: Integerin !new year y := y+1

Note: !new year does not appear in list of tokens. Comments are removed along with white spaces.

Page 7: Chapter 4 Syntactic Analysis

Chart 7

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t v a r y : I n t e g e r i n . . . .

= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.

:=

becomes

y

Ident.

+

op.

1

Intlit.

eot

Page 8: Chapter 4 Syntactic Analysis

Chart 8

// literals, identifiers, operators... INTLITERAL = 0, "<int>", CHARLITERAL = 1, "<char>", IDENTIFIER = 2,

"<identifier>", OPERATOR = 3, "<operator>",

// reserved words - must be in alphabetical order...

ARRAY = 4, "array", BEGIN = 5, "begin", CONST = 6, "const", DO = 7, "do", ELSE = 8, "else", END = 9, "end", FUNC = 10, "func", IF = 11, "if", IN = 12,

"in", LET = 13, "let", OF = 14, "of", PROC = 15, "proc", RECORD = 16, "record", THEN = 17, "then", TYPE = 18, "type", VAR = 19, "var", WHILE = 20, "while",

// punctuation... DOT = 21, ".", COLON = 22, ":", SEMICOLON = 23, ";", COMMA = 24, ",", BECOMES = 25, "~", IS = 26,

// brackets... LPAREN = 27, "(", RPAREN = 28, ")", LBRACKET = 29, [", RBRACKET = 30, "]", LCURLY = 31, "{", RCURLY = 32, "}",

// special tokens... EOT = 33, "", ERROR = 34; "<error>"

Page 9: Chapter 4 Syntactic Analysis

Chart 9

Context free grammarso Generates a set of sentenceso Each sentence is a string of terminal symbolso An unambiguous sentence has a unique phrase

structure embodied in its syntax tree Develop parsers from context-free grammars

Page 10: Chapter 4 Syntactic Analysis

Chart 10

A regular expression (RE) is a convenient notation for expressing a set of stings of terminal symbols

Main featureso ‘|’ separates alternativeso ‘*’ indicates that the previous item may be represented

zero or more timeso ‘(‘ and ‘)’ are grouping parentheses

The empty string -- a special string of length 0

Page 11: Chapter 4 Syntactic Analysis

Chart 11

Algebraic Propertieso | is commutative and associative

• r|s = s|r• r|(s|t) = (r|s)|t

o Concatenation is associative• (rs)t = r(st)

o Concatenation distributes over |• r(s|t) = rs|rt• (s|t)r = sr|tr

– is the identity for concatenation• r = r

• r = ro * is idempotent

• r** = r*• r* = (r| )*

Page 12: Chapter 4 Syntactic Analysis

Chart 12

Common Extensionso r+ one or more of expression r, same as rr*o rk k repetitions of r

• r3 = rrro ~r the characters not in the expression r

• ~[\t\n]o r-z range of characters

• [0-9a-z]o r? Zero or one copy of expression (used for fields of

an expression that are optional)

Page 13: Chapter 4 Syntactic Analysis

Chart 13

Regular Expression for Representing Monthso Examples of legal inputs

• January represented as 1 or 01• October represented as 10

o First Try: [0|1|][0-9] 0, 1, or followed by a number between 0 and 9

• Matches all legal inputs? Yes1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

• Matches any illegal inputs? Yes0, 00, 18

Page 14: Chapter 4 Syntactic Analysis

Chart 14

Regular Expression for Representing Monthso Examples of legal inputs

• January represented as 1 or 01• October represented as 10

o Second Try: [1-9]|(0[1-9])|(1[0-2])• Any number between 1 and 9 or 0 followed by any number

between 1 and 9 or 1 followed by any number between 0 and 2

• Matches all legal inputs? Yes1, 2, 3, …, 10, 11, 12, 01, 02, …, 09

• Matches any illegal inputs? No

Page 15: Chapter 4 Syntactic Analysis

Chart 15

Regular Expression for Floating Point Numberso Examples of legal inputs

• 1.0, 0.2, 3.14159, -1.0, 2.7e8, 1.0E-6, -2.5e+5• Assume that a 0 is required before numbers less than 1 and

does not prevent extra leading zeros, so numbers such as 0011 or 0003.14159 are legal

o Building the regular expression• Assume

digit 0|1|2|3|4|5|6|7|8|9

• Handle simple decimals such as 1.0, 0.2, 3.14159digit+.digit+ 1 or more digits followed by . followed by 1 or more

decimals

• Add an optional sign (only minus, no plus)

(-| )digit+.digit+ or -?digit+.digit+

Page 16: Chapter 4 Syntactic Analysis

Chart 16

Regular Expression for Floating Point Numbers (cont.)o Building the regular expression (cont.)

• Format for the exponent(E|e)(+|-)?(digit+)

• Adding it as an optional expression to the decimal part

(-| )digit+.digit+((E|e)(+|-)?(digit+))?

Page 17: Chapter 4 Syntactic Analysis

Chart 17

Extended BNF (EBNF)o Combination of BNF and REo N::=X, where N is a nonterminal symbol and X is an

extended RE, i.e., an RE constructed from both terminal and nonterminal symbols

o EBNF• Right hand side may use |. *, (, )• Right hand side may contain both terminal and nonterminal

symbols

Page 18: Chapter 4 Syntactic Analysis

Chart 18

Expression::= primary-Expression (Operator primary-Expression)*

primary-Expression ::= Identifier| ( Expression )

Identifier ::= a|b|c|d|e

Operator ::= +|-|*|/

Generatesea + ba – b – ca + (b * c)a + (b + c) / da – (b – (c – (d – e)))

Page 19: Chapter 4 Syntactic Analysis

Chart 19

Left FactorizationXY | XZ is equivalent to X(Y | Z)

single-Command ::= V-name := Expression| if Expression then single-

Command| if Expression then single-

Commandelse single-Command

single-Command ::= V-name := Expression| if Expression then single-

Command( |else single-Command)

Page 20: Chapter 4 Syntactic Analysis

Chart 20

Elimination of left recursionN::= X | NY is equivalent to N::=X(Y)*

Identifier ::= Letter| Identifier Letter| Identifier Digit

Identifier ::= Letter| Identifier (Letter | Digit)

Identifier ::= Letter(Letter | Digit)*

Page 21: Chapter 4 Syntactic Analysis

Chart 21

Substitution of nonterminal symbolsGiven N::=X, we can substitute each occurrence of N with X

iff N::=X is nonrecursive and is the only production rule for N

single-Command ::= for Control-Variable := Expression To-or-DowntoExpression do single-Command

| …Control-Variable ::= IdentifierTo-or-Downto ::= to

| down

single-Command ::= for Identifier := Expression (to|downto)Expression do single-Command

| …

Page 22: Chapter 4 Syntactic Analysis

Chart 22

Starter set of an RE Xo Starters[[X]]o Set of terminal symbols that can start a string generated

by X Examples

o Starter[[his | her | its]] = {h, i}o Starter[[(re)* set]] = {r, s}

Page 23: Chapter 4 Syntactic Analysis

Chart 23

Precise and complete definition of starters:

starters[[starters[[t]] = {t} where t is a terminal symbol

starters[[X Y]] = starters[[X]] starters[[Y]] if X generates starters[[X Y]] = starters[[X]] if X does not

generate starters[[X | Y]] = starters[[X]] starters[[Y]]starters[[X *]] = starters[[X]]

To generalize for a starter set of an extended RE addo starters[[N]] = starters[[X]] where N is a

nonterminal symbol defined production rule N ::= X

Page 24: Chapter 4 Syntactic Analysis

Chart 24

Expression ::= primary-Expression (Operator primary-Expression)*primary-Expression ::= Identifier

| ( Expression )Identifier ::= a|b|c|d|eOperator ::= +|-|*|/

starters[[Expression]] = starters[[primary-Expression (Operator primary-Expression)*]]

= starters[[primany-Expression]] = starters[[Identifier]] starters[[ (Expressions ) ]] = starters[[a | b | c | d | e]] { ( } = {a, b, c, d, e, (}

Page 25: Chapter 4 Syntactic Analysis

Chart 25

The purpose of scanning is to recognize tokens in the source program. Or, to group input characters (the source program text) into tokens.

Difference between parsing and scanning:o Parsing groups terminal symbols, which are tokens, into

larger phrases such as expressions and commands and analyzes the tokens for correctness and structure

o Scanning groups individual characters into tokens

Page 26: Chapter 4 Syntactic Analysis

Chart 26

Lexical Analyzer

Parser & Semantic Analyzer

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Page 27: Chapter 4 Syntactic Analysis

Chart 27

let var y: Integerin !new year y := y+1

InputConverter

Buffer

Scanner

l e t v a r y : I n t e g e r i n . . . .

= space)character string

let

let

var

var

y

Ident.

:

colonInteger

Ident.

in

in

y

Ident.

:=

becomes

y

Ident.

+

op.

1

Intlit.

eot

Page 28: Chapter 4 Syntactic Analysis

Chart 28

Handle keywords (reserve words)o Recognizes identifiers and keywordso Match explicitly

• Write regular expression for each keyword• Identifier is any alpha numeric string which is not a keyword

o Match as an identifier, perform lookup• No special regular expressions for keywords• When an identifier is found, perform lookup into preloaded

keyword table

How does Triangle handle keywords?Discuss in terms of efficiency and ease to code.

Page 29: Chapter 4 Syntactic Analysis

Chart 29

Remove white spaceo Tabs, spaces, new lines

Remove commentso Single line

-- Ada commento Multi-line, start and end delimiters

{ Pascal comment }/* c comment */

o Nestedo Runaway comments

• Nonterminated comments can’t be detected till end of file

Page 30: Chapter 4 Syntactic Analysis

Chart 30

Perform look aheado Multi-character tokens

1..10 vs. 1.10&, &&<, <=etc

Challenging input languageso FORTRAN

• Keywords not reserved• Blanks are not a delimiter• Example (comma vs. decimal)

DO10I=1,5 start of a do loop (equivalent to a C for loop)DO10I=1.5 an assignment statement, assignment to variable DO10I

Page 31: Chapter 4 Syntactic Analysis

Chart 31

Challenging input languages (cont.)o PL/I, keywords not reserved

IF THEN THEN THEN = ELSE; ELSE ELSE = THEN;

Page 32: Chapter 4 Syntactic Analysis

Chart 32

Error Handlingo Error token passed to parser which reports the erroro Recovery

• Delete characters from current token which have been read so far, restart scanning at next unread character

• Delete the first character of the current lexeme and resume scanning from next character.

o Examples of lexical errors:• 3.25e bad format for a constant• Var#1 illegal character

o Some errors that are not lexical errors• Mistyped keywords

• Begim• Mismatched parenthesis• Undeclared variables

Page 33: Chapter 4 Syntactic Analysis

Chart 33

Issueso Simpler design – parser doesn’t have to worry about

white space, etc.o Improve compiler efficiency – allows the construction of

a specialized and potentially more efficient processoro Compiler portability is enhanced – input alphabet

peculiarities and other device-specific anomalies can be restricted to the scanner

Page 34: Chapter 4 Syntactic Analysis

Chart 34

What are the keywords in Triangle? How are keywords and identifiers implemented in

Triangles? Is look ahead implemented in Triangle?

o If so, how?

Page 35: Chapter 4 Syntactic Analysis

Chart 35

Lexical Analyzer

Parser

Intermediate Code Generation

Optimization

Assembly Code Generation

Symbol Table

Source code

Assembly code

tokens

parse tree

intermediate representation

intermediate representation

Semantic Analyzer

Page 36: Chapter 4 Syntactic Analysis

Chart 36

Given an unambiguous, context free grammar, parsing iso Recognition of an input string, i.e., deciding whether or

not the input string is a sentence of the grammaro Parsing of an input string, i.e., recognition of the input

string plus determination of its phrase structure. The phrase structure can be represented by a syntax tree, or otherwise.

Unambiguous is necessary so that every sentence of the grammar will form exactly one syntax tree.

Page 37: Chapter 4 Syntactic Analysis

Chart 37

The syntax of programming language constructs are described by context-free grammars.

Advantages of unambiguous, context-free grammarso A precise, yet easy-to understand, syntactic

specification of the programming languageo For certain classes of grammars we can automatically

construct an efficient parser that determines if a source program is syntactically well formed.

o Imparts a structure to a programming language that is useful for the translation of source programs into correct object code and for the detection of errors.

o Easier to add new constructs to the language if the implementation is based on a grammatical description of the language

Page 38: Chapter 4 Syntactic Analysis

Chart 38

Check the syntax (structure) of a program and create a tree representation of the program

Programming languages have non-regular constructso Nestingo Recursion

Context-free grammars are used to express the syntax for programming languages

sequence of tokens parser syntax tree

Page 39: Chapter 4 Syntactic Analysis

Chart 39

Comprised ofo A set of tokens or terminal symbolso A set of non-terminal symbolso A set of rules or productions which express the legal

relationships between symbolso A start or goal symbol

Example:1. expr expr – digit2. expr expr + digit3. expr digit4. digit 0|1|2|…|9

Tokens: -,+,0,1,2,…,9 Non-terminals: expr, digit Start symbol: expr

Page 40: Chapter 4 Syntactic Analysis

Chart 40

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr

expr

expr digit

digit

digit

3

2

8

+

-

Page 41: Chapter 4 Syntactic Analysis

Chart 41

Given a grammar for a language and a program, how do you know if the syntax of the program is legal?

A legal program can be derived from the start symbol of the grammar

Grammar must be unambiguous and context-free

Page 42: Chapter 4 Syntactic Analysis

Chart 42

The derivation begins with the start symbol At each step of a derivation the right hand side of a

grammar rule is used to replace a non-terminal symbol

Continue replacing non-terminals until only terminal symbols remain

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr expr – digit expr – 2 expr + digit - 2Rule 1 Rule 4 Rule 2

expr + 8-2 digit + 8-2 3+8 -2Rule 4 Rule 3 Rule 4

Page 43: Chapter 4 Syntactic Analysis

Chart 43

The rightmost non-terminal is replaced in each step

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

expr + digit - 2 expr + 8-2

expr + 8-2 digit + 8-2Rule 3

expr expr – digitRule 1

expr – digit expr – 2Rule 4

expr – 2 expr + digit - 2Rule 2

Rule 4

digit + 8-2 3+8 -2Rule 4

Page 44: Chapter 4 Syntactic Analysis

Chart 44

The leftmost non-terminal is replaced in each step

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4

expr expr – digitRule 1

expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

Page 45: Chapter 4 Syntactic Analysis

Chart 45

The leftmost non-terminal is replaced in each step

digit + digit – digit 3 + digit – digit

3 + digit – digit 3 + 8 – digitRule 4

expr expr – digitRule 1

expr – digit expr + digit – digitRule 2

expr + digit – digit digit + digit – digitRule 3

Rule 4

3 + 8 – digit 3 + 8 – 2Rule 4

expr

expr

expr digit

digit

digit

3

2

8

+

-

33

22

11

44

55

66

11

22

33

44

55

66

Page 46: Chapter 4 Syntactic Analysis

Chart 46

Parser examines terminal symbols of the input string, in order from left to right

Reconstructs the syntax tree from the bottom (terminal nodes) up (toward the root node)

Bottom-up parsing reduces a string w to the start symbol of the grammar.o At each reduction step a particular sub-string matching

the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

Page 47: Chapter 4 Syntactic Analysis

Chart 47

Types of bottom-up parsing algorithmso Shift-reduce parsing

• At each reduction step a particular sub-string matching the right side of a production is replaced by the symbol on the left of that production, and if the sub-string is chosen correctly at each step, a rightmost derivation is traced out in reverse.

o LR(k) parsing• L is for left-to-right scanning of the input, the R is for

constructing a right-most derivation in reverse, and the k is for the number of input symbols of look-ahead that are used in making parsing decisions.

Page 48: Chapter 4 Syntactic Analysis

Chart 48

1. expr expr – digit

2. expr expr + digit

3. expr digit

4. digit 0|1|2|…|9

Example input: 3 + 8 - 2

3 + 8 - 2

3 + 8 - 2

digit

3 + 8 - 2

digitdigit

3 + 8 - 2

digitdigit

expr

Page 49: Chapter 4 Syntactic Analysis

Chart 49

3 + 8 - 2

digitdigit

expr

3 + 8 - 2

digitdigit

exprdigit

expr

3 + 8 - 2

digitdigit

exprdigit

Page 50: Chapter 4 Syntactic Analysis

Chart 50

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

a b b c d e

A

Abbcde aAbcde

a b b c d e

A

aAbcde

Page 51: Chapter 4 Syntactic Analysis

Chart 51

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

A

A

aAde

a b b c d e

A

A

aAbcde aAde

Page 52: Chapter 4 Syntactic Analysis

Chart 52

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

A

A

aAde aABe

B

a b b c d e

A

A

aABe

B

Page 53: Chapter 4 Syntactic Analysis

Chart 53

1. S aABe

2. A Abc | b

3. B d

Example input: abbcde

a b b c d e

A

A

aABe S

B

S

Page 54: Chapter 4 Syntactic Analysis

Chart 54

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat .

the cat sees a rat

Noun

the cat sees a rat. the Noun sees a rat.

.

the cat sees a rat

Noun

the Noun sees a rat.

.

Page 55: Chapter 4 Syntactic Analysis

Chart 55

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject sees a rat.

Subject

.

the cat sees a rat

Noun

the Noun sees a rat. Subject sees a rat.

Subject

.

Page 56: Chapter 4 Syntactic Analysis

Chart 56

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject sees a rat. Subject Verb a rat.

Subject

Verb

.

the cat sees a rat

Noun

Subject Verb a rat.

Subject

Verb

.

Page 57: Chapter 4 Syntactic Analysis

Chart 57

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Subject Verb a rat. Subject Verb a Noun.

the cat sees a rat

Noun

Subject

Verb

.

Noun

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun.

Page 58: Chapter 4 Syntactic Analysis

Chart 58

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb a Noun. Subject Verb Object.

Object

Subject Verb Object.

the cat sees a rat

Noun

Subject

Verb

.

Noun

ObjectWhat would happened if we

choose ‘Subject a Noun’ instead of

‘Object a Noun’?

Page 59: Chapter 4 Syntactic Analysis

Chart 59

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

the cat sees a rat

Noun

Subject

Verb

.

Noun

Subject Verb Object.

Object

Sentence

Page 60: Chapter 4 Syntactic Analysis

Chart 60

The parser examines the terminal symbols of the input string, in order from left to right.

The parser reconstructs its syntax tree from the top (root node) down (towards the terminal nodes).

An attempt to find the leftmost derivation for an input string

Page 61: Chapter 4 Syntactic Analysis

Chart 61

General rules for top-down parserso Start with just a stub for the root nodeo At each step the parser takes the left most stubo If the stub is labeled by terminal symbol t, the parser

connects it to the next input terminal symbol, which must be t. (If not, the parser has detected a syntactic error.)

o If the stub is labeled by nonterminal symbol N, the parser chooses one of the production rules N::= X1…Xn, and grows branches from the node labeled by N to new stubs labeled X1,…, Xn (in order from left to right).

o Parsing succeeds when and if the whole input string is connected up to the syntax tree.

Page 62: Chapter 4 Syntactic Analysis

Chart 62

Two formso Backtracking parsers

• Guesses which rule to apply, back up, and changes choices if it can not proceed

o Predictive Parsers• Predicts which rule to apply by using look-ahead tokens

Backtracking parsers are not very efficient. We will cover Predictive parsers

Page 63: Chapter 4 Syntactic Analysis

Chart 63

Many typeso LL(1) parsing

• First L is scanning the input form left to right; second L is for producing a left-most derivation; 1 is for using one input symbol of look-ahead

• Table driven with an explicit stack to maintain the parse tree

o Recursive decent parsing• Uses recursive subroutines to traverse the parse tree

Page 64: Chapter 4 Syntactic Analysis

Chart 64

Lookahead in predictive parsingo The lookahead token (next token in the input) is used to

determine which rule should be used nexto For example:

1. term num term’

2. term’ ‘+’ num term’ | ‘-’ num term’ |

– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7

+

term

num term’

term

num term’

Page 65: Chapter 4 Syntactic Analysis

Chart 65

1. term num term’

2. term’ ‘+’ num term’ | ‘-’ num term’ |

– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7

+

term

num term’

3

term’num7

+

term

num term’

3

- num term’

Page 66: Chapter 4 Syntactic Analysis

Chart 66

1. term num term’

2. term’ ‘+’ num term’ | ‘-’ num term’ |

– num ‘0’|’1’|’2’|…|’9’

Example input: 7 + 3 - 2

term’num7 +

term

num term’

3 - num term’

2

term’num7 +

term

num term’

3 - num term’

2

Page 67: Chapter 4 Syntactic Analysis

Chart 67

Top-down parsing algorithmo Consists of a group of methods (programs) parseN, one

for each nonterminal symbol N of the grammar.o The task of each method parseN is to parse a single N-

phraseo These parsing methods cooperate to parse complete

sentences

Page 68: Chapter 4 Syntactic Analysis

Chart 68

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

.

a. Decide which production rule to apply. Only one, #1.This step created four stubs.

Page 69: Chapter 4 Syntactic Analysis

Chart 69

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 70: Chapter 4 Syntactic Analysis

Chart 70

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 71: Chapter 4 Syntactic Analysis

Chart 71

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun

Page 72: Chapter 4 Syntactic Analysis

Chart 72

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 73: Chapter 4 Syntactic Analysis

Chart 73

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 74: Chapter 4 Syntactic Analysis

Chart 74

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Example input: the cat sees a rat

Sentence

Subject Verb Object .

the cat sees a rat

Noun Noun

Page 75: Chapter 4 Syntactic Analysis

Chart 75

ParseSentenceParseSubjectParseObjectParseVerbParseNoun

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Page 76: Chapter 4 Syntactic Analysis

Chart 76

ParseSentenceparseSubjectparseVerbparseObjectparseEnd

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Sentence

Subject

Verb

Object

.

Page 77: Chapter 4 Syntactic Analysis

Chart 77

ParseSubjectif input = “I”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”

acceptparseNoun

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Subject I

|

Noun

a

|

Noun

the

Page 78: Chapter 4 Syntactic Analysis

Chart 78

ParseNounif input = “cat”

acceptelse if input =“mat”

acceptelse if input = “rat”

acceptelse error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Noun cat

| mat

| rat

Page 79: Chapter 4 Syntactic Analysis

Chart 79

ParseObjectif input = “me”

acceptelse if input =“a”

acceptparseNoun

else if input = “the”acceptparseNoun

else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Object

me

|

Noun

a

|

Noun

the

Page 80: Chapter 4 Syntactic Analysis

Chart 80

ParseVerbif input = “like”

acceptelse if input =“is”

acceptelse if input = “see”

acceptelse if input = “sees”

accept else error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

Verb like

| is

| see

| sees

Page 81: Chapter 4 Syntactic Analysis

Chart 81

ParseEndif input = “.”

acceptelse error

1. Sentence Subject Verb Object.

2. Subject I | a Noun | the Noun

3. Object me | a Noun | the Noun

4. Noun cat | mat | rat

5. Verb like | is | see | sees

.

Page 82: Chapter 4 Syntactic Analysis

Chart 82

Given a (suitable) context-free grammaro Express the grammar in EBNF, with a single production

rule for each nonterminal symbol, and perform any necessary grammar transformations

• Always eliminate left recursion• Always left-factorize whenever possible

o Transcribe each EBNF production rule N::=X to a parsing method parseN, whose body is determined by X

o Make the parser consist of:• A private variable currentToken;• Private parsing methods developed in previous step• Private auxiliary methods accept and acceptIt, both of which

call the scanner• A public parse method that calls parseS, where S is the start

symbol of the grammar), having first called the scanner to store the first input token in currentToken

Page 83: Chapter 4 Syntactic Analysis

Chart 83

“C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows away your whole leg.”o Bjarne Stroustrup

Page 84: Chapter 4 Syntactic Analysis

Chart 84

Did you really say that? Dr. Bjarne Stroustrup: Yes, I did say something along the lines of C makes it easy to shoot yourself in the foot; C++ makes it harder, but when you do, it blows your whole leg off. What people tend to miss is that what I said about C++ is to a varying extent true for all powerful languages. As you protect people from simple dangers, they get themselves into new and less obvious problems. Someone who avoids the simple problems may simply be heading for a not-so-simple one. One problem with very supporting and protective environments is that the hard problems may be discovered too late or be too hard to remedy once discovered. Also, a rare problem is harder to find than a frequent one because you don't suspect it.  I also said, "Within C++, there is a much smaller and cleaner language struggling to get out." For example, that quote can be found on page 207 of The Design and Evolution of C++. And no, that smaller and cleaner language is not Java or C#. The quote occurs in a section entitled "Beyond Files and Syntax". I was pointing out that the C++ semantics is much cleaner than its syntax. I was thinking of programming styles, libraries and programming environments that emphasized the cleaner and more effective practices over archaic uses focused on the low-level aspects of C.

Page 85: Chapter 4 Syntactic Analysis

Chart 85

For production rule N::=Xo Convert production rule to parsing method named parseN

• Private void parseN () {• Parse X• }

o Refine parseE to a dummy statemento Refine parse t (where t is a terminal symbol) to accept(t) or acceptIt()o Refine parse N (where N is a non terminal symbol) to a call of the corresponding

parsing methodparseN()

o Refine parse X Y to{parseXparseY}}

o Refine parse X|YSwitch (currentToken.kind) {Cases in starter[[X]]

Parse XBreak;

Cases in starters[[Y]]:Parse YBreak

Default:Report a syntax error

}

Page 86: Chapter 4 Syntactic Analysis

Chart 86

For X | Y o Choose parse X only if the current token is one that can

start an X-phraseo Choose parse Y only if the current token is one that can

start an Y-phrase• starters[[X]] and starters[[Y]] must be disjoint

For X*o Choose

while (currentToken.kind is in starters[[X]])• starter[[X]] must be disjoint from the set of tokens that can

follow X* in this particular context

Page 87: Chapter 4 Syntactic Analysis

Chart 87

A grammar that satisfies both these conditions is called an LL(1) grammar

Recursive-descent parsing is suitable only for LL(1) grammars

Page 88: Chapter 4 Syntactic Analysis

Chart 88

Good programming languages are designed with a relatively large “distance” between syntactically correct programs, to increase the likelihood that conceptual mistakes are caught on syntactic errors.

Error repair usually occurs at two levels:o Local: repairs mistakes with little global import, such as

missing semicolons and undeclared variables.o Scope: repairs the program text so that scopes are

correct. Errors of this kind include unbalanced parentheses and begin/end blocks.

Page 89: Chapter 4 Syntactic Analysis

Chart 89

Repair actions can be divided into insertions and deletions. Typically the compiler will use some look ahead and backtracking in attempting to make progress in the parse. There is great variation among compilers, though some languages (PL/C) carry a tradition of good error repair. Goals of error repair are:o No input should cause the compiler to collapseo Illegal constructs are flaggedo Frequently occurring errors are repaired gracefullyo Minimal stuttering or cascading of errors.

LL-Style parsing lends itself well to error repair, since the compiler uses the grammar’s rules to predict what should occur next in the input

Page 90: Chapter 4 Syntactic Analysis

Chart 90

Single-Command ::= | V-name := Expression| Identifier ( Actual-Parameter-Sequence )| begin Command end| let Declaration in single-Command| if Expression then single-Command

else single-Command| while Expression do single-Command

V-name ::= Identifier| V-name . Identifier| V-name [ Expression ]

Identifier :: = Letter (Letter | Digit)*Letter ::= a|b|c|d|e|f|g|h|I|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z

|A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|ZDigit :: = 0|1|2|3|4|5|6|7|8|9

Page 91: Chapter 4 Syntactic Analysis

Chart 91

Starter Set for REo starters[[X]] is the string of terminal symbols that can

start a string generated by X Example

starters[[single-Command]] = starters[[:=, (, begin, let, if, while]]• What about Vname vs Identifier?

• Use the look ahead when encounter Identifier to look for := or (.

Page 92: Chapter 4 Syntactic Analysis

Chart 92

Program ::= Command Program (1.14)Command ::= V-name := Expression AssignCommand (1.15a)

| Identifier ( Expression ) CallCommand (1.15b)| Command ; Command SequentialCommand (1.15c)| if Expression then Command IfCommand (15.d)

else Command| while Expression do Command WhileCommand (1.15e| let Declaration in Command LetCommand (1.15f)

Expression ::= Integer-Literal IntegerExpression (1.16a)| V-name

VnameExpression (1.16b)| Operator Expression UnaryExpression (1.16c)| Expression Operator Expression

BinaryExpressioiun (1.16d)V-name ::= Identifier SimpelVname (1.17)Declaration ::= const Identifier ~ Expression ConstDeclaration (1.18a)

| var Identifier : Typoe-denoter VarDeclaration (1.18b)| Declaration ; Declaration SequentialDeclaration (1.18c)

Type-denoter ::= Identifier SimpleTypeDenoter (1.19)

Label

Page 93: Chapter 4 Syntactic Analysis

Chart 93

An explicit representation of the source program’s phrase structure

AST for Mini-Triangle

Page 94: Chapter 4 Syntactic Analysis

Chart 94

Program ASTs (P):

Program

C

Program ::= Command Program (1.14

Command ASTs (C):

AssignCommand

V E

CallCommand

Identifier E

spelling

SequentialCommand

C1C2

Command ::= V-name := Expression AssignCommand (1.15a)

| Identifier ( Expression ) CallCommand (1.15b)

| Command ; Command SequentialCommand (1.15c)

(1.15a)(1.15b) (1.15c)

Page 95: Chapter 4 Syntactic Analysis

Chart 95

Command ASTs (C):

WhileCommand

E C

IfCommand

C1C2(1.15e)

(1.15d)

LetCommand

D C(1.15f) E

Command ::= | if Expression then Command IfCommand (15.d)

else Command

| while Expression do Command WhileCommand (1.15e

| let Declaration in Command LetCommand (1.15f)