Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

73
Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014

Transcript of Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Page 1: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Compiler Principle and Technology

Prof. Dongming LUMar. 7th, 2014

Page 2: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3. Context-Free Grammars and Parsing

(PART TWO)

Page 3: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Contents

PART ONE3.1 The Parsing Process 3.2 Context-Free Grammars3.3 Parse Trees and Abstract

PART TWO3.4 Ambiguity3.5 Extended Notations: EBNF and Syntax

Diagrams 3.6 Formal Properties of Context-Free

Languages

Page 4: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.4 Ambiguity

Page 5: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

What is Ambiguity

Parse trees and syntax trees uniquely express the structure of syntax

It is possible for a grammar to permit a string to have more than one parse tree

For example, the simple integer arithmetic grammar: exp exp op exp|( exp ) | number op + | - | *

The string: 34-3*42

Page 6: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

What is Ambiguity

exp

exp op exp

* number exp op exp

number - number

exp

exp op exp

_ exp op exp number number *

number

This string has two different parse trees.

Page 7: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

What is Ambiguity

exp => exp op exp=> exp op exp op exp ,=> number op exp op exp=>number - exp op exp=> number - number op

exp=> number - number * exp=> number - number *

number

exp=> exp op exp =>number op exp =>number - exp =>number - exp op exp =>number - number op exp =>number - number * exp => number - number *

number

Corresponding to the two leftmost derivations

Page 8: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

What is Ambiguity

The associated syntax trees are

Page 9: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

An Ambiguous Grammar

A grammar that generates a string with two distinct parse trees and at least two distinct derivations. Represents a serious problem for a parser Not specify precisely the syntactic structure of a

program

In some sense, an ambiguous grammar is like a non-deterministic automaton Two separate paths can accept the same string

Page 10: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

An Ambiguous Grammar Ambiguity in grammars

Cannot be removed nearly as easily as non-determinism in finite automata

No algorithm for doing so, unlike the situation in the case of automata

Ambiguous grammars Fail the tests that we introduce later for the

standard parsing algorithms A body of standard techniques have been developed

to deal with typical ambiguities that come up in programming languages.

Page 11: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Two Basic Methods dealing with Ambiguity1. A disambiguating rule: specifies in each ambiguous case which of the parse trees (or syntax trees) is the correct one

The advantage: it corrects the ambiguity without changing (and possibly complicating) the grammar.

The disadvantage: the syntactic structure of the language is no longer given by the grammar alone.

2. Change the grammar into a form: forces the construction of the correct parse tree, thus removing the ambiguity.

Of course, in either method, we must first decide which of the trees in an ambiguous case is the correct one.

Page 12: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Remove the Ambiguity in Simple Expression Grammar

A disambiguating rule that establishes the relative precedence of the three operations.

Give addition and subtraction the same precedence

And, to give multiplication a higher precedence

A further disambiguating rule is the associativity of each of the operations

Specify that all three of these operations are left associative

Page 13: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Remove the Ambiguity in simple Expression Grammar

Specify that an operation is nonassociative : A sequence of more than one operator in an expression is not allowed.

For instance, writing simple expression grammar in the following form: fully parenthesized expressions exp factor op factor | factor factor ( exp ) | number op + |- | *

Page 14: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Remove the Ambiguity in simple Expression Grammar

Strings such as 34-3-42 and even 34-3*42 are now illegal, and must instead be written with parentheses, such as

(34-3) -42 and 34- (3*42)

The disadvantage: not only changed the grammar, also changed the language being recognized.

Page 15: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.4.2 Precedence and Associativity

Page 16: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Two Typical Kinds

of Ambiguities

1. The arithmetic grammars, such as : the simple integer arithmetic grammar:

exp exp op exp|( exp ) | number op + | - | *

The string: 34-3*42

2. The dangling else problem (We will discuss this later)

We use precedence and associativity to solve the first

kind of ambiguity

Page 17: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Group of Equal Precedence

The precedence can be added to our simple expression grammar as follows:

exp exp addop exp | termaddop + | -term term mulop term| factormulop *factor ( exp ) | number

Addition and subtraction will appear "higher" (closer to the root) in the parse and syntax trees Receive lower precedence.

Page 18: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Precedence Cascade Grouping operators into different precedence

levels. Cascade is a standard method in syntactic

specification using BNF.

Replacing the rule exp exp addop exp | term by exp exp addop term |term or exp term addop exp |term A left recursive rule makes operators associate on

the left A right recursive rule makes them associate on

the right

Page 19: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Removal of Ambiguity

Removal of ambiguity in the BNF rules for simple arithmetic expressions write the rules to make all the operations

left associativeexp exp addop term |termaddop + | -term term mulop factor | factormulop *factor ( exp ) | number

Page 20: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

New Parse TreeThe parse tree for the expression 34-3*42 is

exp exp addop term term _ term mulop factor factor factor * number number number

Page 21: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

New Parse Tree The parse tree for the expression 34-3-42 exp exp addop term exp addop term _ factor term _ factor number factor number number

• The precedence cascades cause the parse trees to become much more complex

• The syntax trees, however, are not affected

Page 22: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.4.3 The dangling else problem

Page 23: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

An Ambiguity Grammar

Consider the grammar from:statement if-stmt | otherif-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1

This grammar is ambiguous as a result of the optional else. Consider the string

if (0) if (1) other else other

Page 24: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

statement

if-stmt

( if exp ) statement

0 if-stmt

if ( exp ) statement

1 other

else statement

other

Page 25: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

statement

unmatched-stmt

( if exp ) statement

0 if-stmt

if ( exp ) statement else statement

1 other other

Page 26: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Dangling else problem

Which tree is correct? The first associates the else-part with the first if-

statement; The second associates it with the second if-

statement. This ambiguity called dangling else problem

This disambiguating rule: the most closely nested rule Implies that the second parse tree above is the

correct one.

Page 27: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

An Example

For example: if (x != 0)

if (y = = 1/x) ok = TRUE; else z = 1/x;

Note that, if we wanted we could associate the else-part with the first if-statement by using brackets {...} in C, as in

if (x != 0) { if (y = = 1/x) ok = TRUE; }

else z = 1/x;

Page 28: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

A Solution to the dangling else ambiguity in the BNF

statement matched-stmt | unmatched-stmtmatched-stmt if ( exp ) matched-stmt else

matched-stmt | other unmatched-stmt if ( exp ) statement | if ( exp ) matched-stmt else unmatched-stmtexp 0 | 1

Permitting only a matched-stmt to come before an else in an if-statement Forcing all else-parts to be matched as soon as

possible.

Page 29: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

statement

unmatched-stmt

( if exp ) statement

0 matched-stmt

if ( exp ) matched-stmt else matched-stmt

1 other other

Page 30: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

More about dangling else The dangling else problem has its origins in the syntax

of Algol60.

It is possible to design the syntax in such a way that the dangling else problem does not appear. Require the presence of the else-part, and this

method has been used in LISP and other functional languages (where a value must also be returned).

Use a bracketing keyword for the if-statement languages that use this solution include Algol68 and Ada.

Page 31: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

More About Dangling else

For example, in Ada, the programmer writes

if x /= 0 thenif y = 1/x then ok := true;else z := 1/x;end if; end if;

Associate the else-part with the second if-statement, the programmer writes

if x /= 0 thenif y = 1/x then ok := true;end ifelse z := 1/x; end if;

Page 32: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

More about dangling else

BNF in Ada (somewhat simplified) is

if-stmt if condition then statement-sequence end if

| if condition then statement-sequence else

statement-sequence end if

Page 33: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.4.4 Inessential ambiguity

Page 34: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Why Inessential

A grammar may be ambiguous and yet always produce unique abstract syntax trees.

The grammar ambiguously as

stmt-sequence stmt-sequence ; stmt-sequence

| stmt

stmt s

An ambiguity is called an inessential ambiguity

Either a right recursive or left recursive grammar rule would still result in the same syntax tree structure

Page 35: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Why Inessential Inessential ambiguity: the associated semantics do

not depend on what disambiguating rule is used.

Arithmetic addition or string concatenation, that represent associative operations (a binary operator • is asso­ciative if (a • b) • c = a • (b • c) for all values a, b, and c).

The syntax trees are still distinct, but represent the same semantic value

A parsing algorithm will need to apply some disambiguating rule that the compiler writer may need to supply

Page 36: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.5 Extended Notations: EBNF and Syntax Diagrams

Page 37: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.5.1 EBNF Notation

Page 38: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Special Notations for Repetitive Constructs

Repetition1. A A | (left recursive)2. A A | (right recursive)

where and are arbitrary strings of terminals and non-terminals, and

In the first rule does not begin with A

In the second does not end with A

Page 39: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Notation for repetition as regular expressions use, the asterisk * .

A * , and

A * EBNF opts to use curly brackets {. . .} to express repetition

A { } , and

A {} The problem with any repetition notation is that it obscures

how the parse tree is to be constructed, but, as we have seen, we often do not care.

Special Notations for Repetitive Constructs

Page 40: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Examples

Example: The case of statement sequences The grammar as follows, in right recursive form:

stmt-Sequence stmt ; stmt-Sequence | stmt

stmt s In EBNF this would appear as

stmt-sequence { stmt ; } stmt (right recursive form)

stmt-sequence stmt { ; stmt} (left recursive form)

Page 41: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Examples

A more significant problem occurs when the associativity matters

exp exp addop term | termexp term { addop term }

(imply left associativity)

exp {term addop } term (imply right associativity)

Page 42: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Optional construct are indicated by surrounding them with square brackets [...].

The grammar rules for if-statements with optional else-parts would be written as follows in EBNF:

statement if-stmt | other

if-stmt if ( exp ) statement [ else statement ]

exp 0 | 1

stmt-sequence stmt; stmt-sequence | stmt is written as

stmt-sequence stmt [ ; stmt-sequence ]

Special Notations for Repetitive Constructs

Page 43: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.5.2 Syntax Diagrams

Page 44: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Syntax Diagrams

Syntax Diagrams: Graphical representations for visually representing

EBNF rules. An example: consider the grammar rule

factor ( exp ) | number The syntax diagram:

factor

number

( )exp

The syntax diagram for a context free grammar is like the DFA for RE

It help us to translate grammars

into programs

Page 45: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Syntax Diagrams

Boxes representing terminals and non-terminals. Arrowed lines representing sequencing and

choices. Non-terminal labels for each diagram representing

the grammar rule defining that Non-terminal. A round or oval box is used to indicate terminals

in a diagram. A square or rectangular box is used to indicate

non-­terminals.

factor

number

( )exp

Page 46: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Syntax Diagrams

A repetition : A {B}

An optional : A [B]

B

A

B

A

Page 47: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Examples

Example: Consider the example of simple arithmetic expressions.

exp exp addop term | termaddop + | -term term mulop factor | factor mulop *factor ( exp ) | number

This BNF includes associativity and precedence

Page 48: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Examples

The corresponding EBNF isexp term { addop term }addop + | -term factor { mulop factor }mulop *factor ( exp ) | numberr ,

The corresponding syntax diagrams are given as follows:

exp

term

term addop

Page 49: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Examplesaddop

+

-

term

factor

factor mulop

*

mulop

factor

( exp )

number

Page 50: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Examples

Example: Consider the grammar of simplified if-statements, the BNF

Statement if-stmt | otherif-stmt if ( exp ) statement | if ( exp ) statement else statement exp 0 | 1

and the EBNF statement if-stmt | other if-stmt if ( exp ) statement [ else

statement ] exp 0 | 1

Page 51: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

The corresponding syntax diagrams are given in following figure.

statement

number

if-stmt

if-stmt

if ( ) statementexp

else statement

exp

0

1

Page 52: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.6 Formal Properties of Context-Free Language

Page 53: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.6.1 A Formal Definition of Context-Free Language

Page 54: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Definition

Definition: A context-free grammar consists of the following:

1. A set T of terminals.2. A set N of non-terminals (disjoint from

T).3. A set P of productions, or grammar

rules, of the form A a, where A is an element of N and a is an element of (TN)* (a possibly empty sequence of terminals and non-terminals).

4. A start symbol S from the set N.

Page 55: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Definition

Let G be a grammar as defined above, G = (T, N, P, S).

A derivation step over G is of the forma A => a , Where a and are elements of (TN)*, and A is in P.

The set of symbols: The union T N of the sets of terminals and

non-terminals A sentential form:

a string a in (TN)*.

Page 56: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Definition

The relation a =>* is defined to be the transitive closure of the derivation step relation =>; ta =>* if and only if there is a

sequence of 0 or more derivation steps (n >= 0) a1=> a 2 =>…=> a n-1=> a n such that a =-a1, and = a n

(If n = 0, then a =)

Page 57: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Definition

A derivation over the grammar G is of the form S =>* w, where w T* (i.e., w is a string of terminals only,

called a sentence), and S is the start symbol of G

The language generated by G, written L(G), is defined as the set L(G) = {w T* | there exists a derivation S =>*

w of G}. L(G) is the set of sentences derivable from S.

Page 58: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Definition A leftmost derivation S =>*lm w

is a derivation in which each derivation step a A => a , is such that a T*; that is, a consists only of

terminals. A rightmost derivation is one in which each

derivation step a A => a has the property that T*.

Page 59: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Parse Tree over Grammar G

A rooted labeled tree with the following properties:1. Each node is labeled with a terminal or a non-

terminal or .2. The root node is labeled with the start symbol S.3. Each leaf node is labeled with a terminal or with .4. Each non-leaf node is labeled with a non-terminal.5. If a node with label A N has n children with labels

X1, X2,..., Xn(which may be terminals or non-terminals), then A X1X2 ... Xn P (a production of the

grammar).

Page 60: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

CFG & Ambiguous

A set of strings L is said to be a context-free language if there is context-free grammar G such

that L = L (G).

A grammar G is ambiguous if there exists a string w L(G) such that

w has two distinct parse trees (or leftmost or rightmost derivations).

Page 61: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.6.2 Grammar Rules as Equations

Page 62: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Meaning of Equation

The grammar rules use the arrow symbol instead of an

equal sign to represent the definition of names for

structures (non-terminals)

Left and right-hands sides still hold equality in some

extents, but the defining process of the language that

results from this view is different.

Consider, for example, the following grammar rule, which is extracted (in simplified form) from our simple expression grammar: exp exp+ exp | number

Page 63: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Rules as Equation A non-terminal name like exp defines a set of

strings of terminals, called E; (which is the language, of the grammar if the

non-terminal is the start symbol). let N be the set of natural numbers;

(corresponding to the regular expression name number).

Then, the given grammar rule can be interpreted as the set equation E = (E + E) N

This is a recursive equation for the set E: E = N (N+N) (N+N+N) (N+N+N+N) ……

Page 64: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

3.6.3 Chomsky Hierarchy and Limits of Context-Free Syntax

Page 65: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

The Power of CFGConsider the definition of a number as a sequence of digits using regular expressions:

digit = 0|1|2|3|4|5|6|7|8|9 number = digit digit*

Writing this definition using BNF, instead, asDigit 0 |1|2|3|4|5|6|7|8|9number number digit |digit

Note: the recursion in the second rule is used to express repetition only.

Page 66: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Regular Grammar A grammar is called a regular grammar

The recursion in the rule is used to express repetition only

Can express everything that regular expressions can Can design a parser accepting characters directly from

the input source file and dispense with the scanner altogether.

A parser is a more powerful machine than a scanner but less efficient. The grammar would then express the complete

syntactic structure, including the lexical structure The language implementers would be expected to

extract these definitions from the grammar and turn them into a scanner

Page 67: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Context Rules Free of context rule:

Non-terminals appear by themselves to the left of the arrow in context-free rules.

A rule says that A may be replaced regardless of where the A occurs.

Context-sensitive grammar rule: A rule would apply only if occurs before and occurs

after the non-terminal. We would write this as

A => a , (a )

Context-sensitive grammars are more powerful than context-free grammars Also much more difficult to use as the basis for a

parser.

Page 68: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Requirement of a Context-Sensitive Grammar Rule

The C rule requires declaration before use First: Include the name strings themselves in the

grammar rules rather than include all names as identifier tokens that are indistinguishable.

Second: For each name, we would have to write a rule establishing its declaration prior to a potential use.

Generally, the length of an identifier is unrestricted The number of possible identifiers is (at least potentially)

infinite.

Even if names are allowed to be only two characters long, The potential for hundreds of new grammar rules. Clearly,

this is an impossible situation

Page 69: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Solution like a Disambiguating Rule State a rule (declaration before use) not explicit in the

grammar Such a rule cannot be enforced by the parser itself,

since it is beyond the power of (reasonable) context-free rules to express.

This rule becomes part of semantic analysis Depends on the use of the symbol table (which

records which identifiers have been declared)

The static semantics of the language include type checking (in a statically typed language) and such rules as declaration before use.

Regard as syntax only those rules that can be expressed by BNF rules. Everything else we regard as semantics.

Page 70: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Unrestricted Grammars

More general than the context-sensitive grammars. It have grammar rules of the form

, where there are no restrictions on the form of

the strings a and (except that a cannot be )

Page 71: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Types of Grammars The language classes they construct are also referred

to as the Chomsky hierarchy, after Noam Chomsky, who pioneered their use to describe natural languages. type 0 : unrestricted grammar, equivalent to Turing

machinestype 1 : context sensitive grammartype 2 : context free grammar, equivalent to

pushdown automatontype 3 : regular grammar , equivalent to finite

automata These grammars represent distinct levels of computational power.

Type 0Type 0

Type 1Type 1

Type 2Type 2

Type 3Type 3

Page 72: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

Summaries

Regular expressionRegular expression

Context free grammarContext free grammar

Power is limitedPower is limited

Formal Formal definitiodefinitionn

DerivationDerivationParse tree & Parse tree & Syntax treeSyntax tree

graphical graphical explanationexplanation

leftmost derivationleftmost derivation rightmost derivationrightmost derivation

AmbiguityAmbiguity ambiguity ambiguity eliminationelimination

Syntax analysisSyntax analysis

Lexical analysisLexical analysis

Page 73: Compiler Principle and Technology Prof. Dongming LU Mar. 7th, 2014.

End of Part TwoTHANKS