Syntax Analysis

Post on 07-Jan-2016

20 views 0 download

description

Syntax Analysis. The recognition problem : given a grammar G and a string w , is w Î L(G)?. The parsing problem : if G is a grammar and w Î L(G), how can w be derived in G?. Both of these problems are decidable - that is, there are algorithms which will give a definite - PowerPoint PPT Presentation

Transcript of Syntax Analysis

Syntax Analysis

The recognition problem: given a grammar Gand a string w, is w L(G)?

The parsing problem: if G is a grammar andw L(G), how can w be derived in G?

Both of these problems are decidable - that is,there are algorithms which will give a definite(correct) yes or no answer for any giveninstance of the problems.

Parsing is important, because understandingthe derivation of a structure helps us tounderstand the meaning of the structure.

Derivation Structure

Consider the expression in the language G0:

a +( a * a)

In order to process this expression, it helpsto consider the (a*a) substring as a moresignificant sub-unit than a+(a, for example.

We can use the derivation of the string:

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Derivation Structure

Consider the expression in the language G0:

a +( a * a)

In order to process this expression, it helpsto consider the (a*a) substring as a moresignificant sub-unit than a+(a, for example.

We can use the derivation of the string:

S => S+S => S+(S) => S+(S*S) => S+(S*a)=> S+(a*a) => a+(a*a).

S

S + S

( S )

S * S

a a

a

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Derivation Trees

For any derivation, we can construct a derivation tree.

The root of the tree will be a node representing the start symbol.

Every time we apply a production A -> , we add a subtree below AA is the root, and there is a branch for every symbol of , in the same left-to-right order in which they appear in .

We read the string represented by the derivationtree by reading the "leaf" nodes in left-to-right order.

Note: "left-to-right" order means the "structural"order - the leftmost path, then the same path, but with the next-to-left branch at the last nodewhere there was a choice, etc. - and not anyorder which may appear in the sketch.

S => S+S => S+(S) => S+(S*S) => S+(S*a) => S+(a*a) => a+(a*a).

S S

S + S

( S )

=> S

S + S

S

S + S

( S )

S * S

=> =>

S

S + S

( S )

S * S

a a

=> =>S

S + S

( S )

S * S

a a

a

S

S + S

( S )

S * S

=>

a

Equivalent Derivations

Two different derivations can have the samederivation tree.

Example:

S => S+S => S+a => a+a

and

S => S+S => a+S => a+a

both produce the tree

S

S + S

a a

In CFG's, the order of applying productions is irrelevant, as long as the same production is applied to the same symbol.

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Multiple Derivation Trees

Consider the two derivations below:

1. S => S+S => S+S*S => S+S*a => S+a*a => a+a*a

2. S => S*S => S*a=> S+S*a => S+a*a => a+a*a

These give essentially different derivationtrees for the same final sentence.

S

S

a

+ S

S * S

a a

1. S

S

a+

S

S

*

S

a a

2.

This causes problems for our attempt to understand a string by considering its derivation.

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Ambiguous Grammars

A derivation in which at each step the rightmostnon-terminal is replaced is a right derivation.

In a right derivation, the order of symbols to be replaced is fixed.

A string has two different right derivations iff ithas two different derivation trees.

A CFG is ambiguous if there is at least onestring in L(G) having two or more differentright derivations (or, equally, two or moredifferent derivation trees).

The Problem With Ambiguity

By the previous example, the grammar ofalgebraic expressions, G0, is ambiguous.

Problem: 2+2*2 = ?

Under derivation 1., we get 2 + (2*2) = 6.

Under derivation 2., we get (2+2)*2 = 8.

Which do we select?

Why is this a problem?

Suppose we are attempting to analyse strings in the language of G0, in order to performsimple arithmetic - the structure of thederivation will tell us which operation to applywhen.

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Unambiguous Expressions

We are aiming to produce an unambiguousversion of G0. Essentially, we want to assign priorities to the operators, and reflect this in the grammar. Also, although it makes no difference to the evaluated expression, we want a+a+a to be (a+a)+a.

We will do this by introducing new symbols - aterm, T, will represent a product; a factor, F,will represent things that can be multiplied; andS will represent sums.

An expression can be a sum of an expression and a term, or simply a term. A term can be aproduct of a term and a factor, or simply a factor.A factor can be an expression (in parentheses), orsimply a symbol.

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Unambiguous Expressions

We are aiming to produce an unambiguousversion of G0. Essentially, we want to assign priorities to the operators, and reflect this in the grammar. Also, although it makes no difference to the evaluated expression, we want a+a+a to be (a+a)+a.

Example: Grammar G1.

S -> S + T | TT -> T * F | FF -> (S) | a

We will do this by introducing new symbols - aterm, T, will represent a product; a factor, F,will represent things that can be multiplied; andS will represent sums.

An expression can be a sum of an expression and a term, or simply a term. A term can be aproduct of a term and a factor, or simply a factor.A factor can be an expression (in parentheses), orsimply a symbol.

1) S -> S + S2) S -> S * S3) S -> (S)4) S -> a.

Ambiguity and Decidability

The ambiguity we have seen so far has always been a property of the grammar, and not of thelangauge. However, there exist languages for which every grammar defining them is ambiguous.

Example: {aibjck : i = j or j = k }

A language for which every defining grammar isambiguous is inherently ambiguous.

More importantly, there is no algorithm whichwill determine whether or not a given grammaris ambiguous.