COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong...

COMP 3438 – Part II-Lecture 6Syntax Analysis III

Dr. Zili ShaoDepartment of ComputingThe Hong Kong Polytechnic Univ.

Overview of the Subject (COMP 3438)

Overview of Unix Sys. Prog.

Process File System

Overview of Device Driver Development

Character Device Driver

Development

Introduction to Block Device

Driver

Overview of Complier Design

Lexical Analysis

Syntax Analysis(HW #4)

Part I: Unix System Programming (Device Driver Development)

Part II: Compiler Design

Course Organization (This lecture is in red)

Outline

Part I: Introduction to Syntax Analysis 1. Input (Tokens) and Output (Parse Tree) 2. How to specify syntax? Context Free Grammar (CFG) 3. How to obtain parse tree? CFG Remove left recursion, left factoring, ambiguity LL (Leftmost Derivation) CFG (Remove ambiguity) LR (Reverse Rightmost Derivation)

Part II: Context Free Grammar, Parse Tree and Ambiguity

Part III: Bottom-up Paring (LR) SLR, Canonical LR, LALR

Part III: Top-down Parsing (LL) Left Recursion, Left factoring (Tutorial) Recursive-Decent Paring Predictive Parsing (without backtracking) –HW4 Nonrecursive Predictive Parsing Software Tool: yacc (Lab)

Part IV: Predictive Parsing & Nonrecursive Predictive Parsing

Predictive Parser A special case of recursive-decent parser

Do NOT need backtracking

How to get the grammar that can be parsed by a predictive parser: Remove left recursion Left factoring the resulting grammar

6

In order to eliminate backtracking, we must know, given input symbol a and nonterminal A, which alternative of

A 1 | 2 | ... | n is the right one that derives a string beginning with a.

That is, we must be able to detect the proper alternative by looking at only the FIRST symbol it derives.

Predictive parsing relies on the information about what FIRST symbols can be generated by the right side of a production

Predictive parsers

7

Let be the right side of a production for nonterminal A, i.e., A is a production. We define FIRST() to be the set of tokens {a} that appear as the FIRST symbols of the strings generated from , i.e. FIRST() = {a | =>* a}

Consider again the example grammar GE TE' E' +TE' | T FT'T' *FT' | F ( E ) | id

ThenFIRST(E) = FIRST(T) = FIRST(F) = {(, id}FIRST(E’ )={+, }FISRT(T’ ) = {*, }

For two productions A and A ,

predictive parsing requires FIRST() and FIRST() to be disjoint so that the lookahead symbol can be used to decide which production to use.

Predictive parsers

8

For each nonterminal, we have one corresponding procedures

Each procedure does two things: select a production to use based on the lookahead symbol.

Use the production with right side if the lookahead symbol is in FIRST(). A production with on the right side is used if the lookahead symbol is not in FIRST set for any other right hand side.

apply a production by mimicking the right side. Call the procedure for the nonterminal, and if a token matches the lookahead

symbol, the next input token is read. If at some point the token in the production does not match the lookahead symbol, an error is declared.

The parser begins with the procedure for the start symbol. match terminal symbols against input, and make a potentially recursive

procedure call whenever it has to expand a nonterminal.

Implement a predictive parser

9

The above approach works only if the given grammar does not have nondeterminism, i.e, there is no conflict between right sides for any lookahead symbol.

If ambiguity occurs, we try to resolve it in an ad-hoc way.

If the nondeterminism cannot be eliminated, use recursive-descent parser with backtracking to systematically try all possibilities

Predictive parsers

10

Non-recursive predictive parsers If we don’t have a recursive language for writing the parser or the

overhead of recursive calls is too much, a non-recursive version - a tabular implementation of predictive parsing - can be used

The parser maintains an input buffer, a stack and a parsing table

a + b $

Predictive Parsing Program

Parsing TableM

XYZ$

OUTPUTStack

A two-dimensional array: M[A,a] where A is nonterminal, and a is terminal or symbol $

Input buffer Contain input tokens with “$” (denoting the end).

A sequence of grammar sym-bols with $ on the bottom.

11

The parser is controlled by a program that behaves as follows. Given the top stack symbol X and the current input symbol a, If X = a = $, stops and announces successful completion of parsing. If X = a $, pops X off the stack and advances the input pointer to

the next input symbol. If X is a nonterminal, looks up entry M[X, a] of parsing table.

If M[X,a] = {X UVW}, replaces X on top of stack by WVU (U on top).

If M[X, a] = error, calls an error recovery routine.

Non-recursive predictive parsers

X...$

UVW...$

12

input id + id * id

Grmmar:E TE' E' +TE' | T FT’T' *FT' | F ( E ) | id

Example

13

Construct predictive parsing table

Use two functions, FIRST and FOLLOW FIRST: let be any string of grammar symbols. FIRST() is the set of

terminals that begin the strings derived from . If * then is also in FIRST(). FOLLOW: let A be a nonterminal, FOLLOW(A) is the set of terminals {a}

that can appear immediately to the right of A in some sentential form, i.e., there exists a derivation of the form S * Aa for some and .

If A can be the rightmost symbol in some sentential form, then FOLLOW(S) is also in FOLLOW(A).

If A is the start symbol, then $ is in FOLLOW(A). How to compute FIRST and FOLLOW?

14

FIRST(X)

To compute FIRST(X) for all symbols X

Rules:1. If t is a terminal, then FIRST(t) is {t}. 2. If X , then add to FIRST(X)3. If X A1 … An and FIRST(Ai), for all i : 1 i

n do add FIRST() to FIRST(X)

4. For each X A1 … An s.t. FIRST(Ai), 1 i n do add to FIRST(X)

5. repeat steps 3 & 4 until no FIRST sets can be grown

15

Example for FIRST

Given the grammar E TE’ E’ + TE’ | T FT’

T’ *FT’ | F (E) | id

Computer the FIRST sets FIRST( ( ) = { ( } FIRST( E )=FIRST(T)=FIRST(F) = { ( , id } FIRST( ) ) = { ) } FIRST( E’ ) = {+, } FIRST( id ) = { id } FIRST( T’ ) = {*, } FIRST( + ) = { + } FIRST( * ) = { * }

16

FOLLOW(A)

To compute FOLLOW(A) for all nonterminal A.

Rules: If S is the start symbol then $ FOLLOW(S)

If A D β, then everything in FIRST(β) except is placed in FOLLOW(D).

If A D or A D β where FIRST(β) , then everything in FOLLOW(A) is in FOLLOW(D).

17

Example for FOLLOW(A)

Given the grammar E TE’ E’ + TE’ | T FT’ T’ *FT’ | F (E) | id

Computer the FIRST sets FIRST( ( ) = { ( } FIRST( E )=FIRST(T)=FIRST(F) = { ( , id } FIRST( ) ) = { ) } FIRST( E’ ) = {+, } FIRST( id ) = { id } FIRST( T’ ) = {*, } FIRST( + ) = { + } FIRST( * ) = { * }

Computer the FOLLOW sets FOLLOW( E ) = FOLLOW(E’) = { ) , $} FOLLOW( T ) = FOLLOW(T’) = {+, ) , $} FOLLOW( F ) = { *,+, ), $}

Construction of Parse TableFor each production A of grammar1. For each terminal a First(), add A to M[A, a];2. If First() then

for each terminal b Follow(A), add A to M[A, b];3. If First() and $ Follow(A), add A to M[A, $];

Idea behind: If production A where a First()

( if A is top of stack and a is the input symbol) then replace A by in the stack else if * then expand A by

if current input symbol a Follow(A)

19

LL(1) parsing

The recursive descent method is a special case of so-called LL(k) parsing. scan the input string from Left to right, apply productions to the Leftmost non-terminal in the

sentential form we are manipulating, and look ahead only as far as the next k terminals in the input

string. LL(1) parsing is the most common form of LL(k)

parsing in practice. A parse table using the above method without multi-

defined entries is the parsing table for LL(1).

If grammar is left recursive or ambiguous, M[A,a] would have multiple entries

Given a grammar G, G is LL(1) if for every rule A |

1. There exists no terminal a, such that aFirst() and also First();

2. At most one of the and can derive the empty string;3. If derives the empty string then does not derive any string

beginning with a terminal in FOLLOW(A).

If a grammar is LL(1) ?

Ambiguous Grammars Some grammars may need more than 1 symbol look ahead (k); However, some grammar are not LL regardless of how the

grammar is changed:S if C then S | if C then S else S | a (other stmts)C b

Change to:S if C then S X | aX else S | C b

“else” FIRST(X)FRIST(X) - FOLLOW(S)

X else … | “else” FOLLOW(X)

Problem sentence “if b then if b then a else a”

LL(1) parsers operate in linear time and need linear space relative to the length of input because Time – each input symbol is processed constant number of

times Space – stack is smaller than the input

But, by changing the grammar, it might make the other phases of the compiler more difficult Hard to determine semantics and generate code

Complexity of LL(1) Parser

Summary A non-recursive predictive parser maintains

an input buffer, a stack and a parsing table. The parsing table is constructed using two

functions: FIRST and FOLLOW A set of rules have been introduced to get FIRST and

FOLLOW Based on FIRST and FOLLOW, how to construct

paring table? LL(1) parsing

What is LL(1)? Is a grammar LL(1)? The complexity of LL(1)

COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong...

Documents

Transcript of COMP 3438 – Part II-Lecture 6 Syntax Analysis III Dr. Zili Shao Department of Computing The Hong...