Compiler design syntax analysis

24
COMPILER DESIGN SYNTAX ANALYSIS RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 1 Ms. RICHA SHARMA Assistant Professor [email protected] Lovely Professional University

Transcript of Compiler design syntax analysis

Page 1: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 1

COMPILER DESIGNSYNTAX ANALYSIS

Ms. RICHA SHARMAAssistant [email protected] Professional University

Page 2: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 2

SYNTAX ANALYSIS INTRODUCTION• LEXICAL PHASE IS IMPLEMENTED ON FINITE AUTOMATA & FINITE AUTOMATA CAN REALLY ONLY

EXPRESS THINGS WHERE YOU CAN COUNT MODULUS ON K. • REGULAR LANGUAGES – THE WEAKEST FORMAL LANGUAGES WIDELY USED – MANY APPLICATIONS– CAN’T HANDLE ITERATION & NESTED LOOPS(NESTED IF ELSE ).TO SUMMARIZE, THE LEXER TAKES A STRING OF CHARACTER AS INPUT AND PRODUCES A STRING OF TOKENS AS OUTPUT. THAT STRING OF TOKENS IS THE INPUT TO THE PARSER WHICH TAKES A STRING OF TOKENS AND PRODUCES A PARSE TREE OF THE PROGRAM.SOMETIMES THE PARSE TREE IS ONLY IMPLICIT. SO THE, A COMPILER MAY NEVER ACTUALLY BUILD THE FULL PARSETREE.

Page 3: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 3

Lexical Analyzer ParserSource

program

token

getNextToken

Symboltable

Parse tree Rest of Front End

Intermediaterepresentation

ROLE OF SYNTAX ANALYSIS/PARSER

Page 4: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 4

CONTEXT FREE GRAMMARS

expression -> expression + termexpression -> expression – termexpression -> termterm -> term * factorterm -> term / factorterm -> factorfactor -> (expression)factor -> id

S - IS A FINITE SET OF TERMINALS N - IS A FINITE SET OF NON-TERMINALS P - IS A FINITE SUBSET OF PRODUCTION RULES S - IS THE START SYMBOL

G=(S ,N,P,S)

• A GRAMMAR DERIVES STRINGS BY BEGINNING WITH START SYMBOL AND REPEATEDLY REPLACING A NON TERMINAL BY THE RIGHT HAND SIDE OF A PRODUCTION FOR THAT NON TERMINAL.

• FROM THE START SYMBOL OF A GRAMMAR G FORM THE LANGUAGE L(G) DEFINED BY THE GRAMMAR THE STRINGS THAT CAN BE DERIVED .

Page 5: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 5

• PROGRAMMING LANGUAGES HAVE RECURSIVE STRUCTURE

• CONTEXT-FREE GRAMMARS ARE A NATURAL NOTATION FOR THIS RECURSIVE STRUCTURE .NOT ALL STRINGS OF TOKENS ARE PROGRAMS . . . . . . PARSER MUST DISTINGUISH BETWEEN VALID AND INVALID STRINGS OF TOKENS

WE NEED :– A LANGUAGE :FOR DESCRIBING VALID STRINGS OF TOKENS – A METHOD: FOR DISTINGUISHING VALID FROM INVALID STRINGS OF TOKENS

CONTEXT FREE GRAMMARS

Page 6: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 6

E ::= T | E + T | E - TT ::= F | T * F |T / FF ::= id | (E)

• ARITHMETIC EXPRESSIONS

• STATEMENTS

If Statement ::= if E then Statement else Statement

CONTEXT FREE GRAMMAR EXAMPLES

Steps:1. Begin with a string with only the start symbol S 2. Replace any non-terminal X in the string by the right-hand side of some production X -> Y1…Yn 3. Repeat (2) until there are no non-terminals

Page 7: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 7

DERIVATIONS• DERIVATION IS A SEQUENCE OF PRODUCTIONS SO BEGINNING WITH THE START SYMBOL. • WE CAN APPLY PRODUCTIONS ONE AT A TIME IN SEQUENCE & THAT WILL PRODUCES A

DERIVATION.• A DERIVATION IS A SEQUENCE OF PRODUCTIONS

A -> … -> … ->… -> … -> …

• A DERIVATION CAN BE DRAWN AS A TREE – START SYMBOL IS THE TREE’S ROOT – FOR A PRODUCTION X -> Y1…Yn ADD CHILDREN Y1…Yn TO NODE X • GRAMMAR

E -> E + E | E * E | (E) | ID

• STRING ID *ID + ID

Page 8: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 8

DERIVATIONSDERIVATIONS ARE OF TWO TYPES:• RIGHTMOST AND LEFTMOST DERIVATIONS• LETS DISCUSS WITH EXAMPLE

GRAMMAR: E -> E + E | E * E | -E | (E) | IDSTRING :(ID+ID)LEFT MOST DERIVATION RIGHT MOST DERIVATION

E E = (E) = (E) = (E+E) = (E+E)= (ID+E) = (E+ID)=(ID+ID) =(ID+ID)

Page 9: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 9

DERIVATIONS• NOW WE'RE GOING TO PARSE THIS STRING AND WE'RE GOING TO SHOW HOW

TO PRODUCE A DERIVATION FOR THE STRING AND ALSO AT THE SAME TIME BUILD THE TREE.

• PARSE TREES HAVE TERMINALS AT THE LEAVES AND NONTERMINALS AT THE INTERIOR NODES AND FURTHERMORE, IN-ORDER TRAVERSAL OF THE LEAVES IS THE ORIGINAL INPUT.

• GRAMMAR E -> E + E | E * E | (E) | ID

• STRING ID * ID + ID

Page 10: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 10

LEFT MOST DERIVATION AND PARSE TREEE

E

Page 11: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 11

LEFT MOST DERIVATION AND PARSE TREEEE+E E E + E

Page 12: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 12

LEFT MOST DERIVATION AND PARSE TREEEE+E EE*E+E E + EE * E

Page 13: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 13

LEFT MOST DERIVATION AND PARSE TREEEE+E EE*E+E E + Eid*E+E E * E id

Page 14: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 14

LEFT MOST DERIVATION AND PARSE TREEEE+E EE*E+E E + Eid*E+E E * E id*id+E id id

Page 15: Compiler design syntax analysis

15

LEFT MOST DERIVATION AND PARSE TREEEE+E EE*E+E E + Eid*E+E E * E id id*id+id id id

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY)

Page 16: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 16

DERIVATIONS• A PARSE TREE HAS – TERMINALS AT THE LEAVES – NON-TERMINALS AT THE INTERIOR NODES • AN IN-ORDER TRAVERSAL OF THE LEAVES IS THE ORIGINAL INPUT • THE PARSE TREE SHOWS THE ASSOCIATION OF OPERATIONS, THE INPUT

STRING DOES NOT .NOTE: THAT RIGHT-MOST AND LEFT-MOST DERIVATIONS HAVE THE SAME PARSE TREE IF NOT THEN THE GRAMMAR IS AMBIGUOUS GRAMMAR.

Page 17: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 17

AMBIGUITY• IF STRING HAS TWO OR MORE RIGHT MOST DERIVATIONS OR TWO OR MORE

LEFT DERIVATIONS THEN THAT STRING WILL HAVE TWO DISTINCT PARSE TREES AND HENCE GRAMMAR WILL BE AMBIGUOUS.

• AMBIGUITY IS BAD: LEAVES MEANING OF SOME PROGRAMS ILL-DEFINED • MULTIPLE PARSE TREES FOR SOME PROGRAM THEN THAT ESSENTIALLY

MEANS THAT YOU'RE LEAVING IT UP TO THE COMPILER TO PICK WHICH OF THOSE TWO POSSIBLE INTERPRETATIONS OF THE PROGRAM YOU WANT IT TO GENERATE CODE FOR AND THAT'S NOT A GOOD IDEA.

• TO REMOVE AMBIGUITY WE NEED TO REWRITE THE RULES CHECKING OVER PRECEDENCE AND ASSOCIATIVITY .

Page 18: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 18

AMBIGUITYEg: The string id +id* id produces two parse tree hence the grammar is ambiguous.

One can remove the ambiguity by rewriting the grammar as introducing new non-terminal instead of repeated non-terminal , but it can result in left or right recursion .Hence we have to remove left recursion.

Page 19: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 19

AMBIGUITY• IF WE HAVE AN AMBIGUOUS GRAMMAR:

E →E * E E →NUM • AS THIS DEPENDS ON THE ASSOCIATIVITY OF *,WE USE DIFFERENT REWRITE

RULES FOR DIFFERENT ASSOCIATIVITY .• IF * IS LEFT-ASSOCIATIVE, WE MAKE THE GRAMMAR LEFT-RECURSIVE BY HAVING A

RECURSIVE REFERENCE TO THE LEFT ONLY OF THE OPERATOR SYMBOL. UNAMBIGUOUS GRAMMAR: E →E * E’

E →E’ E’→NUM

Page 20: Compiler design syntax analysis

20

LEFT RECURSION• UNAMBIGUOUS GRAMMAR : E →E * E’

E →E’ E’→NUM

• THIS GRAMMAR IS NOW LEFT RECURSIVE. LEFT RECURSIVE GRAMMAR IS ANY GRAMMAR THAT HAS A NON-TERMINAL WHERE IF YOU START WITH THAT NON-TERMINAL AND YOU DO SOME NON-EMPTY SEQUENCE OF RE-WRITES.

• CONSIDER THE LEFT-RECURSIVE GRAMMAR S -> S a | b• S GENERATES ALL STRINGS STARTING WITH “a” AND FOLLOWED BY ANY NUMBER OF

“b’S” • CAN REWRITE USING RIGHT-RECURSION • S ->bS’

S’ ->aS’ |€

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY)

Page 21: Compiler design syntax analysis

21

EXAMPLES OF LEFT RECURSION

1. E -> E + T | T T -> ID | (E)

2. S ->(L)|X L ->L,S|S

3. S ->S0S1S|01

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY)

Page 22: Compiler design syntax analysis

22

LEFT FACTORING

• LEFT FACTORING IS A GRAMMAR TRANSFORMATION THAT IS USEFUL FOR PRODUCING A DETERMINISTIC GRAMMAR FROM NON-DETERMINISTIC GRAMMAR SUITABLE FOR PREDICTIVE OR TOP-DOWN PARSING.

• CONSIDER FOLLOWING GRAMMAR:• STMT -> IF EXPR THEN STMT ELSE STMT• | IF EXPR THEN STMT

• ON SEEING INPUT IF IT IS NOT CLEAR FOR THE PARSER WHICH PRODUCTION TO USE

• WE CAN EASILY PERFORM LEFT FACTORING:• IF WE HAVE A->ΑΒ1 | ΑΒ2 THEN WE REPLACE IT WITH

• A -> ΑA’• A’ -> Β1 | Β2

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY)

Page 23: Compiler design syntax analysis

23

EXAMPLES OF LEFT FACTORING

1. S -> iEtS|iEtSES|a E ->b

2. S-> aSSbS|aSaSb|abb|b

3. S-> bSSaaS|bSSaSb|bSb|a

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY)

Page 24: Compiler design syntax analysis

RICHA SHARMA (LOVELY PROFESSIONAL UNIVERSITY) 24

Refer next presentation for

Top Down Parsing