Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language...

23
1 Syntax Analysis (Chapter 4) Course Overview PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inside a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion 8 Interpretation 9 Review
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    1

Transcript of Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language...

Page 1: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

1Syntax Analysis (Chapter 4)

Course Overview

PART I: overview material1 Introduction

2 Language processors (tombstone diagrams, bootstrapping)

3 Architecture of a compiler

PART II: inside a compiler4 Syntax analysis

5 Contextual analysis

6 Runtime organization

7 Code generation

PART III: conclusion8 Interpretation

9 Review

Page 2: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

2Syntax Analysis (Chapter 4)

Systematic Development of Rec. Descent Parser

(1) Express grammar in EBNF(2) Grammar Transformations:

Left factorization and Left recursion elimination

(3) Create a parser class with– private variable currentToken– methods to call the scanner: accept and acceptIt

(4) Implement a public method for main function to call:– public parse method that

• fetches the first token from the scanner• calls parseS (where S is start symbol of the grammar)• verifies that scanner next produces the end–of–file token

(5) Implement private parsing methods:– add private parseN method for each non terminal N

Page 3: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

3Syntax Analysis (Chapter 4)

Developing RD Parser for Mini Triangle

Identifier := Letter (Letter|Digit)*Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | < | > | =Comment ::= ! Graphic* eol

Identifier := Letter (Letter|Digit)*Integer-Literal ::= Digit Digit* Operator ::= + | - | * | / | < | > | =Comment ::= ! Graphic* eol

Before we begin:• The following non-terminals are recognized by the scanner• They will be returned as tokens by the scanner

Assume scanner returns instances of this class:

public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0, INTLITERAL = 1; ...

public class Token { byte kind; String spelling; final static byte IDENTIFIER = 0, INTLITERAL = 1; ...

Page 4: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

4Syntax Analysis (Chapter 4)

(1)&(2) Developing RD Parser for Mini Triangle

Program ::= single-CommandCommand ::= single-Command | Command ; single-Commandsingle-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command endV-name ::= Identifier...

Program ::= single-CommandCommand ::= single-Command | Command ; single-Commandsingle-Command ::= V-name := Expression | Identifier ( Expression ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command endV-name ::= Identifier...

Left factorization needed

Left recursion elimination needed

Page 5: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

5Syntax Analysis (Chapter 4)

(1)&(2) Express grammar in EBNF and transform

Program ::= single-CommandCommand ::= single-Command (; single-Command)*single-Command ::= Identifier

( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command endV-name ::= Identifier...

Program ::= single-CommandCommand ::= single-Command (; single-Command)*single-Command ::= Identifier

( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | while Expression do single-Command | let Declaration in single-Command | begin Command endV-name ::= Identifier...

After factorization etc. we get:

Page 6: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

6Syntax Analysis (Chapter 4)

(1)&(2) Developing RD Parser for Mini Triangle

Expression ::= primary-Expression | Expression Operator primary-Expressionprimary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration | Declaration ; single-Declarationsingle-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoterType-denoter ::= Identifier

Expression ::= primary-Expression | Expression Operator primary-Expressionprimary-Expression ::= Integer-Literal | V-name | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration | Declaration ; single-Declarationsingle-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoterType-denoter ::= Identifier

Left recursion elimination needed

Left recursion elimination needed

Page 7: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

7Syntax Analysis (Chapter 4)

(1)&(2) Express grammar in EBNF and transform

Expression ::= primary-Expression ( Operator primary-Expression )*primary-Expression ::= Integer-Literal | Identifier | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration (; single-Declaration)*single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoterType-denoter ::= Identifier

Expression ::= primary-Expression ( Operator primary-Expression )*primary-Expression ::= Integer-Literal | Identifier | Operator primary-Expression | ( Expression ) Declaration ::= single-Declaration (; single-Declaration)*single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoterType-denoter ::= Identifier

After factorization and recursion elimination :

Page 8: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

8Syntax Analysis (Chapter 4)

(3)&(4) Create a parser class and public parse method

public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) { acceptIt( ); // get the first token parseProgram( ); // Program is the start symbol if (currentToken.kind != Token.EOT) report syntax error } ...

public class Parser { private Token currentToken; private void accept (byte expectedKind) { if (currentToken.kind == expectedKind) currentToken = scanner.scan( ); else report syntax error } private void acceptIt( ) { currentToken = scanner.scan( ); } public void parse( ) { acceptIt( ); // get the first token parseProgram( ); // Program is the start symbol if (currentToken.kind != Token.EOT) report syntax error } ...

Page 9: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

9Syntax Analysis (Chapter 4)

(5) Implement private parsing methods

private void parseProgram( ) { parseSingleCommand( );}

private void parseProgram( ) { parseSingleCommand( );}

Program ::= single-CommandProgram ::= single-Command

Page 10: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

10Syntax Analysis (Chapter 4)

(5) Implement private parsing methods

single-Command ::= Identifier

( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | ... other alternatives ...

single-Command ::= Identifier

( := Expression | ( Expression ) ) | if Expression then single-Command else single-Command | ... other alternatives ...

private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER : ... case Token.IF : ... ... other cases ... default: report a syntax error }}

private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER : ... case Token.IF : ... ... other cases ... default: report a syntax error }}

Page 11: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

12Syntax Analysis (Chapter 4)

Algorithm to convert EBNF into a RD parser

private void parseN( ) { parse // as explained on next two slides}

private void parseN( ) { parse // as explained on next two slides}

N ::= N ::=

• The conversion of an EBNF specification into a Java or C++ implementation for a recursive descent parser is so “mechanical” that it could easily be automated (such tools exist, but we won’t use them in this course)

• We can describe the algorithm by a set of mechanical rewrite rules

Page 12: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

13Syntax Analysis (Chapter 4)

Algorithm to convert EBNF into a RD parser

// a dummy statement// a dummy statement

parse parse

parse N where N is a non-terminalparse N where N is a non-terminal

parseN( );parseN( );

parse t where t is a terminalparse t where t is a terminal

accept(t);accept(t);

parse X Yparse X Y

parse Xparse Y

parse Xparse Y

Page 13: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

14Syntax Analysis (Chapter 4)

Algorithm to convert EBNF into a RD parser

parse X* parse X*

while (currentToken.kind is in starters[X]) { parse X}

while (currentToken.kind is in starters[X]) { parse X}

parse X | Yparse X | Y

switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break;default: if neither X nor Y generates then report syntax error}

switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break;default: if neither X nor Y generates then report syntax error}

Page 14: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

15Syntax Analysis (Chapter 4)

private void parseCommand( ) { parse single-Command ( ; single-Command )*}

private void parseCommand( ) { parse single-Command ( ; single-Command )*}

Example: “Generation” of parseCommand

Command ::= single-Command ( ; single-Command )*Command ::= single-Command ( ; single-Command )*

private void parseCommand( ) { parse single-Command parse ( ; single-Command )*}

private void parseCommand( ) { parse single-Command parse ( ; single-Command )*}

private void parseCommand( ) { parseSingleCommand( ); parse ( ; single-Command )*}

private void parseCommand( ) { parseSingleCommand( ); parse ( ; single-Command )*}

private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; single-Command }}

private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; single-Command }}

private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; parse single-Command }}

private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { parse ; parse single-Command }}

private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); // because SEMICOLON has just been checked parseSingleCommand( ); }}

private void parseCommand( ) { parseSingleCommand( ); while (currentToken.kind==Token.SEMICOLON) { acceptIt( ); // because SEMICOLON has just been checked parseSingleCommand( ); }}

Page 15: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

16Syntax Analysis (Chapter 4)

Example: Generation of parseSingleDeclaration

single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter

single-Declaration ::= const Identifier ~ Expression | var Identifier : Type-denoter

private void parseSingleDeclaration( ) { parse const Identifier ~ Expression | var Identifier : Type-denoter}

private void parseSingleDeclaration( ) { parse const Identifier ~ Expression | var Identifier : Type-denoter}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const Identifier ~ Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const Identifier ~ Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const parse Identifier parse ~ parse Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: parse const parse Identifier parse ~ parse Expression case Token.VAR: parse var Identifier : Type-denoter default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: parse var Identifier : Type-denoter default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: parse var Identifier : Type-denoter default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: acceptIt( ); parseIdentifier( ); accept(Token.COLON); parseTypeDenoter( ); default: report syntax error }}

private void parseSingleDeclaration( ) { switch (currentToken.kind) { case Token.CONST: acceptIt( ); parseIdentifier( ); accept(Token.IS); parseExpression( ); case Token.VAR: acceptIt( ); parseIdentifier( ); accept(Token.COLON); parseTypeDenoter( ); default: report syntax error }}

Page 16: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

17Syntax Analysis (Chapter 4)

LL 1 Grammars

• The presented algorithm to convert EBNF into a parser does not work for all possible grammars.

• It only works for so called “LL 1” grammars.• Basically, an LL 1 grammar is a grammar which can

be parsed with a top-down parser with a lookahead (in the input stream of tokens) of one token.

• What grammars are LL 1?

How can we recognize that a grammar is (or is not) LL 1?

=> We can deduce the necessary conditions from the parser generation algorithm.

Page 17: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

18Syntax Analysis (Chapter 4)

LL 1 Grammars

parse X* parse X*

while (currentToken.kind is in starters[X]) { parse X}

while (currentToken.kind is in starters[X]) { parse X}

parse X |Y parse X |Y

switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: if neither X nor Y generates then report syntax error}

switch (currentToken.kind) { cases in starters[X]: parse X break; cases in starters[Y]: parse Y break; default: if neither X nor Y generates then report syntax error}

Conditions: starters[X] and starters[Y] must be disjoint sets, and if either X or Y generates then must also be disjoint from the set of tokens that can immediately follow X | Y

Conditions: starters[X] and starters[Y] must be disjoint sets, and if either X or Y generates then must also be disjoint from the set of tokens that can immediately follow X | Y

Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

Condition: starters[X] must be disjoint from the set of tokens that can immediately follow X *

Page 18: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

19Syntax Analysis (Chapter 4)

LL 1 grammars and left factorization

single-Command ::= V-name := Expression | Identifier ( Expression ) | ...V-name ::= Identifier

single-Command ::= V-name := Expression | Identifier ( Expression ) | ...V-name ::= Identifier

The original Mini-Triangle grammar is not LL 1:

For example:

Starters[V-name := Expression] = Starters[V-name] = Starters[Identifier]

Starters[Identifier ( Expression )] = Starters[Identifier] NOT DISJOINT!

Page 19: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

20Syntax Analysis (Chapter 4)

LL 1 grammars: left factorization

private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER: parse V-name := Expression case Token.IDENTIFIER: parse Identifier ( Expression ) ...other cases... default: report syntax error }}

private void parseSingleCommand( ) { switch (currentToken.kind) { case Token.IDENTIFIER: parse V-name := Expression case Token.IDENTIFIER: parse Identifier ( Expression ) ...other cases... default: report syntax error }}

single-Command ::= V-name := Expression | Identifier ( Expression ) | ...

single-Command ::= V-name := Expression | Identifier ( Expression ) | ...

What happens when we generate a RD parser from a non LL 1 grammar?

wrong: overlappingcases

Page 20: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

21Syntax Analysis (Chapter 4)

LL 1 grammars: left factorization

single-Command ::= V-name := Expression | Identifier ( Expression ) | ...

single-Command ::= V-name := Expression | Identifier ( Expression ) | ...

Left factorization (and substitution of V-name)

single-Command ::= Identifier

( := Expression | ( Expression ) ) | ...

single-Command ::= Identifier

( := Expression | ( Expression ) ) | ...

Page 21: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

22Syntax Analysis (Chapter 4)

LL 1 Grammars: left recursion elimination

Command ::= single-Command | Command ; single-Command

Command ::= single-Command | Command ; single-Command

public void parseCommand( ) { switch (currentToken.kind) { case in starters[single-Command] parseSingleCommand( ); case in starters[Command] parseCommand( ); accept(Token.SEMICOLON); parseSingleCommand( ); default: report syntax error }}

public void parseCommand( ) { switch (currentToken.kind) { case in starters[single-Command] parseSingleCommand( ); case in starters[Command] parseCommand( ); accept(Token.SEMICOLON); parseSingleCommand( ); default: report syntax error }}

What happens if we don’t perform left-recursion elimination?

wrong: overlappingcases

Page 22: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

23Syntax Analysis (Chapter 4)

LL 1 Grammars: left recursion elimination

Command ::= single-Command | Command ; single-Command

Command ::= single-Command | Command ; single-Command

Left recursion elimination

Command ::= single-Command (; single-Command)*

Command ::= single-Command (; single-Command)*

Page 23: Syntax Analysis (Chapter 4) 1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture.

24Syntax Analysis (Chapter 4)

Abstract Syntax Trees

• So far we have talked about how to build a recursive descent parser which recognizes a given language described by an (LL 1) EBNF grammar.

• Next we will look at – how to represent AST as data structures.

– how to modify the parser to construct an AST data structure.

• We make heavy use of Object–Oriented Programming! (classes, inheritance, dynamic method binding)