Syntax analysis

37
Structure of Programming Languages SYNTAX ANALYSIS VSRivera

description

Syntax Analysis

Transcript of Syntax analysis

Page 1: Syntax analysis

Structure of Programming Languages

SYNTAX ANALYSISVSRivera

Page 2: Syntax analysis

Syntax

“The arrangement of words as elements in a sentence to show their relationship” describes the sequence of symbols that make up valid programs.

Form of expressions, statements and program units.

Page 3: Syntax analysis

The General Problem of Describing Syntax: Terminology

A sentence is a string of characters over some alphabet

A language is a set of sentences A lexeme is the lowest level syntactic

unit of a language (e.g., *, sum, begin)

A token is a category of lexemes (e.g., identifier)

Page 4: Syntax analysis

Syntactic elements of the Language

Character set – ASCII, Unicode Identifiers –restrictions on length

reduces readability Operator symbols - + and –

represents two basic arithmetic operations.

Page 5: Syntax analysis

Syntactic elements of the Language

Keywords and reserved words – is an identifier used as a fixed part of the syntax of a statement. It is a reserved word if it may not be used as a programmer-chosen identifier.

Noise words – optional words that are inserted in a statements to improved readability.

Page 6: Syntax analysis

Syntactic elements of the Language

Comments – important part of the documentation. REM, /* */, or //

Blank (spaces) Delimiters – a syntactic element used

simply to mark the beginning or end of some syntactic unit such as a statement or expression. “begin”…”end”, or { }.

Page 7: Syntax analysis

Syntactic elements of the Language

Expressions – functions that access data objects in a program and return some value.

Statements

Page 8: Syntax analysis

Syntactic Analysis (parsing)

2nd stage in translation Determines if the program being

compiled is a valid sentence in the syntactic model programming language.

Page 9: Syntax analysis

Role of the Parser Where lexical analysis splits the input

into tokens, the purpose of syntax analysis (also known as parsing) is to recombine these tokens to reflect the data structure of the text.

The parse must also reject invalid texts by reporting syntax errors, and recover from commonly occurring errors so that it can continue processing the remainder of its input.

Page 10: Syntax analysis

Role of the Parser

Lexical Analyzer

Sourceprogram

Get next token

token

ParserRest of

front end

Parser

Parsetree

Intermediaterepresentation

Page 11: Syntax analysis

Formal Methods of Describing Syntax

Grammars Parse Trees Syntax Diagrams

Page 12: Syntax analysis

Grammars

Formal definition of the syntax of a programming language.

Collection of rules that define, mathematically, which strings of symbols are valid sentences.

Page 13: Syntax analysis

Parts of Grammar

Set of tokens/terminal symbols symbols that are atomic / non-divisible can be combined to form valid constructs

in the language

Set of non-terminal symbols symbols used to represent intermediate

definitions within the language defined by productions syntactic classes or categories

Page 14: Syntax analysis

Parts of Grammar

Set of rules called productions a definition of a non-terminal symbol has the form

x ::= ywhere x is a non-terminal symbol and y is a sequence of symbols (non-terminal or terminal)

Page 15: Syntax analysis

Parts of Grammar LHS: abstraction being defined RHS: tokens, lexemes, references to

other abstractions

Goal symbol one of the set of non-terminal symbols also referred to as the start symbol

Page 16: Syntax analysis

Rules to form Grammar

Every non-terminal symbol must appear to the left of the ::= at least one production

The goal symbol must not appear to the right of the ::= of any production

A rule is recursive if its LHS appears in its RHS

Page 17: Syntax analysis

Context Free Grammar (CFG)

Backus-Naur Form (BNF) Grammar originally presented by John Backus (to

describe ALGOL 58)and later modified by Peter Naur

Composed of finite set of grammar rules which define a programming language.

Page 18: Syntax analysis

Examples

<conditional stmt> ::= if <boolean expr> then

<stmt> else

<stmt>| if <boolean expr> then

<stmt>

Page 19: Syntax analysis

Examples

<unsigned int> ::=<digit> | <unsigned int>

<digit> A rule is recursive if its LHS appears in

its RHS

Page 20: Syntax analysis

Examples

<assign> ::= <id> := <expr><id> ::= A | B | C<expr> ::= <id> + <expr>

| <id> * <expr>| ( <expr> )| <id>

Page 21: Syntax analysis

Examples <program> ::= begin

<stmt_list>end

<stmt_list> ::=<stmt> | <stmt> <stmt_list>

<stmt> ::= <var> := <expression><var> ::= A | B | C<expression> ::= <var> + <var>

Page 22: Syntax analysis

Grammar Derivation

BNF is a generative device for defining language.

The sentences of the language are generated through a sequence of applications of the rules, beginning with a special non-terminal (start symbol) of the grammar.

Page 23: Syntax analysis

Example

<program> ::= begin <stmt_list> end

begin <stmt> endbegin <var> := <expression> endbegin <var> := <var> + <var> endbegin A := B + C end

Page 24: Syntax analysis

Example

A := B * ( A + C) <assign> ::= <id> := <expr>

:= A := <expr>:= A := <id> * <expr>:= A := B * <expr>:= A := B * (<expr>):= A := B * ( A + <expr>):= A := B * ( A + <id>):= A := B * ( A + C)

Page 25: Syntax analysis

When does derivation stop?

By exhaustingly choosing all combinations of choices, the entire language can generate.

Page 26: Syntax analysis

Exercise

BNF of signed integer? begin A := B + C; B := C; end

Page 27: Syntax analysis

Extended BNF (EBNF)

Enhance the descriptive power of BNF Increases the readability and

writability of BNF

Page 28: Syntax analysis

Extended BNF (EBNF)

Notational Extensions An optional element may be indicated by

enclosing the element in square brackets,[ … ].

A choice of alternative may use the symbol | within the single rule, optionally enclosed by parenthesis ( [ , ] ) if needed.

An arbitrary sequence of instances of element may be indicated by enclosing the element in braces followed by an asterisk, { … }+.

Page 29: Syntax analysis

Example

BNF <expr> ::= <expr> + <term>

| <expr> - <term>| <term>

<term> ::= <term> * <factor>| <term> / <factor>| <factor>

Page 30: Syntax analysis

Example

EBNF <expr> ::= <term> { (+|-) <term> }

<term> ::= <factor> { (*|/) <factor>}

Page 31: Syntax analysis

Example

BNF <program> ::= begin

<stmt_list>

end

Page 32: Syntax analysis

Example

EBNF <program> ::= begin

<stmt> {<stmt>}

end <program> ::= begin

{<stmt>}+

end

Page 33: Syntax analysis

Example

BNF <signed int> ::= + <int> | - <int> <int> ::= <digit> | <int> <digit>

EBNF <signed int> ::= [+|-] <digit>

{<digit>}+

Page 34: Syntax analysis

Exercise

EBNF of identifier?

Page 35: Syntax analysis

Solution

EBNF of identifier <identifier> ::= <letter> {<letter> |

<digit> }+

Page 36: Syntax analysis

Get ½ sheet of yellow pad. Prepare for a quiz. Open Notes.

Page 37: Syntax analysis

Midterm Quiz #1 Using the following English Grammar:<sentence> ::= <noun phrase> <verb phrase> .<noun phrase> ::= <determiner> <noun>| <determiner>

<noun> <prepositional phrase><verb phrase> ::= <verb> | <verb> <noun phrase> |

<verb> <noun phrase> <prepositional phrase><prepositional phrase> ::= <preposition> <noun phrase><noun> ::= boy | girl | cat | telescope | song | feather<determiner> ::= a | the<verb> ::= saw | touched | surprised | sang<preposition> ::= by | with

Write the Left Side Derivation of the sentence “the girl touched the cat with a feather”