CS 3360 1
Describing Syntax
CS 3360Spring 2012
Sec 3.1-3.4Adapted from Addison Wesley’s lecture notes (Copyright © 2004 Pearson Addison Wesley)
2CS 3360
Outline
Introduction Formal description of syntax Backus-Naur Form (BNF) Attribute grammars (probably next time
)
3CS 3360
Introduction
Who must use language definitions? Implementers Programmers (the users of the language)
Syntax - the form or structure of the expressions, statements, and program units
Semantics - the meaning of the expressions, statements, and program units
4CS 3360
Introduction (cont.)
ExampleSyntax of Java while statement
while (<boolean-expr>) <statement>
Semantics?
5CS 3360
Describing Syntax – Vocabulary
A sentence is a string of characters over some alphabet
A language is a set of sentences A lexeme is the lowest level syntactic unit
of a language (e.g., *, sum, while) A token is a category of lexemes (e.g.,
identifier)
6CS 3360
Example
index = 2 * count + 17;
Lexemes Tokensindex identifier= equal_sign2 int_literal* mult_opcount identifier+ plus_op17 int_literal; semicolon
7CS 3360
Describing Syntax
Formal approaches to describing syntax:Recognizers (once you have code)
Can tell whether a given string is in a language or not
Used in compilers, and called a parserGenerators (in order to build code)
Generate the sentences of a language Used to describe the syntax of a language
8CS 3360
Formal Methods of Describing Syntax Context-Free Grammars (CFG – see automata
course)
Developed by Noam Chomsky in the mid-1950’s
Language generators, meant to describe the syntax of natural languages
Define a class of languages called context-free languages
9CS 3360
Formal Methods of Describing Syntax Backus-Naur Form
Invented by John Backus to describe Algol 58 Extended by Peter Naur to describe Algol 60 BNF is equivalent to context-free grammars A metalanguage is a language used to describe
another language. In BNF, abstractions are used to represent classes of
syntactic structures--they act like syntactic variables (also called nonterminal symbols)
10CS 3360
Backus-Naur Form
<while_stmt> while ( <logic_expr> ) <stmt>
This is a rule (also called a production rule); it describes the structure of a while statement
11CS 3360
Backus-Naur Form
A rule has a left-hand side (LHS) and a right-hand side (RHS), and consists of terminal and non-terminal symbols
A grammar is a finite non-empty set of rules An abstraction (or non-terminal symbol) can
have more than one RHS
<stmt> <single_stmt>
| { <stmt_list> }
12CS 3360
Backus-Naur Form
Syntactic lists are described using recursion
<ident_list> ident
| ident , <ident_list>
Example sentences:
ident
ident , ident
ident , ident, ident
13CS 3360
Example
A grammar for small language: <program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5 Sample programa = b + 5
14CS 3360
Exercise
Define a grammar to generate all sentences of the form:
subject verb object .where subject is “i” or “we”, and verb is “love” or “like”, and object is “exercises” or “programming”.
15CS 3360
Exercise
Define the syntax of Java Boolean expressions consisting of: Constants: false and true Operators: !, &&, and ||
16CS 3360
Derivation
A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)
Example: <ident_list> ident | ident , <ident_list>
<ident_list> => ident , <ident_list> => ident , ident , <ident_list> => ident, ident , ident
17CS 3360
More Example
a = b + 5 <program> => <stmts>
=> <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term>=> a = <var> + <term>=> a = b + <term>=> a = b + 5
<program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5
18CS 3360
Derivation
Every string of symbols in the derivation is a sentential form
A sentence is a sentential form that has only terminal symbols
A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded
A derivation may be neither leftmost nor rightmost
19CS 3360
Exercise
Derivea = b + 5by using a rightmost derivation.
<program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term> | <term> - <term> <term> <var> | 5
20CS 3360
Parse Tree A hierarchical representation of a
derivation
<program>
<stmts>
<stmt>
5
a
<var> = <expr>
<var>
b
<term> + <term>
21CS 3360
Ambiguity of Grammars
A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees.
22CS 3360
An Ambiguous Expression Grammar
<expr> <expr> <op> <expr> | 5 <op> / | -
<expr>
<expr> <expr>
<expr> <expr><op>
5 5 5- /
<op>
<expr>
<expr> <expr>
<expr> <expr>
<op><op>
<op>
5 5 5- /
23CS 3360
An Unambiguous Expression Grammar If we use the parse tree to indicate precedence levels of
the operators, we cannot have ambiguity
<expr> <expr> - <term> | <term><term> <term> / 5 | 5
<expr>
<expr> <term>
<term> <term>
5 5
5/
-
24CS 3360
Exercise
Prove or disprove the ambiguity of the following grammar<stmt> -> <if-stmt><if-stmt> -> if <expr> then <stmt> | if <expr> then <stmt> else <stmt>
25CS 3360
Operator Precedence
Derivation:<expr> => <expr> - <term>
=> <term> - <term>
=> 5 - <term>
=> 5 - <term> / 5
=> 5 - 5 / 5
<expr> <expr> - <term> | <term><term> <term> / 5 | 5
26CS 3360
Operator Associativity Can we describe operator associativity
correctly?
A = A + B + C
(A + B) + C or A + (B + C)?
Does it matter?
27CS 3360
Operator Associativity Operator associativity can also be indicated by
a grammar <expr> -> <expr> + <expr> | 5 (ambiguous) <expr> -> <expr> + 5 | 5 (unambiguous)
<expr><expr>
<expr>
<expr> 5
5
5
+
+
28CS 3360
Left vs. Right Recursion A rule is left recursive if its LHS also appears at
the beginning (left end) of its RHS. A rule is right recursive if its LHS also appears
at the right end of its RHS.
<factor> -> <expr> ** <factor> | <expr> <expr> -> c
Example: c ** c ** c interpreted as c ** (c ** c)
29CS 3360
Exercise Define a BNF grammar for expressions consisting of +, *, and **
(exponential). The operator ** has precedence over *, and * has precedence over +. Both + and * are left associative while ** is right associative.
Using the above grammar, draw a parse tree for the sentence:
7 + 6 + 5 * 4 * 3 ** 2 ** 1
Exercise to do in groups at the end of lecture
30CS 3360
Extended BNF (EBNF)
Extended BNF (just abbreviations): Optional parts are placed in brackets ([ ])
<meth_call> -> ident ( [<expr_list>] ) Put alternative parts of RHSs in parentheses and
separate them with vertical bars
<term> -> <term> (+ | -) const Put repetitions (0 or more) in braces ({ })
<ident> -> letter {letter | digit}
31CS 3360
Example
BNF: <expr> <expr> + <term> | <expr> - <term> | <term> <term> <term> * <factor> | <term> / <factor> | <factor>
EBNF: <expr> <term> {(+ | -) <term>} <term> <factor> {(* | /) <factor>}
32CS 3360
Exercise / Homework
Write BNF rules for the following EBNF rules:1. <meth_call> -> <ident> “(” [<expr_list>] “)”2. <term> -> <term> (+ | -) const
3. <ident> -> letter {letter | digit}
Due on Tuesday at the start of the session!
33CS 3360
Outline
Introduction Describing syntax formally Backus-Naur Form (BNF) Attribute grammars
34CS 3360
Attribute Grammars
CFGs cannot describe all of the syntax of programming languages
Additions to CFGs to carry some semantic info along through parse trees
Primary value of attribute grammars:Static semantics specificationCompiler design (static semantics checking)
35CS 3360
Basic Idea
Add attributes, attribute computation functions, and predicates to CFGs
Attributes Associated with grammar symbols Can have values assigned to them
Attribute computation functions Associated with grammar rules Specify how to compute attribute values Are often called semantic functions
Predicate functions Associated with grammar rules State some of the syntax and static semantic rules of the
language
36CS 3360
Example
BNF<meth_def> -> meth <meth_name> <meth_body> end <meth_name>
<meth_name> -> <identifier>
<meth_body> -> …
AG1. Syntax rule: <meth_def> -> meth <meth_name>[1]
<meth_body>
end <meth_name>[2]
Predicate: <meth_name>[1].string == <meth_name>[2].string
2. Syntax rule: <meth_name> -> <identifier>
Semantic rule: <meth_name>.string <- <identifier>.string
37CS 3360
Attribute Grammars Defined
An attribute grammar is a CFG with the following additions: A set of attributes A(X) for each grammar symbol X
A(X) consists of two disjoint sets S(X) and I(X) S(X): synthesized attributes I(X): inherited attributes
Each rule has a set of functions that define certain attributes of the non-terminals in the rule
Each rule has a (possibly empty) set of predicates to check for attribute consistency
38CS 3360
Attribute Functions
Let X0 X1 ... Xn be a rule Functions of the form S(X0) = f(A(X1), ... , A(Xn))
define synthesized attributes Functions of the form I(Xj) = f(A(X0), ... , A(Xn)), for
1 <= j <= n, define inherited attributes. Often of the form: I(Xj) = f(A(X0), ... , A(Xj-1))
Initially, there are intrinsic attributes on the leaves. Intrinsic attributes are synthesized attributes whose
value are determined outside the parse tree.
39CS 3360
Example - Type Checking Rules
BNF <assign> -> <var> = <expr> <expr> -> <var> | <var> + <var>
<var> -> A | B | C Rule
A variable is either int or float. If the two operands of + has the same type, the type of expression is that of the
operands; otherwise, it is float. The type of the left side of assignment must match the type of the right side.
Attributes actual_type: synthesized for <var> and <expr> expected_type: inherited for <expr> string: intrinsic for <var>
40CS 3360
Example – Attribute Grammar
1. Syntax rule: <assign> -> <var> = <expr>Semantic rule: <expr>.expected_type <- <var>.actual_type
2. Syntax rule: <expr> -> <var>[1] + <var>[2] Semantic rule: <expr>.actual_type <- (<var>[1].actual_type == int
&& <var>[2].actual_type == int) ? int : floatPredicate: <expr>.actual_type == <expr>.expected_type
3. Syntax rule: <expr> -> <var>Semantic rule: <expr>.actual_type <- <var>.actual_typePredicate: <expr>.actual_type == <expr>.expected_type
4. Syntax rule: <var> -> A | B | CSemantic rule: <var>.actual_type <- lookup(<var>.string)
41CS 3360
<expr>
<var>[2]
A = A + B
Example – Parse Tree A = A + B
<assign>
<var> <var>[1]
42CS 3360
<expr>
<var>[2]
A = A + B
Example – Flow of Attributes A = A + B
<assign>
<var> <var>[1]
expected_type actual_type
actual_type actual_type actual_type
<expr>.expected_type <- <var>.actual_type<expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float
43CS 3360
<expr>
<var>[2]
A = A + B
Example – Calculating Attributes
A = A + B<assign>
<var> <var>[1]
expected_type actual_type
actual_type actual_type actual_type
float float int
float
floatfloat
float int
<expr>.expected_type <- <var>.actual_type<expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float
44CS 3360
<expr>
<var>[2]
A = A + B
Example – Calculating Attributes
A = A + B<assign>
<var> <var>[1]
expected_type actual_type
actual_type actual_type actual_type
int int float
int
floatint
int float
<expr>.expected_type <- <var>.actual_type<expr>.actual_type <- (<var>[1].actual_type == int && <var>[2].actual_type == int) ? int : float
45CS 3360
Attribute Grammars
How are attribute values computed? If all attributes were inherited, the tree could
be decorated in top-down order. If all attributes were synthesized, the tree
could be decorated in bottom-up order. In many cases, both kinds of attributes are
used, and it is some combination of top-down and bottom-up that must be used.
46CS 3360
Group Exercise: homework due Tuesday February, 7 at the start of class BNF <cond_expr> -> <expr> ? <expr> : <expr>
<expr> -> <var> | <expr> + <expr> <var> -> id Rule
id's type can be bool, int, or float. Operands of + must be numeric and of the same type. The type of + is the type of its operands. The first operand of ?: must be of bool and the second and third must
be of the same type. The type of ?: is the type of its second and third operands.
Given the above BNF and rule:1. Define an attribute grammar2. Draw a decorated parse tree for “id ? id : id + id” assuming that the
first id is of type bool and the rest are of type int.
Top Related