CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the...

36
CSE443 Compilers Dr. Carl Alphonce [email protected] 343 Davis Hall http://www.cse.buffalo.edu/faculty/alphonce/SP17/CSE443/index.php https://piazza.com/class/iybn4ndqa1s3ei

Transcript of CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the...

Page 1: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

CSE443 Compilers

Dr. Carl Alphonce [email protected]

343 Davis Hall

http://www.cse.buffalo.edu/faculty/alphonce/SP17/CSE443/index.php https://piazza.com/class/iybn4ndqa1s3ei

Page 2: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

Phases of a

compiler

Figure 1.6, page 5 of text

Syntactic structure

Page 3: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

5

Language terminology(from Sebesta (10th ed), p. 115)

• A language is a set of strings of symbols, drawn from some finite set of symbols (called the alphabet of the language).

• “The strings of a language are called sentences”• “Formal descriptions of the syntax […] do not include

descriptions of the lowest-level syntactic units […] called lexemes.”

• “A token of a language is a category of its lexemes.”• Syntax of a programming language is often presented in

two parts:– regular grammar for token structure (e.g. structure of identifiers)– context-free grammar for sentence structure

Page 4: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

6

Examples of lexemes and tokensLexemes Tokens

foo identifieri identifiersum identifier-3 integer_literal10 integer_literal1 integer_literal; statement_separator= assignment_operator

Page 5: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

7

Backus-Naur Form (BNF)• Backus-Naur Form (1959)

– Invented by John Backus to describe ALGOL 58, modified by Peter Naur for ALGOL 60

– BNF is equivalent to context-free grammar – BNF is a metalanguage used to describe another language,

the object language– Extended BNF: adds syntactic sugar to produce more

readable descriptions

Page 6: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

8

BNF Fundamentals• Sample rules [p. 128]

<assign> → <var> = <expression><if_stmt> → if <logic_expr> then <stmt><if_stmt> → if <logic_expr> then <stmt> else <stmt>

• non-terminals/tokens surrounded by < and >• lexemes are not surrounded by < and >• keywords in language are in bold• → separates LHS from RHS• | expresses alternative expansions for LHS

<if_stmt> → if <logic_expr> then <stmt>| if <logic_expr> then <stmt> else <stmt>

• = is in this example a lexeme

Page 7: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

9

BNF Rules• A rule has a left-hand side (LHS) and a right-hand

side (RHS), and consists of terminal and nonterminal symbols

• A grammar is often given simply as a set of rules (terminal and non-terminal sets are implicit in rules, as is start symbol)

Page 8: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

10

Describing Lists• There are many situations in which a

programming language allows a list of items (e.g. parameter list, argument list).

• Such a list can typically be as short as empty or consisting of one item.

• Such lists are typically not bounded.• How is their structure described?

Page 9: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

11

Describing lists• The are described using recursive rules.• Here is a pair of rules describing a list of

identifiers, whose minimum length is one:<ident_list> -> ident

| ident , <ident_list>• Notice that ‘,’ is part of the object language (the

language being described by the grammar).

Page 10: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

12

Derivation of sentences from a grammar

• A derivation is a repeated application of rules, starting with the start symbol and ending with a sentence (all terminal symbols)

Page 11: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

13

Recall example 2

G2 = ({a, the, dog, cat, chased},{S, NP, VP, Det, N, V},{S à NP VP, NP à Det N, Det à a | the,N à dog | cat, VP à V | VP NP, V à chased},

S)

Page 12: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

14

Example: derivation from G2

• Example: derivation of the dog chased a catS à NP VP à Det N VPà the N VPà the dog VPà the dog V NPà the dog chased NPà the dog chased Det Nà the dog chased a Nà the dog chased a cat

Page 13: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

15

Example 3

L3 = { 0, 1, 00, 11, 000, 111, 0000, 1111, … }

G3 = ( {0, 1},{S, ZeroList, OneList},{S à ZeroList | OneList,ZeroList à 0 | 0 ZeroList,OneList à 1 | 1 OneList },S )

Page 14: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

16

Example: derivations from G3• Example: derivation of 0 0 0 0

S à ZeroListà 0 ZeroListà 0 0 ZeroListà 0 0 0 ZeroListà 0 0 0 0

• Example: derivation of 1 1 1S à OneListà 1 OneListà 1 1 OneListà 1 1 1

Page 15: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

17

Observations about derivations

• Every string of symbols in the derivation is a sentential form.

• A sentence is a sentential form that has only terminal symbols.

• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded.

• A derivation can be leftmost, rightmost, or neither.

Page 16: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

18

An example programming language grammar fragment

<program> -> <stmt-list><stmt-list> -> <stmt>

| <stmt> ; <stmt-list><stmt> -> <var> = <expr><var> -> a

| b| c| d

<expr> -> <term> + <term>| <term> - <term>

<term> -> <var>| const

Page 17: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

19

A leftmost derivation ofa = b + const

<program> => <stmt-list> => <stmt> => <var> = <expr>=> a = <expr> => a = <term> + <term>=> a = <var> + <term> => a = b + <term>=> a = b + const

Page 18: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

20

Parse tree• A parse tree is an hierarchical representation of a

derivation: <program>

<stmt-list>

<stmt>

const

a

<var> = <expr>

<var>

b

<term> + <term>

Page 19: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

21

Parse trees and compilation

• A compiler builds a parse tree for a program (or for different parts of a program).

• If the compiler cannot build a well-formed parse tree from a given input, it reports a compilation error.

• The parse tree serves as the basis for semantic interpretation/translation of the program.

Page 20: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

22

Extended BNF• Optional parts are placed in brackets [ ]<proc_call> -> ident [(<expr_list>)]

• Alternative parts of RHSs are placed inside parentheses and separated via vertical bars <term> -> <term> (+|-) const

• Repetitions (0 or more) are placed inside braces { }<ident> -> letter {letter|digit}

Page 21: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

23

Comparison of BNF and EBNF• sample grammar fragment expressed in BNF

<expr> -> <expr> + <term>| <expr> - <term>| <term>

<term> -> <term> * <factor>| <term> / <factor>| <factor>

• same grammar fragment expressed in EBNF<expr> -> <term> {(+ | -) <term>}<term> -> <factor> {(* | /) <factor>}

Page 22: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

24

Ambiguity in grammars

• A grammar is ambiguous if and only if it generates a sentential form that has two or more distinct parse trees

• Operator precedence and operator associativity are two examples of ways in which a grammar can provide an unambiguous interpretation.

Page 23: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

25

Operator precedence ambiguity

The following grammar is ambiguous:<expr> -> <expr> <op> <expr> | const<op> -> / | -

The grammar treats the '/' and '-' operators equivalently.

Page 24: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

26

An ambiguous grammarfor arithmetic expressions

<expr> -> <expr> <op> <expr> | const<op> -> / | -

<expr>

<expr> <expr>

<expr> <expr>

<expr>

<expr> <expr>

<expr> <expr>

<op>

<op>

<op>

<op>

const const const const const const- -/ /

<op>

Page 25: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

28

Disambiguating the grammar

• If we use the parse tree to indicate precedence levels of the operators, we can remove the ambiguity.

• The following rules give / a higher precedence than -

<expr> -> <expr> - <term> | <term><term> -> <term> / const | const

<expr>

<expr> <term>

<term> <term>

const const

const/

-

Page 26: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

29

Links to BNF-style grammars for actual programming languages

Below are some links to grammars for real programming languages. Look at how the grammars are expressed.

– http://www.schemers.org/Documents/Standards/R5RS/– http://www.sics.se/isl/sicstuswww/site/documentation.html

In the ones listed below, find the parts of the grammar that deal with operator precedence.

– http://java.sun.com/docs/books/jls/index.html– http://www.lykkenborg.no/java/grammar/JLS3.html– http://www.enseignement.polytechnique.fr/profs/informatique/Jean-

Jacques.Levy/poly/mainB/node23.html– http://www.lrz-muenchen.de/~bernhard/Pascal-EBNF.html

Page 27: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

30

Derivation of2+5*3

using C grammar

<expression>

<conditional-expression>

<assignment-expression>

<logical-OR-expression>

<inclusive-OR-expression>

<AND-expression>

<logical-AND-expression>

<exclusive-OR-expression>

<equality-expression>

<relational-expression>

<shift-expression>

<additive-expression>

<additive-expression> + <multiplicative-expression>

<multiplicative-expression>

<cast-expression>

<unary-expression>

<postfix-expression>

<primary-expression>

<constant>

2

<multiplicative-expression> <cast-expression>

<unary-expression>

<postfix-expression>

<primary-expression>

<constant>

3

<cast-expression>

<unary-expression>

<postfix-expression>

<primary-expression>

<constant>

5

*

Page 28: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

31

Recursion and parentheses

• To generate 2+3*4 or 3*4+2, the parse tree is built so that + is higher in the tree than *.

• To force an addition to be done prior to a multiplication we must use parentheses, as in (2+3)*4.

• Grammar captures this in the recursive case of an expression, as in the following grammar fragment:

<expr> à <expr> + <term> | <term><term> à <term> * <factor> | <factor><factor> à <variable> | <constant> | “(” <expr> “)”

Page 29: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

Lecture discussion

There are many reasons to study the syntax of programming languages.When learning a new language you need to be able to read a syntax description to be able to write well-formed programs in the language.Understanding at least a little of what a compiler does in translating a program from high-level to low-level forms deepens your understanding of why programming languages are designed the way they are, and equips you to better diagnose subtle bugs in programs.The next slide shows the “evaluation order” remark in the C++ language reference, which alludes to the order being left unspecified to allow a compiler to optimize the code during translation.

32

Page 30: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

Shown on Visualizer

33

C++ Programming Language, 3rd edition. Bjarne Stroustrup. (c) 1997. Page 122.

Page 31: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

A compiler translates high level language statements into a much larger number of low-level statements, and then applies optimizations. The entire translation process, including optimizations, must preserve the semantics of the original high-level program.The next slides shows that different phases of compilation can apply different types of optimizations (some target-independent, some target-dependent).By not specifying the order in which subexpressions are evaluated (left-to-right or right-to-left) a C++ compiler can potentially re-order the resulting low-level instructions to give a “better” result.

34

Page 32: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

35

Com

pilers: principles, techniques, and tools (Aho

et al) (c) 2007, page 5.

Page 33: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

RL ⊆ CFLGiven a regular language L we can always construct a context free grammar G such that L = 𝓛(G).

For every regular langauge L there is an NFA M = (S,∑,𝛅,F,s0) such that L = 𝓛(M).

Build G = (N,T,P,S0) as follows:

N = { Ns | s ∈ S }

T = { t | t ∈ ∑ }

If 𝛅(i,a)=j, then add Ni → a Nj to P

If i ∈ F, then add Ni → 𝜀 to P

S0 = Nso

Page 34: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

(a|b)*abb

G = ( {A0, A1, A2, A3}, {a, b}, {A0 → a A0, A0 → b A0, A0 → a A1, A1 → b A2, A2 → b A3, A3 → 𝜀}, A0 }

0 1 2 3a b b

a

b

Page 35: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

RL ⊊ CFLShow that not all CF languages are regular.

To do this we only need to demonstrate that there exists a CFL that is not regular.

Consider L = { anbn | n ≥ 1 }

Claim: L ∈ CFL, L ∉ RL

Page 36: CSE443 Compilers - University at BuffaloLecture discussion There are many reasons to study the syntax of programming languages. When learning a new language you need to be able to

RL ⊊ CFLProof (sketch):

L ∈ CFL: S → aSb | ab

L ∉ RL (by contradiction):

Assume L is regular. In this case there exists a DFA D=(S,∑,𝛅,F,s0) such that 𝓛(D) = L.

Let k = |S|. Consider aibi, where i>k.

Suppose 𝛅(s0, ai) = sr. Since i>k, not all of the states between

s0 and sr are distinct. Hence, there are v and w, 0 ≤ v < w ≤ k such that sv = sw. In other words, there is a loop.

This DFA can certainly recognize aibi but it can also recognize ajbi, where i ≠ j, by following the loop.

"REGULAR GRAMMARS CANNOT COUNT"