Syntax and Grammars. Definitions Syntax is the form of its expressions, statements and program units...

Click here to load reader

download Syntax and Grammars. Definitions Syntax is the form of its expressions, statements and program units Semantics is the meaning of those expressions, statements.

of 32

Transcript of Syntax and Grammars. Definitions Syntax is the form of its expressions, statements and program units...

Syntax and Grammars

Syntax and GrammarsDefinitionsSyntax is the form of its expressions, statements and program unitsSemantics is the meaning of those expressions, statements and program units

Syntax can be precisely defined, while semantics are harder to describe:syntax easier were to sentence would unimportant this read if bethis sentence would be easier to read if syntax were unimportant

time flies like an arrow, fruit flies like a bananaContext-sensitive vs. context-freeEnglish is context sensitive since the meaning of a word can change based on the words around it

Programming languages are usually context-freeThe meaning of the words in a program are determined by the syntax rules of the language

Describing SyntaxTwo approaches:Language recognizersGiven a string of characters, determines if the string is valid in the languageApproach taken by the compiler

Language generatorsAllows a new valid sentence to be created each time it is usedMore easily read and understood

These formalizations aid in compiler design, and language acquisition4Language GeneratorsUsually called grammarsChomsky (1956, 1959)Linguist who described four classes of language generators, two which are useful for programming language syntax: context-free and regularInterested in the theoryBackus and NaurComputer scientists describing PL syntaxInterested in the practical aspect of describing ALGOL 58 and ALGOL 60BNF (Backus-Naur Form) now the standardBackus-Naur FormBNF is a metalanguage A language that describes another languageGives details for each of characters, lexemes and tokensPieces of syntaxIndividual characters e.g. alphanumeric keysLexemesSmall units E.g. numeric literals, operators, special wordsTokensCategories for lexemesE.g. identifier, mult_op, semicolon, int_literal

BNF Example C++ Identifier ::= ::= | _::= | ::= | | _ ::=A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z::=0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9Is R2D2 a valid identifier?

R R R R 2 R 2 R 2 R 2 D R 2 D R 2 D R 2 D 2 R 2 D 2 This sequence of steps is called a derivation.Parse trees

R2. . . Formal DefinitionsLet S be a set of symbols.A string over S is a finite sequence of zero or more symbols from the set S. Symbols whose meaning is predefined are called terminals. Symbols like A, _, 6, etc. are terminals in our BNF.Symbols whose meanings must be defined are called non-terminals, and are enclosed in angle-brackets (< and >). Symbols like , , etc. are non-terminals. Like variables, non-terminals usually describe what they represent.One symbol is designated as the starting non-terminal. The symbol is the starting non-terminal in our BNF.Each non-terminal must be defined by a production (rule):Different productions defining the same non-terminal:::=Def1::=Def2 ...::=Defncan be written in shorthand using the OR (|) operator: ::=Def1 | Def2 | | Defn::=where: is the nonterminal being defined, is a string of terminals and/or nonterminals defining .Formal Definitions continuedA BNF is a quadruple: (S, N, P, S), where:A derivation is a sequence of strings, beginning with the starting nonterminal S, in which each successive string replaces a nonterminal with one of its productions, and in which the final string consists solely of terminals. A derivation is sometimes called a parse.S is the set of symbols in the BNF;N is the subset of S that are nonterminals in the language;P is the set of productions defining the symbols in N; S is the element of N that is the starting nonterminal.Formal Definitions continued The act of building a derivation tree for a sentence (to check its correctness) is called parsing that sentence. The set of valid sentences in a language is the set of all sentences for which a parse tree exists!A BNF derivation tree (or parse tree) is a tree, such that the root of the tree is the starting nonterminal (S) in the BNF; the children of the root are the symbols (L to R) in a production whose is S, the starting nonterminal; each terminal child is a leaf; and each nonterminal child is the root of a derivation tree for that nonterminal.Formal Definitions continuedOur identifier-BNF permits identifiers to have arbitrary lengths through the use of recursive productions:The recursive production provides for unrestricted repetition of the non-terminal being defined, which is useful when what is being defined can be appear 0 or more times.::= | The e-production provides: a base-case for trivial instances of the non-terminal, andan anchor to terminate the recursion.The first production is recursive because the non-terminal on the appears in the production (on the ).Recursive ProductionsWriting BNFs: A 3-Step ProcessA. Start with the non-terminal youre defining.B. Build the productions to define the non-terminal:1. Start with the question: What comes first?2. If a construct is optional:a. create a nonterminal for that construct; b. add a production for the non-optional case; c. add an e-production for the optional case.3. If a construct can be repeated zero or more times:a. create a nonterminal for that construct; b. add an e-production for the zero-reps case;c. add a recursive production for the other cases.C. For each nonterminal in the of every production:repeat step B until all nonterminals have been defined.Example: C++ if statementA. Create a non-terminal for what were defining:::=if ( ) - keyword if- open parentheses- an expression- close parentheses- a statement- (optional) an else and a statementC. Repeat B for each undefined non-terminal:::=else | (For now we wont worry about and ) B. Build a production to define it: what comes first?How are we doing?if ( a < b) if (a < c) S1 else S2Uh oh::=if ( ) ::=else | AmbiguitiesIf multiple parse trees exist for a valid expression, the grammar is said to be ambiguousC++ allows this ambiguity in the if expression, and instead uses a semantic rule to resolve the ambiguity:An else associates with the closest prior unterminated ifNot all ambiguity is okMany times, if there is ambiguity, the grammar has been poorly structuredExample ::= = ::= A|B|C::= + | * | ( )|

Consider the expression A = B + C * A

From Concepts of Programming Languages, 9th ed. Sebesta20PrecedenceOrder of evaluations can be built in to the grammarAn expression lower in the parse tree then has higher precedence

::= | | ::= | + | - ::= | * | / | % ::=( ) | Example: 2+3*4

+

2 * 3

4

AssociativityWhen two operators in an expression have the same precedence (e.g. + and -, or * and /), a semantic rule is required to indicate which should have precedenceAssociativity can matter in computing where it does not matter in mathematicsE.g. 1000000 plus ten 1s on a machine with 7 digits of accuracyA grammar can imply operator associativityAssociativity Examples- is left-associative; which is correct?--842--842= is right-associative; which is correct?==xyz==xyzLeft-recursive productionsLeft associativity can be built into a grammar using left-recursive productions

::= | + | - --842Right-recursive productionsRight associativity can be built into a grammar using right-recursive productions

::= ==xyzExerciseBuild a parse tree for a = x - y / z + u ;Make sure the tree shows appropriate precedence and associativityWhich productions are in the grammar?Exercise::= | + | - ::= | * | / | % ::=( ) | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ** | ::=Draw the parse tree for: 4 ** 2 ** 3 + 5 * 6 + 7Extended BNF (EBNF)No more description power than regular BNFIncreased readability and writabilityVariety of implementationsInstead of indicating a non-terminal, bold, underline or quote terminalsCapitalize first letter of non-terminals

Common extensions[ and ] surround optional symbolsNo productions!E.g. if_stmt ::= if ( expr ) stmt [ else stmt ]

{ and } surround symbols repeated zero or more timesNo recursion!E.g. ident ::= letter { letter | digit }

( and ) indicate groupings from which a single choice must be madeE.g. term ::= term ( * | / | % ) factorWhy BNF?The recursive productions make it easier for a compiler to parse the sentences in the languageBasic parsing algorithm

0. Push S (the starting symbol) onto a stack.1. Get the first terminal symbol t from the input file.2. Repeat the following steps:a. Pop the stack into topSymbol;b. If topSymbol is a nonterminal:1) Choose a production p of topSymbol based on t 2) If p != e:Push p right-to-left onto the stack.c. Else if topSymbol is a terminal && topSymbol == t:Get the next terminal symbol t from the input file.d. ElseGenerate a parse error message.while the stack is not empty.ExampleLets parse the declaration:int x, y; assuming that is our starting symbol.::= id ; | e ::=int | char | float | double | ::=, | e stack:

t:int

id

;

intid

;

id

;

id (x)

;

,,;

id

id

;

id (y);

;

;

Suppose our grammar is:SummaryThe EBNF is simpler and easier to use, so it is frequently used in language guides/manuals.

The BNFs recursive- and e-productions simplify the task of parsing, so it is more useful to compiler-writers.