Part 2 - Syntax

download Part 2 - Syntax

of 30

Transcript of Part 2 - Syntax

  • 8/18/2019 Part 2 - Syntax

    1/30

  • 8/18/2019 Part 2 - Syntax

    2/30

  • 8/18/2019 Part 2 - Syntax

    3/30

    •  The study of programming languages can bedivided into the examination of syntax andsemantics

     – Syntax - is the form of expressions, statements, andprogram units

     – Semantics - is the meaning of those expressions,

    statements, and program units

    • Meaning that syntax is the form (structure,grammar) of a language and semantics is the

    meaning of a language .• In a well-designed programming language,

    semantics should follow directly from syntax

    • Describing syntax is easier than describing

    semantics

  • 8/18/2019 Part 2 - Syntax

    4/30

    xample!

    syntax! if-else is an operator thatta"es three operands - a condition andtwo statements

    semantics! if the value of a is greaterthan the value of b, then increment a.#therwise, increment b.

    if $a % b& a ' a ( )* else b ' b ( )*

  • 8/18/2019 Part 2 - Syntax

    5/30

    • Syntax is what the grammar allows,semantics is what it means.

    int x ' +ve+* syntax is o"ay

    $type identier ' value&, semantics is

    wrong $+ve+ is not an int&.

  • 8/18/2019 Part 2 - Syntax

    6/30

  • 8/18/2019 Part 2 - Syntax

    7/30

    oth the syntax and semantics of aprogramming language must be carefullydened so that!

    • language implementors can implementthe language $correctly&, so that programsdeveloped with one implementation runcorrectly under another $portability&

    • programmers can use the language$correctly&

  • 8/18/2019 Part 2 - Syntax

    8/30

    Describing Syntax

    / language is a set of strings ofcharacters from some alphabet.

    xamples!0 nglish $using the standard alphabet&0 binary numbers $using the alphabet

    12, )3&

  • 8/18/2019 Part 2 - Syntax

    9/30

     The syntax rules of a languagedetermine whether or not arbitrary

    strings belong to the language. The rststep in specifying syntax is describingthe basic units or 4words5 of thelanguage, called lexemes.

    6or example, some typical 7ava lexemesinclude!

    0 if0 ((

    0 (

  • 8/18/2019 Part 2 - Syntax

    10/30

     T; /? @>#?M #6DSA>II=< SB=T/C

    •   Lexemes - the lowest level ofsyntactic unit

    •  The lexemes of a programminglanguage include its identiers,literals, operators and special words

    •   Token of a language is a category ofits lexemes

  • 8/18/2019 Part 2 - Syntax

    11/30

    • ?exemes are grouped into categoriescalled to"ens. ach to"en has one or

    more lexemes.•  To"ens are specied using regular

    expressions or nite automata.•

     The scannerlexical analyEer of acompiler processes the characterstrings in the source program anddetermines the to"ens that theyrepresent.

    • #nce the to"ens of a language aredened, the next step is to determine

    which seFuences of to"ens are in the

  • 8/18/2019 Part 2 - Syntax

    12/30

  • 8/18/2019 Part 2 - Syntax

    13/30

  • 8/18/2019 Part 2 - Syntax

    14/30

    ?/=

  • 8/18/2019 Part 2 - Syntax

    15/30

    6#>M/? MT;#DS #6 DSA>II=<SB=T/C

    •  7ohn ac"us and =oam Ahoms"y

    invented a notation that is mostwidely used for describingprogramming language syntax

  • 8/18/2019 Part 2 - Syntax

    16/30

    A#=TCT 6> /MM/>S

    • Ahoms"y described classes ofgrammars that dene classes oflanguages. Two of these grammar

    classes, context-free and regularturned out to be useful for describingthe syntax of programming

    languages•  The to"ens of programming

    languages can be described by

    regular grammars

  • 8/18/2019 Part 2 - Syntax

    17/30

    #>I 6#>M$=6&

    • =6 is a very natural notation for describing syntax

    • Ahoms"yKs context-free languages is almost same as=6Ks context-free grammars meta-language is alanguage that is used to describe another language

    • =6 is meta-language for programming languages

    •  The abstractions in =6, or grammar are called non-terminals

    •  The lexemes and to"ens of the rules are called

    terminals• / =6 description, or grammar, is simply a collection

    of rules

  • 8/18/2019 Part 2 - Syntax

    18/30

    =6 L ac"us-=aur 6orm=6 is!

    0 a metalanguage - a language used todescribe other languages0 the standard way to describe programminglanguage syntax

    0 often used in language reference manuals

     The class $set& of languages that can bedescribed using =6 is called the context-freelanguages, and =6 descriptions are alsocalled context-free grammars or ustgrammars.

  • 8/18/2019 Part 2 - Syntax

    19/30

    =6 =otation

  • 8/18/2019 Part 2 - Syntax

    20/30

  • 8/18/2019 Part 2 - Syntax

    21/30

    @arse Tree

    • / parse tree is a graphical way ofrepresenting a derivation.

    • the root of the parse tree is always the

    start symbol• each interior node is a nonterminal

    • each leaf node is a to"en

    • the children of a nonterminal $interiornode& are the >;S of some rule whose?;S is the nonterminal

  • 8/18/2019 Part 2 - Syntax

    22/30

    6 l t f

  • 8/18/2019 Part 2 - Syntax

    23/30

    6or example, a parse tree for!if $id % num& id ' num* else 1 id ' id (num* id ' id* 3

    using the previous grammar is!

    / i bi if th 9

  • 8/18/2019 Part 2 - Syntax

    24/30

    / grammar is ambiguous if there are 9 or moredistinct parse trees $or eFuivalently, leftmostderivations& for the same string.

    Aonsider the grammar!Nexpr% O id P num P $Nexpr%& P Nexpr% ( Nexpr% PNexpr% Q Nexpr%and the string!id ( num Q id

     The following parse trees show that this grammar isambiguous!

    Rhich parse tree ould e prefer

  • 8/18/2019 Part 2 - Syntax

    25/30

    Rhich parse tree would we prefer

  • 8/18/2019 Part 2 - Syntax

    26/30

     This grammar modication gives rise tothree proof obligations!0 the two grammars dene the samelanguage0 the second grammar always gives correctassociativity and precedence

    0 the second grammar is not ambiguous These roofs are omitted.

  • 8/18/2019 Part 2 - Syntax

    27/30

    SB=T/C /@;S

    • / graph is a collection of nodes, some of whichare connected by lines, called edges

    • / directed graph is one in which the edges are

    directional* they have arrowheads on one endto indicate a direction

    •  The information in =6 rules can berepresented in a directed graph, such graphs

    are called syntax graphs. These graphs userectangles for non-terminals and circles forterminals

  • 8/18/2019 Part 2 - Syntax

    28/30

    /MM/>S /=D>A#S

    • #ne of the most widely used of the syntaxanalyEer generators is named yacc - yet anothercompiler compiler Syntax analyEers forprogramming languages, which are often called

    parsers, construct parse trees for givenprograms

    •  The 9 broad classes of parsers are top-down, inwhich the tree is built from the root downward to

    the leaves, and bottom-up, in which the parsetree is built from the leaves upward to the root.

  • 8/18/2019 Part 2 - Syntax

    29/30

    >AG>SI DA=T@/>SI=<

    • Aontext-free grammar can serve as thebasis for the syntax analyEer, or parser, of acompiler

    / simple "ind of grammar-based top-downparser is named recursive decent

    • @arsing is the process of tracing a parsetree for a given input string

    •  The basic idea of a recursive decent parseris that there is a subprogram for each non-terminal in the grammar

  • 8/18/2019 Part 2 - Syntax

    30/30

    /TT>IGT /MM/>S

    • /n attribute grammar is a deviceused to describe more of thestructure of 

    a programming language than ispossible with a context-free grammar

    • /n attribute grammar is an extension

    to a context-free grammar