Lect Slides
-
Upload
vikasdalal -
Category
Documents
-
view
240 -
download
0
description
Transcript of Lect Slides
-
Introduction to CompilersIntroductiontoCompilers
-
Writing Cross CompilersWritingCrossCompilers
Mac C compiler Unix C Mac C complierMac C compilersource code
in Unix C
Unix Ccompiler
Mac C complierusable on Unix
Mac C complierusable on Unix
Mac C compilersource code
in Unix C
Mac C complierusable on Mac
in Unix C
-
Writing Retargetable CompilersWritingRetargetableCompilers
Twomethods: Make a strict distinction between frontendMakeastrictdistinctionbetweenfront endandbackend,thenusedifferentbackends.
Generatecodeforavirtualmachine,thenbuild,acompilerorinterpretertotranslatevirtualmachinecodetoaspecificmachinecode.
-
BootstrappingBootstrapping Processofwritinga compiler (or assembler)ing p ( )thetarget programminglanguage whichitisintendedtocompile.
Applying this technique leads to a self Applyingthistechniqueleadstoaselfhosting compiler.
Many compilers for many programmingManycompilersformanyprogramminglanguagesarebootstrapped,includingcompilersfor BASIC, ALGOL, C, Pascal, PL/I, Factor, Haskell,Modula 2 Oberon OCaml CommonModula2, Oberon, OCaml, CommonLisp, Scheme,Java, Python, Scala, Nimrod, Eiffel,andmore.
-
Formal LanguagesFormalLanguages
Already studiedAlreadystudied
-
Roles of ScannerRolesofScanner
Removal of commentsRemovalofcomments Caseconversion Removal of white spaces Removalofwhitespaces
Blanks,tabulars,carriagereturnsandlinefeeds Interpretation of compiler directives Interpretationofcompilerdirectives
#include, #ifdef, #ifndef and#define are directives to redirect the input of#define aredirectivesto redirecttheinput ofthecompiler
Maybedonebyaprecompiler
-
Token: An element of the lexical definition ofToken:Anelementofthelexicaldefinitionofthelanguage.
Lexeme: A sequence of characters identified Lexeme:Asequenceofcharactersidentifiedasatoken.P S f i i d ib d b l Pattern :Setofstringsisdescribedbyarulecalledpatternassociatedwithatoken.
-
Regular Languages and Regular ExpressionRegularLanguagesandRegularExpression
Studied in Theory of computationStudiedinTheoryofcomputation
-
Possible ImplementationsPossibleImplementations
LexicalAnalyzerGenerator(e.g.Lex)y ( g )+ safe,quick Mustlearnsoftware,unabletohandleunusualsituations
TableDrivenLexicalAnalyzer+ generalandadaptablemethod,samefunctioncanbeusedfor all tabledriven lexical analyzersforalltable drivenlexicalanalyzers
Buildingtransitiontablecanbetediousanderrorprone
-
Possible ImplementationsPossibleImplementations
HandwrittenHand written+ Canbeoptimized,canhandleanyunusualsituation easy to build for most languagessituation,easytobuildformostlanguages
Errorprone,notadaptableormaintainable
-
Design of a Lexical AnalyzerDesignofaLexicalAnalyzer
St Steps1- Construct a set of regular expressions (REs)
that define the form of all valid tokenf h2- Derive an NDFA from the REs
3- Derive a DFA from the NDFA4- Translate to a state transition table5- Implement the table5 Implement the table6- Implement the algorithm to interpret the table
-
SpecificationoftokensSpecification of tokensRegularexpressionsareimportantnotationforspecifying patternsspecifyingpatterns.
RulestodefineRegularexpressions
Limitations of regular expressionsLimitationsofregularexpressions
Notdescribebalancedornestedconstructs.RepeatingstringscannotbedescribedEg{wcw|wisstringofasandbs}
-
Regular ExpressionsRegularExpressions
{ } : { }s : {s | s in s^}a : {a}a : {a}r | s : {r | r in r^} or {s | s in s^}s* : {sn | s in s^ and n>=0}s+ : {sn | s in s^ and n> 1}
id -> letter(letter|digit)*
s+ : {sn | s in s and n>=1}
Num->digit+(.digit+)? (E(+|-)?digit+)?
-
Recognition of tokensRecognitionoftokensTransitiondiagrams:
Asanintermediatestepinconstructionoflexicalanalyzer,weproduceastylizedflowchart,calledatransitiondiagram.
start letter
Letterordigit
other ( k () ll d())start
9 10 11other Return(gettoken(),install_id())
Transitiondiagramforidentifiersandkeywords
-
Implementingatransitiondiagramp g gAsequenceoftransitiondiagramscanbeconvertedintoaprogramtolookforthetokensspecifiedbythediagrams.Programsizeisproportionaltothenoof
& d i h distates&edgesinthediagrams.
digit
25 26 27
start digit
g
other
Transitiondiagramfornumbers
C code for Lexical Analyzer is :CcodeforLexicalAnalyzeris:
-
token nexttoken()token nexttoken() {while(1){
switch (state) { case 0: c = nextchar(); /* c is lookahead character */ if ( bl k t b li ) {if (c==blank :: c==tab :: c==newline) { state = 0; lexerne beginning++; _ g g/* advance beginning of lexerne */ }
else if (c == '') state = 6;else if (c == > ) state = 6;
-
else state = fail(); ()break; /* cases 1-8 here */ case9:c=nextchar ();
if (isletter(c)) state = 10; else state = fail();else state = fail(); break;
case 10: c = nextchar(); if (isletter(c)) state = 10; else if (isdigit(c)) state = 10; else state = 11;break;
-
case 11: retract(1); install id();case 11: retract(1); install_id(); return ( gettoken() ); .../* cases 12-24 here */ case25:c=nextchar ();
if(isdigi t(c))state=26;
else state = fail(); break;
case 26: c = nextchar();case 26: c = nextchar(); if (isdigit(c)) state = 26;else state = 27; break;
case 27: retract(1); install_nurn(); return ( NUM ); }}}
-
Gettoken()Looksforlexemeinsymboltable.Iflexemeiskeyword,correspondingtokenisreturned;otherwisetokenidisreturned.
Install id()Install_id()Hasaccesstobuffer,wheretheidentifierlexemeislocated.
Sym table is examined & if lexeme is found marked as keyword,it returns 0.Symtableisexamined&iflexemeisfoundmarkedaskeyword,itreturns0.
Lexemeisfound&isprogramvariable,returnspointertosymtableentry
Ifnotfoundinsymtable,itisinstalledasavariable&pointertonewlycreatedt i t dentryisreturned.
Install_num()
-
Derive NDFA from REsDeriveNDFAfromREs
CouldderiveDFAfromREsbut: MucheasiertodoNDFA,thenderiveDFA No standard way of deriving DFAs from ResNostandardwayofderivingDFAsfromRes UseThompsonsconstruction(Loudens)
letter
letter
digit
letter
-
Derive DFA from NDFADeriveDFAfromNDFA Use subset construction (Loudens)Usesubsetconstruction(Louden s) Maybeoptimized
i i l Easiertoimplement: No edges Determinist(nobacktracking)
l
letter
[ h ]
letter
letter [other]letter
l
e
t
t
e
r
digit
digitdigit
-
Implementation ConcernsImplementationConcerns
BacktrackingBacktracking Principle :Atokenisnormallyrecognizedonlywhenthenextcharacterisread.
Problem :Maybethischaracterispartofthenexttoken. Example :x
-
Implementation ConcernsImplementationConcerns
AmbiguityAmbiguity Problem :Sometokenslexemesaresubsetsofothertokens.
Example : n-1. Isitor?l i Solutions :
Postponethedecisiontothesyntacticanalyzer Donotallowsignprefixtonumbersinthelexicalspecificationg p p Interactwiththesyntacticanalyzertofindasolution.(Inducescoupling)
-
ExampleExample
Alphabet:p {:,*,=,(,),,{,},[a..z],[0..9]}
Simpletokens: {(,),{,},:,}
Compositetokens:{ (* *)} {:=,>=,
-
ExampleExample
Ambiguity problems: Ambiguityproblems:Character Possible tokens
: :, :=: :, :> >, >=<