BHIMKAYA
-
Upload
anjali-nagpal -
Category
Documents
-
view
224 -
download
0
Transcript of BHIMKAYA
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 1/40
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 2/40
SRS
( SOFTWARE REQUIREMENTSPECIFICATION)
1. INTRODUCTION
COMPILER Simply stated, a compiler is a program that reads a program
written in one language-the source language-and translates it
into an equivalent program in another language-the target
language. As an important part of this translation process, the
compiler reports to its user the presence of errors in the source
program.
Source Targetprogram program
Error messages
Compilers are sometimes classified as single-pass, multi-pass,
load-and-go, debugging, or optimizing, depending on how theyhave been constructed or on what function they are supposed to
perform. Despite this apparent complexity, the basic tasks that
any compiler must perform are essentially the same.
Compiler
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 3/40
THE PHASES OF A COMPILER
Conceptually, a compiler operates in phases, each of which
transforms the source program from one representation to
another.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 4/40
The first three phases, forms the bulk of the analysis portion of a
compiler. Symbol table management and error handling, are
shown interacting with the six phases.
Symbol table management
An essential function of a compiler is to record the identifiers
used in the source program and collect information about various
attributes of each identifier. A symbol table is a data structure
containing a record for each identifier, with fields for the
attributes of the identifier. The data structure allows us to find
the record for each identifier quickly and to store or retrieve data
from that record quickly. When an identifier in the source
program is detected by the lex analyzer, the identifier is entered
into the symbol table.
Error Detection and Reporting
Each phase can encounter errors. A compiler that stops when it
finds the first error is not as helpful as it could be.
The syntax and semantic analysis phases usually handle a large
fraction of the errors detectable by the compiler. The lexical
phase can detect errors where the characters remaining in the
input do not form any token of the language. Errors when the
token stream violates the syntax of the language are determined
by the syntax analysis phase. During semantic analysis the
compiler tries to detect constructs that have the right syntactic
structure but no meaning to the operation involved.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 5/40
THE ANALYSIS PHASES:-
As translation progresses , the compiler’s internal representation
of the source program changes. Consider the statement,
position := initial + rate * 10
The lexical analysis phase reads the characters in the source
pgm and groups them into a stream of tokens in which each
token represents a logically cohesive sequence of characters,
such as an identifier, a keyword etc. The character sequence
forming a token is called the lexeme for the token. Certain tokens
will be augmented by a ‘lexical value’. For example, for anyidentifier the lex analyzer generates not only the token id but also
enter s the lexeme into the symbol table, if it is not already
present there. The lexical value associated this occurrence of id
points to the symbol table entry for this lexeme. The
representation of the statement given above after the lexical
analysis would be:
id1: = id2 + id3 * 10
Syntax analysis imposes a hierarchical structure on the token
stream, which is shown by syntax trees.
THE SYNTHESIS PHASES:-
Intermediate Code Generation
After syntax and semantic analysis, some compilers generate an
explicit intermediate representation of the source program. This
intermediate representation can have a variety of forms.
In three-address code, the source pgm might look like this,
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 6/40
temp1: = inttoreal (10)temp2: = id3 * temp1temp3: = id2 + temp2id1: = temp3
Code Optimisation
The code optimization phase attempts to improve the
intermediate code, so that faster running machine codes will
result. Some optimizations are trivial. There is a great variation in
the amount of code optimization different compilers perform. In
those that do the most, called ‘optimising compilers’, a significant
fraction of the time of the compiler is spent on this phase.
Code Generation
The final phase of the compiler is the generation of target code,
consisting normally of relocatable machine code or assembly
code. Memory locations are selected for each of the variables
used by the program. Then, intermediate instructions are each
translated into a sequence of machine instructions that perform
the same task. A crucial aspect is the assignment of variables to
registers.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 7/40
LEXICAL ANALYSIS
Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical
analysis are called lexical analyzers or lexers. A lexer is often
organized as separate scanner and tokenizer functions, though
the boundaries may not be clearly defined.
The purpose of the lexical analyzer is to partition the input text,
delivering a sequence of comments and basic symbols.
Comments are character sequences to be ignored, while basic
symbols are character sequences that correspond to terminal
symbols of the grammar defining the phrase structure of the
input.
A simple way to build lexical analyzer is to construct a diagram
that illustrates the structure of the tokens of the source language,
and then to hand-translate the diagram into a program for finding
tokens. Efficient lexical analysers can be produced in this manner.
Role of a Lexical Analyzer
The lexical analyzer is the first phase of compiler. Its main task is
to read the input characters and produce as output a sequence of
tokens that the parser uses for syntax analysis. As in the figure,
upon receiving a “get next token” command from the parser the
lexical analyzer reads input characters until it can identify the
next token.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 8/40
Source program
Fig. Interaction of lexical analyzer with parser.
Since the lexical analyzer is the part of the compiler that reads
the source text, it may also perform certain secondary tasks at
the user interface. One such task is stripping out from the source
program comments and white space in the form of blank, tab,
and new line character. Another is correlating error messages
from the compiler with the source program.
Issues in Lexical Analysis
There are several reasons for separating the analysis phase of
compiling into lexical analysis and parsing.
1) Simpler design is the most important consideration. The
separation of lexical analysis from syntax analysis often allows us
to simplify one or the other of these phases.
2) Compiler efficiency is improved.
3) Compiler portability is enhanced.
token
get next
token
Lexical
analyser Parser
Symbol
table
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 9/40
Tokens Patterns and Lexemes
There is a set of strings in the input for which the same token is
produced as output. This set of strings is described by a rule
called a pattern associated with the token. The pattern is set tomatch each string in the set. A lexeme is a sequence of
characters in the source program that is matched by the pattern
for the token. For example in the Pascal’s statement const pi =
3.1416; the substring pi is a lexeme for the token identifier.
In most programming languages, the following constructs are
treated as tokens: keywords, operators, identifiers, constants,
literal strings, and punctuation symbols such as parentheses,commas, and semicolons.
SAMPLELEXEMES
INFORMAL DESCRIPTION OFPATTERN
constif relationidnumliteral
Constif <,<=,=,<>,>,>=pi,count,D23.1416,0,6.02E23“core dumped”
constif < or <= or = or <> or >= or>letter followed by letters anddigitsany numeric constantany characters between “ and“ except”
In the example when the character sequence pi appears in the
source program, the token representing an identifier is returned
to the parser. The returning of a token is often implemented by
passing and integer corresponding to the token. It is this integer
that is referred to as bold face id in the above table.
A pattern is a rule describing a set of lexemes that can represent
a particular token in source program. The pattern for the token
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 10/40
const in the above table is just the single string const that spells
out the keyword.
Certain language conventions impact the difficulty of lexical
analysis. Languages such as FORTRAN require a certainconstructs in fixed positions on the input line. Thus the alignment
of a lexeme may be important in determining the correctness of a
source program.
Attributes of Token
The lexical analyzer returns to the parser a representation for the
token it has found. The representation is an integer code if the
token is a simple construct such as a left parenthesis, comma, or
colon .The representation is a pair consisting of an integer code
and a pointer to a table if the token is a more complex element
such as an identifier or constant .The integer code gives the
token type, the pointer points to the value of that token .Pairs are
also retuned whenever we wish to distinguish between instances
of a token.
Regular Expressions
In Pascal, an identifier is a letter followed by zero or more letters
or digits. Regular expressions allow us to define precisely sets
such as this. With this notation, Pascal identifiers may be defined
as
letter (letter | digit)*
The vertical bar here means “or” , the parentheses are used togroup subexpressions, the star means “ zero or more instances
of” the parenthesized expression, and the juxtaposition of letter
with remainder of the expression means concatenation.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 11/40
A regular expression is built up out of simpler regular expressions
using set of defining rules. Each regular expression r denotes a
language L(r). The defining rules specify how L(r) is formed by
combining in various ways the languages denoted by the
subexpressions of r .
Recognition of Tokens
The question of how to recognize the tokens is handled in this
section. The language generated by the following grammar is
used as an example.
Consider the following grammar fragment:
stmt àif expr then stmt
|if expr then stmt else stmt
|
expràterm relop term
|term
termàid
|num
where the terminals if , then, else, relop, id and num generate
sets of strings given by the following regular definitions:
if à if
thenà ten
elseà else
relopà <|<=|=|<>|>|>=
idàletter(letter|digit)*
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 12/40
numàdigit+ (.digit+)?(E(+|-)?digit+)?
For this language fragment the lexical analyzer will
recognize the keywords if, then, else, as well as the lexemes
denoted by relop, id, and num. To simplify matters, we assumekeywords are reserved; that is, they cannot be used as
identifiers. Unsigned integer and real numbers of Pascal are
represented by num.
In addition, we assume lexemes are separated by white space,
consisting of nonnull sequences of blanks, tabs and newlines. Our
lexical analyzer will strip out white space. It will do so by
comparing a string against the regular definition ws, below.
delimàblank|tab|newline
wsàdelim+
If a match for ws is found, the lexical analyzer does not return a
token to the parser. Rather, it proceeds to find a token following
the white space and returns that to the parser. Our goal is to
construct a lexical analyzer that will isolate the lexeme for the
next token in the input buffer and produce as output a pair
consisting of the appropriate token and attribute value, using the
translation table given in the figure. The attribute values for the
relational operators are given by the symbolic constants
LT,LE,EQ,NE,GT,GE.
Transition diagram
A transition diagram is a stylized flowchart. Transition diagram is
used to keep track of information about characters that are seen
as the forward pointer scans the input. We do so by moving from
position to position in the diagrams as characters are read.
Positions in a transition diagram are drawn as circles and are
called states. The states are connected by arrow, called edges.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 13/40
Edges leaving state s have labels indicating the input characters
that can next appear after the transition diagram has reached
state s. the label other refers to any character that is not
indicated by any of the other edges leaving s.
One state is labeled as the start state; it is the initial state of the
transition diagram where control resides when we begin to
recognize a token. Certain states may have actions that are
executed when the flow of control reaches that state. On entering
a state we read the next input character if there is and edge from
the current state whose label matches this input character, we
then go to the state pointed to by the edge. Otherwise we
indicate failure. A transition diagram for >= is shown in thefigure.
A recognizer for a language is a program that takes as input a
string x and answers ‘yes’ if a sentence of the language and ‘no’
otherwise. We compile a regular expression into a recognizer by
constructing a transition diagram called finite automation. A finiteautomation can be deterministic or non deterministic where non
deterministic means that more than one transition out of a state
may be possible out of a state may be possible on a same input
symbol.
76
8
0
star
t
other
Fig 5. Transition diagram
for >=
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 14/40
DFA s are faster recognizers than nfas but can be much bigger
than equivalent NFAs.
Non deterministic finite automata
A mathematical model consisting :
1) a set of states S2) input alphabet3) transition function4) initial state5) final state
Lexical grammar
The specification of a programming language will include a set of rules, often expressed syntactically, specifying the set of possiblecharacter sequences that can form a token or lexeme. Thewhitespace characters are often ignored during lexical analysis.
Token
A token is a categorized block of text. The block of textcorresponding to the token is known as a lexeme. A lexicalanalyzer processes lexemes to categorize them according tofunction, giving them meaning. This assignment of meaning isknown as tokenization. A token can look like anything; it justneeds to be a useful part of the structured text.
Consider this expression in the C programming language:
sum=3+2;
Tokenized in the following table:
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 15/40
lexeme
token type
sum IDENTIFIER
= OPERATOR
3 CONSTANT
+ OPERATOR
2 CONSTANT
;SPECIALCHARACTER
Tokens are frequently defined by regular expressions,
which are understood by a lexical analyzer generatorsuch as lex. The lexical analyzer (either generatedautomatically by a tool like lex, or hand-crafted) reads ina stream of characters, identifies the lexemes in the
stream, and categorizes them into tokens. This is called"tokenizing." If the lexer finds an invalid token, it will
report an error.
Following tokenizing is parsing. From there, theinterpreted data may be loaded into data structures, forgeneral use, interpretation, or compiling.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 16/40
2. PURPOSE
The first phase of the construction of a compiler is the
generation of a LEXICAL ANALYZER. The project aims at
building such an analyzer.
The program has been made on the concept of NFA (Non
Deterministic Finite Automata) for recognizing the
characters in the input and classifying them as tokens.
NFA (Non Deterministic Finite Automata) is a
mathematical model consisting:
1) a set of states (S)
2) input alphabet (∑)3) transition function (δ)4) initial state (q0)5) set of final states (F)
A lexical analyzer generator creates a lexical analyser using a set
of specifications usually in the format
p1 {action 1}
p2 {action 2}
. . . . . . . . . . . .
pn {action n}
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 17/40
where pi is a regular expression and each action actioni is a
program fragment that is to be executed whenever a lexeme
matched by pi is found in the input. If more than one pattern
matches, then longest lexeme matched is chosen. If there are
two or more patterns that match the longest lexeme, the first
listed matching pattern is chosen.
This is usually implemented using a finite automaton. There is an
input buffer with two pointers to it, a lexeme-beginning and a
forward pointer. The lexical analyser generator constructs a
transition table for a finite automaton from the regular expression
patterns in the lexical analyser generator specification. The lexical
analyser itself consists of a finite automaton simulator that usesthis transition table to look for the regular expression patterns in
the input buffer.
This can be implemented using an NFA or a DFA. The transition
table for an NFA is considerably smaller than that for a DFA, but
the DFA recognises patterns faster than the NFA.
Using NFA
The transition table for the NFA N is constructed for the
composite pattern p1|p2|. . .|pn, The NFA recognises the longest
prefix of the input that is matched by a pattern. In the final NFA,
there is an accepting state for each pattern pi. The sequence of
steps the final NFA can be in is after seeing each input character
is constructed. The NFA is simulated until it reaches termination
or it reaches a set of states from which there is no transition
defined for the current input symbol. The specification for the
lexical analyser generator is so that a valid source program
cannot entirely fill the input buffer without having the NFA reach
termination. To find a correct match two things are done. Firstly,
whenever an accepting state is added to the current set of states,
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 18/40
the current input position and the pattern pi is recorded
corresponding to this accepting state. If the current set of states
already contains an accepting state, then only the pattern that
appears first in the specification is recorded. Secondly, the
transitions are recorded until termination is reached. Upon
termination, the forward pointer is retracted to the position at
which the last match occurred. The pattern making this match
identifies the token found, and the lexeme matched is the string
between the lexeme beginning and forward pointers. If no
pattern matches, the lexical analyser should transfer control to
some default recovery routine.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 19/40
3. REQUIREMENTS
HARDWARE SPECIFICATIONS
• minimum 128 mb RAM
• Pentium processor
SOFTWARE SPECIFICATIONS
• Operating system(windows xp/vista(64-
bit)/me/2000/98)
• turbo C/C++ IDE
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 20/40
CODING/* Program on lexical analysis */
#include<stdio.h>#include<conio.h>#include<graphics.h>#include<ctype.h>#include<string.h>#define MAX 30
void first(){
int gd=DETECT,gm;initgraph(&gd,&gm,"c:\\tc\\bgi");setcolor(GREEN);settextstyle(10,0,7);outtextxy(130,50,"LEXICAL");setcolor(YELLOW);settextstyle(10,0,7);
outtextxy(90,190,"ANALYSIS");settextstyle(1,0,4);getch();restorecrtmode();
}
void second (){
char str[MAX];
int gdriver=DETECT, gmod;initgraph(&gdriver,&gmod,"c:\\tc\\bgi");setcolor(RED);rectangle(20,85,615,435);rectangle(25,90,610,430);setcolor(GREEN);settextstyle(1,0,4);
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 21/40
outtextxy(30,30,"SUBMITTED BY:-");settextstyle(6,0,1);setcolor(MAGENTA);outtextxy(40,130,"NAME :- KHUSHBOO SHARMA");
outtextxy(40,160,"BRANCH :- CSE(B) - V SEM");outtextxy(40,190,"ROLL NO. : - 0609210046");outtextxy(40,220,"COLLEGE :- PCCS");setcolor(YELLOW);settextstyle(1,0,2.5);outtextxy(300,250,"&");settextstyle(6,0,1);setcolor(BLUE);outtextxy(350,280,"NAME :- NIHARIKA SETH");outtextxy(350,310,"BRANCH :- CSE(B) - V SEM");
outtextxy(350,340,"ROLL NO. : - 0609210067");outtextxy(350,370,"COLLEGE :- PCCS");getch();restorecrtmode();
}void next(){
char str[MAX];int gdriver=DETECT, gmod;
initgraph(&gdriver,&gmod,"c:\\tc\\bgi");setcolor(RED);rectangle(20,85,615,435);rectangle(25,90,610,430);setcolor(GREEN);settextstyle(7,0,4);settextstyle(7,0,1);setcolor(GREEN+BLINK);outtextxy(110,110,"ENTER THE CODE TO BE ANALYSED.");outtextxy(110,160,"THE PROGRAM WILL FIND THE");outtextxy(110,210,"VARIOUS TOKENS PRESENT IN THE");outtextxy(110,260,"INPUT AND PROVIDE YOU WITH THE
SAME");getch();restorecrtmode();
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 22/40
}
void main()
{ char str[MAX];int state=0;int i=0, j, startid=0, endid, startcon, endcon;clrscr();first();for(j=0; j<MAX; j++)str[j]=NULL;second(); //Initialise NULLnext();
printf("\nEnter the string to be analysed:\n\n");gets(str); //Accept input stringstr[strlen(str)]=' ';gotoxy(400,110);printf("Analysis:\n\n");
while(str[i]!=NULL){
while(str[i]==' ') //To eliminate spaces
i++;switch(state){
case 0: if(str[i]=='i') state=1; //if else if(str[i]=='w') state=3; //whileelse if(str[i]=='d') state=8; //doelse if(str[i]=='e') state=10; //elseelse if(str[i]=='f') state=14; //forelse if(isalpha(str[i]) || str[i]=='_'){
state=17;startid=i;
} //identifiers
else if(str[i]=='<') state=19;//relational '<' or '<='
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 23/40
else if(str[i]=='>') state=21;//relational '>' or '>='
else if(str[i]=='=') state=23;//relational '==' or assignment '='
else if(isdigit(str[i])){
state=25; startcon=i;}//constant
else if(str[i]=='(') state=26;
//special characters '('
else if(str[i]==')') state=27;//special characters ')'
else if(str[i]==';') state=28;//special characters ';'
else if(str[i]=='+') state=29;
//operator '+'
else if(str[i]=='-') state=30;//operator '-'
break;
//States for 'if'case 1: if(str[i]=='f') state=2;
else { state=17; startid=i-1; i--; }break;
case 2: if(str[i]=='(' || str[i]==NULL){
printf("if : Keyword\n\n");state=0;i--;
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 24/40
}else { state=17; startid=i-2; i--; }break;
//States for 'while'case 3: if(str[i]=='h') state=4;else { state=17; startid=i-1; i--; }break;
case 4: if(str[i]=='i') state=5;else { state=17; startid=i-2; i--; }break;
case 5: if(str[i]=='l') state=6;else { state=17; startid=i-3; i--; }break;
case 6: if(str[i]=='e') state=7;else { state=17; startid=i-4; i--; }break;
case 7: if(str[i]=='(' || str[i]==NULL){
printf("while : Keyword\n\n");state=0;i--;
}
else { state=17; startid=i-5; i--; }break;
//States for 'do'case 8: if(str[i]=='o') state=9;
else { state=17; startid=i-1; i--; }break;
case 9: if(str[i]=='{' || str[i]==' ' || str[i]==NULL ||str[i]=='(')
{printf("do : Keyword\n\n");state=0;i--;
}break;
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 25/40
//States for 'else'case 10: if(str[i]=='l') state=11;
else { state=17; startid=i-1; i--; }break;
case 11: if(str[i]=='s') state=12;else { state=17; startid=i-2; i--; }break;
case 12: if(str[i]=='e') state=13;else { state=17; startid=i-3; i--; }break;
case 13: if(str[i]=='{' || str[i]==NULL){
printf("else : Keyword\n\n");state=0;
i--;}else { state=17; startid=i-4; i--; }break;
//States for 'for'case 14: if(str[i]=='o') state=15;
else { state=17; startid=i-1; i--; }break;
case 15: if(str[i]=='r') state=16;else { state=17; startid=i-2; i--; }break;
case 16: if(str[i]=='(' || str[i]==NULL){
printf("for : Keyword\n\n");state=0;i--;
}else { state=17; startid=i-3; i--; }break;
//States for identifierscase 17: if(isalnum(str[i]) || str[i]=='_'){
state=18; i++;
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 26/40
}else if(str[i]==NULL||str[i]=='<'||str[i]=='>'||
str[i]=='('||str[i]==')'||str[i]==';'||str[i]=='='||str[i]=='+'||str[i]=='-') state=18;
i--;break;
case 18:
if(str[i]==NULL || str[i]=='<' || str[i]=='>' || str[i]=='(' ||str[i]==')' || str[i]==';' || str[i]=='=' || str[i]=='+' ||str[i]=='-')
{endid=i-1;printf("");
for(j=startid; j<=endid; j++)printf("%c", str[j]);
printf(" : Identifier\n\n");state=0;i--;
}break;
//States for relational operator '<' & '<='
case 19: if(str[i]=='=') state=20;else if(isalnum(str[i]) || str[i]=='_'){
printf("< : Relational operator\n\n");i--;state=0;
}break;
case 20: if(isalnum(str[i]) || str[i]=='_'){
printf("<= : Relational operator\n\n");i--;state=0;
}break;
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 27/40
//States for relational operator '>' & '>='case 21: if(str[i]=='=') state=22;
else if(isalnum(str[i]) || str[i]=='_'){
printf("> : Relational operator\n\n");i--;state=0;
}break;
case 22: if(isalnum(str[i]) || str[i]=='_'){
printf(">= : Relational operator\n\n");i--;state=0;
}break;
//States for relational operator '==' & assignment operator'='
case 23: if(str[i]=='=') state=24;else{
printf("= : Assignment operator\n\n");
i--;state=0;
}break;
case 24: if(isalnum(str[i])){
printf("== : Relational operator\n\n");state=0;i--;
}break;
//States for constantscase 25: if(isalpha(str[i]))
{printf("*** ERROR ***\n\n");
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 28/40
puts(str);for(j=0; j<i; j++)
printf(" ");printf("^");
printf("Error at position %d : Alphabet cannot followdigit\n", i);state=99;
}else if(str[i]=='(' || str[i]==')' || str[i]=='<' || str[i]=='>' ||str[i]==NULL || str[i]==';' || str[i]=='=')
{endcon=i-1;printf("");for(j=startcon; j<=endcon; j++)
printf("%c", str[j]);printf(" : Constant\n\n");state=0;i--;
}break;
//State for special character '('case 26:
printf("( : Special character\n\n");startid=i;state=0;i--;break;
//State for special character ')'case 27:
printf(") : Special character\n\n");state=0;i--;break;
//State for special character ';'case 28:
printf("; : Special character\n\n");state=0;i--;
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 29/40
break;//State for operator '+'
case 29:printf("+ : Operator\n\n");
state=0;i--;break;
//State for operator '-'case 30:
printf("- : Operator\n\n");state=0;i--;break;
//Error State
case 99: goto END;}i++;
}printf("\n\nEnd of program\n\n");END:getch();
}
OUTPUT (example)
Correct input
Enter the string to be analysed : for(x1=0; x1<=10; x1++);
Analysis:
for : Keyword( : Special characterx1 : Identifier= : Assignment operator0 : Constant
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 30/40
; : Special characterx1 : Identifier<= : Relational operator10 : Constant
; : Special characterx1 : Identifier+ : Operator+ : Operator) : Special character; : Special character
End of program
Wrong input
Enter the string to be analyzed: for(x1=0; x1<=19x; x++);
Analysis:
for : Keyword( : Special character
x1 : Identifier= : Assignment operator0 : Constant; : Special characterx1 : Identifier<= : Relational operator
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 31/40
***ERROR***
for(x1=0; x1<=19x; x++);^error at position 12: alphabet cannot follow digit
LIMITATIONS
• The input text has to be entered in a single line without any
indentations.This is contrary to the presently followed style
for writing programs.
• The whitespace characters such as blanks, newline
characters and tabs are not considered for the analysis
(generation of tokens).
• If, during the analysis of the input, an invalid sequence of
character is encountered, an error message is displayed and
the characters following this sequence are not analyzed.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 32/40
FUTURE PROSPECTS OF THEPROJECT
• This program generates the lexical analyzer which is the first
step in the construction of a compiler.
• If the further phases of the construction process are
performed correctly then an efficient compiler can be
developed.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 33/40
REFERENCES
1) Principles of compiler design-
By Alfred V. Aho & Jefferey D. Ullman
2) Compilers techniques, design & tools- by Alfred V.
Aho , Jefferey D. Ullman & Ravi Sethi
3) Websites:
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 34/40
www.wikipaedia.org
www.crazyengineers.com
www.mec.com
www.curriri.com
ACKNOWLEDGEMENT
This project is the outcome of the efforts of several people, apart
from the team members, and it is important that their help be
acknowledged here.
First of all I want to present my sincere gratitude and deep
appreciations to Mr. ROHIT SACHAN and Mr. ABHINAV YADAV
(PROJECT INCHARGE) for their valuable support and guidance.
Without motivation, a person is literally unable to make her best
effort. I am highly grateful to them for their guidance as they
played an important role in making this project a success.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 35/40
I would also like to devote my special thanks to Mrs. LALITA
VERMA (H.O.D-C.S.E) for availing us with the various required
resources.
KHUSHBOO SHARMA0609210046B. TECH. (5th SEM.)
COMPUTER SCIENCE & ENGINEERING
ACKNOWLEDGEMENT
This project is the outcome of the efforts of several people, apart
from the team members, and it is important that their help be
acknowledged here.
First of all I want to present my sincere gratitude and deep
appreciations to Mr. ROHIT SACHAN and Mr. ABHINAV YADAV
(PROJECT INCHARGE ) for their valuable support and guidance.
Without motivation, a person is literally unable to make her best
effort. I am highly grateful to them for their guidance as they
played an important role in making this project a success.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 36/40
I would also like to devote my special thanks to Mrs. LALITA
VERMA (H.O.D-C.S.E) for availing us with the various required
resources.
NIHARIKA SETH
0609210067B. TECH. (5th SEM.)COMPUTER SCIENCE & ENGINEERING
MINI PROJECTON
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 37/40
SUBMITTED TO: -SUBMITTED BY: -
Mr. ROHIT SACHANKHUSHBOO SHARMA
& (0609210046)
Mr. ABHINAV YADAV&
NIHARIKA SETH
(0609210067)
CERTIFICATE OF APPROVAL
This is to certify that KHUSHBOO SHARMA student
of B. Tech. - Computer Science & Engineering (5TH sem.)
of PRIYADARSHINI COLLEGE OF COMPUTER
SCIENCES,GREATER NOIDA (roll no. –
0609210046) has successfully completed her mini
project under my guidance.
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 38/40
Mrs.Lalita VermaH.O.D. (C.S.E. department)
Priyadarshini College of Computer Sciences
CERTIFICATE OF APPROVAL
This is to certify that NIHARIKA SETH student of B.
Tech. - Computer Science & Engineering (5TH sem.) of
PRIYADARSHINI COLLEGE OF COMPUTER
SCIENCES,GREATER NOIDA (roll no. –
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 39/40
0609210067) has successfully completed her mini
project under my guidance.
Mrs.Lalita VermaH.O.D. (C.S.E. department)
Priyadarshini College of Computer Sciences
8/8/2019 BHIMKAYA
http://slidepdf.com/reader/full/bhimkaya 40/40