BHIMKAYA

40
LEVEL-0-DFD (DATA FLOW DIAGRAM) INPUT OUTPUT (SOURCE CODE) (TOKENS) NFA generati on SOURCE CODE

Transcript of BHIMKAYA

Page 1: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 1/40

Page 2: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 2/40

SRS

( SOFTWARE REQUIREMENTSPECIFICATION)

1. INTRODUCTION

COMPILER Simply stated, a compiler is a program that reads a program

written in one language-the source language-and translates it

into an equivalent program in another language-the target

language. As an important part of this translation process, the

compiler reports to its user the presence of errors in the source

program.

Source Targetprogram program

Error messages

Compilers are sometimes classified as single-pass, multi-pass,

load-and-go, debugging, or optimizing, depending on how theyhave been constructed or on what function they are supposed to

perform. Despite this apparent complexity, the basic tasks that

any compiler must perform are essentially the same.

Compiler

Page 3: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 3/40

THE PHASES OF A COMPILER  

Conceptually, a compiler operates in  phases, each of which

transforms the source program from one representation to

another.

 

Page 4: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 4/40

The first three phases, forms the bulk of the analysis portion of a

compiler. Symbol table management and error handling, are

shown interacting with the six phases.

Symbol table management

An essential function of a compiler is to record the identifiers

used in the source program and collect information about various

attributes of each identifier. A symbol table is a data structure

containing a record for each identifier, with fields for the

attributes of the identifier. The data structure allows us to find

the record for each identifier quickly and to store or retrieve data

from that record quickly. When an identifier in the source

program is detected by the lex analyzer, the identifier is entered

into the symbol table.

Error Detection and Reporting

Each phase can encounter errors. A compiler that stops when it

finds the first error is not as helpful as it could be.

The syntax and semantic analysis phases usually handle a large

fraction of the errors detectable by the compiler. The lexical

phase can detect errors where the characters remaining in the

input do not form any token of the language. Errors when the

token stream violates the syntax of the language are determined

by the syntax analysis phase. During semantic analysis the

compiler tries to detect constructs that have the right syntactic

structure but no meaning to the operation involved.

Page 5: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 5/40

THE ANALYSIS PHASES:-

As translation progresses , the compiler’s internal representation

of the source program changes. Consider the statement,

position := initial + rate * 10

The lexical analysis phase reads the characters in the source

pgm and groups them into a stream of tokens in which each

token represents a logically cohesive sequence of characters,

such as an identifier, a keyword etc. The character sequence

forming a token is called the lexeme for the token. Certain tokens

will be augmented by a ‘lexical value’. For example, for anyidentifier the lex analyzer generates not only the token id but also

enter s the lexeme into the symbol table, if it is not already

present there. The lexical value associated this occurrence of id

points to the symbol table entry for this lexeme. The

representation of the statement given above after the lexical

analysis would be:

id1: = id2 + id3 * 10

Syntax analysis imposes a hierarchical structure on the token

stream, which is shown by syntax trees.

THE SYNTHESIS PHASES:-

Intermediate Code Generation

After syntax and semantic analysis, some compilers generate an

explicit intermediate representation of the source program. This

intermediate representation can have a variety of forms.

In three-address code, the source pgm might look like this,

Page 6: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 6/40

temp1: = inttoreal (10)temp2: = id3 * temp1temp3: = id2 + temp2id1: = temp3

Code Optimisation

The code optimization phase attempts to improve the

intermediate code, so that faster running machine codes will

result. Some optimizations are trivial. There is a great variation in

the amount of code optimization different compilers perform. In

those that do the most, called ‘optimising compilers’, a significant

fraction of the time of the compiler is spent on this phase.

Code Generation

The final phase of the compiler is the generation of target code,

consisting normally of relocatable machine code or assembly

code. Memory locations are selected for each of the variables

used by the program. Then, intermediate instructions are each

translated into a sequence of machine instructions that perform

the same task. A crucial aspect is the assignment of variables to

registers.

Page 7: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 7/40

LEXICAL ANALYSIS

Lexical analysis is the process of converting a sequence of characters into a sequence of tokens. Programs performing lexical

analysis are called lexical analyzers or lexers. A lexer is often

organized as separate scanner and tokenizer functions, though

the boundaries may not be clearly defined.

The purpose of the lexical analyzer is to partition the input text,

delivering a sequence of  comments and basic symbols.

Comments are character sequences to be ignored, while basic

symbols are character sequences that correspond to terminal

symbols of the grammar defining the phrase structure of the

input.

A simple way to build lexical analyzer is to construct a diagram

that illustrates the structure of the tokens of the source language,

and then to hand-translate the diagram into a program for finding

tokens. Efficient lexical analysers can be produced in this manner.

Role of a Lexical Analyzer

The lexical analyzer is the first phase of compiler. Its main task is

to read the input characters and produce as output a sequence of 

tokens that the parser uses for syntax analysis. As in the figure,

upon receiving a “get next token” command from the parser the

lexical analyzer reads input characters until it can identify the

next token.

Page 8: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 8/40

Source program

Fig. Interaction of lexical analyzer with parser.

Since the lexical analyzer is the part of the compiler that reads

the source text, it may also perform certain secondary tasks at

the user interface. One such task is stripping out from the source

program comments and white space in the form of blank, tab,

and new line character. Another is correlating error messages

from the compiler with the source program.

Issues in Lexical Analysis

There are several reasons for separating the analysis phase of 

compiling into lexical analysis and parsing.

1) Simpler design is the most important consideration. The

separation of lexical analysis from syntax analysis often allows us

to simplify one or the other of these phases.

2) Compiler efficiency is improved.

3) Compiler portability is enhanced.

token

get next

token

Lexical

analyser Parser

Symbol

table

Page 9: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 9/40

Tokens Patterns and Lexemes

There is a set of strings in the input for which the same token is

produced as output. This set of strings is described by a rule

called a pattern associated with the token. The pattern is set tomatch each string in the set. A lexeme is a sequence of 

characters in the source program that is matched by the pattern

for the token. For example in the Pascal’s statement const pi =

3.1416; the substring pi is a lexeme for the token identifier.

In most programming languages, the following constructs are

treated as tokens: keywords, operators, identifiers, constants,

literal strings, and punctuation symbols such as parentheses,commas, and semicolons.

SAMPLELEXEMES

INFORMAL DESCRIPTION OFPATTERN

constif relationidnumliteral

Constif <,<=,=,<>,>,>=pi,count,D23.1416,0,6.02E23“core dumped”

constif < or <= or = or <> or >= or>letter followed by letters anddigitsany numeric constantany characters between “ and“ except”

In the example when the character sequence pi appears in the

source program, the token representing an identifier is returned

to the parser. The returning of a token is often implemented by

passing and integer corresponding to the token. It is this integer

that is referred to as bold face id in the above table.

A pattern is a rule describing a set of lexemes that can represent

a particular token in source program. The pattern for the token

Page 10: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 10/40

const in the above table is just the single string const that spells

out the keyword.

Certain language conventions impact the difficulty of lexical

analysis. Languages such as FORTRAN require a certainconstructs in fixed positions on the input line. Thus the alignment

of a lexeme may be important in determining the correctness of a

source program.

Attributes of Token

The lexical analyzer returns to the parser a representation for the

token it has found. The representation is an integer code if the

token is a simple construct such as a left parenthesis, comma, or

colon .The representation is a pair consisting of an integer code

and a pointer to a table if the token is a more complex element

such as an identifier or constant .The integer code gives the

token type, the pointer points to the value of that token .Pairs are

also retuned whenever we wish to distinguish between instances

of a token.

Regular Expressions

In Pascal, an identifier is a letter followed by zero or more letters

or digits. Regular expressions allow us to define precisely sets

such as this. With this notation, Pascal identifiers may be defined

as

letter (letter | digit)*

The vertical bar here means “or” , the parentheses are used togroup subexpressions, the star means “ zero or more instances

of” the parenthesized expression, and the juxtaposition of letter

with remainder of the expression means concatenation.

Page 11: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 11/40

A regular expression is built up out of simpler regular expressions

using set of defining rules. Each regular expression r denotes a

language L(r). The defining rules specify how L(r) is formed by

combining in various ways the languages denoted by the

subexpressions of r .

Recognition of Tokens

The question of how to recognize the tokens is handled in this

section. The language generated by the following grammar is

used as an example.

Consider the following grammar fragment:

stmt àif expr then stmt

|if expr then stmt else stmt

|

expràterm relop term

|term

termàid

|num

where the terminals if , then, else, relop, id and num generate

sets of strings given by the following regular definitions:

if à if 

thenà ten

elseà else

relopà <|<=|=|<>|>|>=

idàletter(letter|digit)*

Page 12: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 12/40

numàdigit+ (.digit+)?(E(+|-)?digit+)?

For this language fragment the lexical analyzer will

recognize the keywords if, then, else, as well as the lexemes

denoted by relop, id, and num. To simplify matters, we assumekeywords are reserved; that is, they cannot be used as

identifiers. Unsigned integer and real numbers of Pascal are

represented by num.

In addition, we assume lexemes are separated by white space,

consisting of nonnull sequences of blanks, tabs and newlines. Our

lexical analyzer will strip out white space. It will do so by

comparing a string against the regular definition ws, below.

delimàblank|tab|newline

wsàdelim+

If a match for ws is found, the lexical analyzer does not return a

token to the parser. Rather, it proceeds to find a token following

the white space and returns that to the parser. Our goal is to

construct a lexical analyzer that will isolate the lexeme for the

next token in the input buffer and produce as output a pair

consisting of the appropriate token and attribute value, using the

translation table given in the figure. The attribute values for the

relational operators are given by the symbolic constants

LT,LE,EQ,NE,GT,GE.

Transition diagram

A transition diagram is a stylized flowchart. Transition diagram is

used to keep track of information about characters that are seen

as the forward pointer scans the input. We do so by moving from

position to position in the diagrams as characters are read.

Positions in a transition diagram are drawn as circles and are

called states. The states are connected by arrow, called edges.

Page 13: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 13/40

Edges leaving state s have labels indicating the input characters

that can next appear after the transition diagram has reached

state s. the label other refers to any character that is not

indicated by any of the other edges leaving s.

One state is labeled as the start state; it is the initial state of the

transition diagram where control resides when we begin to

recognize a token. Certain states may have actions that are

executed when the flow of control reaches that state. On entering

a state we read the next input character if there is and edge from

the current state whose label matches this input character, we

then go to the state pointed to by the edge. Otherwise we

indicate failure. A transition diagram for >= is shown in thefigure.

A recognizer for a language is a program that takes as input a

string x and answers ‘yes’ if a sentence of the language and ‘no’ 

otherwise. We compile a regular expression into a recognizer by

constructing a transition diagram called finite automation. A finiteautomation can be deterministic or non deterministic where non

deterministic means that more than one transition out of a state

may be possible out of a state may be possible on a same input

symbol.

76

8

0

star

t

other

Fig 5. Transition diagram

for >=

Page 14: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 14/40

DFA s are faster recognizers than nfas but can be much bigger

than equivalent NFAs.

Non deterministic finite automata

A mathematical model consisting :

1) a set of states S2) input alphabet3) transition function4) initial state5) final state

Lexical grammar 

The specification of a programming language will include a set of rules, often expressed syntactically, specifying the set of possiblecharacter sequences that can form a token or lexeme. Thewhitespace characters are often ignored during lexical analysis.

Token

A token is a categorized block of text. The block of textcorresponding to the token is known as a lexeme. A lexicalanalyzer processes lexemes to categorize them according tofunction, giving them meaning. This assignment of meaning isknown as tokenization. A token can look like anything; it justneeds to be a useful part of the structured text.

Consider this expression in the C programming language:

sum=3+2; 

Tokenized in the following table:

Page 15: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 15/40

lexeme

token type

sum IDENTIFIER

= OPERATOR

3 CONSTANT

+ OPERATOR

2 CONSTANT

;SPECIALCHARACTER

Tokens are frequently defined by regular expressions,

which are understood by a lexical analyzer generatorsuch as lex. The lexical analyzer (either generatedautomatically by a tool like lex, or hand-crafted) reads ina stream of characters, identifies the lexemes in the

stream, and categorizes them into tokens. This is called"tokenizing." If the lexer finds an invalid token, it will

report an error.

Following tokenizing is parsing. From there, theinterpreted data may be loaded into data structures, forgeneral use, interpretation, or compiling.

Page 16: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 16/40

2. PURPOSE

The first phase of the construction of a compiler is the

generation of a LEXICAL ANALYZER. The project aims at

building such an analyzer.

The program has been made on the concept of NFA (Non

Deterministic Finite Automata) for recognizing the

characters in the input and classifying them as tokens.

NFA (Non Deterministic Finite Automata) is a

mathematical model consisting:

1) a set of states (S)

2) input alphabet (∑)3) transition function (δ)4) initial state (q0)5) set of final states (F)

A lexical analyzer generator creates a lexical analyser using a set

of specifications usually in the format

p1 {action 1}

p2 {action 2}

. . . . . . . . . . . .

pn {action n}

Page 17: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 17/40

where pi is a regular expression and each action actioni is a

program fragment that is to be executed whenever a lexeme

matched by pi is found in the input. If more than one pattern

matches, then longest lexeme matched is chosen. If there are

two or more patterns that match the longest lexeme, the first

listed matching pattern is chosen.

This is usually implemented using a finite automaton. There is an

input buffer with two pointers to it, a lexeme-beginning and a

forward pointer. The lexical analyser generator constructs a

transition table for a finite automaton from the regular expression

patterns in the lexical analyser generator specification. The lexical

analyser itself consists of a finite automaton simulator that usesthis transition table to look for the regular expression patterns in

the input buffer.

This can be implemented using an NFA or a DFA. The transition

table for an NFA is considerably smaller than that for a DFA, but

the DFA recognises patterns faster than the NFA.

Using NFA

The transition table for the NFA N  is constructed for the

composite pattern p1|p2|. . .|pn, The NFA recognises the longest

prefix of the input that is matched by a pattern. In the final NFA,

there is an accepting state for each pattern pi. The sequence of 

steps the final NFA can be in is after seeing each input character

is constructed. The NFA is simulated until it reaches termination

or it reaches a set of states from which there is no transition

defined for the current input symbol. The specification for the

lexical analyser generator is so that a valid source program

cannot entirely fill the input buffer without having the NFA reach

termination. To find a correct match two things are done. Firstly,

whenever an accepting state is added to the current set of states,

Page 18: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 18/40

the current input position and the pattern pi is recorded

corresponding to this accepting state. If the current set of states

already contains an accepting state, then only the pattern that

appears first in the specification is recorded. Secondly, the

transitions are recorded until termination is reached. Upon

termination, the forward pointer is retracted to the position at

which the last match occurred. The pattern making this match

identifies the token found, and the lexeme matched is the string

between the lexeme beginning and forward pointers. If no

pattern matches, the lexical analyser should transfer control to

some default recovery routine.

Page 19: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 19/40

3. REQUIREMENTS

HARDWARE SPECIFICATIONS

• minimum 128 mb RAM

• Pentium processor

SOFTWARE SPECIFICATIONS

• Operating system(windows xp/vista(64-

bit)/me/2000/98)

• turbo C/C++ IDE

Page 20: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 20/40

CODING/* Program on lexical analysis */

#include<stdio.h>#include<conio.h>#include<graphics.h>#include<ctype.h>#include<string.h>#define MAX 30

void first(){

int gd=DETECT,gm;initgraph(&gd,&gm,"c:\\tc\\bgi");setcolor(GREEN);settextstyle(10,0,7);outtextxy(130,50,"LEXICAL");setcolor(YELLOW);settextstyle(10,0,7);

outtextxy(90,190,"ANALYSIS");settextstyle(1,0,4);getch();restorecrtmode();

}

void second (){

char str[MAX];

int gdriver=DETECT, gmod;initgraph(&gdriver,&gmod,"c:\\tc\\bgi");setcolor(RED);rectangle(20,85,615,435);rectangle(25,90,610,430);setcolor(GREEN);settextstyle(1,0,4);

Page 21: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 21/40

outtextxy(30,30,"SUBMITTED BY:-");settextstyle(6,0,1);setcolor(MAGENTA);outtextxy(40,130,"NAME :- KHUSHBOO SHARMA");

outtextxy(40,160,"BRANCH :- CSE(B) - V SEM");outtextxy(40,190,"ROLL NO. : - 0609210046");outtextxy(40,220,"COLLEGE :- PCCS");setcolor(YELLOW);settextstyle(1,0,2.5);outtextxy(300,250,"&");settextstyle(6,0,1);setcolor(BLUE);outtextxy(350,280,"NAME :- NIHARIKA SETH");outtextxy(350,310,"BRANCH :- CSE(B) - V SEM");

outtextxy(350,340,"ROLL NO. : - 0609210067");outtextxy(350,370,"COLLEGE :- PCCS");getch();restorecrtmode();

}void next(){

char str[MAX];int gdriver=DETECT, gmod;

initgraph(&gdriver,&gmod,"c:\\tc\\bgi");setcolor(RED);rectangle(20,85,615,435);rectangle(25,90,610,430);setcolor(GREEN);settextstyle(7,0,4);settextstyle(7,0,1);setcolor(GREEN+BLINK);outtextxy(110,110,"ENTER THE CODE TO BE ANALYSED.");outtextxy(110,160,"THE PROGRAM WILL FIND THE");outtextxy(110,210,"VARIOUS TOKENS PRESENT IN THE");outtextxy(110,260,"INPUT AND PROVIDE YOU WITH THE

SAME");getch();restorecrtmode();

Page 22: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 22/40

}

void main()

{ char str[MAX];int state=0;int i=0, j, startid=0, endid, startcon, endcon;clrscr();first();for(j=0; j<MAX; j++)str[j]=NULL;second(); //Initialise NULLnext();

printf("\nEnter the string to be analysed:\n\n");gets(str); //Accept input stringstr[strlen(str)]=' ';gotoxy(400,110);printf("Analysis:\n\n");

while(str[i]!=NULL){

while(str[i]==' ') //To eliminate spaces

i++;switch(state){

case 0: if(str[i]=='i') state=1; //if else if(str[i]=='w') state=3; //whileelse if(str[i]=='d') state=8; //doelse if(str[i]=='e') state=10; //elseelse if(str[i]=='f') state=14; //forelse if(isalpha(str[i]) || str[i]=='_'){

state=17;startid=i;

} //identifiers

else if(str[i]=='<') state=19;//relational '<' or '<='

Page 23: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 23/40

else if(str[i]=='>') state=21;//relational '>' or '>='

else if(str[i]=='=') state=23;//relational '==' or assignment '='

else if(isdigit(str[i])){

state=25; startcon=i;}//constant

else if(str[i]=='(') state=26;

//special characters '('

else if(str[i]==')') state=27;//special characters ')'

else if(str[i]==';') state=28;//special characters ';'

else if(str[i]=='+') state=29;

//operator '+'

else if(str[i]=='-') state=30;//operator '-'

break;

//States for 'if'case 1: if(str[i]=='f') state=2;

else { state=17; startid=i-1; i--; }break;

case 2: if(str[i]=='(' || str[i]==NULL){

printf("if : Keyword\n\n");state=0;i--;

Page 24: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 24/40

}else { state=17; startid=i-2; i--; }break;

//States for 'while'case 3: if(str[i]=='h') state=4;else { state=17; startid=i-1; i--; }break;

case 4: if(str[i]=='i') state=5;else { state=17; startid=i-2; i--; }break;

case 5: if(str[i]=='l') state=6;else { state=17; startid=i-3; i--; }break;

case 6: if(str[i]=='e') state=7;else { state=17; startid=i-4; i--; }break;

case 7: if(str[i]=='(' || str[i]==NULL){

printf("while : Keyword\n\n");state=0;i--;

}

else { state=17; startid=i-5; i--; }break;

//States for 'do'case 8: if(str[i]=='o') state=9;

else { state=17; startid=i-1; i--; }break;

case 9: if(str[i]=='{' || str[i]==' ' || str[i]==NULL ||str[i]=='(')

{printf("do : Keyword\n\n");state=0;i--;

}break;

Page 25: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 25/40

//States for 'else'case 10: if(str[i]=='l') state=11;

else { state=17; startid=i-1; i--; }break;

case 11: if(str[i]=='s') state=12;else { state=17; startid=i-2; i--; }break;

case 12: if(str[i]=='e') state=13;else { state=17; startid=i-3; i--; }break;

case 13: if(str[i]=='{' || str[i]==NULL){

printf("else : Keyword\n\n");state=0;

i--;}else { state=17; startid=i-4; i--; }break;

//States for 'for'case 14: if(str[i]=='o') state=15;

else { state=17; startid=i-1; i--; }break;

case 15: if(str[i]=='r') state=16;else { state=17; startid=i-2; i--; }break;

case 16: if(str[i]=='(' || str[i]==NULL){

printf("for : Keyword\n\n");state=0;i--;

}else { state=17; startid=i-3; i--; }break;

//States for identifierscase 17: if(isalnum(str[i]) || str[i]=='_'){

state=18; i++;

Page 26: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 26/40

}else if(str[i]==NULL||str[i]=='<'||str[i]=='>'||

str[i]=='('||str[i]==')'||str[i]==';'||str[i]=='='||str[i]=='+'||str[i]=='-') state=18;

i--;break;

case 18:

if(str[i]==NULL || str[i]=='<' || str[i]=='>' || str[i]=='(' ||str[i]==')' || str[i]==';' || str[i]=='=' || str[i]=='+' ||str[i]=='-')

{endid=i-1;printf("");

for(j=startid; j<=endid; j++)printf("%c", str[j]);

printf(" : Identifier\n\n");state=0;i--;

}break;

//States for relational operator '<' & '<='

case 19: if(str[i]=='=') state=20;else if(isalnum(str[i]) || str[i]=='_'){

printf("< : Relational operator\n\n");i--;state=0;

}break;

case 20: if(isalnum(str[i]) || str[i]=='_'){

printf("<= : Relational operator\n\n");i--;state=0;

}break;

Page 27: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 27/40

//States for relational operator '>' & '>='case 21: if(str[i]=='=') state=22;

else if(isalnum(str[i]) || str[i]=='_'){

printf("> : Relational operator\n\n");i--;state=0;

}break;

case 22: if(isalnum(str[i]) || str[i]=='_'){

printf(">= : Relational operator\n\n");i--;state=0;

}break;

//States for relational operator '==' & assignment operator'='

case 23: if(str[i]=='=') state=24;else{

printf("= : Assignment operator\n\n");

i--;state=0;

}break;

case 24: if(isalnum(str[i])){

printf("== : Relational operator\n\n");state=0;i--;

}break;

//States for constantscase 25: if(isalpha(str[i]))

{printf("*** ERROR ***\n\n");

Page 28: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 28/40

puts(str);for(j=0; j<i; j++)

printf(" ");printf("^");

printf("Error at position %d : Alphabet cannot followdigit\n", i);state=99;

}else if(str[i]=='(' || str[i]==')' || str[i]=='<' || str[i]=='>' ||str[i]==NULL || str[i]==';' || str[i]=='=')

{endcon=i-1;printf("");for(j=startcon; j<=endcon; j++)

printf("%c", str[j]);printf(" : Constant\n\n");state=0;i--;

}break;

//State for special character '('case 26:

printf("( : Special character\n\n");startid=i;state=0;i--;break;

//State for special character ')'case 27:

printf(") : Special character\n\n");state=0;i--;break;

//State for special character ';'case 28:

printf("; : Special character\n\n");state=0;i--;

Page 29: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 29/40

break;//State for operator '+'

case 29:printf("+ : Operator\n\n");

state=0;i--;break;

//State for operator '-'case 30:

printf("- : Operator\n\n");state=0;i--;break;

//Error State

case 99: goto END;}i++;

}printf("\n\nEnd of program\n\n");END:getch();

}

OUTPUT (example)

Correct input

Enter the string to be analysed : for(x1=0; x1<=10; x1++);

Analysis:

for : Keyword( : Special characterx1 : Identifier= : Assignment operator0 : Constant

Page 30: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 30/40

; : Special characterx1 : Identifier<= : Relational operator10 : Constant

; : Special characterx1 : Identifier+ : Operator+ : Operator) : Special character; : Special character

End of program

Wrong input

Enter the string to be analyzed: for(x1=0; x1<=19x; x++);

Analysis:

for : Keyword( : Special character

x1 : Identifier= : Assignment operator0 : Constant; : Special characterx1 : Identifier<= : Relational operator

Page 31: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 31/40

***ERROR***

for(x1=0; x1<=19x; x++);^error at position 12: alphabet cannot follow digit

LIMITATIONS

• The input text has to be entered in a single line without any

indentations.This is contrary to the presently followed style

for writing programs.

• The whitespace characters such as blanks, newline

characters and tabs are not considered for the analysis

(generation of tokens).

• If, during the analysis of the input, an invalid sequence of 

character is encountered, an error message is displayed and

the characters following this sequence are not analyzed.

Page 32: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 32/40

FUTURE PROSPECTS OF THEPROJECT

• This program generates the lexical analyzer which is the first

step in the construction of a compiler.

• If the further phases of the construction process are

performed correctly then an efficient compiler can be

developed.

Page 33: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 33/40

REFERENCES

1) Principles of compiler design-

By Alfred V. Aho & Jefferey D. Ullman

2) Compilers techniques, design & tools- by Alfred V.

Aho , Jefferey D. Ullman & Ravi Sethi

3) Websites:

Page 34: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 34/40

www.wikipaedia.org

www.crazyengineers.com

www.mec.com

www.curriri.com

ACKNOWLEDGEMENT

This project is the outcome of the efforts of several people, apart

from the team members, and it is important that their help be

acknowledged here.

First of all I want to present my sincere gratitude and deep

appreciations to Mr. ROHIT SACHAN and Mr. ABHINAV YADAV

(PROJECT INCHARGE) for their valuable support and guidance.

Without motivation, a person is literally unable to make her best

effort. I am highly grateful to them for their guidance as they

played an important role in making this project a success.

Page 35: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 35/40

I would also like to devote my special thanks to Mrs. LALITA

VERMA (H.O.D-C.S.E) for availing us with the various required

resources.

KHUSHBOO SHARMA0609210046B. TECH. (5th SEM.)

COMPUTER SCIENCE & ENGINEERING

ACKNOWLEDGEMENT

This project is the outcome of the efforts of several people, apart

from the team members, and it is important that their help be

acknowledged here.

First of all I want to present my sincere gratitude and deep

appreciations to Mr. ROHIT SACHAN and Mr. ABHINAV YADAV

(PROJECT INCHARGE ) for their valuable support and guidance.

Without motivation, a person is literally unable to make her best

effort. I am highly grateful to them for their guidance as they

played an important role in making this project a success.

Page 36: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 36/40

I would also like to devote my special thanks to Mrs. LALITA

VERMA (H.O.D-C.S.E) for availing us with the various required

resources.

NIHARIKA SETH

0609210067B. TECH. (5th SEM.)COMPUTER SCIENCE & ENGINEERING

MINI PROJECTON

Page 37: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 37/40

SUBMITTED TO: -SUBMITTED BY: -

Mr. ROHIT SACHANKHUSHBOO SHARMA

& (0609210046)

Mr. ABHINAV YADAV&

NIHARIKA SETH

(0609210067)

CERTIFICATE OF APPROVAL

This is to certify that KHUSHBOO SHARMA student

of B. Tech. - Computer Science & Engineering (5TH sem.)

of  PRIYADARSHINI COLLEGE OF COMPUTER 

SCIENCES,GREATER NOIDA (roll no. –

0609210046) has successfully completed her mini

project under my guidance.

Page 38: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 38/40

Mrs.Lalita VermaH.O.D. (C.S.E. department)

Priyadarshini College of Computer Sciences

CERTIFICATE OF APPROVAL

This is to certify that NIHARIKA SETH student of B.

Tech. - Computer Science & Engineering (5TH sem.) of 

PRIYADARSHINI COLLEGE OF COMPUTER 

SCIENCES,GREATER NOIDA (roll no. –

Page 39: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 39/40

0609210067) has successfully completed her mini

project under my guidance.

Mrs.Lalita VermaH.O.D. (C.S.E. department)

Priyadarshini College of Computer Sciences

Page 40: BHIMKAYA

8/8/2019 BHIMKAYA

http://slidepdf.com/reader/full/bhimkaya 40/40