Spam Filter

97
SPAM FILTER A Minor Project report submitted to Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal towards Partial fulfillment of the Degree of Bachelor of Engineering In Computer Science and Engineering Guided By:- Submitted By;- Ms. Suhani Agrawal Abhas Mehta 0832CS071004 Angil Jain 0832CS071013 KrishaDubey 0832CS071043 MaitreyeeBhise 0832CS071046

Transcript of Spam Filter

SPAM FILTER

A Minor Project report submitted toRajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal

towards Partial fulfillment ofthe Degree of

Bachelor of EngineeringIn

Computer Science and Engineering

Guided By:- Submitted By;-Ms. Suhani Agrawal Abhas Mehta 0832CS071004

Angil Jain 0832CS071013 KrishaDubey 0832CS071043 MaitreyeeBhise 0832CS071046

Computer Science and Engineering DepartmentChameliDevi Institute of Technology and Management,

Indore (M.P.)2009-10

CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,

INDORE

Enrollment No:0832CS071004, 0832CS061013, 0832CS061043, 0832CS061046

College Name:ChameliDevi Institute of Technology and Management

Branch: Computer Science and Engineering Sem: VI

E-mail: [email protected],[email protected], [email protected],[email protected]

1.Name of the Student: Abhas Mehta, Angil Jain, Krisha Dubey, Maitreyee Bhise

2.Title of the Project: “SPAM FILTER”

3.Name of the Guide: Ms.Suhani Agrawal

4.Education Qualification of the Guide: B.E.-Computer Science,M.B.A.

5.Working/Teaching experience of the Guide: 4 Years

6.Software used in the Project:Umbrello,SQLyog,Wamp Server 2.0h, Adobe Dreamviewer cs3

Name and Signature of the Student: Name,Designation and Signature of the Guide:

Abhas Mehta 0832CS071004 Ms. Suhani Agrawal Angil Jain 0832CS071013 Krisha Dubey 0832CS071043Maitreyee Bhise0832CS071046

Date:……………….. Date:………………

CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,

INDORE

CERTIFICATE OF AUTHENTICATED WORK

This is to certify that the project report entitled SPAM FILTER submitted to

R.G.P.V.Bhopal in partial fulfillment of the requirement for the award of the

degree of Bachelor of Engineering is an original work carried out by Abhas

Mehta, Angil Jain, Krisha Dubey, Maitreyee Bhise enrollment no.

0832CS071004,0832CS071013,0832CS071043,0832CS071046 under my

guidance. The matter embodied in this project is authentic and is genuine work

done by the student and has been submitted whether to this University or to any

other University/Institute for the fulfillment of the requirement of any course of

study.

Name and Signature of the Student: Name, Designation and Signature of the Guide:

Abhas Mehta 0832CS07100 Ms. Suhani AgrawalAngil Jain 0832CS071013Krisha Dubey 0832CS071043Maitreyee Bhise0832CS071046

Date:……………….. Date:………………

CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,

INDORE

RECOMMENDATION

A minor project report entitled “SPAM FILTER”, submitted by Abhas

Mehta(0832CS071004),Angil Jain(0832CS071013), Krisha

Dubey(0832CS071043), Maitreyee Bhise(0832CS071046) recommended and

forwarded for partial fulfillment of degree of Bachelor of Engineering in

Computer Science and Engineering of Rajiv Gandhi Proudyogiki

Vishwavidyalaya, Bhopal, for academic year of 2009-10.

Ms.Suhani Agrawal Mr.Prashant LakkadwalaProject Guide,Department Head of Department of Computer Science Computer Science and Engineering and EngineeringCITM,Indore CITM,Indore

Dr.C.N.S.Murthy Director,Department of Computer Science and Engineering CITM,Indore

CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,

INDORE

CERTIFICATE

A minor project report entitled “SPAM FILTER”, submitted by

Abhas Mehta(0832CS071004), Angil Jain(0832CS071013),

Krisha Dubey(0832CS071043), Maitreyee Bhise(0832CS071046) is

recommended and forwarded for partial fulfillment of degree of Bachelor of

Engineering in Computer Science and Engineering of Rajiv Gandhi Proudyogiki

Vishwavidyalaya, Bhopal, for academic year of 2009-10.

INTERNAL EXAMINER EXTERNAL EXAMINER

CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT, INDORE

ROLES AND RESPONSIBILITIES FORM

S.no Enrollment No.

Name of the Team Member

Role Tasks and Responsibilities

1. 0832CS061004 Abhas Mehta System Administrator

Coding+System Monitoring

2. 0832CS061013 Angil Jain Database Administrator

Coding+Database Design

3. 0832CS061043 Krisha Dubey Configuration Manager

Coding+Client & Server

Configuration4. 0832CS061046 Maitreyee Bhise System

ManagerCoding+Docume

ntation

Name and Signature of the Project Team Members:

1.Abhas Mehta 0832CS071004) Signature:2.Angil Jain 0832CS071013) Signature:3.Krisha Dubey 0832CS071043) Signature:4.Maitreyee Bhise0832CS071046) Signature:

Signature of the Guide:

Date:…………….

ABSTRACT

In this project we develop an anti spam software to be developed at the mail

servers to filter out the spam at the server it self. The software will be based on

the black list and white list of domain names from where the domain would be

allowed and denied respectively. The list would be continuously updated based on

the feedback of the users. Our approach combines white list black list & collective

approaches. We have substitute challenge response system with feedback

mechanism from recipents. It is through the same feed back mechanism that the

colobrative approach is implemented. The anti spam software works on the bases

of three list of domain names namely

1.white list(w-list)

2.black list (b-list)

3. suspicious list (s-list)

ACKNOWLEDGEMENTS

We express our sincere gratitude towards Ms.Suhani Agrawal,our Project Guide

Department of Computer Science & Engineering for providing us valuable

support and necessary help whenever required and also helping us explore new

technologies by the help of their technical expertise.

Also we would like to thank Mr. Prashant Lakkadwala Head of

Department of Computer Science and Engineering for providing us necessary

help.

We would also like to express our sincere gratitude towards Director Dr.

C.N.S. Murthy and Director General Dr. S. RAJASHEKHARIAH for

providing us valuable support.

I forward my sincere thanks to all teaching and non-teaching staff of

Computer Science & Engineering Department, C.I.T.M., Indore for providing

necessary information and there kind co-operation.

I would like to thank my classmates for there motivation and there

valuable suggestions during the project .

A blend of gratitude, pleasure and great satisfaction, is what I feel to

convey my indebtness to all those who have directly or indirectly contributed to

the successful completion of my project work.

Finally, I express my love and respect towards my Family Members who

are my strength in every work I do.

TABLE OF CONTENTS

TABLE OF CONTENTS

1)Introduction..........................................................................................

..

1.1)Background............................................................................

....

1.2)Objectives...............................................................................

...

1.3)Purpose scope and

Availability………………………………….......

Purpose................................................................................

Scope...................................................................................

Applicability.......................................................................

1.4)Achievements.........................................................................

....

1.5) Organization Of

report……………………………………….

2) Survey of

Technologies.........................................................................

3) Requirement and analysis................................................................

3.1) Problem

Definition...................................................................

3.2) Requirement

Specification......................................................

3.2.1) Functional

Requirements…………………………....

3.2.2) Non Functional Requirements…………………

3.2.3)planning and

scheduling……………………………...

3.3) Software and Hardware……………………….

3.3.1)Software requirements …………...………………….......

3.3.2Hardware requirements………………………

3.4)preliminary product description…………………………

3.5)conceptual

models……………………………………………

3.5.1)class

diagram…………………………………………

3.5.2)Usecase Diagram

…...................................................

3.5.3)Entity Relationship Diagram………….......………

3.5.4) Architecture Design……………………….......

……

3.5.5) Sequence Diagram………………………….......

……

3.5.6) Collaboration Diagram…..…………………......…

3.5.7) Activity Diagrams…………………………….......

3.5.8) Data Flow Diagram…...……………………….....

4) System

Design.......................................................................................

4.1) Basic

Modules...........................................................................

4.2) Data Design.......................................................................

4.3) User Interface Design..........................................................

4.4) Security

Issues.........................................................................

CHAPTER 1

INTRODUCTION

1.1)BACKGROUND

Major approaches adopted towards spam filtering include text analysis, white and

black lists of domain names and community based approaches.

Text analysis of contents of mails is a widely used approach towards the spams.

Many solutions deployable on server and client sides are available. Naive Bayes

is one of the most popular algorithms used in these approaches. SpamBayes and

Mozilla Mail spam filter are examples of such solutions. But rejecting mails based

on text analysis can be serious problem in case of false positives.Normally users

and organizations would not want any genuine e-mails to be lost.Black list

approach has been one of the earliest approaches tried for the filtering of spams.

The strategy is to accept all the mails except the ones from the domain/e-mail ids

Explicitly blacklisted. With newer domains entering the category of spamming

domains this strategy tends to not work so well.White list approach is the strategy

of accepting the mails from the domains/addresses explicitly white listed and put

others in a less priority queue, which is delivered only after sender responds to a

confirmation request sent by the spam filtering system. The problem with this

"Challengeresponse system" is that it assumes that genuine mails are not sent by

automatic e-mailaccounts, which is increasingly not the case. Various transacti

on management systems, automated responses, mailing lists etc. pose problems

for this approach.Solutions like SpamAssassin provide for incorporation of many

of the approaches, largely relying on "Rule based approach" but in the process

gains the obvious disadvantage of excessive overheads and being resource

intensive.Further possibilities of false positives cannot be denied

altogether.Another approach talked about in literature and practice is that of

collaborative approach towards spam filtering. The concept is that of cooperation

amongst a set of users.This strategy derives on the usual bulk nature of spams,

where several users are expected to receive the same spam. The idea is for the

early receiver to add the spam to a central repository so that late receivers can

later filter the message out by matching it from the ones in the repository. This

cooperative and collaborative approach obviously has administrative and policy

problems. An attempt to make a completely automated system, thus, might not

work out well. Vipul's Razor is a widely used such tools.

1.2)OBJECTIVES

The following are the objectives of the project:

1.User needs to be able to distinguish spam from legitimate messages. To do this

he needs to identify typical spam characteristics & practices.

2. We need to stop the spams as much as possible.

3.Updation of database of the three lists i.e. black list,white list and suspicious

list.

1.3)PURPOSE SCOPE AND APPLICABILITY1.3.1)PURPOSEReduced Employee Productivity – organizations lose productivity to the extent

employees spend time viewing and deleting spam messages rather than doing

productive tasks. Cost estimates vary, but one large organization with 52,000

employees—using a rule of thumb that a person takes approximately 12 seconds

to scan and delete one spam message—estimated that each employee would spend

5½ hours per year deleting spam at a cost of $14 million.

Increased Network Resource Costs – as the volume of spam e-mail increases so

does the cost of system resources to support it. If 50% of e-mail coming into an

organization is spam, then half of an organization’s e-mail servers (as well as

related LAN bandwidth and storage backups) are dedicated solely to processing

and storing junk mail. In addition, all this e-mail traffic requires additional

bandwidth, network hardware, and archival storage capacity.

*Increased IT Administration Costs – with more IT infrastructure and end-user

problems comes the need for additional IT resources. This includes additional

network and e-mail administrators, as well as additional help desk/technical

support resources to assist end users.

*Increased Legal Liability Risk – a significant proportion of spam includes

offensive, or hate-based content that enterprises cannot allow into their

organizations. Otherwise, they risk creating a “hostile workplace.” For example,

HR organizations at many large enterprises fear their companies may become the

targets of lawsuits unless they quickly take proactive steps to stem the tide of

pornographic spam.

*Reduced Security and Control – IT security staff worry that spammers will take

advantage of the biggest weakness of their security architecture—people. Tricking

end users into opening messages or attachments containing malicious applications

or viruses, or tricking the user into divulging sensitive company information

represents a serious and increasing security exposure. In addition, some of the

cures are worse than the disease: desktop solutions often allow end-users to ignore

established security practices and define their own spam policies, while

outsourced solutions require organizations to hand over control of their

confidential, mission-critical e-mail stream to a third party. Add to this the

personal cost of various spam-based scams to employees, who may be tricked into

spending money on unwanted items, divulging personal data that the spammers

can then use for identity theft, or falling prey to fraudulent get-rich-quick

schemes.

1.3.2)SCOPEFunctions:The following functions are to be implemented in this project

1.View white listThe user can view the white list and verify the list of those domain names

which are present there.

2.View black listThe viewer can check to see if there is any domain name which should

have been in the white list.

3.View the mailsThe viewer can view the mails which he is receiving .

1.3.2.1)PROJECT RESOURCESThe project resources are categorized as:*Human Resources:The human resource contains a necessary description of the skills

required to complete the development, the specialty of developers, and the

organizational hierarchy in terms of the organizational position of each person

involved in development.

Specialty and skills The project is an application development project and thus calls for

the specialty of developers in selected programming language

which is C for our project.We are also using PHP for frontend

andMysql server fordatabase .Good programming skills

aproperhandling of related operations is the requirement for the

human resources.

Organizational positionThe developer team consists of members all at the same position.

Each member performs the task of a team manager as well as a

software engineer in coordination with the other member of the

team.

* Reusable Software ResourcesReusable software resources are the reusable components used for building the

system. They can be divided into the following categories:

Off the Shelf Components

We are using MILTER APIs which are used for making mail filter. Full-Experience Components:

The full-experience components for our project include the

various functions provided by C and PHP.

Partial-Experience Components

There are no partial-experienced components in our software.

New Component

There are no new components in our software.

1.3.3)APPLICABILITY

Spam filters are used in organization’s e-mail servers (as well as related LAN

bandwidth and storage backups).

1.4)ACHIEVEMENTS

*By this we can increases Employee Productivity.

*It also reduce Network Resource Costs.

*It reduceses IT Administration Costs.

*It reduceses Legal Liability Risk.

*It increases Security and Control.

1.5)ORGANISATION OF REPORT The project report till gives only the introduction to application the description

that follows gives you detail of what the system is and how system work.The

most important part to follow in report is how we have realized our project

including technologies and tools used ,requirements analysis, how we planned to

meet deadlines ,software and hardware requirements both at server and client

side, Preliminary product description,

Various conception models including Class diagram, sequence diagram, use case

diagram, activity diagram, collaboration diagram, ER diagram and other’s.

Then follows System Design that include basic modules, Data design, Procedural

design, User interfaces, security issues and test cases design.

Finally report consists of implementing and testing details and at last conclusion,

future extension and Improvements.

CHAPTER 2

SURVEY OF TECHNOLOGY

ProgrammingLanguage

In our project we have used C and PHP for programming purpose.

PHP (a recursive acronym of PHP: Hypertext Preprocessor) is an open source ,

server-side web web scripting language for creating dynamic web pages . Outside

of it being browser independent , it offers a simple and universal cross platform

solution for e-commerce ,and complex web and database-driven applications.

PHP has :

1. A low ,smooth learning curve.

2. Broad functionality for databases , strings , network connectivity, file

system support, Java, COM ,XML,CORBA ,WDDX,and Macromedia

Flash.

3.Platform compatibility with UNIX(all

variants),Win32(NT/95/98/2000),QNX ,MacOS(Web Ten) , OSX <OS/2,

and BeOS.

4. Server compatibility for Apache module (UNIX ,Win 32) , CGI/Fast

CGI, thttpd, fhttpd,phttpd , ISAPI(IIS,Zeus), NSAPI(Netscape

iPlanet),Java servelet engines , AQLServer , and Roxen/ Caudium module.

5. A rapid development cycle. New version with bug fixes , additional

functionality , and other improvements are released every few months.

6. A vibrant and suppoprtivr community. Code examples and free code

abound . 7. The PHP group has done an excellent job of providing new

users with resources and support. Different functions of PHP used are as follows :

1.mysql_connect():

resource mysql_connect([string hostname[:port][:/path/to/socket]] [, string username] [,string password])

This function establishes the connection to a MYSQL server on the specified

hostname(or localhost if none is specified) . It returns a link identifier if

successful , or false otherwise.

2.mysql_close():

boolean mysql_close ([resource link_identifier])

This function closes non_persistent links to the MySQL server and returns true or

false , depending on its success.

3.mysql_select_db():

mysql_select_db boolean (string database_name [,resources link-identifier])

This function is equivalent to the USE statement in the MySQL interpreter . It

Sets the currently active database . Subsequent calls to mysql _query are then

executed against the selected database.

4.mysql_query():

resource mysql (string query [, resource link_identifier])

This is used to send SQL statements to the MySQL server to be executed . For

queries other than SELECT statements , the function returns true on success and

false on failure.For SELECT statements , this function returns a link identifier on

success and false on failure. The link identifier can be used with mysql_result()

or one of the mysql_fetch*() functions to access the resulting data.

5.mysql_affected_rows():

int mysql_affected_rows([resource link identifier])

This returns the number of rows that were changed by the most recent INSERT

,REPLACS, UPDATE, or DELETE query for the given link_identifier.

6.mysql_result():

mixed mysql_result(resource result , int row[, mixed field])

This is used to retrieve a single value from a mysql_query() resultset. To retrieve

a full row of data from the resultset , refer the mysql_fetch() function covered

later.

Sendmail API’s(Milter)

The Sendmail Content Management API (Milter) provides an interface

for third-party software to validate and modify messages as they pass

through the mail transport system. Filters can process messages'

connection (IP) information, envelope protocol elements, message

headers, and/or message body contents, and modify a message's

recipients, headers, and body. The MTA configuration file specifies

which filters are to be applied, and in what order, allowing an

administrator to combine multiple independently-developed filters.We expect to see both vendor-supplied, configurable mail filtering

applications and a multiplicity of script-like filters designed by and for

MTA administrators. A certain degree of coding sophistication and

domain knowledge on the part of the filter provider is assumed. This

allows filters to exercise fine-grained control at the SMTP level.

However, as will be seen in the example, many filtering applications can

be written with relatively little protocol knowledge.

Given these expectations, the API is designed to achieve the following

goals:

1. Reliability. Coding failures in a Milter process that cause that

process to hang or core-dump should not stop mail delivery.

Faced with such a failure, sendmail should use a default

mechanism, either behaving as if the filter were not present or as

if a required resource were unavailable. The latter failure mode

will generally have sendmail return a 4xx SMTP code (although

in later phases of the SMTP protocol it may cause the mail to be

queued for later processing).

2. Simplicity. The API should make implementation of a new filter

no more difficult than absolutely necessary. Subgoals include:

o Encourage good thread practice by defining thread-clean

interfaces including local data hooks.

o Provide all interfaces required while avoiding unnecessary

pedanticism.

3. Performance. Simple filters should not seriously impact overall

MTA performance.

UMBRELLOUmbrello UML Modeller helps the software development process by using industry

standard unified Modelling Language(uml)to enable you to create diagrams for

designing and documenting your systems.

Umbrello UML Modeller is a UML diagram tool that can support you in the software

development process. Especially during the analysis and design phases of thisprocess, Umbrello UML Modeller will help you to get a high quality product. UML

can also be used to document your software designs to help you and your fellow

developers.

Having a good model of your software is the best way to communicate with other

developers working on the project and with your customers. A good model is

extremely important for medium and big-size projects, but it is also very useful for

small ones. Even if you are working on a small one man project you will benefit from

a good model because it will give you an overview that will help you code things

right the first time.

The Unified Modelling Language (UML) is a diagramming language or notation to

specify, visualize and document models of Object Oriented software systems. UML is not

a development method, that means it does not tell you what to do first and what to do next or

how to design your system, but it helps you to visualize your design and communicate

with others. UML is controlled by the Object Management Group

(OMG) and is the industry standard for graphically describing software.

UML is designed for Object Oriented software design and has limited use for other

programming paradigms.UML is composed of many model elements that represent

the different parts of a software system. The UML elements are used to create

diagrams, which represent a certain part, or a point of view of the system. UML is the

diagramming language used to describing such models. You can represent your ideas

in UML using different types of diagrams. Umbrello UML Modeller 1.2 supports the

following types:

Use Case Diagrams show actors (people or other users of the system), use cases (the

scenarios when they use the system), and their relationships.

Class Diagrams show classes and the relationships between them.

Sequence Diagrams show objects and a sequence of method calls they make to other objects.

Collaboration Diagrams show objects and their relationship, putting emphasis on the objects that participate in the message exchange.

State Diagrams show states, state changes and events in an object or a part of the system.

Activity Diagrams show activities and the changes from one activity to another with the events occurring in some part of the system.

Component Diagrams show the high level programming components (such as KParts or Java Beans).

Deployment Diagrams show the instances of the components and their relationships.

CHAPTER 3

REQUIREMENTS ANDANALYSIS

3.1)PROBLEM DEFINITION:

Unsolicited e-mails usually sent in bulk, or spams are growing concerns for

users;mail server administrators, ISPs and business and other organizations all over

the world. According to some estimates as much as up to 60 percent of Internet e-

mail tra_c is due to spams. Given the scale and ubiquity of the problem, anti-spam

solutions are being sought and provided all over the world. Several

solutions exist and many more are being talked about.

In this project we will develop an anti-spam software to be

deployed at the mail-servers to filter out the spam at the server itself. The

software will be based on the white and black lists of domain names from

where the mails would be allowed and rejected respectively. The lists would

be continuously updated based on the feedback from users (recipients of the

mails).

3.2) REQUIREMENT SPECIFICATION

REQUIREMENT DOCUMENTATION:

In the software the user should be able to enter his/her

request/complain regarding some domain names.

The user should be able to receive the genuine mails.

He should be able to lodge complain to the administrator in case he

receives mails which are spams.

He should be informed to enter certain domain names into the white

list in case they are genuine and in the suspicious list.

In case the user wants to see the list of domain names present in the black list and white list then he should be able to view that.

On the basis of Requirement definition we can restate our requirement

definition in technical terms by dividing the project into different modules,

each module takes care of its respective task assigned: -

User database is maintained.

Enter the user name and password and the user will be able to access

databases.

FUNCTIONAL & NON FUNCTIONAL REQUIREMENTS

Functional Requirements

Login. Check the database if user name and password are correct.

Register a new user if wanted.

A user can view the list of domain names present in the black as

well as white list.

User can lodge a complain regarding the entrance of genuine

domain names into he black list with the administrator.

Non-functional Requirements

Describe a verification system that limits our choice for developing a solution

to the problem.

The system must be reliable enough in most situations .The system

must be correct to a high degree. Extensibility is also a key

requirement to make the application compatible with the ever-changing

world of present day computers and to enhance its functionality from

time to time as desired.

GENERAL TYPES OF REQUIREMENT

PHYSICAL ENVIRONMENT: The software will be deployed at the server

level.

USER AND HUMAN FACTOR: There is not much for the user to interact

with this software except adding the domain names to he white list and

viewing white list/black list contents.

3.3)PLANNING AND SCHEDULING

Identification of various category of users.

User can view list of domain names present in black and white list.

Designing of various WebPages including Home page, Login page etc.

Schema Definition- Databases and table structure have been designed

3.4)SOFTWARE AND HARDWARE REQUIREMENTS3.4.1)SOFTWARE REQUIREMENTS

Send-mail Server

Apache Web Server

Linux Operating System

3.4.2) HARDWARE REQUIREMENTS

(i) 64-MB RAM(ii) Network Interface Card

3.5) PRELIMINARY PRODUCT DESCRIPTION User database is maintained.

Enter the user name and password and the user will be able to access

databases.

A user can view the list of domain names present in the black as well as

white list.

User can lodge a complain regarding the entrance of genuine domain names

into he black list with the administrator.

3.6)CONCEPTUAL MODELS

Data Flow Diagram (DFD)

It is a directed graph where nodes represents processing activity and arc represent data items transmitted between processing nodes.

Spam filter

Inbox

0-LEVEL DFD

Domain name1 Domain name3

Incoming Mails

Genuine mails

White list Suspicious list Black list

Internet

Request for connection

Mlfi_connect

Domain name

Env_from

From field

Mail check

Reject

Deliver the genuine mail

White list

Black list

Suspicious list

Domain name3

Domain name 2Domain name 1

Black __check

Forge_check

Bool 1

Bool 2

White_check

Bool 3

Add_suspicious

LEVEL-II DFD

Mail check

White list Black listSuspicious list

White list

Black list

Domain name 1

Domain name 2

Domain name 3

LEVEL-I DFD

ER-Diagram

Suspicious list

Use-Case Diagram

Sequence Diagram

Flowchart

CHAPTER 4

SYSTEM DESIGN

4.1) BASIC MODULES MODULE 1

NEW REGISTRATION

INPUTS TO THE MODULE:

1. First Name

2. Last Name

3. User Name

4. Password

5. Retype Password

6. Contact Number

7. City

8. Enter the Image

9. Create my Account

10. Cancel

OUTPUT OF THE MODULE:

New Registration Successful.

MODULE 2

SIGN-IN MODULE

INPUTS TO THE MODULE:

1. User Name

2. Password

OUTPUT OF THE MODULE:

Successful Login, if correct user name and password else directed towards error page.

MODULE 3

SUBMODULE 1: Date Time Access

INPUTS TO THE SUBMODULE:

1. Date Time

2. Exit

3. Sign In

OUTPUT TO THE SUBMODULE:

Shows date and time of last login, exit directs towards Sign-In page.

33

SUBMODULE 2:File Access

INPUT TO THE SUBMODULE:

1. File Name

2. View

3. Back

OUTPUT TO THE SUBMODULE:

If we choose view then we can view the file and if choosen Back then we are directed towards Sign-In page.

4.2)Data DesignDesign concepts provide the software designer with the foundation from which more

sophisticated design methods can be applied. Each design concept helps the software

engineer to answer the following question:

What criteria can be used to partition software into individual components?

How function or data structure detail separated from a conceptual

representation of the software?

What uniform criteria define the technical quality of a software design?

The main design concepts are outlined below:

Abstraction:-Abstraction permits one to concentrate on a problem at some

level of generalization without regard to irrelevant low level details; use of

abstraction also permits one to work with concepts and terms that are familiar

in the problem environment without having to transform them to an

unfamiliar structure.

Refinement: Stepwise refinement is a top level design activity which

41

involves developing a program by successively refining levels of procedural

detail. In each step of refinement, one or several instruction of the given

program are decomposed into more detailed instruction until all the

instructions are expressed in terms of any underlying computer or

programming language.

Modularity:-Modularity is the design concept in which the software is

divided into separately named and addressable components, often called

modules, that are integrated to satisfy the problem requirements.

Software Architecture:- Software architecture alludes to “the overall

structure of the software and the ways in which that structure provides

conceptual integrity for a system”. It’s a hierarchical structure of program

components (modules), the manner in which those components interact and

the structure of data that are used by the components.

Control Hierarchy:- Also known as the program structure, the control

hierarchy represents the organization of the program components (modules)

and implies a hierarchy of control.

Structural Partioning: Structural partitioning is a process of partitioning the

program structure both horizontally and vertically. Horizontalpartitioning

involves defining separate branches of the modular hierarchy for each major

program function. Vertical partitioning, on the other hand, suggest that

control (decision making) and work should be distributed top down in the

program structure. Data Structure:- Data structure is a representation of logical relationship

among the individual elements of data. Data structure dictates the

organization, methods of access, degree of associativity ,and processing the

alternatives for information.

42

Software Procedure:- The software procedure focuses on the processing

details of each module individually and must provide a precise specification

of processing including the sequence of events, exact decision points,

repetitive operations, an even data organization and structure.

Information Hiding: The principle of information hiding states that the

modules should be specified and designed so that information (procedures

and data) contain within a module is inaccessible to other modules that have

no need for such information.

Data Structure

Database system is used to store white list, black list and suspicion list.

Inbuilt structures of c/linux are used.

DATA DICTIONARY:

Name: UserAliases: none How used Description:

User = [email acc. holder|Administrator]Email acc holder=email id + passwordemail id= * xyz+@+domain name *password =*any 8 digit string*

Administrator = login id+passwordLogin id = personal idPersonal id = *a 3 digit unique no.*Password = *any 8 digit string*

Name:database [white list | black list | suspicion list]Aliases: noneWhere used : Will be used as the spam filter in the mail server to identify the spamsHow used : To compare the from field of the mail header to entries present in the databaseDescription:

database =[white list|black list|suspicion list]

43

White list =[genuine domain names]Black list=[Non-genuine domain names]Suspicion list=[Domain names which are not present either in the black list or the white list]Domain name=* xyz+@+abc.com| abc.com *

Name: e-mail Aliases: noneWhere used :received by the mail serverHow used : used by the user if genuineDescription:

e-mail = * header+content*header=[from field | to field ]from field=*emil-id of sender*to field=*email-id of receiver*content=*message*

Procedural and Functional details implementation

Function to implement updating of database through nice user

interface. Function to implement accessing of database for reading purpose by

the user.

Function to implement storing data in database.

Function to take user input.

Procedure to combine the various modules into one using GUI.

Interface Characterization

Options for accessing of the various databases by the user provided.

Text-areas and text-fields for inserting data into the database.

44

Design translation to programming languages: Database using Mysql.

Coding the functions and modules in syntactically valid statements

in PHP and C integrating them using GUI.

Algorithm Design First of all mail comes through the internet.

The domain name of the from field is extracted.

Now , the domain name is matched with the list of the domain-

names present in the black list.

If the list matches then the mail is simply rejected.

Otherwise the domain name is matched with the white list.

If the list matches then the mail is sent to the user.

If the list does not match then the domain name is added to a list

called suspicious list and the mail is sent to the user with an

attachment asking the user to add it into the white list .

If the user does not add the domain name into the white list then it

automatically gets added into the black list after a given time.

4.5)SECURITY ISSUES

45

In order to maintain security of the system, authentication based access is used.

Although there are various authentication techniques available, but this applicationis

more useful..

This means that the user needs to provide the ID and password for accessing its

account. Mails received by the user are compared with domain names and

accordingly they are put into the lists. Else the user is directed towards the suspected

list.

46

CHAPTER 5

IMPLEMENTATION AND TESTING

47

Code-

#include <stdio.h>#include <stdlib.h>#include <string.h> /*libsm string manipulation*/#include <sysexits.h> /*Exit status codes for system programs.*/#include <unistd.h> /*Include machine specific syscallX mcros*/#include "/root/filter/mysql/mysql.h" /*defines for the libmysql library*/#include <libmilter/mfapi.h> /*include file for mail filter library MYSQL my_connection; functions*/MYSQL_RES *res_ptr;MYSQL_ROW sqlrow;char* domain_name;SMFICTX *a;int rejflag=0 ,whiteflag=0; /********************************mlfi_connect**********************************/sfsistatmlfi_connect(ctx, hostname, hostaddr)

SMFICTX *ctx; char *hostname; _SOCK_ADDR *hostaddr;

/*Called When:- Once, at the start of each SMTP

connection. Default Behavior:- Do nothing; return

SMFIS_CONTINUE. Argument:- Description ctx

the opaque context structure. hostname the host name of the message

sender,as determined by a

reverse lookup on the host

address. If the reverse

lookup fails, hostname will contain

the message sender's IP address

enclosed in square brackets.

48

hostaddr the host address, as determined by a

getpeername() call on the SMTP socket.

NULL if the type is not supported in

the current version or if the SMTP

connection is made via stdin.*/{

struct mlfiPriv *priv;char *ident;a=ctx;

printf("%s",hostname);domain_name=hostname;

printf("\n");

return SMFIS_CONTINUE; /* continue processing */

}

/*****************************mlfi_envfrom*******************************/sfsistat mlfi_envfrom(ctx, argv)

SMFICTX *ctx; char **argv;

/*Called When:- mlfi_envfrom is called once at the beginning

of each message, before mlfi_envrcpt. Default Behavior:- Do nothing; return SMFIS_CONTINUE. Argument

Description ctx

Opaque context structure. argv

Null-terminated SMTP command arguments; argv[0]

is guaranteed to be the sender address. Later

arguments are the ESMTP arguments. Return

Values:- Description SMFIS_TEMPFAIL

Reject this sender and message with a

49

temporary error; a new sender (and

hence a new message) may subsequently

be specified. mlfi_abort is not called.

SMFIS_REJECT Reject this sender and message; a new

sender/message may be specified.

mlfi_abort is not called.

SMFIS_DISCARD Accept and silently discard this

message.mlfi_abort is not called. SMFIS_ACCEPT

Accept this message.mlfi_abort is not called*/

{char *mailaddr = smfi_getsymval(ctx, "{mail_addr}");int argc = 0; /*Called When:- smfi_getsymval

may be called from within any of the mlfi_*

callbacks. Which macros are defined will depend on

when it is called. Argument

Description ctx

The opaque context structure. symname

The name of a sendmail macro. Single letter

macros can optionally be enclosed in braces

("{" and "}"), longer macro names must be

enclosed in braces, just as in a sendmail.cf

file. Return Values:-

smfi_getsymval returns the value of the given

macro as a null-terminated string, or NULL

if the macro is not defined.*/

printf("MAIL FROM :%s",smfi_getsymval(ctx, "{mail_addr}"));mail_check(mailaddr);

printf("\n\n");

50

/* continue processing */if(rejflag==0){

return SMFIS_CONTINUE;

}else

{

return SMFIS_REJECT;

}

} /************************************mail_check***********************************/

mail_check( char *mailaddr)/*This function called after the smfi_getsymval() and receieve the

return value (null terminated string) from the smfi_getsymval*/ {printf("\n*************** check_black_list*****************");

black_check(mailaddr); /*This function check the domain name which is enter by the email user with the database value if this domain name is invalid then restrict the user to send mail.*/ if(rejflag==0){printf("\n **************check_white_list******************");printf("\n\n");

white_check(mailaddr); /*This function check the domain name which is enter by the email user with the database value if this domain name is valid then allow the user to send the mail.*/ if(!whiteflag){printf("\n**************** add_to_suspicious****************");sus_add(mailaddr); /*This function will be called when the domain name not present in the black list or whitelist and add the domain name into the database

(suspicious list).*/ }}}

51

/***************************black_check*************************/

black_check( char* mailaddr)

{

int res;char **r;mysql_init(&my_connection);

if(mysql_real_connect(&my_connection,"127.0.0.1","root","","nit",0,NULL,0))

/*This function establishes the connection to a MYSQL server on

the specified hostname(or localhost if none is specified) . It

returns a link identifier if successful , or false otherwise.*/{res = mysql_query(&my_connection,"SELECT * FROM b_list ");

/*This is used to send SQL statements to the MySQL server to be

executed . For queries other than SELECT statements , the function returns true on success and false on failure.For

SELECT statements , this function returns a link identifier on

success and false on failure. The link identifier can be used with mysql_result() or one of the mysql_fetch*() functions to

access the resulting data.*/if(res) {

printf("SELECT error:%s\n",mysql_error(&my_connection));

}

else

{res_ptr = mysql_use_result(&my_connection);

while((sqlrow = mysql_fetch_row(res_ptr))) {if(!strcmp(mailaddr,sqlrow[0])){rejflag=1;printf("\n REJECT");printf("%s\n",sqlrow[0]);printf("\n\n\n\n");break;}else{

52

rejflag=0;printf("\n OK");}}return EXIT_SUCCESS;}}}

/*****************************white_check***************************/

white_check( char* mailaddr){int res;char **r;mysql_init(&my_connection);if(mysql_real_connect(&my_connection,"127.0.0.1","root","","nit",0,NULL,0)){ /*This function establishes the connection to a MYSQL

server on the specified hostname(or localhost if none is specified) . It returns a link identifier if successful , or false otherwise.*/res = mysql_query(&my_connection,"SELECT * FROM w_list "); /*This is used to send SQL statements to the MySQL server to be executed . For queries other than SELECT statements

, the function returns true on success and false on

failure.For SELECT statements , this function returns a link identifier on success and false on failure. The link

identifier can be used with mysql_result() or one of the

mysql_fetch*() functions to access the resulting data.*/

if(res) {

printf("SELECT error:%s\n",mysql_error(&my_connection)); /*This function is called when mysql query has some mistake.*/ }

else

{

res_ptr = mysql_use_result(&my_connection);

while((sqlrow = mysql_fetch_row(res_ptr)))

53

/*To retrieve the rows of records returned from the server,the mysql_fetch_row() function is used.It takes a

result set pointer returned from a previous query,and

returns an array corresponding to the fetch row(or false if there are no more rows left).*/{if(!strcmp(mailaddr,sqlrow[0])){whiteflag=1;printf("\n WHITE_OK :");printf("%s\n",sqlrow[0]);printf("\n\n\n\n");break;}else{whiteflag=0;

printf("\n Add to Suspicious List");

}}}}}

/**************************sus_add******************************/

sus_add(char *mailaddr){int res;char **r;char mysql[150]="insert into sus_list values('";const char b[3]="')";int i=29;int j=0;while(mailaddr[j]){mysql[i]=mailaddr[j];i++;j++;}j=0;while(b[j]){mysql[i]=b[j];i++;j++;}mysql[i]='\0';

printf("\n add to sus my==%s",mysql);printf("\n\n");

54

mysql_init(&my_connection);if(mysql_real_connect(&my_connection,"127.0.0.1","root","","nit",0,NULL,0)) /*This function establishes the connection to a MYSQL server on the specified hostname(or localhost if none is specified) . It returns a link identifier if successful , or false otherwise.*/{

/*printf("Connection success\n");*/

res = mysql_query(&my_connection,mysql); /*This is used to send SQL statements to the MySQL server to be executed . For queries other than SELECT statements ,

the function returns true on success and false on failure.For SELECT statements , this function returns a

link identifier on success and false on failure. The link

identifier can be used with mysql_result() or one of the mysql_fetch*() functions to access the resulting data.*/if(res) {

printf("SELECT error:%s\n",mysql_error(&my_connection));

}

else{

res_ptr = mysql_use_result(&my_connection);

return EXIT_SUCCESS;

}}}

struct smfiDesc smfilter ={

"badDNS", /* filter name */SMFI_VERSION, /* version code -- do not change

*/SMFIF_NONE, /* flags */mlfi_connect, /* connection info filter */NULL, /* SMTP HELO command

filter */mlfi_envfrom, /* envelope sender filter */NULL, /* envelope recipient

filter */NULL, /* header filter */NULL, /* end of header */

55

NULL, /* body block filter */NULL, /* end of message */NULL, /* message aborted */NULL /* connection cleanup */

};

int main(argc, argv)int argc;char *argv[];

{int c;const char *args = "p:";char buf[1024], *ptr;

/* Process command line options */while ((c = getopt(argc, argv, args)) != -1){

switch (c){ case 'p':

if (optarg == NULL || *optarg == '\0')

{(void)

fprintf(stderr, "Illegal conn: %s\n",optarg);exit(EX_USAGE);

}(void) smfi_setconn(optarg);

/*Called When:- smfi_setconn must be called once before smfi_main. Effects:- Sets the socket through which the filter

communicates with sendmail. Argument:- The address of the desired communication socket. The address should be a NULL-terminated string in "proto:address" format:*/break;

}}

if (smfi_register(smfilter) == MI_FAILURE) /*Called when:- smfi_register must be called before smfi_main Effects:- smfi_register creates a filter using the information given in the smfiDesc argument.

Multiple calls to smfi_register within a single

process are not allowed.

56

Argument:- smfilter A filter descriptor of type smfiDesc

describing the filter's functions.*/{

fprintf(stderr, "smfi_register failed\n");exit(EX_UNAVAILABLE);

}return smfi_main();

/*Called when:- smfi_main is called after a filter's initialization is complete. Effects:- smfi_main hands control to the Milter event loop Return value:- smfi_main will return MI_FAILURE if it fails

to establish a connection. This may occur for

any of a variety of reasons (e.g. invalid Address passed to smfi_setconn). The reason

for the failure will be logged. Otherwise, smfi_main will return MI_SUCCESS.*/

}

IMPLEMENTATION APPROACHESS

PHASES IMPLEMENTED

PHASE I

Step 1 Identify needs and benefits

57

Activity 1.1 Meeting with customer.

Activity 1.2 Identifying the basic needs and projecconstraints.

Activity 1.2Establish project statements.

PHASE II

Step 2 Selection of the process model to be used for design and

implementation of the system

.Activity 2.1Study and comparison of various models.

Activity 2.2Study of process model performed.

Activity 2.3Select the most appropriate process model.

Activity 1.2.4Establishing the scope of project.

Activity 1.2.5Listing the advantages and disadvantages of project.

PHASE IIIStep 3Perform the requirement analysis of the system.

Activity 3.1Prepare the requirement documents.

PHASE IVStep 4. To identify the project resource requirement for accomplishment of project

58

Activity 4.1Define reusable s/w and human resources.

PHASE VStep 5. Partioning the sofyware into individual components.

Activity 5.1Identification of yhe modules that comprises the system.

Activity 5.2Sketch the interrelationship between different modules.

PHASE VIStep 6. Coding of the modules.

Activity 6.1Identification of functions performed by each module.

Activity 6.2The modules are coded.

PHASE VIIStep 7. Verefication and testing of proper functioning of yhe system.

Activity 7.1Error free execution of the project.Activity 7.2Testing of the system.

5.3)TESTING APPROACH

Testing Method Used.

The following testing methods have been used in this project:

Black Box Testing

Black Box testing also called behavioral testing focuses on the functional

requirements of the software. That is, black box testing enables the

59

software engineer to derive sets of input conditions that will fully exercise

all functional requirement for a program. It attempts to find error in the

following categories: Incorrect or missing functions.

Interface errors

Error in data structures or external database access

Initialization and terminal errors.

This testing strategy is applied during later stages of testing. Because black

box testing purposely disregards control structure, attention is focused on

the information domain. Tests are designed to answer the following

questions.

How is functional validity tested?

How is system behavior and performance tested?

What classes of input will make a good test case?

Is the system particularly sensitive to certain input values?

How are the boundaries of a data class isolated?

What data rates and data volume can the system tolerate?

What effect will specific combination of data have on system operation?

The various types of black box testing are:

Graph-Based testing Equivalence partitioning

60

Boundary value analysis Comparison testing Orthogonal array testing

How is functional validity tested?The functional validity of the project was tested by giving series of

input and noting the output and then matching the output obtained with

the expected output.

All the modules were tested individually in different environment and

were found to be functionally valid.

Basis Path Testing

Basis Path Testing is a White Box Testing Technique proposed by Tom

McCabe [MCC 76]. The Basis path method enables the test case designer

to derive a logical complexity measure of a procedural design and use this

measure as a guide for defining a basis set of execution paths. Test cases

derived to exercise the basis set are guaranteed to execute every statement

in the program at least one time during testing.

1. Flow Graph notation:

Flow Graph depicts logical control flow of the process. Each circle

is called a flow graph node and represents one or more procedural

statements .

2. Cyclomatic Complexity

Cyclomatic Complexity is a software metric that provides a

61

quantitative measure of logical complexity of a program. The value

for Cyclomatic Complexity defines the number of independent paths

in the basis set of a program and provides us with an upper bound

for the number of tests that must be conducted to ensure that all

statements have been executed at least once.

Data Flow Graph

62

1,2

3 4

1,2-Checking the mailform of the coming mail 3- Checking for the domain name of the incoming mail in the black

list.

4- Reject the incoming mail.

5- Checking whether the mail is forged or not.

6- Check for the domain name in white list.

7- Deliver the mail. 8- Add in suspicious list

Path 1: 1,2,3,4,1,2

Input:Mail with the from field contained in the black list.

Output: Sender reject.

63

5

6 7

8

Path 2: 1,2,3,5,4,1,2

Input: Mail with incorrect domain name in from field .

Output: Sender reject.

Path 3: 1,2,3,4,5,6,7

Input: Mail with the from field contained in the white list.

Output: Sender OK.

Path 4: 1,2,3,4,5,6,8,7

Input:Mail with the from field contained neither in black list nor in white list.

Output:From field is added in the suspicious list and the mail is delivered.

64

65

CHAPTER 6

RESULTS AND DISCUSSIONS

66

CHAPTER 7

CONCLUSION

7.1)CONCLUSION

67

This approach of handling spams using white and black list of domains names and

recipient feedback works fairly well. With higher number of users using the system,

the efficiency and accuracy is expected to go up substantially. System is also

expected to stabilize soon enough so as not to cause any substantial user resistance.

Also, the system can be used in combination with other approaches, like content

checking towards filtering out the spam

7.2)LIMITATIONS OF THE PROJECT

Following are the limitations of our project:

Our project ,therefore spam filter is capable of filtering mails according to the

domain names listed in black list only . Therefore it ,at this stage is not able to

filter the spams on the basis of the its contents or some other criteria.

DIFFICULTIES ENCOUNTERED

Following are the difficulties encountered while making of our project:

As it was our first experience with the Linux operating system ,therefore it

was bit difficult to cope up with its environment.

Since sendmail server need to be configured for enabling the milter facility

the configuration was a difficult task to perform.

Apache web server need to be configured to run PHP ,and initially PHP was

not working properly.

68

FUTURE SCOPE OF PROJECT

There is a wide scope of enhancement in our project.

Following enhancements can be done:

Filtering of spams can be done on the basis of its contents.

69

BIBLIOGRAPHY

[1] Red Hat Linux 8 Server –Mohammed J. Kabir

[2] Professional Linux programming wrox publication

[3] Professional PHP Guide wrox publication

70

Other References:

[1] Evaluating Anti-Spam Solutions - Criteria that Makes the Difference,Michael

Vizard, editor-in-chief, CRN (www.syntegra.com/us/anti spam/evaluating anti-spam

solutions full.pdf )

[2] Spambayes anti-spam, http://spambayes.sourceforge.net/

[3] Mozilla spam filter, ttp://www.mozilla.org/mailnews/spam.html

[4] SPAM Research Center http://www.spamresearchcenter.com/

[5] SpamAssassin, http://www.spamassassin.org/

[6] Send-mail, http://www.sendmail.org/

[7] Milter community website, http://www.milter.org/

[8] PHP, http://www.php.net/

71