Spam Filter
-
Upload
maitreyee-bhise -
Category
Documents
-
view
575 -
download
3
Transcript of Spam Filter
SPAM FILTER
A Minor Project report submitted toRajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal
towards Partial fulfillment ofthe Degree of
Bachelor of EngineeringIn
Computer Science and Engineering
Guided By:- Submitted By;-Ms. Suhani Agrawal Abhas Mehta 0832CS071004
Angil Jain 0832CS071013 KrishaDubey 0832CS071043 MaitreyeeBhise 0832CS071046
Computer Science and Engineering DepartmentChameliDevi Institute of Technology and Management,
Indore (M.P.)2009-10
CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,
INDORE
Enrollment No:0832CS071004, 0832CS061013, 0832CS061043, 0832CS061046
College Name:ChameliDevi Institute of Technology and Management
Branch: Computer Science and Engineering Sem: VI
E-mail: [email protected],[email protected], [email protected],[email protected]
1.Name of the Student: Abhas Mehta, Angil Jain, Krisha Dubey, Maitreyee Bhise
2.Title of the Project: “SPAM FILTER”
3.Name of the Guide: Ms.Suhani Agrawal
4.Education Qualification of the Guide: B.E.-Computer Science,M.B.A.
5.Working/Teaching experience of the Guide: 4 Years
6.Software used in the Project:Umbrello,SQLyog,Wamp Server 2.0h, Adobe Dreamviewer cs3
Name and Signature of the Student: Name,Designation and Signature of the Guide:
Abhas Mehta 0832CS071004 Ms. Suhani Agrawal Angil Jain 0832CS071013 Krisha Dubey 0832CS071043Maitreyee Bhise0832CS071046
Date:……………….. Date:………………
CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,
INDORE
CERTIFICATE OF AUTHENTICATED WORK
This is to certify that the project report entitled SPAM FILTER submitted to
R.G.P.V.Bhopal in partial fulfillment of the requirement for the award of the
degree of Bachelor of Engineering is an original work carried out by Abhas
Mehta, Angil Jain, Krisha Dubey, Maitreyee Bhise enrollment no.
0832CS071004,0832CS071013,0832CS071043,0832CS071046 under my
guidance. The matter embodied in this project is authentic and is genuine work
done by the student and has been submitted whether to this University or to any
other University/Institute for the fulfillment of the requirement of any course of
study.
Name and Signature of the Student: Name, Designation and Signature of the Guide:
Abhas Mehta 0832CS07100 Ms. Suhani AgrawalAngil Jain 0832CS071013Krisha Dubey 0832CS071043Maitreyee Bhise0832CS071046
Date:……………….. Date:………………
CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,
INDORE
RECOMMENDATION
A minor project report entitled “SPAM FILTER”, submitted by Abhas
Mehta(0832CS071004),Angil Jain(0832CS071013), Krisha
Dubey(0832CS071043), Maitreyee Bhise(0832CS071046) recommended and
forwarded for partial fulfillment of degree of Bachelor of Engineering in
Computer Science and Engineering of Rajiv Gandhi Proudyogiki
Vishwavidyalaya, Bhopal, for academic year of 2009-10.
Ms.Suhani Agrawal Mr.Prashant LakkadwalaProject Guide,Department Head of Department of Computer Science Computer Science and Engineering and EngineeringCITM,Indore CITM,Indore
Dr.C.N.S.Murthy Director,Department of Computer Science and Engineering CITM,Indore
CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT,
INDORE
CERTIFICATE
A minor project report entitled “SPAM FILTER”, submitted by
Abhas Mehta(0832CS071004), Angil Jain(0832CS071013),
Krisha Dubey(0832CS071043), Maitreyee Bhise(0832CS071046) is
recommended and forwarded for partial fulfillment of degree of Bachelor of
Engineering in Computer Science and Engineering of Rajiv Gandhi Proudyogiki
Vishwavidyalaya, Bhopal, for academic year of 2009-10.
INTERNAL EXAMINER EXTERNAL EXAMINER
CHAMELIDEVI INSTITUTE OF TECHNOLOGY AND MANAGEMENT, INDORE
ROLES AND RESPONSIBILITIES FORM
S.no Enrollment No.
Name of the Team Member
Role Tasks and Responsibilities
1. 0832CS061004 Abhas Mehta System Administrator
Coding+System Monitoring
2. 0832CS061013 Angil Jain Database Administrator
Coding+Database Design
3. 0832CS061043 Krisha Dubey Configuration Manager
Coding+Client & Server
Configuration4. 0832CS061046 Maitreyee Bhise System
ManagerCoding+Docume
ntation
Name and Signature of the Project Team Members:
1.Abhas Mehta 0832CS071004) Signature:2.Angil Jain 0832CS071013) Signature:3.Krisha Dubey 0832CS071043) Signature:4.Maitreyee Bhise0832CS071046) Signature:
Signature of the Guide:
Date:…………….
ABSTRACT
In this project we develop an anti spam software to be developed at the mail
servers to filter out the spam at the server it self. The software will be based on
the black list and white list of domain names from where the domain would be
allowed and denied respectively. The list would be continuously updated based on
the feedback of the users. Our approach combines white list black list & collective
approaches. We have substitute challenge response system with feedback
mechanism from recipents. It is through the same feed back mechanism that the
colobrative approach is implemented. The anti spam software works on the bases
of three list of domain names namely
1.white list(w-list)
2.black list (b-list)
3. suspicious list (s-list)
ACKNOWLEDGEMENTS
We express our sincere gratitude towards Ms.Suhani Agrawal,our Project Guide
Department of Computer Science & Engineering for providing us valuable
support and necessary help whenever required and also helping us explore new
technologies by the help of their technical expertise.
Also we would like to thank Mr. Prashant Lakkadwala Head of
Department of Computer Science and Engineering for providing us necessary
help.
We would also like to express our sincere gratitude towards Director Dr.
C.N.S. Murthy and Director General Dr. S. RAJASHEKHARIAH for
providing us valuable support.
I forward my sincere thanks to all teaching and non-teaching staff of
Computer Science & Engineering Department, C.I.T.M., Indore for providing
necessary information and there kind co-operation.
I would like to thank my classmates for there motivation and there
valuable suggestions during the project .
A blend of gratitude, pleasure and great satisfaction, is what I feel to
convey my indebtness to all those who have directly or indirectly contributed to
the successful completion of my project work.
Finally, I express my love and respect towards my Family Members who
are my strength in every work I do.
TABLE OF CONTENTS
1)Introduction..........................................................................................
..
1.1)Background............................................................................
....
1.2)Objectives...............................................................................
...
1.3)Purpose scope and
Availability………………………………….......
Purpose................................................................................
Scope...................................................................................
Applicability.......................................................................
1.4)Achievements.........................................................................
....
1.5) Organization Of
report……………………………………….
2) Survey of
Technologies.........................................................................
3) Requirement and analysis................................................................
3.1) Problem
Definition...................................................................
3.2) Requirement
Specification......................................................
3.2.1) Functional
Requirements…………………………....
3.2.2) Non Functional Requirements…………………
3.2.3)planning and
scheduling……………………………...
3.3) Software and Hardware……………………….
3.3.1)Software requirements …………...………………….......
3.3.2Hardware requirements………………………
3.4)preliminary product description…………………………
3.5)conceptual
models……………………………………………
3.5.1)class
diagram…………………………………………
3.5.2)Usecase Diagram
…...................................................
3.5.3)Entity Relationship Diagram………….......………
3.5.4) Architecture Design……………………….......
……
3.5.5) Sequence Diagram………………………….......
……
3.5.6) Collaboration Diagram…..…………………......…
3.5.7) Activity Diagrams…………………………….......
3.5.8) Data Flow Diagram…...……………………….....
4) System
Design.......................................................................................
4.1) Basic
Modules...........................................................................
4.2) Data Design.......................................................................
4.3) User Interface Design..........................................................
4.4) Security
Issues.........................................................................
1.1)BACKGROUND
Major approaches adopted towards spam filtering include text analysis, white and
black lists of domain names and community based approaches.
Text analysis of contents of mails is a widely used approach towards the spams.
Many solutions deployable on server and client sides are available. Naive Bayes
is one of the most popular algorithms used in these approaches. SpamBayes and
Mozilla Mail spam filter are examples of such solutions. But rejecting mails based
on text analysis can be serious problem in case of false positives.Normally users
and organizations would not want any genuine e-mails to be lost.Black list
approach has been one of the earliest approaches tried for the filtering of spams.
The strategy is to accept all the mails except the ones from the domain/e-mail ids
Explicitly blacklisted. With newer domains entering the category of spamming
domains this strategy tends to not work so well.White list approach is the strategy
of accepting the mails from the domains/addresses explicitly white listed and put
others in a less priority queue, which is delivered only after sender responds to a
confirmation request sent by the spam filtering system. The problem with this
"Challengeresponse system" is that it assumes that genuine mails are not sent by
automatic e-mailaccounts, which is increasingly not the case. Various transacti
on management systems, automated responses, mailing lists etc. pose problems
for this approach.Solutions like SpamAssassin provide for incorporation of many
of the approaches, largely relying on "Rule based approach" but in the process
gains the obvious disadvantage of excessive overheads and being resource
intensive.Further possibilities of false positives cannot be denied
altogether.Another approach talked about in literature and practice is that of
collaborative approach towards spam filtering. The concept is that of cooperation
amongst a set of users.This strategy derives on the usual bulk nature of spams,
where several users are expected to receive the same spam. The idea is for the
early receiver to add the spam to a central repository so that late receivers can
later filter the message out by matching it from the ones in the repository. This
cooperative and collaborative approach obviously has administrative and policy
problems. An attempt to make a completely automated system, thus, might not
work out well. Vipul's Razor is a widely used such tools.
1.2)OBJECTIVES
The following are the objectives of the project:
1.User needs to be able to distinguish spam from legitimate messages. To do this
he needs to identify typical spam characteristics & practices.
2. We need to stop the spams as much as possible.
3.Updation of database of the three lists i.e. black list,white list and suspicious
list.
1.3)PURPOSE SCOPE AND APPLICABILITY1.3.1)PURPOSEReduced Employee Productivity – organizations lose productivity to the extent
employees spend time viewing and deleting spam messages rather than doing
productive tasks. Cost estimates vary, but one large organization with 52,000
employees—using a rule of thumb that a person takes approximately 12 seconds
to scan and delete one spam message—estimated that each employee would spend
5½ hours per year deleting spam at a cost of $14 million.
Increased Network Resource Costs – as the volume of spam e-mail increases so
does the cost of system resources to support it. If 50% of e-mail coming into an
organization is spam, then half of an organization’s e-mail servers (as well as
related LAN bandwidth and storage backups) are dedicated solely to processing
and storing junk mail. In addition, all this e-mail traffic requires additional
bandwidth, network hardware, and archival storage capacity.
*Increased IT Administration Costs – with more IT infrastructure and end-user
problems comes the need for additional IT resources. This includes additional
network and e-mail administrators, as well as additional help desk/technical
support resources to assist end users.
*Increased Legal Liability Risk – a significant proportion of spam includes
offensive, or hate-based content that enterprises cannot allow into their
organizations. Otherwise, they risk creating a “hostile workplace.” For example,
HR organizations at many large enterprises fear their companies may become the
targets of lawsuits unless they quickly take proactive steps to stem the tide of
pornographic spam.
*Reduced Security and Control – IT security staff worry that spammers will take
advantage of the biggest weakness of their security architecture—people. Tricking
end users into opening messages or attachments containing malicious applications
or viruses, or tricking the user into divulging sensitive company information
represents a serious and increasing security exposure. In addition, some of the
cures are worse than the disease: desktop solutions often allow end-users to ignore
established security practices and define their own spam policies, while
outsourced solutions require organizations to hand over control of their
confidential, mission-critical e-mail stream to a third party. Add to this the
personal cost of various spam-based scams to employees, who may be tricked into
spending money on unwanted items, divulging personal data that the spammers
can then use for identity theft, or falling prey to fraudulent get-rich-quick
schemes.
1.3.2)SCOPEFunctions:The following functions are to be implemented in this project
1.View white listThe user can view the white list and verify the list of those domain names
which are present there.
2.View black listThe viewer can check to see if there is any domain name which should
have been in the white list.
3.View the mailsThe viewer can view the mails which he is receiving .
1.3.2.1)PROJECT RESOURCESThe project resources are categorized as:*Human Resources:The human resource contains a necessary description of the skills
required to complete the development, the specialty of developers, and the
organizational hierarchy in terms of the organizational position of each person
involved in development.
Specialty and skills The project is an application development project and thus calls for
the specialty of developers in selected programming language
which is C for our project.We are also using PHP for frontend
andMysql server fordatabase .Good programming skills
aproperhandling of related operations is the requirement for the
human resources.
Organizational positionThe developer team consists of members all at the same position.
Each member performs the task of a team manager as well as a
software engineer in coordination with the other member of the
team.
* Reusable Software ResourcesReusable software resources are the reusable components used for building the
system. They can be divided into the following categories:
Off the Shelf Components
We are using MILTER APIs which are used for making mail filter. Full-Experience Components:
The full-experience components for our project include the
various functions provided by C and PHP.
Partial-Experience Components
There are no partial-experienced components in our software.
New Component
There are no new components in our software.
1.3.3)APPLICABILITY
Spam filters are used in organization’s e-mail servers (as well as related LAN
bandwidth and storage backups).
1.4)ACHIEVEMENTS
*By this we can increases Employee Productivity.
*It also reduce Network Resource Costs.
*It reduceses IT Administration Costs.
*It reduceses Legal Liability Risk.
*It increases Security and Control.
1.5)ORGANISATION OF REPORT The project report till gives only the introduction to application the description
that follows gives you detail of what the system is and how system work.The
most important part to follow in report is how we have realized our project
including technologies and tools used ,requirements analysis, how we planned to
meet deadlines ,software and hardware requirements both at server and client
side, Preliminary product description,
Various conception models including Class diagram, sequence diagram, use case
diagram, activity diagram, collaboration diagram, ER diagram and other’s.
Then follows System Design that include basic modules, Data design, Procedural
design, User interfaces, security issues and test cases design.
Finally report consists of implementing and testing details and at last conclusion,
future extension and Improvements.
ProgrammingLanguage
In our project we have used C and PHP for programming purpose.
PHP (a recursive acronym of PHP: Hypertext Preprocessor) is an open source ,
server-side web web scripting language for creating dynamic web pages . Outside
of it being browser independent , it offers a simple and universal cross platform
solution for e-commerce ,and complex web and database-driven applications.
PHP has :
1. A low ,smooth learning curve.
2. Broad functionality for databases , strings , network connectivity, file
system support, Java, COM ,XML,CORBA ,WDDX,and Macromedia
Flash.
3.Platform compatibility with UNIX(all
variants),Win32(NT/95/98/2000),QNX ,MacOS(Web Ten) , OSX <OS/2,
and BeOS.
4. Server compatibility for Apache module (UNIX ,Win 32) , CGI/Fast
CGI, thttpd, fhttpd,phttpd , ISAPI(IIS,Zeus), NSAPI(Netscape
iPlanet),Java servelet engines , AQLServer , and Roxen/ Caudium module.
5. A rapid development cycle. New version with bug fixes , additional
functionality , and other improvements are released every few months.
6. A vibrant and suppoprtivr community. Code examples and free code
abound . 7. The PHP group has done an excellent job of providing new
users with resources and support. Different functions of PHP used are as follows :
1.mysql_connect():
resource mysql_connect([string hostname[:port][:/path/to/socket]] [, string username] [,string password])
This function establishes the connection to a MYSQL server on the specified
hostname(or localhost if none is specified) . It returns a link identifier if
successful , or false otherwise.
2.mysql_close():
boolean mysql_close ([resource link_identifier])
This function closes non_persistent links to the MySQL server and returns true or
false , depending on its success.
3.mysql_select_db():
mysql_select_db boolean (string database_name [,resources link-identifier])
This function is equivalent to the USE statement in the MySQL interpreter . It
Sets the currently active database . Subsequent calls to mysql _query are then
executed against the selected database.
4.mysql_query():
resource mysql (string query [, resource link_identifier])
This is used to send SQL statements to the MySQL server to be executed . For
queries other than SELECT statements , the function returns true on success and
false on failure.For SELECT statements , this function returns a link identifier on
success and false on failure. The link identifier can be used with mysql_result()
or one of the mysql_fetch*() functions to access the resulting data.
5.mysql_affected_rows():
int mysql_affected_rows([resource link identifier])
This returns the number of rows that were changed by the most recent INSERT
,REPLACS, UPDATE, or DELETE query for the given link_identifier.
6.mysql_result():
mixed mysql_result(resource result , int row[, mixed field])
This is used to retrieve a single value from a mysql_query() resultset. To retrieve
a full row of data from the resultset , refer the mysql_fetch() function covered
later.
Sendmail API’s(Milter)
The Sendmail Content Management API (Milter) provides an interface
for third-party software to validate and modify messages as they pass
through the mail transport system. Filters can process messages'
connection (IP) information, envelope protocol elements, message
headers, and/or message body contents, and modify a message's
recipients, headers, and body. The MTA configuration file specifies
which filters are to be applied, and in what order, allowing an
administrator to combine multiple independently-developed filters.We expect to see both vendor-supplied, configurable mail filtering
applications and a multiplicity of script-like filters designed by and for
MTA administrators. A certain degree of coding sophistication and
domain knowledge on the part of the filter provider is assumed. This
allows filters to exercise fine-grained control at the SMTP level.
However, as will be seen in the example, many filtering applications can
be written with relatively little protocol knowledge.
Given these expectations, the API is designed to achieve the following
goals:
1. Reliability. Coding failures in a Milter process that cause that
process to hang or core-dump should not stop mail delivery.
Faced with such a failure, sendmail should use a default
mechanism, either behaving as if the filter were not present or as
if a required resource were unavailable. The latter failure mode
will generally have sendmail return a 4xx SMTP code (although
in later phases of the SMTP protocol it may cause the mail to be
queued for later processing).
2. Simplicity. The API should make implementation of a new filter
no more difficult than absolutely necessary. Subgoals include:
o Encourage good thread practice by defining thread-clean
interfaces including local data hooks.
o Provide all interfaces required while avoiding unnecessary
pedanticism.
3. Performance. Simple filters should not seriously impact overall
MTA performance.
UMBRELLOUmbrello UML Modeller helps the software development process by using industry
standard unified Modelling Language(uml)to enable you to create diagrams for
designing and documenting your systems.
Umbrello UML Modeller is a UML diagram tool that can support you in the software
development process. Especially during the analysis and design phases of thisprocess, Umbrello UML Modeller will help you to get a high quality product. UML
can also be used to document your software designs to help you and your fellow
developers.
Having a good model of your software is the best way to communicate with other
developers working on the project and with your customers. A good model is
extremely important for medium and big-size projects, but it is also very useful for
small ones. Even if you are working on a small one man project you will benefit from
a good model because it will give you an overview that will help you code things
right the first time.
The Unified Modelling Language (UML) is a diagramming language or notation to
specify, visualize and document models of Object Oriented software systems. UML is not
a development method, that means it does not tell you what to do first and what to do next or
how to design your system, but it helps you to visualize your design and communicate
with others. UML is controlled by the Object Management Group
(OMG) and is the industry standard for graphically describing software.
UML is designed for Object Oriented software design and has limited use for other
programming paradigms.UML is composed of many model elements that represent
the different parts of a software system. The UML elements are used to create
diagrams, which represent a certain part, or a point of view of the system. UML is the
diagramming language used to describing such models. You can represent your ideas
in UML using different types of diagrams. Umbrello UML Modeller 1.2 supports the
following types:
Use Case Diagrams show actors (people or other users of the system), use cases (the
scenarios when they use the system), and their relationships.
Class Diagrams show classes and the relationships between them.
Sequence Diagrams show objects and a sequence of method calls they make to other objects.
Collaboration Diagrams show objects and their relationship, putting emphasis on the objects that participate in the message exchange.
State Diagrams show states, state changes and events in an object or a part of the system.
Activity Diagrams show activities and the changes from one activity to another with the events occurring in some part of the system.
Component Diagrams show the high level programming components (such as KParts or Java Beans).
Deployment Diagrams show the instances of the components and their relationships.
3.1)PROBLEM DEFINITION:
Unsolicited e-mails usually sent in bulk, or spams are growing concerns for
users;mail server administrators, ISPs and business and other organizations all over
the world. According to some estimates as much as up to 60 percent of Internet e-
mail tra_c is due to spams. Given the scale and ubiquity of the problem, anti-spam
solutions are being sought and provided all over the world. Several
solutions exist and many more are being talked about.
In this project we will develop an anti-spam software to be
deployed at the mail-servers to filter out the spam at the server itself. The
software will be based on the white and black lists of domain names from
where the mails would be allowed and rejected respectively. The lists would
be continuously updated based on the feedback from users (recipients of the
mails).
3.2) REQUIREMENT SPECIFICATION
REQUIREMENT DOCUMENTATION:
In the software the user should be able to enter his/her
request/complain regarding some domain names.
The user should be able to receive the genuine mails.
He should be able to lodge complain to the administrator in case he
receives mails which are spams.
He should be informed to enter certain domain names into the white
list in case they are genuine and in the suspicious list.
In case the user wants to see the list of domain names present in the black list and white list then he should be able to view that.
On the basis of Requirement definition we can restate our requirement
definition in technical terms by dividing the project into different modules,
each module takes care of its respective task assigned: -
User database is maintained.
Enter the user name and password and the user will be able to access
databases.
FUNCTIONAL & NON FUNCTIONAL REQUIREMENTS
Functional Requirements
Login. Check the database if user name and password are correct.
Register a new user if wanted.
A user can view the list of domain names present in the black as
well as white list.
User can lodge a complain regarding the entrance of genuine
domain names into he black list with the administrator.
Non-functional Requirements
Describe a verification system that limits our choice for developing a solution
to the problem.
The system must be reliable enough in most situations .The system
must be correct to a high degree. Extensibility is also a key
requirement to make the application compatible with the ever-changing
world of present day computers and to enhance its functionality from
time to time as desired.
GENERAL TYPES OF REQUIREMENT
PHYSICAL ENVIRONMENT: The software will be deployed at the server
level.
USER AND HUMAN FACTOR: There is not much for the user to interact
with this software except adding the domain names to he white list and
viewing white list/black list contents.
3.3)PLANNING AND SCHEDULING
Identification of various category of users.
User can view list of domain names present in black and white list.
Designing of various WebPages including Home page, Login page etc.
Schema Definition- Databases and table structure have been designed
3.4)SOFTWARE AND HARDWARE REQUIREMENTS3.4.1)SOFTWARE REQUIREMENTS
Send-mail Server
Apache Web Server
Linux Operating System
3.4.2) HARDWARE REQUIREMENTS
(i) 64-MB RAM(ii) Network Interface Card
3.5) PRELIMINARY PRODUCT DESCRIPTION User database is maintained.
Enter the user name and password and the user will be able to access
databases.
A user can view the list of domain names present in the black as well as
white list.
User can lodge a complain regarding the entrance of genuine domain names
into he black list with the administrator.
3.6)CONCEPTUAL MODELS
Data Flow Diagram (DFD)
It is a directed graph where nodes represents processing activity and arc represent data items transmitted between processing nodes.
Spam filter
Inbox
0-LEVEL DFD
Domain name1 Domain name3
Incoming Mails
Genuine mails
White list Suspicious list Black list
Internet
Request for connection
Mlfi_connect
Domain name
Env_from
From field
Mail check
Reject
Deliver the genuine mail
White list
Black list
Suspicious list
Domain name3
Domain name 2Domain name 1
Black __check
Forge_check
Bool 1
Bool 2
White_check
Bool 3
Add_suspicious
LEVEL-II DFD
Mail check
White list Black listSuspicious list
White list
Black list
Domain name 1
Domain name 2
Domain name 3
LEVEL-I DFD
4.1) BASIC MODULES MODULE 1
NEW REGISTRATION
INPUTS TO THE MODULE:
1. First Name
2. Last Name
3. User Name
4. Password
5. Retype Password
6. Contact Number
7. City
8. Enter the Image
9. Create my Account
10. Cancel
OUTPUT OF THE MODULE:
New Registration Successful.
MODULE 2
SIGN-IN MODULE
INPUTS TO THE MODULE:
1. User Name
2. Password
OUTPUT OF THE MODULE:
Successful Login, if correct user name and password else directed towards error page.
MODULE 3
SUBMODULE 1: Date Time Access
INPUTS TO THE SUBMODULE:
1. Date Time
2. Exit
3. Sign In
OUTPUT TO THE SUBMODULE:
Shows date and time of last login, exit directs towards Sign-In page.
33
SUBMODULE 2:File Access
INPUT TO THE SUBMODULE:
1. File Name
2. View
3. Back
OUTPUT TO THE SUBMODULE:
If we choose view then we can view the file and if choosen Back then we are directed towards Sign-In page.
4.2)Data DesignDesign concepts provide the software designer with the foundation from which more
sophisticated design methods can be applied. Each design concept helps the software
engineer to answer the following question:
What criteria can be used to partition software into individual components?
How function or data structure detail separated from a conceptual
representation of the software?
What uniform criteria define the technical quality of a software design?
The main design concepts are outlined below:
Abstraction:-Abstraction permits one to concentrate on a problem at some
level of generalization without regard to irrelevant low level details; use of
abstraction also permits one to work with concepts and terms that are familiar
in the problem environment without having to transform them to an
unfamiliar structure.
Refinement: Stepwise refinement is a top level design activity which
41
involves developing a program by successively refining levels of procedural
detail. In each step of refinement, one or several instruction of the given
program are decomposed into more detailed instruction until all the
instructions are expressed in terms of any underlying computer or
programming language.
Modularity:-Modularity is the design concept in which the software is
divided into separately named and addressable components, often called
modules, that are integrated to satisfy the problem requirements.
Software Architecture:- Software architecture alludes to “the overall
structure of the software and the ways in which that structure provides
conceptual integrity for a system”. It’s a hierarchical structure of program
components (modules), the manner in which those components interact and
the structure of data that are used by the components.
Control Hierarchy:- Also known as the program structure, the control
hierarchy represents the organization of the program components (modules)
and implies a hierarchy of control.
Structural Partioning: Structural partitioning is a process of partitioning the
program structure both horizontally and vertically. Horizontalpartitioning
involves defining separate branches of the modular hierarchy for each major
program function. Vertical partitioning, on the other hand, suggest that
control (decision making) and work should be distributed top down in the
program structure. Data Structure:- Data structure is a representation of logical relationship
among the individual elements of data. Data structure dictates the
organization, methods of access, degree of associativity ,and processing the
alternatives for information.
42
Software Procedure:- The software procedure focuses on the processing
details of each module individually and must provide a precise specification
of processing including the sequence of events, exact decision points,
repetitive operations, an even data organization and structure.
Information Hiding: The principle of information hiding states that the
modules should be specified and designed so that information (procedures
and data) contain within a module is inaccessible to other modules that have
no need for such information.
Data Structure
Database system is used to store white list, black list and suspicion list.
Inbuilt structures of c/linux are used.
DATA DICTIONARY:
Name: UserAliases: none How used Description:
User = [email acc. holder|Administrator]Email acc holder=email id + passwordemail id= * xyz+@+domain name *password =*any 8 digit string*
Administrator = login id+passwordLogin id = personal idPersonal id = *a 3 digit unique no.*Password = *any 8 digit string*
Name:database [white list | black list | suspicion list]Aliases: noneWhere used : Will be used as the spam filter in the mail server to identify the spamsHow used : To compare the from field of the mail header to entries present in the databaseDescription:
database =[white list|black list|suspicion list]
43
White list =[genuine domain names]Black list=[Non-genuine domain names]Suspicion list=[Domain names which are not present either in the black list or the white list]Domain name=* xyz+@+abc.com| abc.com *
Name: e-mail Aliases: noneWhere used :received by the mail serverHow used : used by the user if genuineDescription:
e-mail = * header+content*header=[from field | to field ]from field=*emil-id of sender*to field=*email-id of receiver*content=*message*
Procedural and Functional details implementation
Function to implement updating of database through nice user
interface. Function to implement accessing of database for reading purpose by
the user.
Function to implement storing data in database.
Function to take user input.
Procedure to combine the various modules into one using GUI.
Interface Characterization
Options for accessing of the various databases by the user provided.
Text-areas and text-fields for inserting data into the database.
44
Design translation to programming languages: Database using Mysql.
Coding the functions and modules in syntactically valid statements
in PHP and C integrating them using GUI.
Algorithm Design First of all mail comes through the internet.
The domain name of the from field is extracted.
Now , the domain name is matched with the list of the domain-
names present in the black list.
If the list matches then the mail is simply rejected.
Otherwise the domain name is matched with the white list.
If the list matches then the mail is sent to the user.
If the list does not match then the domain name is added to a list
called suspicious list and the mail is sent to the user with an
attachment asking the user to add it into the white list .
If the user does not add the domain name into the white list then it
automatically gets added into the black list after a given time.
4.5)SECURITY ISSUES
45
In order to maintain security of the system, authentication based access is used.
Although there are various authentication techniques available, but this applicationis
more useful..
This means that the user needs to provide the ID and password for accessing its
account. Mails received by the user are compared with domain names and
accordingly they are put into the lists. Else the user is directed towards the suspected
list.
46
Code-
#include <stdio.h>#include <stdlib.h>#include <string.h> /*libsm string manipulation*/#include <sysexits.h> /*Exit status codes for system programs.*/#include <unistd.h> /*Include machine specific syscallX mcros*/#include "/root/filter/mysql/mysql.h" /*defines for the libmysql library*/#include <libmilter/mfapi.h> /*include file for mail filter library MYSQL my_connection; functions*/MYSQL_RES *res_ptr;MYSQL_ROW sqlrow;char* domain_name;SMFICTX *a;int rejflag=0 ,whiteflag=0; /********************************mlfi_connect**********************************/sfsistatmlfi_connect(ctx, hostname, hostaddr)
SMFICTX *ctx; char *hostname; _SOCK_ADDR *hostaddr;
/*Called When:- Once, at the start of each SMTP
connection. Default Behavior:- Do nothing; return
SMFIS_CONTINUE. Argument:- Description ctx
the opaque context structure. hostname the host name of the message
sender,as determined by a
reverse lookup on the host
address. If the reverse
lookup fails, hostname will contain
the message sender's IP address
enclosed in square brackets.
48
hostaddr the host address, as determined by a
getpeername() call on the SMTP socket.
NULL if the type is not supported in
the current version or if the SMTP
connection is made via stdin.*/{
struct mlfiPriv *priv;char *ident;a=ctx;
printf("%s",hostname);domain_name=hostname;
printf("\n");
return SMFIS_CONTINUE; /* continue processing */
}
/*****************************mlfi_envfrom*******************************/sfsistat mlfi_envfrom(ctx, argv)
SMFICTX *ctx; char **argv;
/*Called When:- mlfi_envfrom is called once at the beginning
of each message, before mlfi_envrcpt. Default Behavior:- Do nothing; return SMFIS_CONTINUE. Argument
Description ctx
Opaque context structure. argv
Null-terminated SMTP command arguments; argv[0]
is guaranteed to be the sender address. Later
arguments are the ESMTP arguments. Return
Values:- Description SMFIS_TEMPFAIL
Reject this sender and message with a
49
temporary error; a new sender (and
hence a new message) may subsequently
be specified. mlfi_abort is not called.
SMFIS_REJECT Reject this sender and message; a new
sender/message may be specified.
mlfi_abort is not called.
SMFIS_DISCARD Accept and silently discard this
message.mlfi_abort is not called. SMFIS_ACCEPT
Accept this message.mlfi_abort is not called*/
{char *mailaddr = smfi_getsymval(ctx, "{mail_addr}");int argc = 0; /*Called When:- smfi_getsymval
may be called from within any of the mlfi_*
callbacks. Which macros are defined will depend on
when it is called. Argument
Description ctx
The opaque context structure. symname
The name of a sendmail macro. Single letter
macros can optionally be enclosed in braces
("{" and "}"), longer macro names must be
enclosed in braces, just as in a sendmail.cf
file. Return Values:-
smfi_getsymval returns the value of the given
macro as a null-terminated string, or NULL
if the macro is not defined.*/
printf("MAIL FROM :%s",smfi_getsymval(ctx, "{mail_addr}"));mail_check(mailaddr);
printf("\n\n");
50
/* continue processing */if(rejflag==0){
return SMFIS_CONTINUE;
}else
{
return SMFIS_REJECT;
}
} /************************************mail_check***********************************/
mail_check( char *mailaddr)/*This function called after the smfi_getsymval() and receieve the
return value (null terminated string) from the smfi_getsymval*/ {printf("\n*************** check_black_list*****************");
black_check(mailaddr); /*This function check the domain name which is enter by the email user with the database value if this domain name is invalid then restrict the user to send mail.*/ if(rejflag==0){printf("\n **************check_white_list******************");printf("\n\n");
white_check(mailaddr); /*This function check the domain name which is enter by the email user with the database value if this domain name is valid then allow the user to send the mail.*/ if(!whiteflag){printf("\n**************** add_to_suspicious****************");sus_add(mailaddr); /*This function will be called when the domain name not present in the black list or whitelist and add the domain name into the database
(suspicious list).*/ }}}
51
/***************************black_check*************************/
black_check( char* mailaddr)
{
int res;char **r;mysql_init(&my_connection);
if(mysql_real_connect(&my_connection,"127.0.0.1","root","","nit",0,NULL,0))
/*This function establishes the connection to a MYSQL server on
the specified hostname(or localhost if none is specified) . It
returns a link identifier if successful , or false otherwise.*/{res = mysql_query(&my_connection,"SELECT * FROM b_list ");
/*This is used to send SQL statements to the MySQL server to be
executed . For queries other than SELECT statements , the function returns true on success and false on failure.For
SELECT statements , this function returns a link identifier on
success and false on failure. The link identifier can be used with mysql_result() or one of the mysql_fetch*() functions to
access the resulting data.*/if(res) {
printf("SELECT error:%s\n",mysql_error(&my_connection));
}
else
{res_ptr = mysql_use_result(&my_connection);
while((sqlrow = mysql_fetch_row(res_ptr))) {if(!strcmp(mailaddr,sqlrow[0])){rejflag=1;printf("\n REJECT");printf("%s\n",sqlrow[0]);printf("\n\n\n\n");break;}else{
52
rejflag=0;printf("\n OK");}}return EXIT_SUCCESS;}}}
/*****************************white_check***************************/
white_check( char* mailaddr){int res;char **r;mysql_init(&my_connection);if(mysql_real_connect(&my_connection,"127.0.0.1","root","","nit",0,NULL,0)){ /*This function establishes the connection to a MYSQL
server on the specified hostname(or localhost if none is specified) . It returns a link identifier if successful , or false otherwise.*/res = mysql_query(&my_connection,"SELECT * FROM w_list "); /*This is used to send SQL statements to the MySQL server to be executed . For queries other than SELECT statements
, the function returns true on success and false on
failure.For SELECT statements , this function returns a link identifier on success and false on failure. The link
identifier can be used with mysql_result() or one of the
mysql_fetch*() functions to access the resulting data.*/
if(res) {
printf("SELECT error:%s\n",mysql_error(&my_connection)); /*This function is called when mysql query has some mistake.*/ }
else
{
res_ptr = mysql_use_result(&my_connection);
while((sqlrow = mysql_fetch_row(res_ptr)))
53
/*To retrieve the rows of records returned from the server,the mysql_fetch_row() function is used.It takes a
result set pointer returned from a previous query,and
returns an array corresponding to the fetch row(or false if there are no more rows left).*/{if(!strcmp(mailaddr,sqlrow[0])){whiteflag=1;printf("\n WHITE_OK :");printf("%s\n",sqlrow[0]);printf("\n\n\n\n");break;}else{whiteflag=0;
printf("\n Add to Suspicious List");
}}}}}
/**************************sus_add******************************/
sus_add(char *mailaddr){int res;char **r;char mysql[150]="insert into sus_list values('";const char b[3]="')";int i=29;int j=0;while(mailaddr[j]){mysql[i]=mailaddr[j];i++;j++;}j=0;while(b[j]){mysql[i]=b[j];i++;j++;}mysql[i]='\0';
printf("\n add to sus my==%s",mysql);printf("\n\n");
54
mysql_init(&my_connection);if(mysql_real_connect(&my_connection,"127.0.0.1","root","","nit",0,NULL,0)) /*This function establishes the connection to a MYSQL server on the specified hostname(or localhost if none is specified) . It returns a link identifier if successful , or false otherwise.*/{
/*printf("Connection success\n");*/
res = mysql_query(&my_connection,mysql); /*This is used to send SQL statements to the MySQL server to be executed . For queries other than SELECT statements ,
the function returns true on success and false on failure.For SELECT statements , this function returns a
link identifier on success and false on failure. The link
identifier can be used with mysql_result() or one of the mysql_fetch*() functions to access the resulting data.*/if(res) {
printf("SELECT error:%s\n",mysql_error(&my_connection));
}
else{
res_ptr = mysql_use_result(&my_connection);
return EXIT_SUCCESS;
}}}
struct smfiDesc smfilter ={
"badDNS", /* filter name */SMFI_VERSION, /* version code -- do not change
*/SMFIF_NONE, /* flags */mlfi_connect, /* connection info filter */NULL, /* SMTP HELO command
filter */mlfi_envfrom, /* envelope sender filter */NULL, /* envelope recipient
filter */NULL, /* header filter */NULL, /* end of header */
55
NULL, /* body block filter */NULL, /* end of message */NULL, /* message aborted */NULL /* connection cleanup */
};
int main(argc, argv)int argc;char *argv[];
{int c;const char *args = "p:";char buf[1024], *ptr;
/* Process command line options */while ((c = getopt(argc, argv, args)) != -1){
switch (c){ case 'p':
if (optarg == NULL || *optarg == '\0')
{(void)
fprintf(stderr, "Illegal conn: %s\n",optarg);exit(EX_USAGE);
}(void) smfi_setconn(optarg);
/*Called When:- smfi_setconn must be called once before smfi_main. Effects:- Sets the socket through which the filter
communicates with sendmail. Argument:- The address of the desired communication socket. The address should be a NULL-terminated string in "proto:address" format:*/break;
}}
if (smfi_register(smfilter) == MI_FAILURE) /*Called when:- smfi_register must be called before smfi_main Effects:- smfi_register creates a filter using the information given in the smfiDesc argument.
Multiple calls to smfi_register within a single
process are not allowed.
56
Argument:- smfilter A filter descriptor of type smfiDesc
describing the filter's functions.*/{
fprintf(stderr, "smfi_register failed\n");exit(EX_UNAVAILABLE);
}return smfi_main();
/*Called when:- smfi_main is called after a filter's initialization is complete. Effects:- smfi_main hands control to the Milter event loop Return value:- smfi_main will return MI_FAILURE if it fails
to establish a connection. This may occur for
any of a variety of reasons (e.g. invalid Address passed to smfi_setconn). The reason
for the failure will be logged. Otherwise, smfi_main will return MI_SUCCESS.*/
}
IMPLEMENTATION APPROACHESS
PHASES IMPLEMENTED
PHASE I
Step 1 Identify needs and benefits
57
Activity 1.1 Meeting with customer.
Activity 1.2 Identifying the basic needs and projecconstraints.
Activity 1.2Establish project statements.
PHASE II
Step 2 Selection of the process model to be used for design and
implementation of the system
.Activity 2.1Study and comparison of various models.
Activity 2.2Study of process model performed.
Activity 2.3Select the most appropriate process model.
Activity 1.2.4Establishing the scope of project.
Activity 1.2.5Listing the advantages and disadvantages of project.
PHASE IIIStep 3Perform the requirement analysis of the system.
Activity 3.1Prepare the requirement documents.
PHASE IVStep 4. To identify the project resource requirement for accomplishment of project
58
Activity 4.1Define reusable s/w and human resources.
PHASE VStep 5. Partioning the sofyware into individual components.
Activity 5.1Identification of yhe modules that comprises the system.
Activity 5.2Sketch the interrelationship between different modules.
PHASE VIStep 6. Coding of the modules.
Activity 6.1Identification of functions performed by each module.
Activity 6.2The modules are coded.
PHASE VIIStep 7. Verefication and testing of proper functioning of yhe system.
Activity 7.1Error free execution of the project.Activity 7.2Testing of the system.
5.3)TESTING APPROACH
Testing Method Used.
The following testing methods have been used in this project:
Black Box Testing
Black Box testing also called behavioral testing focuses on the functional
requirements of the software. That is, black box testing enables the
59
software engineer to derive sets of input conditions that will fully exercise
all functional requirement for a program. It attempts to find error in the
following categories: Incorrect or missing functions.
Interface errors
Error in data structures or external database access
Initialization and terminal errors.
This testing strategy is applied during later stages of testing. Because black
box testing purposely disregards control structure, attention is focused on
the information domain. Tests are designed to answer the following
questions.
How is functional validity tested?
How is system behavior and performance tested?
What classes of input will make a good test case?
Is the system particularly sensitive to certain input values?
How are the boundaries of a data class isolated?
What data rates and data volume can the system tolerate?
What effect will specific combination of data have on system operation?
The various types of black box testing are:
Graph-Based testing Equivalence partitioning
60
Boundary value analysis Comparison testing Orthogonal array testing
How is functional validity tested?The functional validity of the project was tested by giving series of
input and noting the output and then matching the output obtained with
the expected output.
All the modules were tested individually in different environment and
were found to be functionally valid.
Basis Path Testing
Basis Path Testing is a White Box Testing Technique proposed by Tom
McCabe [MCC 76]. The Basis path method enables the test case designer
to derive a logical complexity measure of a procedural design and use this
measure as a guide for defining a basis set of execution paths. Test cases
derived to exercise the basis set are guaranteed to execute every statement
in the program at least one time during testing.
1. Flow Graph notation:
Flow Graph depicts logical control flow of the process. Each circle
is called a flow graph node and represents one or more procedural
statements .
2. Cyclomatic Complexity
Cyclomatic Complexity is a software metric that provides a
61
quantitative measure of logical complexity of a program. The value
for Cyclomatic Complexity defines the number of independent paths
in the basis set of a program and provides us with an upper bound
for the number of tests that must be conducted to ensure that all
statements have been executed at least once.
Data Flow Graph
62
1,2
3 4
1,2-Checking the mailform of the coming mail 3- Checking for the domain name of the incoming mail in the black
list.
4- Reject the incoming mail.
5- Checking whether the mail is forged or not.
6- Check for the domain name in white list.
7- Deliver the mail. 8- Add in suspicious list
Path 1: 1,2,3,4,1,2
Input:Mail with the from field contained in the black list.
Output: Sender reject.
63
5
6 7
8
Path 2: 1,2,3,5,4,1,2
Input: Mail with incorrect domain name in from field .
Output: Sender reject.
Path 3: 1,2,3,4,5,6,7
Input: Mail with the from field contained in the white list.
Output: Sender OK.
Path 4: 1,2,3,4,5,6,8,7
Input:Mail with the from field contained neither in black list nor in white list.
Output:From field is added in the suspicious list and the mail is delivered.
64
This approach of handling spams using white and black list of domains names and
recipient feedback works fairly well. With higher number of users using the system,
the efficiency and accuracy is expected to go up substantially. System is also
expected to stabilize soon enough so as not to cause any substantial user resistance.
Also, the system can be used in combination with other approaches, like content
checking towards filtering out the spam
7.2)LIMITATIONS OF THE PROJECT
Following are the limitations of our project:
Our project ,therefore spam filter is capable of filtering mails according to the
domain names listed in black list only . Therefore it ,at this stage is not able to
filter the spams on the basis of the its contents or some other criteria.
DIFFICULTIES ENCOUNTERED
Following are the difficulties encountered while making of our project:
As it was our first experience with the Linux operating system ,therefore it
was bit difficult to cope up with its environment.
Since sendmail server need to be configured for enabling the milter facility
the configuration was a difficult task to perform.
Apache web server need to be configured to run PHP ,and initially PHP was
not working properly.
68
FUTURE SCOPE OF PROJECT
There is a wide scope of enhancement in our project.
Following enhancements can be done:
Filtering of spams can be done on the basis of its contents.
69
BIBLIOGRAPHY
[1] Red Hat Linux 8 Server –Mohammed J. Kabir
[2] Professional Linux programming wrox publication
[3] Professional PHP Guide wrox publication
70
Other References:
[1] Evaluating Anti-Spam Solutions - Criteria that Makes the Difference,Michael
Vizard, editor-in-chief, CRN (www.syntegra.com/us/anti spam/evaluating anti-spam
solutions full.pdf )
[2] Spambayes anti-spam, http://spambayes.sourceforge.net/
[3] Mozilla spam filter, ttp://www.mozilla.org/mailnews/spam.html
[4] SPAM Research Center http://www.spamresearchcenter.com/
[5] SpamAssassin, http://www.spamassassin.org/
[6] Send-mail, http://www.sendmail.org/
[7] Milter community website, http://www.milter.org/
[8] PHP, http://www.php.net/
71