Divya Document

97
AUTOMATIC DISCOVERY OF ASSOCIATION ORDERS BETWEEN NAME AND ALIASES FROM THE WEB USING ANCHOR TEXT BASED CO-OCCURENCES CHAPTER 1 INTRODUCTION This paper mainly deals with information retrieval system. Information retrieval is the area where users might search for documents, information within documents and metadata from documents on the web. Many users query might include retrieval of documents for personal names. 1 DEPARTMENT OF CSE GDMM,NANDIGAMA

description

AUTOMATIC DISCOVERY OF ASSOCIATION ORDERS BETWEEN NAME AND ALIASES FROM THE WEB USING ANCHOR TEXT BASED CO-OCCURENCES

Transcript of Divya Document

1

AUTOMATIC DISCOVERY OF ASSOCIATION ORDERS BETWEEN NAME AND ALIASES FROM THE WEB USING ANCHOR TEXT BASED CO-OCCURENCES

CHAPTER 1INTRODUCTION

This paper mainly deals with information retrieval system. Information retrieval is the area where users might search for documents, information within documents and metadata from documents on the web. Many users query might include retrieval of documents for personal names. Many celebrities and experts from various fields are referred by their original names on web. Most of the queries to web search engines include person names. For example, people might use Michel Jackson as a query on search engine to know about him. The search engine might give the relevant documents met the information need of the users query. Apparently celebrities and experts might also be referred by their aliases on the web. Many web pages about person names might also be created by aliases. For example, a newspaper article might refer the persons using their original names, whereas a blogger might refer them using their nick names. The user will not be able to retrieve all information about a person if he only uses his personal name. To retrieve complete information about a person name, one might know about his aliases on the web. Various types of words are used as aliases on the web. Identifying aliases will be helpful in information retrieval. The aliases are extracted using previously proposed alias extraction method. The search engine expands the query on person names by tagging the extracted aliases to retrieve relevant web pages those are referred by original names as well as aliases thereby improving recall and MRR.1.1 MOTIVATIONSearching for information about people in the Web is one of the most common activities of Internet users. Around 30% of search engine queries include person names .However, retrieving information about people from web search engines can become difficult when a person has nicknames or name aliases. For example, the famous Japanese major league base ballplayer Hideki Matsui is often called as Godzilla on the Web. A newspaper article on the baseball player might use the real name, Hideki Matsui, whereas a blogger would use the alias, Godzilla, in a blog entry. We will not be able to retrieve all the information about the base ballplayer if we only use his real name .Identifying aliases of a name is important in information retrieval. In information retrieval, to improve recall of a web search on a person name, a search engine can automatically expand a query using aliases of the name. In our previous example, a user who searches for Hideki Matsui might also be interested in retrieving documents in which Matsui is referred to as Godzilla. Consequently, we can expand a query on Hideki Matsui using his alias name Godzilla.The set A of its aliases to be the set of all words or multiword expressions that are used to refer on the web. For example, Godzilla is a one-word alias for Hideki Matsui, whereas alias the Fresh Prince contains three words and refers to Will Smith. Various types of terms are used as aliases on the web. For instance, in the case of an actor, the name of a role or the title of a drama (or a movie) can later become an alias for the person (e.g., Fresh Prince, Knight Rider). Titles or professions such as president, doctor, professor, etc., are also frequently used as aliases. Variants or abbreviations of names such as Bill for William, and acronyms such as JFK for John Fitzgerald Kennedy are also types of name aliases that are observed frequently on the web.1.2 OVERVIEW

An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given person name is useful in various web related tasks such as information retrieval, personal name disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the web. Given a personal name, the proposed method first extracts a set of aliases. Second, we rank the extracted candidates according to the likelihood of a candidate being a correct alias of the given name.

We define numerous ranking scores to evaluate candidate aliases using three approaches: lexical pattern frequency, word co-occurrences in an anchor text graph, and page counts on the web. To construct a robust alias detection system, we integrate the different ranking scores into a single ranking function using ranking support vector machines. We evaluate the proposed method on two data sets: an English personal names data set, an English place names data set The proposed method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a statistically significant mean reciprocal rank (MRR) of 0.67. Experiments carried out using location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities, and for different languages. Moreover, the aliases extracted using the proposed method are successfully utilized in an information retrieval task and improve recall by 20 percent in a relation detection task.

To select the best aliases among the extracted candidates, we propose numerous ranking scores based upon three approaches: word co-occurrences in an anchor text graph, and page counts on the web. Moreover, using real-world name alias data, we train a ranking support vector machine to learn the optimal combination of individual ranking scores to construct a robust alias extraction method.

Along with the recent rapid growth of social media such as blogs, extracting and classifying sentiment on the web has received much attention However, when people express their views about a particular entity, they do so by referring to the entity not only using the real name but also using various aliases of the name. By aggregating texts that use various aliases to refer to an entity, a sentiment analysis system can produce an informed judgment related to the sentiment.1.3 LIMITATIONS

However, an inherent limitation is that they cannot identify aliases which share no words or letters with the real name For example; approximate string matching methods would not identify Fresh Prince as an alias for Will Smith.

The most likely correct aliases are assigned a higher rank. Ranking a set of candidate aliases must be in descending order only. The co-occurrences will be considered not only anchor text but also the overall process on the web.

The different ranking scores must be integrated to evaluate ranking function which a complex process Comparison of SVM ranks with is previously proposed rank must be difficult.1.4 ORGANISATION OF DOCUMENTATION

In this project documentation we have initially put the definition and objective of the project as well as the design of the project which is followed by the implementation and testing phases. Finally the project has been concluded successfully and also the future enhancements of the project were given in this documentation.

CHAPTER 2

LITERATURE SURVEY

2.1 INTRODUCTIONLiterature survey is the most important step in software development process. Before developing the tool it is necessary to determine the time factor, economy n company strength. Once these things r satisfied, ten next steps are to determine which operating system and language can be used for developing the tool. Once the programmers start building the tool the programmers need lot of external support. This support can be obtained from senior programmers, from book or from websites. Before building the system the above consideration are taken into account for developing the proposed system.2.2 EXISTING SYSTEMThe existing namesake disambiguation algorithm assumes the real name of a person to be given and does not attempt to disambiguate people who are referred only by aliases.

2.2.1 DISADVANTAGES OF EXISTING SYSTEM 1) To low MRR and AP scores on all data sets. 2) To complex hub discounting measure.2.3 PROPOSED SYSTEMThe proposed method will work on the aliases and get the association orders between name and aliases to help search engine tag those aliases according to the orders such as first order associations, second order associations etc so as to substantially increase the recall and MRR of the search engine while searching made on person names. The term recall is defined as the percentage of relevant documents that were in fact retrieved for a search query on search engine. The mean reciprocal rank of the search engine for a given sample of queries is that the average of the reciprocal ranks for each query. The term word co-occurrence refers to the temporal property of the two words occurring at the same web page or same document on the web. The anchor text is the clickable text on web pages, which points to a particular web document. Moreover the anchor texts are used by search engine algorithms to provide relevant documents for search results because they point to the web pages that are relevant to the user queries. So the anchor texts will be helpful to find the strength of association between two words on the web. The anchor texts-based co-occurrence means that the two anchor texts from the different web pages point to the same the URL on the web. The anchor texts which point to the same URL are called as inbound anchor texts. The proposed method will find the anchor texts-based co-occurrences between name and aliases using co-occurrence statistics and will rank the name and aliases by support vector machine according to the co-occurrence measures in order to get connections among name and aliases for drawing the word co-occurrence graph. Then a word co-occurrence graph will be created and mined by graph mining algorithm so as to get the hop distance between name and aliases that will lead to the association orders of aliases with the name. The search engine can now expand the search query on a name by tagging the aliases according to their association orders to retrieve all relevant pages which in turn will increase the recall and achieve a substantial MRR.

2.4 CONCLUSIONThis paper presents a software application which avoids manual hours and helps the customer to track the status of applications. It is not secure to maintain important information manually it is better to store in database which helps in avoiding conflicts. In this software application no specific training is required for the employees to use this application.CHAPTER 3

ANALYSIS

3.1 INTRODUCTIONFEASIBILITY STUDY The feasibility of the project is analyzed in this phase and business proposal is put forth with a very general plan for the project and some cost estimates. During system analysis the feasibility study of the proposed system is to be carried out. This is to ensure that the proposed system is not a burden to the company. For feasibility analysis, some understanding of the major requirements for the system is essential.Three key considerations involved in the feasibility analysis are

ECONOMICAL FEASIBILITY

TECHNICAL FEASIBILITY

SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY This study is carried out to check the economic impact that the system will have on the organization. The amount of fund that the company can pour into the research and development of the system is limited. The expenditures must be justified. Thus the developed system as well within the budget and this was achieved because most of the technologies used are freely available. Only the customized products had to be purchased. TECHNICAL FEASIBILITY

This study is carried out to check the technical feasibility, that is, the technical requirements of the system. Any system developed must not have a high demand on the available technical resources. This will lead to high demands on the available technical resources. This will lead to high demands being placed on the client. The developed system must have a modest requirement, as only minimal or null changes are required for implementing this system. SOCIAL FEASIBILITYThe aspect of study is to check the level of acceptance of the system by the user. This includes the process of training the user to use the system efficiently. The user must not feel threatened by the system, instead must accept it as a necessity. The level of acceptance by the users solely depends on the methods that are employed to educate the user about the system and to make him familiar with it. His level of confidence must be raised so that he is also able to make some constructive criticism, which is welcomed, as he is the final user of the system.3.2 SOFTWARE SYSTEM SPECIFICATION

3.2.1 PURPOSE OF THE PROJECT

To train and evaluate the proposed method, there are two data sets: the personal names data set and the place names data set. The personal names data set includes people from various fields of cinema, sports, politics, science, and mass media. The place names data set contains aliases for US states. According to the definition of co-occurrences if the two anchor texts co-occur in pointing to the same URL, then undirected edge will be drawn between them to denote their co-occurrences. A word co-occurrence graph like will be created for the name and aliases according to their first order associations among them. Each name and aliases will be represented by a node in the graph. The two nodes will be connected if they make first order associations between them. The edge between nodes will describe that the nodes bearing anchor texts co-occur according to the definition of anchor texts co-occurrences. Next the hop distance between nodes will be identified in order to have first, second, and higher order associations between name and aliases by graph mining algorithm.3.2.2 SCOPE OF THE PROJECT Extracting aliases of an entity is important for various tasks such as identification of relations among entities, web search and entity disambiguation. To extract relations among entities properly, one must first identify those entities. We propose a novel approach to find aliases of a given name using automatically extracted lexical patterns. We exploit a set of known names and their aliases as training data and extract lexical patterns that convey information related to aliases of names from text snippets returned by a web search engine. The patterns are then used to find candidate aliases of a given name. We use anchor texts to design a word co-occurrence model and use it to define various ranking scores to measure the association between a name and a candidate alias. The ranking scores are integrated with page-count-based association measures using support vector machines to leverage a robust alias detection method. The proposed method outperforms numerous baselines and previous work on alias extraction on a dataset of personal names, achieving a statistically significant mean reciprocal rank of 0.6718. Experiments carried out using a dataset of location names and Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different types of named entities and for other languagesSRS MODEL

Fig 3.1 steps to develop system analysis modelThewaterfall modelis asequentialdesignprocess, often used insoftware development processes, in which progress is seen as flowing steadily downwards (like awaterfall) through the phases of Conception, Initiation,Analysis,Design, Construction,Testing, Production/Implementation, andMaintenance. The waterfall development model originates in themanufacturingandconstruction industries highly structured physical environments in which after-the-fact changes are prohibitively costly, if not impossible. Since no formal software development methodologies existed at the time, this hardware-oriented model was simply adapted for software development.

The first known presentation describing use of similar phases in software engineering was held by Herbert D. Benington at Symposium on advanced programming methods for digital computers on 29 June 1956. This presentation was about the development of software forSAGE. In 1983 the paper was republished with a foreword by Benington pointing out that the process was not in fact performed in a strict top-down fashion, but depended on a prototype.

The first formal description of the waterfall model is often cited as a 1970 article by Winston W. Royce, although Royce did not use the term "waterfall" in this article. Royce presented this model as an example of a flawed, non-working model.This, in fact, is how the term is generally used in writing about software development to describe a critical view of a commonly used software development practice.3.3 SYSTEM CONFIGURATION

3.3.1 HARDWARE SYSTEM CONFIGURATION Processor

- Pentium III/IV Speed

- 1.1 Ghz

RAM

- 256 MB(min)

Hard Disk

- 20 GB

Floppy Drive

- 1.44 MB

Key Board

- Standard Windows Keyboard

Mouse

- Two or Three Button Mouse

Monitor

- SVGA 3.3.2 Software System Configuration

Operating System

: Windows95/98/2000/XP

Application Server

: Tomcat5.0/6.X Front End

: HTML, Java, Jsp

Scripts

: JavaScript.

Server side Script

: Java Server Pages.

Database

: MysQl

Database Connectivity : JDBC.3.4SPECIFIC PROJECT SOFTWARESJava Technology

Initially the language was called as oak later renamed as Java in 1991. The primary motivation of this language was the need for a platform-independent (i.e., architecture neutral) language that could be used to create software to be embedded in various consumer electronic devices. Java is a programmers language.

Java is cohesive and consistent.

Except for those constraints imposed by the Internet environment, Java gives the programmer, full control.

Finally, Java is to Internet programming where C was to system programming. Importance of java to the internet Java has had a profound effect on the Internet. This is because; Java expands the Universe of objects that can move about freely in Cyberspace. In a network, two categories of objects are transmitted between the Server and the Personal computer. They are: Passive information and Dynamic active programs. The Dynamic, Self-executing programs cause serious problems in the areas of Security and probability. But, Java addresses those concerns and by doing so, has opened the door to an exciting new form of program called the Applet.

Java can be used to create two programs

Applications and Applets: An application is a program that runs on our Computer under the operating system of that computer. It is more or less like one creating using C or C++. Javas ability to create Applets makes it important. An Applet is an application designed to be transmitted over the Internet and executed by a Java compatible web browser. An applet is actually a tiny Java program, dynamically downloaded across the network, just like an image. But the difference is, it is an intelligent program, not just a media file. It can react to the user input and dynamically change.

Java Virtual Machine (JVM)Beyond the language, there is the Java virtual machine. The Java virtual machine is an important element of the Java technology. The virtual machine can be embedded within a web browser or an operating system. Once a piece of Java code is loaded onto a machine, it is verified. As part of the loading process, a class loader is invoked and does byte code verification makes sure that the code thats has been generated by the compiler will not corrupt the machine that its loaded on. Byte code verification takes place at the end of the compilation process to make sure that is all accurate and correct. So byte code verification is integral to the compiling and executing of Java code.

THE THREE-OOP PRINCIPLES

A)ENCAPSULATION

Encapsulation is the mechanism that binds together code and the data it manipulates, and keeps both safe from outside interference and misuse. One way to think about encapsulation is as a protective wrapper that prevents the code and data from begin arbitrarily accessed by other code defined outside of the wrapper. Access to he code and data inside the wrappers is tightly controlled through a well-defined interface. To relate this to the real world, consider the automatic transmission on an automobile. It encapsulates hundreds of bits of information about our engine, such as how much you are accelerating, the pitch of the surface you are on, and the position of the shift lever.B) INHERITANCEInheritance is the process by which one object acquires the properties of another object. This is important because it supports the concept of hierarchical classification. As mentioned earlier, most knowledge is made manageable by hierarchical classification. Inheritance interacts with encapsulation as well. If a given class encapsulates some attributes, then subclass will have the same attributes plus any that it adds as part of its specialization. C)POLYMORPHISM

Polymorphism (from the Greek, meaning many forms) is a feature that allows one interface to be used for a general class of actions. More generally, the concept of polymorphism is often expressed by the phrase one interface, multiple methods. This means that is possible to design a generic interface to a group or related activities. This helps reduce complexity by allowing the same interface to be used to specify a general class of action. It is the compilers job to select the specific action as it applies to each situationMY SQLMySQL is a relational database management system (RDBMS), and ships with no GUI tools to administer MySQL databases or manage data contained within the databases. Users may use the included command line tools, or use MySQL "front-ends", desktop software and web applications that create and manage MySQL databases, build database structures, back up data, inspect status, and work with data records. The official set of MySQL front-end tools,my sql work bench is actively developed by Oracle, and is freely available for use. Graphical PackageThe official MySQL Workbench is a free integrated environment developed by MySQL AB, that enables users to graphically administer MySQL databases and visually design database structures. MySQL Workbench replaces the previous package of software, My SQL GUI Tools. Similar to other third-party packages, but still considered the authoritative My SQL front end, My SQL Workbench lets users manage database design & modeling, SQL development (replacing My SQL Query Browser) and Database administration (replacing MySQL Administrator).

MySQL Workbench is available in two editions, the regular free and open source Community Edition which may be downloaded from the MySQL website, and the proprietary Standard Edition which extends and improves the feature set of the Community Edition.MySQL ships with many command line tools, from which the main interface is 'mysql' client. Third-parties have also developed tools to manage, optimize, monitor and backup a MySQL server, some listed below. All these tools work on *NIX type operating systemsAs of April 2009[update], MySQL offered MySQL 5.1 in two different variants: the open source MySQL Community Server and the commercial Enterprise Server. MySQL 5.5 is offered under the same licences. They have a common code base and include the following features:

A broad subset of ANSI SQL 99, as well as extensions

Cross-platform support

Stored procedures Triggers Cursors Updatable Views Information schema Strict mode X/Open XA distributed transaction processing (DTP) support; two phase commit as part of this, using Oracle's InnoDB engine

Independent storage engines Transactions with the InnoDB, and Cluster storage engines; savepoints with InnoDB

SSL supportLimitationsLike other SQL databases, MySQL does not currently comply with the full SQL standard for some of the implemented functionality, including foreign key references when using some storage engines other than the 'standard' InnoDBTriggers are currently limited to one per action / timing, i.e. maximum one after insert and one before insert on the same table. There are no triggers on viewsMySQL, like most other transactional relational databases, is strongly limited by hard disk performance. This is especially true in terms of write latency. Given the recent appearance of very affordable consumer grade SATA interface Solid-state drives that offer zero mechanical latency, a fivefold speedup over even an eight drive RAID array can be had for a smaller investmentWindows DeploymentMySQL can be built and installed manually from source code, but this can be tedious so it is more commonly installed from a binary package unless special customizations are required. On most Linux distributions the package management system can download and install MySQL with minimal effort, though further configuration is often required to adjust security and optimization settings.Though MySQL began as a low-end alternative to more powerful proprietary databases, it has gradually evolved to support higher-scale needs as well. It is still most commonly used in small to medium scale single-server deployments

CommunityThe MySQL server software itself and the client libraries use dual-licensing distribution. They are offered under GPL version 2, beginning from 28 June 2000 (which in 2009 has been extended with a FLOSS License Exception) or to use a proprietary license. Support can be obtained from the official manual. Free support additionally is available in different IRC channels and forums. Oracle offers paid support via its MySQL Enterprise products.

Apache tomcatApache Tomcat (or simply Tomcat, formerly also Jakarta Tomcat) is an open source web server and servlet container developed by the Apache Software Foundation (ASF). Tomcat implements the Java Servlet and the JavaServer Pages (JSP) specifications from Sun Microsystems, and provides a "pure Java" HTTP web server environment for Java code to run.Apache Tomcat includes tools for configuration and management

ClusterThis component has been added to manage large applications. It is used for Load balancing that can be achieved through many techniques.Clustering support currently requires the JDK version 1.5 or later.

High availabilityA high-availability feature has been added to facilitate the scheduling of system upgrades (e.g. new releases, change requests) without affecting the live environment. This is done by dispatching live traffic requests to a temporary server on a different port while the main server is upgraded on the main port. It is very useful in handling user requests on high-traffic web applications.[2]Web ApplicationIt has also added user as well as system based web applications enhancement to add support for deployment across the variety of environments. It also tries to manage session as well as applications across the network.

Tomcat building is additional components. A number of additional components may be used with Apache Tomcat. These components may be built by users should they need them or they can be downloaded from one of the mirrors may be built by users

FeaturesTomcat 7.x implements the Servlet 3.0 and JSP 2.2 specifications. It requires Java version 1.6, although previous versions have run on Java 1.1 through 1.5. Versions 5 through 6 saw improvements in garbage collection, JSP parsing, performance and scalability. Native wrappers, known as "Tomcat Native", are available for Microsoft Windows and Unix for platform integration3.5 CONTENT DIAGRAM OF THE PROJECTFFig3.2: Architecture of proposed method

The architecture outlined in Fig3.2 and comprises four main components namely computation of word co-occurrence statistics, ranking anchor texts, creation of anchor text co-occurrence graph, and discovery of association orders. The anchor text mining to measure the associations between anchor texts. Ranking support vector machine (SVM) will be used to rank the anchor texts with respect to each anchor text to identify the highest ranking anchor text for making first order associations among anchor texts. The whole process that should done in the project is described as follows

The input is in the form of allinanchor: input is given to the Google search engine and it will retrieve the all corresponding anchor texts and urls according to the given input and those anchor texts and urls are kept in a table called contingency table in contingency table the anchor texts and urls are arranged in two columns. After creation of contingency table the word co-occurrence frequency can be computed

In the second part of the architecture the training data set is presented. The input is given to the support vector machine. The support vector machine generates a ranking function that function is given to the ranking algorithm. The ranking algorithm generates a first order association.

In the third part of the architecture to get the connections among the name and aliases the graph drawing algorithm is drawn. From that algorithm the word co-occurrence graph is created. The word co-occurrence graph should be mined by graph mining algorithm. The hop distance is found by using graph mining algorithm. Finally the association orders will be discovered by using the hop distances.3.6ALGORITHM3.6.1 Keyword Extraction Algorithm Matsuo, Ishizuka proposed a method called keyword extraction algorithm that applies to a single document without using a corpus. Frequent terms are extracted first, and then a set of co-occurrences between each term and the frequent terms, i.e., occurrences in the same sentences, are generated. Co-occurrence distribution showed the importance of a term in the document. However, this method only extracts a keyword from a document but not correlate any more documents using anchor texts-based co-occurrence frequency3.7 FLOW CHART

Fig3.3: Flow chart of the project3.8 CONCLUSION

In this phase, we understand the software requirements specifications for the project. We arrange all the required components to develop the project in this phase itself so that we will have a clear idea regarding the requirements before designing the project. Thus we will proceed to the design phase followed by the implementation phase of the project.

CHAPTER 4

DESIGN

4.1 INTRODUCTION4.1.1 INPUT DESIGN

The input design is the link between the information system and the user. It comprises the developing specification and procedures for data preparation and those steps are necessary to put transaction data in to a usable form for processing can be achieved by inspecting the computer to read data from a written or printed document or it can occur by having people keying the data directly into the system. The design of input focuses on controlling the amount of input required, controlling the errors, avoiding delay, avoiding extra steps and keeping the process simple. The input is designed in such a way so that it provides security and ease of use with retaining the privacy. Input Design considered the following things:

What data should be given as input?

How the data should be arranged or coded?

The dialog to guide the operating personnel in providing input.

Methods for preparing input validations and steps to follow when error occur.4.1.2 OBJECTIVES

Input Design is the process of converting a user-oriented description of the input into a computer-based system. This design is important to avoid errors in the data input process and show the correct direction to the management for getting correct information from the computerized system.

It is achieved by creating user-friendly screens for the data entry to handle large volume of data. The goal of designing input is to make data entry easier and to be free from errors. The data entry screen is designed in such a way that all the data manipulates can be performed. It also provides record viewing facilitiesWhen the data is entered it will check for its validity. Data can be entered with the help of screens. Appropriate messages are provided as when needed so that the user will not be in maize of instant. Thus the objective of input design is to create an input layout that is easy to follow

4.1.3 OUTPUT DESIGN

A quality output is one, which meets the requirements of the end user and presents the information clearly. In any system results of processing are communicated to the users and to other system through outputs. In output design it is determined how the information is to be displaced for immediate need and also the hard copy output. It is the most important and direct source information to the user. Efficient and intelligent output design improves the systems relationship to help user decision-making.

1. Designing computer output should proceed in an organized, well thought out manner; the right output must be developed while ensuring that each output element is designed so that people will find the system can use easily and effectively. When analysis design computer output, they should Identify the specific output that is needed to meet the requirements.

2. Select methods for presenting information.

3. Create document, report, or other formats that contain information produced by the system.

The output form of an information system should accomplish one or more of the following objectives.

Convey information about past activities, current status or projections of the

Future.

Signal important events, opportunities, problems, or warnings.

Trigger an action.

Confirm an action4.2 UML DIAGRAMS4.2.1 INTRODUCTIONUML is a method for describing the system architecture in detail using the blueprint.UML represents a collection of best engineering practices that have proven successful in the modeling of large and complex systems.

UML is a very important part of developing objects oriented software and the software development process.

UML uses mostly graphical notations to express the design of software projects. Using the UML helps project teams communicate, explore potential designs, and validate the architectural design of the software.

DefinitionUML is a general-purpose visual modeling language that is used to specify, visualize, construct, and document the artifacts of the software system.

UML is a languageIt will provide vocabulary and rules for communications and function on conceptual and physical representation. So it is modeling language.UML SpecifyingSpecifying means building models that are precise, unambiguous and complete. In particular, the UML address the specification of all the important analysis, design and implementation decisions that must be made in developing and displaying a software intensive system.UML VisualizationThe UML includes both graphical and textual representation. It makes easy to visualize the system and for better understanding.

UML ConstructingUML models can be directly connected to a variety of programming languages and it is sufficiently expressive and free from any ambiguity to permit the direct execution of models.

UML DocumentingUML provides variety of documents in addition raw executable codes.

Fig 4.1: Modeling a System Architecture using views of UML

The use case view of a system encompasses the use cases that describe the behavior of the system as seen by its end users, analysts, and testers. The design view of a system encompasses the classes, interfaces, and collaborations that form the vocabulary of the problem and its solution. The process view of a system encompasses the threads and processes that form the system's concurrency and synchronization mechanisms. The implementation view of a system encompasses the components and files that are used to assemble and release the physical system. The deployment view of a system encompasses the nodes that form the system's hardware topology on which the system executes.Uses of UML The UML is intended primarily for software intensive systems. It has been used effectively for such domain as

Enterprise Information System

Banking and Financial Services

Telecommunications

Transportation

Defense/AerospaceRetails

Medical Electronics

Scientific Fields

Distributed Web

Building blocks of UMLThe vocabulary of the UML encompasses 3 kinds of building blocks

Things

Relationships

Diagrams

ThingsThings are the data abstractions that are first class citizens in a model. Things are of 4 types

Structural Things, Behavioral Things , Grouping Things, An notational Things

RelationshipsRelationships tie the things together. Relationships in the UML are

Dependency, Association, Generalization, Specialization

UML DiagramsA diagram is the graphical presentation of a set of elements, most often rendered as a connected graph of vertices (things) and arcs (relationships).

There are two types of diagrams, they areStructural and Behavioral Diagrams

Structural DiagramsThe UMLs four structural diagrams exist to visualize, specify, construct and document the static aspects of a system. View the static parts of a system using one of the following diagrams. Structural diagrams consist of Class Diagram, Object Diagram, Component Diagram, and Deployment Diagram.

Behavioral Diagrams The UMLs five behavioral diagrams are used to visualize, specify, construct, and document the dynamic aspects of a system. The UMLs behavioral diagrams are roughly organized around the major ways which can model the dynamics of a system.

Behavioral diagrams consists of

Use case Diagram, Sequence Diagram, Collaboration Diagram, State chart Diagram, Activity Diagram4.2.2 CLASS DIAGRAMClass diagrams are widely used to describe the types of objects in a system and their relationships. Class diagrams model class structure and contents using design elements such as classes, packages and objects. Class diagrams describe three different perspectives when designing a system, conceptual, specification, and implementation. These perspectives become evident as the diagram is created and help solidify the design. Class diagrams are arguably the most used UML diagram type. It is the main building block of any object oriented solution. It shows the classes in a system, attributes and operations of each class and the relationship between each class. In most modeling tools a class has three parts, name at the top, attributes in the middle and operations or methods at the bottom. In large systems with many classes related classes are grouped together to create class diagrams. Different relationships between diagrams are show by different types of Arrows.

Fig 4.2: class diagram4.2.3 USE-CASE DIAGRAMA use case is a set of scenarios that describing an interaction between a user and a system. A use case diagram displays the relationship among actors and use cases. The two main components of a use case diagram are use cases and actors.

Fig 4.3: Elements of use-case diagramAn actor is represents a user or another system that will interact with the system you are modeling. A use case is an external view of the system that represents some action the user might perform in order to complete a task.

Contents:

Use cases

Actors

Dependency, Generalization, and association relationships

System boundary

Fig4.4: Use case diagram4.2.4 SEQUENCE DIAGRAMSequence diagrams in UML shows how object interact with each other and the order those interactions occur. Its important to note that they show the interactions for a particular scenario. The processes are represented vertically and interactions are show as arrows

Fig4.5: Sequence diagram4.2.5 COLLABORATION DIAGRAMCommunication diagram was called collaboration diagram in UML. It is similar to sequence diagrams but the focus is on messages passed between objects. The same information can be represented using a sequence diagram and different objects.

Fig4.6: Collaboration diagram4.2.6 ACTIVITY DIAGRAMActivity diagrams describe the workflow behavior of a system. Activity diagrams are similar tostate diagramsbecause activities are the state of doing something. The diagrams describe the state of activities by showing the sequence of activities performed. Activity diagrams can show activities that are conditional or parallelActivity diagrams should be used in conjunction with other modeling techniques such asinteraction diagramsand state diagrams. The main reason to use activity diagrams is to model the workflow behind the system being designed. Activity Diagrams are also useful for: analyzing a use case by describing what actions need to take place and when they should occur;describing a complicated sequential algorithm;and modeling applications with parallel processes.

Fig 4.7: Activity diagram 4.2.7 STATE CHART DIAGRAMSState chart diagrams are similar to activity diagrams although notations and usage changes a bit. They are sometime known as state diagrams or start chart diagrams as well.. Below State machine diagram show the basic states and actions.

Fig4.8: State chart diagram4.2.8 COMPONENT DIAGRAMA component diagram displays the structural relationship of components of a software system. These are mostly used when working with complex systems that have many components.. Below images shows a component diagram.

user

Fig 4.9: Component diagram4.2.9 DEPLOYMENT DIAGRAMA deployment diagrams shows the hardware of your system and the software in those hardware. Deployment diagrams are useful when your software solution is deployed across multiple machines with each having a unique configuration

Fig 4.10: Deployment diagram4.3 LOGICAL DESIGNLogical design is a process through which requirements are translated into a representation of software. Initially the representation depicts a holistic view of software. Subsequent refinement leads to a design representation that is very close to source code.

The conceptual structure of a database is called a schema. Schema shows the kinds of data that exists in a database and how the kinds of data are logically related to each other. A schema can be regarded as a blueprint that portrays both kind of data used in building a database and logical relationship that exist among various kinds of data. At the minimum, the schema must represent all needed data items, must correctly represent their interrelationships, and must be able to support all reports. Schema is frequently depicted pictorially using Data Flow Diagrams (DFD). 4.3.1 DATA FLOW DIAGRAMS

Data Flow Diagrams (DFD) depicts information flow and transforms that are applied as data move from input to output. The DFD is also known as Data Flow Graph or Bubble Chart. It is the starting point of the design phase that functionality decomposes the requirement specification down to the lowest level of details. Thus, a DFD describes what data flows (logical) rather than how they are processed. Data Flow Diagrams are made up of number of symbols which represent system components. Data Flow modeling methods used for kinds of symbols. These symbols are used to represent four kinds of system components. Processes, Data Stores, External Entities, Data Flows.

Processes

Processes show what system does. Each process has one or more data inputs and produces one or more data outputs. Processes are represented by round rectangles in DFD.Data StoresA file or data stores is repository of data. Processes can enter data into a store or retrieve data from data store. The line in the DFD and each store represents each data store as a unique name.External Entities

External entities are outside the system but they supply either input data into the system or used for the system output. They are entities on which the designer has no control. There may be an organization or other bodies with which system interacts.

Data Flows

Data Flows model the passage of data on the system and represented by the lines joining the system components. An arrow indicates the direction of flow and line is labeled by the name of data flow. Flow of data in the system can take place

Between two processes

From a data store to a process

From a process to a process

From source to a process

4.3.2Level 0 DFD Diagrams

Fig 4.11: level 0 data flow diagram

First the user login into the system with his user id and password and uploads the image after successful completion of uploading the image the user has to search the image with the original name and nick name that means aliases. The original name and nick name of the celebrity is given at the time of image uploading. The user has to search the image with the aliases. Then he got the results for the appropriate image.4.3.3 Level 1 DFD diagram

Fig 4.12: level 1 dataflow diagramFirst the user login into the system with his user id and password if he is authenticated. If the user is not authenticated then first he register into the system then only he login into the system. Then the image uploading is done where the original name and aliases is given. Then the user searches the image with the alias name then he got the appropriate results for the corresponding search. The difference between level 0 level 1 DFD diagrams is nothing but the more brief description is given in the level 1 DFD diagram4.4 DATABASE DESIGN

Table name : Admin

Function : This table contain two fields they are name and password where the

Admin login with his name and password where he can browse and upload the photos Field Type Null

name varchar(255) NO

password varchar(255) NO

Table 4.1: Admin tableTable name: CommentFunction : This table contains four fields they are id, name, comment, date. This Table is used to put a comment to the user uploaded photos. The users names also displayed on his comments.Field Type Null

id varchar(255) NO

name varchar(255) NO

comment varchar(255) NO

date varchar(255) NO

Table 4.2: Comment tableTable name: NewFunction : The user can login with his login id and upload photos with their tags and descriptions. The likes and comments are also presented where the

another users Can put a comment on already uploaded photo.

Field Type Null

image blob NO

tag varchar(255) NO

iname varchar(255) NO

id int(255) NO

count int(255) NO

comment int(255) NO

description varchar(255) NO

like int(255) NO

date varchar(255) NO

Table 4.3: New table

Table name: RegFunction : The new user can register with their name and e-mail and whenever the

Registration process is successfully completed then that user login with Their id and browsing is done Field Type Null

name varchar(255) NO

email varchar(255) NO

gender varchar(255) NO

phone varchar(255) NO

date varchar(255) NO

Table 4.4: Registration tableE-R DIAGRAMAn ER model is an abstract way to describe a database. Describing a database usually starts with a relational database, which stores data in tables. Some of the data in these tables point to data in other tables - for instance, your entry in the database could point to several entries for each of the phone numbers that are yours. The ER model would say that you are an entity, and each phone number is an entity, and the relationship between you and the phone numbers is 'has a phone number'. Diagrams created to design these entities and relationships are called entityrelationship diagrams or ER diagrams.An entity may be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identifiedA relationship captures how entities are related to one another. Relationships can be thought of as verbs, linking two or more nouns. Entities and relationships can both have attributes

Peter Chen, the father of ER modeling said in his seminal paper"The entity-relationship model adopts the more natural view that the real world consists of entities and relationships. It incorporates some of the important semantic information about the real world." Limitations ER models assume information content that can readily be represented in a relational database. They describe only a relational structure for this information.They are inadequate for systems

Fig4.13: E-R Diagram4.5 MODULE DESIGNING AND ORGANIZATIONTotally there are 5 modules in our project

1. Co-occurences in anchor texts

2. Role of anchor texts

3. Anchor text co-occurrence frequency

4. Ranking anchor texts

5. Discovery of association orders

1. Co-occurrences in Anchor Texts The proposed method will first retrieve all corresponding URLs from search engine for all anchor texts in which name and aliases appear. Most of the search engines provide search operators to search in anchor texts on the web. For example, Google provides in anchor or Allinanchor search operator to retrieve URLs that are pointed by the anchor text given as a query. For example, query on Allinanchor:Hideki Matsui to the Google will provide all URLs pointed by Hideki Matsui anchor text on the web. For example, the picture of Arnold Schwarzenegger is shown in Fig which is being liked by four different anchor texts. According to the definition of co-occurrences on anchor texts, Terminator and Predator are co-occurring. As well, The Expendables and Governator are also co-occurring.

Fig4.14: A picture of Arnold Schwarzenegger being linked by different anchor texts on the web2. Role of Anchor Texts The main objective of search engine is to provide the most relevant documents for a users query. Anchor texts play a vital role in search engine algorithm because it is clickable text which points to a particular relevant page on the web. Hence search engine considers anchor text as a main factor to retrieve relevant documents to the users query. Anchor texts are used in synonym extraction, ranking and classification of web pages and query translation in cross language information retrieval system.3. Anchor Texts Co-occurrence FrequencyThe two anchor texts appearing in different web pages are called as inbound anchor texts if they point to the same URL. Anchor texts co-occurrence frequency between anchor texts refers to the number of different URLs on which they co-occur. For example, if p and x that are two anchor texts are co-occurring, then p and x point to the same URL. If the co-occurrence frequency between p and x is that say an example k, and then p and x co-occur in k number of different URLs. Anchor texts x

P k

Table 4.5: Anchor text co-occurrence frequency table4. Ranking Anchor Texts

Ranking SVM will be used for ranking the aliases. The ranking SVM will be trained by training samples of name and aliases. All the co-occurrence measures for the anchor texts of the training samples will be found and will be normalized into the range of [0-1]. The normalized values termed as feature vectors will be used to train the SVM to get the ranking function to test the given anchor texts of name and aliases. Then for each anchor text, the trained SVM using the ranking function will rank the other anchor texts with respect to their co-occurrence measures with it. The highest ranking anchor text will be elected to make a firstorder association with its corresponding anchor text for which ranking was performed. Next the word co-occurrence graph will be drawn for name and aliases according to the first order associations between them. However, this method considered only the first order co-occurrences on aliases to rank them but did not focus on the second order co-occurrences to improve recall and achieve a substantial MRR for the web search engine

Fig 4.15: word co-occurrence graph5. Discovery of Association Orders

Using the graph mining algorithm, the word co-occurrence graph will be mined to find the hop distances between nodes in graph. The hop distances between two nodes will be measured by counting the number of edges in-between the corresponding two nodes. The number of edges will yield the association orders between two nodes. According to the definition, a node that lies n hops away from p has an n-order co-occurrence with p. Hence the first, second and higher order associations between name and aliases will be identified by finding the hop distances between them. The search engine can now expand the query on person names by tagging the aliases according to the association orders with the name. Thereby the recall will be substantially improved by 40% in relation detection task. Moreover the search engine will get a substantial MRR for a sample of queries by giving relevant search results.4.6 CODING

4.6.1 ADMIN LOGIN PAGE

Untitled Document

4.6.2 USER LOGIN PAGE

Untitled Document

4.6.3 LIKE PAGE

4.6.4 COMMENT PAGE

4.7 CONCLUSIONIn this way we can design the layout of the project which is to be implemented during the construction phase. Thus we will have a clear picture of the project before being coded. Hence any necessary enhancements can be made during this phase and coding can be started.CHAPTER 5

IMPLEMANTATION

5.1 INTRODUCTIONImplementation is the stage of the project when the theoretical design is turned out into a working system. Thus it can be considered to be the most critical stage in achieving a successful new system and in giving the user, confidence that the new system will work and be effective.

The implementation stage involves careful planning, investigation of the existing system and its constraints on implementation, designing of methods to achieve changeover and evaluation of changeover methods. Implementation can be preceded through JSP.JSP will be more suitable for dynamic page designing, data sharing and mining concepts. For maintaining data information we go for MS-SQL as database back end.

Implementation is the stage of the project when the theoretical design is turned out into a working system. Thus it can be considered to be the most critical stage in achieving a successful new system and in giving the user, confidence that the new system will work and be effective.

The implementation stage involves careful planning, investigation of the existing system and its constraints on implementation, designing of methods to achieve changeover and evaluation of changeover methods.

Implementation is the process of converting a new system design into operation. It is the phase that focuses on user training, site preparation and file conversion for installing a candidate system. The important factor that should be considered here is that the conversion should not disrupt the functioning of the organization.

The main purpose of Internet-based productivity applications, such as share values, is to promote the productivity of human users as a group.

It is accepted in this context that high local responsiveness and high concurrency are conducive to individual and group productivity.

Share amount is calculated for all value.

Consistency control in this environment must not only guarantee convergence of replicated data, but also attempt to preserve intentions of operations.

Operational transformation (OT) is a well-established method for optimistic consistency control.

This paper analyzes the root of correctness problems in OT and establishes a novel operational transformation framework for developing OT algorithms and proving their correctness.

Operational Transformation (OT) has been well accepted in group editors for archive high local responsiveness and unconstrained collaboration.

Remote operations are transformed before they are executed such that inconsistencies are repaired.5.2 METHOD OF IMPLEMENTATION

When the system is ready for implementation, emphasis switches to communicate with the finance department staff. Open discussion with the staff is important from the beginning of the project. Staff can be expected to be concerned about the effect of the automation on their jobs and the fear of redundancy or loss of status must be allayed immediately. During the implementation phase it is important that all staff concerned be apprised of the objectives of overall operation of the system. They will need shinning on how computerization will change their duties and need to understand how their role relates to the system as a whole. An organization-training program is advisable; this can include demonstrations, newsletters, seminars etc.

The department should allocate a member of staff, who understands the system and the equipment, and should be made responsible for the smooth operation of the system. An administrator should coordinate the users to the system.

Users should be informed about new aspects of the system that will, affect them. The features of the system explained with the adequate documentation. New services such as security, on-line application form and back-ups must be advertised on the staff when the time is ripe.

Existing documents such as employee loan details should be entered into the new system. Since these files are very large, conversion of these may continue long after the system based on current files has been implemented. Hence we need to assign responsibility for each activity.

The system may come into full operation via number of possible routes. Complete change over at one point time is conceptually the most tidy. But this approach requires careful planning and coordination, particularly during the changeover. A phased approach, possible implementing the system of the section relating to one operation or procedure first and progressing to more novel or complex subsystems in the fullness of time. These likely to be less traumatic. A phased approach gives the staff time to adjust to the new system. But depends on being able to split the system, without reliance on it. Thus approach is sensible when the consequences of failure are disastrous, but will require extra staff time. The fourth angle, is pilot operation permits any problems to be tackled on a smaller scale operation. Pilot operation generally means the implementation of the complete system, but at one location or branch only.5.2.1 FORM DESIGNThe various forms and their processing is given below

Admin Login ScreenPurposeThe purpose of the screen is to allow the administrator to enter into the system.

DetailsThe Edit boxes are used to enter the username and password.The send button is used to login into the systemScreen design

Image uploading screenPurpose

The purpose of the screen is to upload the image and admin give both name and aliases of the image.DetailsThe edit boxes are used to enter the details

The browse button is used to upload the image

The send button indicates the process completion.Screen design

User login screenPurpose

The purpose of the screen is to login the user into the systemDetails

The edit boxes are used to enter the text

The send button is used to login into the system Screen design

Search screen

Purpose

The purpose of the screen is to search the images which are already uploadedDetails

The search bar is used to search the images Screen design

Like screenPurpose

The purpose of the screen is to like the images which are already uploaded according to our wish

Details

The like link is used to like the images Screen design

Comment screenPurpose

The purpose of the screen is to comment the images which are already uploaded according to our wish

Details

The comment link is used to comment the images the edit box will appear to comment

Screen design

5.2.2 OUTPUT SCREENS

Screen1: welcome screen

Screen 2: admin login screen

Screen 3:image uploading screen

Screen 4: user registration screen

Screen 5: user login screen

Screen6: search screen

Screen 7: image display screen

Screen 8: like screen

Screen 9: comment screen

Screen 10: comment posting screen

5.2.3 RESULT ANALYSISThe proposed method will compute anchor texts-based co-occurrences among the given personal name and aliases, and will create a word co-occurrence graph by making connections between nodes representing name and aliases in the graph based on their first order associations with each other. The graph mining algorithm to find out the hop distances between nodes will be used to identify the association orders between name and aliases. Ranking SVM will be used to rank the anchor texts according to the co-occurrence statistics in order to identify the anchor texts in the first order associations. The web search engine can expand the query on a personal name by tagging aliases in the order of their associations with name to retrieve all relevant results thereby improving recall and achieving a substantial MRR compared to that of previously proposed methods.5.3 CONCLUSIONIn this way we implemented the project successfully for an easy interaction of the user with the interfaces and enhanced security with less effort work. We proceed to the next phase i.e., testing which is very important before delivering the project.

CHAPTER 6

TESTING AND VALIDATION

6.1 INTRODUCTIONTesting is the process of detecting errors. Testing performs a very critical role for quality assurance and for ensuring the reliability of software. The results of testing are used later on during maintenance also.

The aim of testing is often to demonstrate that a program works by showing that it has no errors. The basic purpose of testing phase is to detect the errors that may be present in the program. Hence one should not start testing with the intent of showing that a program works, but the intent should be to show that a program doesnt work. Testing is the process of executing a program with the intent of finding errors.

Software testing is a critical element of software quality assurance and represents the ultimate review of software specification, design and coding. The increasing visibility of software as a system element and the attendant costs associated with a software failure are motivating forces for well planned, thorough testing. It is not unusual for software. Development organization to expend 40 percent of total project effort on testing. Hence the importance of software testing and its implications with respect to software quality cannot be overemphasized. Different types of testing have been carried out for this system, and they are briefly explained below.6.1.1 TESTING OBJECTIVESThe main objective of testing is to uncover a host of errors, systematically and with minimum effort and time. Stating formally, we can say,

Testing is a process of executing a program with the intent of finding an error.

A successful test is one that uncovers an as yet undiscovered error.

A good test case is one that has a high probability of finding error, if it exists.

The tests are inadequate to detect possibly present errors.6.1.2 LEVELS OF TESTINGIn order to uncover the errors present in different phases we have the concept of levels of testing. This is a type of black box testing that is based on the specifications of the software that is to be tested. The application is tested by providing input and then the results are examined that need to conform to the functionality it was intended for. The testing is conducted on a complete, integrated system to evaluate the system's compliance with its specified requirements.The basic levels of testing are as shown below

Client Needs

Requirements

Design

Code

Fig 6.1: Levels of testing

6.2 TEST PLAN

A Test plan is a plan prepared by the testers to follow a systematic procedure adapted to fulfill the process of testing. This includes the order of testing scenarios Implemented. As our system is concerned we follow the below procedures:

1. Point out what to be tested and where to be tested:

a) Teat the presence of database connection to obtain the data.

b) Check for the completeness of code to run the system.

c) Decide the portions of code to be tested and not to be tested.

d) Implemented the test scenarios and check the outputs with the specified ones.

2. Fail Criteria:

Here we specify the conditions where the system may fail and add them to test conditions that are to be tested.

3. Test Cases:

The test case Specification deals with the details of the test data, which is used to test the system.6.3 TESTING STRATEGIES

A strategy for software testing integrates software test case design methods into a well-planned series of steps that result in the successful construction of software. Unit Testing

Unit testing focuses verification effort on the smallest unit of software i.e. the module. Using the detailed design and the process specifications testing is done to uncover errors within the boundary of the module. All modules must be successful in the unit test before the start of the integration testing begins.

White Box Testing

White Box Testing mainly focuses on the internal performance of the product. Here a part will be taken at a time and tested thoroughly at a statement level to find the maximum possible errors. Also construct a loop in such a way that the part will be tested with in a range. That means the part is executed at its boundary values and within bounds for the purpose of testing.

Black Box Testing

This testing method considers a module as a single unit and checks the unit at interface and communication with other modules rather getting into details at statement level. Here the module will be treated as a block box that will take some input and generate output. Output for a given set of input combinations are forwarded to other modules.

Integration Testing

After the unit testing we have to perform integration testing. The goal here is to see if modules can be integrated properly or not. This testing activity can be considered as testing the design and hence the emphasis on testing module interactions. It also helps to uncover a set of errors associated with interfacing. Here the input to these modules will be the unit tested modules.

Integration testing is classifies in two types

1. Top-Down Integration Testing.

2. Bottom-Up Integration Testing.

In Top-Down Integration Testing modules are integrated by moving downward through the control hierarchy, beginning with the main control module.

In Bottom-Up Integration Testing each sub module is tested separately and then the full system is tested.

System Testing

Project testing is an important phase without which the system cant be released to the end users. It is aimed at ensuring that all the processes are according to the specification accurately.Acceptance TestingAcceptance Test is performed with realistic data of the client to demonstrate that the software is working satisfactorily. Testing here is focused on external behavior of the system; the internal logic of program is not emphasized.

6.3.1 TESTING ACTIVITIESThe activities of testing include:

Component Inspection: This finds faults in an individual component through the manual inspection of its source code. Inspections can be conducted before or after the unit test.

Usability Testing: This finds differences between the system and the users expectation of what it should do? Usability testing tests the users understanding of the system.

Unit Testing: This finds faults by isolating an individual component using test stubs and drivers and by exercising the component using a test case. Unit testing focuses on the building blocks of the software system, that is, objects and subsystems. The most important unit testing techniques are:

1. Equivalence Testing or Black box testing: This black box testing minimizes the number of test cases. The possible inputs are partitioned into equivalence classes, and a test case is selected for each class. Equivalence testing involves two steps:

Identification of the equivalence classes and

Selection of the test inputs.

2. Path based testing or White box testing: This white box testing technique identifies faults in the implementation of the component. The assumption here is that, by exercising all possible paths through the code at least once, most faults will trigger failures. The identification of paths requires knowledge of the source code and data structures. Integration Testing: This finds faults by integrating several components together. Integration Testing detects faults that have not been detected during unit testing by focusing on small groups of components. Two or more components are integrated and tested and when no new faults are revealed, additional components are added to the group.

System Testing: This focuses on the complete system, its functional and nonfunctional requirements, and its target environment. All the testing activities stated are to be implemented on large scale projects to get the consistent System to be designed. All of them are not applicable to small projects that do not much analysis done.6.4 VALIDATIONNameField validation test in signup

Locationnewsignup.jsp

Input1

Log1All the fields filled correctly

The system will shows him welcome screen to his Inbox

Input2

Log2Invalid pin number

A message Enter Pin Correctly is displayed.

Table 6.1: validations6.5 CONCLUSIONIn this way we also completed the testing phase of the project and ensured that the system is ready to go live. Thus we developed a new technology courier system so that people will have tracking details of their consignments.CHAPTER 7

CONCLUSION

The proposed method compute anchor texts-based co-occurrences among the given personal name and aliases, and create a word co-occurrence graph by making connections between nodes representing name and aliases in the graph based on their first order associations with each other. The graph mining algorithm to find out the hop distances between node used to identify the association orders between name and aliases. Ranking SVM will be used to rank the anchor texts according to the co-occurrence statistics in order to identify the anchor texts in the first order associations. The web search engine can expand the query on a personal name by tagging aliases in the order of their associations with name to retrieve all relevant results thereby improving recall and achieving a substantial MRR compared to that of previously proposed methods.

CHAPTER 8

BIBILOGRAPHYGood Teachers are worth more than thousand books, we have them in Our Department

References Made From

[1] J. Artiles, J. Gonzalo , and F. Verdejo, A Testbed for People Searching Strategies in the WWW, Proc. SIGIR 05, pp. 569-570, 2005.

[2] R. Guha and A. Garg, Disambiguating People in Search, technical report, Stanford Univ., 2004.

[3] D.Bollegala, Y. Matsuo, and M. Ishizuka , Automatic Discovery of Personal Name Aliases from the Web, IEEE Transactions on Knowledge and Data Engineering, vol. 23, No. 6, June 2011.

[4] Y. Matsuo, and M. Ishizuka, Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information, International Journal on Artificial Intelligence Tools, 2004.

[5] W. Lu, L. Chien and H. Lee, Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach, ACM Transactions on Information Systems, Vol. 22, No. 2, Aprill 2004, Pages 242-269.

[6] Z. Liu, W. Yu, Y. Deng, Y. Wang, and Z. Bian, A Feature selection Method for Document Clustering based on Part-of-Speech and Word Co-occurrence, Proceedings of 7th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 10), pp. 2331-2334, Aug 2010.

[7] F. Figueiredo, L. Rocha, T. Couto, T. Salles, M.A. Gonclaves, and W. Meira Jr, Word Co-occurrence Features for Text Classification, Vol 36, Issues 5, Pages 843-858, July 2011.

[8] G. Salton and C. Buckley, Term-Weighting Approaches in Automatic Text Retrieval, Information processing and Management, vol. 24, pp. 513-523, 1988.

[9] T. Dunning, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics, vol. 19, pp. 61-74, 1993.

[10] K. Church and P. Hanks, Word Association Norms, Mutual Information and Lexicography, Computational Linguistics, Vol. 16, pp. 22-29, 1991.

[11] T. Hisamitsu and Y. Niwa, Topic-Word Selection Based on Combinatorial Probability, Proc. Natural Language Processing Pacific-Rim Symp. (NLPRS 01), pp.289-296, 2001.

[12] F.Smadja, Retrieveing Collocations from Text: Xtract, Computational Liguistics, Vol. 19, no 1, pp. 143-177, 1993.

[13] T. Joachims, Optimizing Search Engines using Clickthrough Data, proc. ACM SIGKDD 02, 2002.

[14] D. Chakrabarti and C. Faloutsos , Graph Mining: Laws, Generators, and Algorithms, ACM Computing Surveys, Vol. 38, March 2006, Article 2.

[15] C.C. Agarwal and H. Wang, Graph Data Management and Mining : A Survey of Algorithms and Applications, DOI 10.1007/978-1-4419-6045-0_2,@ Springler Science+Business Media, LLC 2010.

Sites Referred:

http://java.sun.comhttp://www.sourcefordgde.comhttp://www.networkcomputing.com/http://www.roseindia.com/http://www.java2s.com/Start

Extracts Frequent Item Set

Generate Occurrences in the Same Sentence

Stop

Extracts a Keyword

Terminator

Governator

The expendables

Predator

Hideki Matsui

Matsui

Godzilla

Baseball

Yankees

Sports

New York

Acceptance Testing

System Testing

Integration Testing

Unit Testing

User login

Authenticated

yes

No

Upload image

Search image with aliases

Getting results

Getting Results

Search Image with Aliases

Register

Upload Image

Login

Home Page

Upload Image

Search Image with Aliases

Search Image with Aliases

Getting Results

User Login

Upload Image

User Login

Upload Image

Alias Names

Search Image

With Alias Names

Getting Results

PAGE 67DEPARTMENT OF CSE GDMM,NANDIGAMA