ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGEPROCESSING FUNCTIONALITIES

ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGE

PROCESSING FUNCTIONALITIES

The problem of Natural Language Understanding is one of the first problems researchers in AI were

trying to solve. There is a fundamental need for a system which can incorporate this ability of natural language

understanding into the web and create a web which could understand and reason. This type of web is known

as the semantic web which is in its nascent stages of development. This concept of Semantic Web employs

tools and software which aims to achieve this very understanding which is needed in future Web Technologies.

Some tools include Ontologies, Natural Language Software, Information Retrieval Algorithms...Etc. We

propose a system which solves this problem of natural language understanding. The proposed system is capable

of accepting inputs in form of perfect English sentences i.e. Natural English and produce outputs which are both

pertinent and correct in nature. The system performs Part-of-speech Tagging which reduces the complexity of

the input and makes it perceivable to the Natural Language parser which effectively translates the natural

language query to a query on a database Schema. This particular Database Schema is built on a solid

knowledge based logic obtained from a very domain specific Ontology. Hence, data is stored devoid of logical

errors and information retrieval become easier when this is coupled with the Natural Language Technicality. We

use the domain “Demographics of India” for demonstration purposes.

The proposed model consists of four modules namely, the POS Tagger, The NLP Parser, The Ontology

and Database, User Interface. The POS Tagger gets the input sentence and recognizes the various Parts-of-

Speech of each and every word in the Sentence. This is vital to the project because without understanding the

context of the input words, it will be difficult to build a query with it. The second part of the project is the NLP

Parser which gets the output of the POS Tagger and builds a query based on the original Natural English

sentence. The Ontology is the integral part of the project as it houses the logic and the structure of the knowledge

base which is simulated by the database. It is this database which is modeled after the Ontology which will be

queried to produce the final result. The User Interface is used to get Input and display the result. Existing System:

Today’s Web Services are not organized enough to operate in a intelligent way. This is

precisely why we need the Semantic Web in order to overcome the thinking deficiency of the present

day Web. Since my project is a new concept which is in its nascent stages, We cannot compare it with

one particular existing system completely. But when we compare the web as a whole, we can clearly

say that the current web is totally keyword oriented and it lacks the infrastructure and resources to be

able to provide solutions in a logical artificially intelligent manner.

For example, almost all search engines in the internet make use of only keywords and not the

logic behind the query. In simple words, they just match words found in the search string and try to

give results based on previous visits by all users and popularity level rather than trying to provide the

correct answer to the query or the most logical answer to a question.

Currently, only one search engine namely “ask.com” in the internet has accomplished this partially

by using Natural language processing and partially using keyword based techniques. But even, it does not

use the concept of Ontology to be able to dynamically update itself and learn from queries. Demerits of the Existing System:

The existing system does not have the capability to produce pertinent and accurate actions for the

user. In this case of a web service to answer questions like in a search engine, the existing system fails to

deliver the results as the narrow keyword based approach and the algorithm which depends on the number

of successful previous search results does not help a lot to find answers to questions.

Eg :- The existing system will not return a good answer if the search query is “Which states does the Yamuna

flow through?” because it depends on a keyword based approach which will search for individual keywords

like ‘river’, ‘Yamuna’, ‘flow’…etc and will return the most famous (i.e.) most visited result.

Proposed System: The Architecture of the proposed System is shown below in

Fig 3.1 Architecture of the Proposed System

This system will accept natural English language as input and will produce pertinent and correct results as

output. It accomplishes this by having a strong logical Knowledge Base which is built on the lines of a

logically adept Ontology which makes sure that the created database remains logically sound and correct.

The Natural language is processed by using a POS Tagger and a NLP Parser both of which

ensure that the English sentence is correctly understood by the system and a corresponding query is

generated for querying the Database modeled on the Demographics of India – Ontology.

Finally, the User interface is done in JSP and will be used to accept the input and display

the output after it is processed.

The POS Tagger and the NLP Parser are done in Java and the User interface is done in

JSP and the Ontology is created with the Protégé tool and the database using MySQL 5.1.

This System will accept simple queries and has the extension capabilities to add more

support to queries which are more complex in nature.

Madhura Raju

This project was proposed in May, 2009

9/1/2009

A Report

On

The Enervation of IPv4

leading to

The Adoption of IPv6

Submitted by

Madhura

Page 1 of 8

9/1/2009

The Enervation of IPv4 leading to the Adoption of IPv6 The limitations of the IPv4 have paved a clear cut way for the deployment of the

IPv6. This is tardily hitting the world and is undoubtedly anticipated to replace the

IPv4. A deep delve into this subject will give a trenchant picture of why, how and

when the whole process is transitioning.

Before the subject matter, the fundamentals of the outgrowth are discoursed. Internet Protocol Address All the Information exchanged between the sender and the receiver in the internet,

irrespective of type and content, is structured in the form of “packets”. A packet is

a perfectly sized block of data, also known as datagram. Each of these packets

includes a header and a message (also called payload). The header will have the

source and the destination address. Internet Protocol is a standard that assists the

various devices on a network to communicate with one another, having a unique

IP address. This unique address serves the purpose of identifying every device on

both the Local Area Network and the internet. The primary objective of the IP

addresses is to route the packets in the network to their respective destinations.

Format of the IP Address The IP Address is represented as a 32 Bit address, written as 4 whole numbers separated by

periods. Each number employs 8 bits and thus, can range from 0 to 255. Every IP address

has two sections: NetID and HostID. The NetID is the identifier for the network while on the

contrary HostID refers to the particular host in the network. The NetID is the

Page 2 of 8

9/1/2009 unique Internet Number that can be requested from the Network Information

Center (NIC). The HostID, also known as the Machine or local address,

represents the specific machine in the network that is linked.

IP Address Management The Internet Assigned Numbers Authority is the key entity behind the successful

management of the IP address. The IANA collaborating with the five Regional Internet

Registries allocates IP blocks to the Internet Service Providers. The ISPs in turn

apportions the IP addresses to the individual nodes or networks. Thus the ISPs bridge

a connection between the network or individual machines to the Internet.

Domain Name System The DNS is a Naming System that simplifies the task of remembering perplexed IP

addresses. It translates the address into alphabetical representation that can be easily put

to memory. A DNS Server is used to interpret the Domain Name into IP addresses.

Though the task sounds simple, it is a complicated operation because the Domain Name

System database is apparently one of the most accessed databases on the internet.

IPv4 Exhaustion The current version of the Internet Protocol is IPv4, which is apparently the first version that

is most extensively used. IPv4 employs 32 bit addressing and has a limitation up to 232

(4,294,967,296) unique addresses. As mentioned before we know that every device on a

network has a unique IP address, this has obviously led to an insufficiency of the IP

addresses. IPv4 caters around 4.29 billion IP addresses. However apart from these, large

Page 3 of 8

9/1/2009 blocks of addresses are allocated to various organizations. These blocks are not

available for public allocation, which introduces the problem of inadequacy of

the IP Addresses. However if there was a way of redefining the address blocks

into a pool of regular addresses that can be used for public allocation, the

problem can be alleviated, but to a very small extent. This is because the whole

operation gets more complicated, expensive and is time consuming.

Solutions to deal with the Exhaustion: Temporary: These issues are temporarily palliated by Network Address Translation in which one

Internet Protocol address can be shared among many hosts in a Local Area Network.

The main function of NAT is IP masquerading, which conceals a whole private Network

behind one public IP address. This undoubtedly serves as a major mitigation process of

the exhaustion. Other temporary ways of mitigations are Classful Networks, Classful

Inter-Domain Routing (CIDR) and the Virtual Private Network (VPN). The only

permanent solution to this problem could be migration to the new version - IPv6.

As the usage of Internet grew exponentially, the depletion of the IPv4 address

space was foreseeable. In the “IP Address Space Report” by Jeoff Huston, he

predicts that the RIR pool will be exhausted by March, 2013. We know that the

main task of RIR is to allocate and register the internet number, within a

particular region. Regional Internet Registry includes 5 regions operating:

Page 4 of 8

9/1/2009 1. American Registry for Internet Numbers (ARIN)

2. RIPE Networking Coordination Centre for Europe (RIPE NCC)

3. Asia-Pacific Network Information Centre (APNIC)

4. Latin American and Caribbean Internet Addresses Registry

5. African Network Information Centre

The IANA delegates the RIRs the responsibility of providing the required resources

in the form of Internet Addresses to the customers or end users. Thus the RIRs

allocate the respective IP addresses to the various stations in a network.

Permanent: After a good amount of experimenting, a consensus was reached that the best

way to handle such a crisis is the adoption of the IPv6. This IPv6 is a Next

Generation Internet Protocol. This was recommended by the IPng Area

Directors of the Internet Engineering Task Force in Toronto in 1994 after which

it was approved by the Internet Engineering Steering Group and made into a

“proposed Standard”. Then in 1997 it was made into a “Draft Standard”.

After the period of exhaustion, the allocation of the IPv4 will be terminated. Now

will be the time for the IPv6 to play its part. The speciality of IPv6 is that the

number of unique addresses it provides is 2128.It has a 128 Bit address unlike

the 32 Bit address of IPv4. Once we reach the threshold of the exhaustion of the

IPv4, organizations will start deploying IPv6.

Page 5 of 8

9/1/2009 This piece of information has already been circulated to the various organizations that

employ the IPv4, by the American Registry for Internet Numbers. ARIN has predicted the

deficiency of the Internet Numbers to take place within two years. It is suggested that the

firms start planning on IPv6 adoption to continue acquiring additional IP addresses.

The large address size of the IPv6 is an obvious advantage of the IPv6 over IPv4. The

header of IPv6 is designed in such a way that it fastens the routing process. In IPv6, the

packets supported in the payload are more than those in IPv4, this is called

Jumbograms. This increases the overall performance over high throughput networks.

Other advantages of IPv6 are Multicasting, high Security (due to IPsec being a part of its

core) and auto configuration of the hosts when connected to the IPv6 Routed Network.

The Transition: The transition from IPv4 to IPv6 is not a task that can be done overnight. It requires

a lot of complications to be handled perfectly. This changeover is a prolonged

process, mainly because of the extended horizon of the internet and the IPv4 users

galore. This could be cited as a reason for the delay in the transition. One important

thing to be noted in this process is that IPv4 and IPv6 can coexist without any

issues. This confirms that the organizations should only process an up gradation.

This migration has to be done node by node in the routed network. This can employ the auto

configuration procedures to avoid manual operations. A closer study of the IPv6 reveals

various interesting facts. The IPv6 is designed in such a way that its addresses can be

derived from the IPv4 addresses. Another interesting feature is that the IPv6 nodes

Page 6 of 8

9/1/2009 conform to the Dual Stack approach. This means the IPv6 nodes can support

both IPv6 and IPv4 at the same time.

Thus the migration involves a comfortable interoperability of IPv4 and IPv6, distribution

of the IPv6 routers and hosts in a gradual manner and easy comprehensibility among

both the Network Administrators and the End Users. To assists the conversion, a list of

mechanisms, is implemented. This is called Simple Internet Transition (SIT).The SIT

attends to the progressive updating of the IPv4 hosts and routers to IPv6 one at a time.

Secondly it facilitates address simplicity, in which IPv6 can use even IPv4 addresses.

Thus when the migration is ascertained the manufacturers will start integrating

IPv6 in the networks, routers and operating systems and the users will adapt

themselves to the new change.

Page 7 of 8

9/1/2009

Bibliography 1. The letter to the CEOs from John Curran, Chairman, ARIN

https://www.arin.net/knowledge/about_resources/ceo_letter.pdf.

2. A chapter on migration: http://www.cu.ipv6tf.org/literatura/chap12.pdf

3. IPv4 and IPv6 threat comparison and best practice

evaluation: http://seanconvery.com/v6-v4-threats.pdf

4. IPv4 Address Report by Geoff Huston:

http://www.potaroo.net/tools/ipv4/index.html

5. Internet Address Spacing by Organization for Economic Cooperation

and Development: http://www.oecd.org/dataoecd/7/1/40605942.pdf

6. Notes on Internet Protocol: http://www.wisegeek.com/what-is-ip-or-

internet-protocol.htm

7. The Choice: Exhaustion or Transition

http://www.6journal.org/archive/00000285/01/the_choice_ipv4_exhaustion_or_tr

ansition_to_ipv6_v4.4.pdf

8. American Registry for Internet Numbers:

https://www.arin.net/resources/request/index.html

Page 8 of 8

SOFTWARE QUALITY

03 October 2008

Software Quality-V unit OOAD

Software Quality Assurance

INTRODUCTION

To develop and deliver Robust systems:

High level of confidence:

q Each component will behave perfectly.

q Collective behavior is correct.

For this:

q Verifying

components in isolation

is

necessary… but not sufficient.

03 October 2008

Software Quality-V Unit OOAD

History of how “deb

ugging”

came into existen

ce

èGrace Murray Hopper during final

days of WWII

èSeptember 9th , 1946

è Working on the Harvard university, Mark II

relay calculator that was room size, that

experienced a problem.

èThere was a “moth” trapped between the

Log book

machine.

è("First actual case of bug being found.")

03 October 2008


DEBUGGING AND TESTING

DEBUGGING:

qIs the process of eliminating the syntactical bugs

TESTING:

qIs the process of detection and elimination of the

logical bug

03 October 2008


Software Quality Ass

urance

What Testing Shows

errors

requirem

ents conform

ance

perform

ance

an in

dication

of quality

03 October 2008



urance

Types of Errors

ü Language

errors result from

incorrectly constructed code.

ü Run-time errors occur when a statement

attempts an impossible operation

ü Logic errors occur when the code does

not perform the way you intended.

03 October 2008


QUALITY ASSURANCE TESTING

TYPES

ERROR BASED TESTING & SCENARIO BASED TESTING

Error -bas

ed tes

ting

Ú Searches a class’s method for clues of

interest and then tests the clues.

E.g.:

Ú Payroll compututation method, Employee class:

Ú anEmployee.computePay(hours)

Ú This is called “Testing the boundary conditions”

03 October 2008


Scenario -based

testing

Ø Also called User-Based Testing

Ø Concentrates on what user does, than

what product does.

Ø Capture use cases and user’s

tasks and perform them as tests

Ø More complex and realistic

Ø Covers higher visibility interaction

bugs, though not find everything.

03 October 2008


TESTING STRATEGIES

ÚThere are many strategies, but most

use a combination of the following:

Ø Blacking box testing

Ø White box testing

Ø Top down testing

Ø Bottom up testing

It can’t prove the correctness of the system

but it can establish the “acceptability”

03 October 2008



Black Box Testing

üConcept: to represent a system whose inside

workings not available for inspection.

üIn a black box, test item treated as BLACK, since

logic is unknown.

üOnly the input and output is known, not the

implementation part.

INPUT

???

?

OUTPUT

03 October 2008


BLACK BOX TESTING

I npu

t test da ta

I e

Sy stem

O utput test r esults

O e

I npu

ts cau

sing

anom

alous

beha

viour

O utputs w hich r

evea

l the pr esence

of defects

03 October 2008


White Box Testing

ü Assumes that the logic is important and must

be tested to guarantee proper functioning.

ü Main use in : error-based testing..

ü 1 form of white box testing is: path tes

ting!

Statement Testing

Branch testing

coverage

coverage

03 October 2008



urance

White Box Testing

üSoftware testing approach that uses inner

structural and logical properties of the program for

verification and deriving test data

üAlso called: Clear Box Testing, Glass Box

Testing and Structural Testing

INPUT

OUTPUT

03 October 2008



Top Down Testing

ü Test the top layer or the controlling subsystem first

ü Then combine all the subsystems that are called

by the tested subsystems and test the resulting

collection of subsystems

ü Do this until all subsystems are incorporated into the test

ü Test Stubs are used to simulate the components of

lower layers that have not yet been integrated.

ü No drivers are needed

03 October 2008



Top Down Testing

A

top m

odule is

tested with

stubs

B

F

G

stubs are replaced one at

a time, "dep

th first"

C

as new

modules are integrated,

some su

bset of tests is re-run

D

E

Assumes that main logic or object interaction of application

needs more testing than an individual object’s method.

03 October 2008


BOTTOM UP TESTING

ü Starts with the details of the system

and proceeds to a higher level.

ü More appropriate

ü Test each object, combine, test their

interaction and the messages passed

among the objects.

ü Leads to integration testing , which

leads to systems testing.

03 October 2008



Bottom up testing

A

B

F

G

C

drivers are rep

laced one at

a time, "dep

th first"

worker m

odules are grouped

into

build

s an

d in

tegrated

D

E

03 October 2008


IMPACT OF OBJECT ORIENTATION ON

TESTING

ÚSome types of errors could become

less plausible

ÚSome could become more

plausible ÚSome new errors might

appear.

èImpact of inheritance in

testing èReusability of tests

03 October 2008


TEST CASES

èTo tes

t a system

: Ú Construct test input,

Ú describe how output will look,

Ú perform tests

Ú compare with expected output.

èMyer's objective of testing:

§ It is Process of exe a program with intent to find

errors. § Good test = high probability of detecting

errors

§ Successful test case= one that detects

undiscovered errors.

03 October 2008


TEST PLAN

Ú Test plan: is developed to detect and identify

potential problems before delivering the

softwares to the users.

Ú It offers road map for testing activities,

whether usability, user satisfaction or

quality assurance tests.

Ú Users might demand a test plan with the product.

Ú Should state the test objectives and how to

meet them.

03 October 2008


STEPS TO CREATE TEST PLAN

ÚObjectives of the test

§

Create and describe how?!

ÚDevelopment of the test

case

§ Develop input and output data and

test ÚTest analysis

§ Examination of the test output and

documentation.If, errors, debug and

repeat until no error-state.

03 October 2008



urance

Guidelines for Developing Test Plans

ü Try to include as much as detail as

possible about the tests.

ü a Schedule and a list of required resources

ü Document every type of test

ü Tracking the changes to the code.

ü Sync test plan & product and keep up to date

ü Keep configuration information and complete

routine updates.

03 October 2008



Myer’s Debugging principles

BUG LOCATING PRINCIPLE

DEBUGGING PRINCIPLE

v Bug Loca

ting principles

ü

Think

ü

If you reach an bottleneck, sleep on it

ü

If the bottleneck remains, describe the problem to someone else

ü

Use Debugging tools

ü

Experimentation should be done as a last resort

03 October 2008


v Deb

ugging principles

ü Where there is one bug, there is likely to be another

ü Fix the problem ,not just the symptom of it.

ü The Probability of the solution being correct drops

as the size of the program increases.

ü Beware of the probability that an error correction

will create a new error.

03 October 2008


Ú Lets co

nsider the test case of a ATM system:

Ú If bank client inserts cardè password request

Ú Password incorrectè error Msg , card ejects

Ú Transaction completedèshow main menu

Ú If the cash is lowèsystem notifies bank

Ú Act of vandalism occursè sound alarm

Ú Wrong pin number enteredè 3 chances provided è

still wrongècard is

deactivated Ú And so on..

03 October 2008


ÚThis is an interactive process

ÚAt every iteration new issue is exposed.

ÚThe positive aspects is : arise to

new questions that could refine the

system close to perfection.

èèTHANK YOUçç

èMad

hura Raju

03 October 2008


11/11/2010

A Report

On

SOFTWARE USABILITY

A NECESSITY NOT AN OPTION

Submitted by

Madhura Page 1 of 4

11/11/2010

SOFTWARE USABILITY

A NECESSITY NOT AN OPTION Usability in general refers to the quality of the ability to provide good service. This in

portmanteau with software is a very important aspect in computer science.

“Software Usability” is the process of how easy-to-use a software product or a user

interface can be made. This directly relates to the customer satisfaction, without

which the software product serves no purpose. This is not necessarily restricted to

software, it can also pertain to Websites and Software design in general.

Need for Software Usability:

To keep the customers happy and comfortable, to increase the employee’s productivity

and to augment the modus operandi of a company efficiently the Software Usability

should be at a very good standard. Otherwise, the customers will not use or

recommend the software due to the difficulties he is encountering; the employee will

not be efficient as the content on the intranet is not so easily accessible and the

company can be on loss due to ineffective operation of the websites.

Assessing Usability:

The first question to be asked when assessing the usability of software is: How easy is it to

use? After the answer to this question, various other attribute should be checked. The

software should be easily understood and be well-equipped enough to guide the user

through the lifecycle. The tasks that the user requests, should be fulfilled faster than

Page 2 of 4

11/11/2010 usual, the software or the website should be quickly learned. If the user takes

longer time, the product has to be reconstructured into a simpler application.

The Target Users should be known well before hand and the design of the

product should be done accordingly. Rapid prototyping and Feedback

mechanisms can be carried out in order to ascertain the ease-of-use of the

product. The users handling of the product or the website is observed closely.

His level of satisfaction, the frequencies of errors he makes while working with

the product, his efficiency and consistency in carrying out the operations without

forgetting them are the factors that depicts the usability of the software.

The whole procedure of assessing the usability of a end product or a website

can be done by both third party consultants or by internal groups, like focus

groups. There are various consultancy companies working on usability. The

prototype should be cautiously tested and approved by a sample of customers.

If there is a bad feedback about the usability, it should be handled by an expert

group. Focus Groups are good, but testing directly by users is the best way to

handle Usability. The number of steps taken by the user to accomplish their

tasks without encountering errors and the usage of online help, when they

come across a problem, is observed closely.

Thus, Software Usability is a very important requisite for Customer

Satisfaction and Increased User Efficiency.

Page 3 of 4

11/11/2010

Bibliography

1. http://www.usabilityfirst.com/. Last accessed 08 September 2009. 2. http://www.paciellogroup.com/resources/whitepapers/WPAssessingUsability.html. Last accessed 09 September 2009.

Page 4 of 4

ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGEPROCESSING FUNCTIONALITIES

Documents

Transcript of ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGEPROCESSING FUNCTIONALITIES