ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGEPROCESSING FUNCTIONALITIES
-
Upload
madhura-raju -
Category
Documents
-
view
212 -
download
0
description
Transcript of ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGEPROCESSING FUNCTIONALITIES
ONTOLOGY BASED SEMANTIC WEB SERVICE WITH NATURAL LANGUAGE
PROCESSING FUNCTIONALITIES
The problem of Natural Language Understanding is one of the first problems researchers in AI were
trying to solve. There is a fundamental need for a system which can incorporate this ability of natural language
understanding into the web and create a web which could understand and reason. This type of web is known
as the semantic web which is in its nascent stages of development. This concept of Semantic Web employs
tools and software which aims to achieve this very understanding which is needed in future Web Technologies.
Some tools include Ontologies, Natural Language Software, Information Retrieval Algorithms...Etc. We
propose a system which solves this problem of natural language understanding. The proposed system is capable
of accepting inputs in form of perfect English sentences i.e. Natural English and produce outputs which are both
pertinent and correct in nature. The system performs Part-of-speech Tagging which reduces the complexity of
the input and makes it perceivable to the Natural Language parser which effectively translates the natural
language query to a query on a database Schema. This particular Database Schema is built on a solid
knowledge based logic obtained from a very domain specific Ontology. Hence, data is stored devoid of logical
errors and information retrieval become easier when this is coupled with the Natural Language Technicality. We
use the domain “Demographics of India” for demonstration purposes.
The proposed model consists of four modules namely, the POS Tagger, The NLP Parser, The Ontology
and Database, User Interface. The POS Tagger gets the input sentence and recognizes the various Parts-of-
Speech of each and every word in the Sentence. This is vital to the project because without understanding the
context of the input words, it will be difficult to build a query with it. The second part of the project is the NLP
Parser which gets the output of the POS Tagger and builds a query based on the original Natural English
sentence. The Ontology is the integral part of the project as it houses the logic and the structure of the knowledge
base which is simulated by the database. It is this database which is modeled after the Ontology which will be
queried to produce the final result. The User Interface is used to get Input and display the result. Existing System:
Today’s Web Services are not organized enough to operate in a intelligent way. This is
precisely why we need the Semantic Web in order to overcome the thinking deficiency of the present
day Web. Since my project is a new concept which is in its nascent stages, We cannot compare it with
one particular existing system completely. But when we compare the web as a whole, we can clearly
say that the current web is totally keyword oriented and it lacks the infrastructure and resources to be
able to provide solutions in a logical artificially intelligent manner.
For example, almost all search engines in the internet make use of only keywords and not the
logic behind the query. In simple words, they just match words found in the search string and try to
give results based on previous visits by all users and popularity level rather than trying to provide the
correct answer to the query or the most logical answer to a question.
Currently, only one search engine namely “ask.com” in the internet has accomplished this partially
by using Natural language processing and partially using keyword based techniques. But even, it does not
use the concept of Ontology to be able to dynamically update itself and learn from queries. Demerits of the Existing System:
The existing system does not have the capability to produce pertinent and accurate actions for the
user. In this case of a web service to answer questions like in a search engine, the existing system fails to
deliver the results as the narrow keyword based approach and the algorithm which depends on the number
of successful previous search results does not help a lot to find answers to questions.
Eg :- The existing system will not return a good answer if the search query is “Which states does the Yamuna
flow through?” because it depends on a keyword based approach which will search for individual keywords
like ‘river’, ‘Yamuna’, ‘flow’…etc and will return the most famous (i.e.) most visited result.
Proposed System: The Architecture of the proposed System is shown below in
Fig 3.1 Architecture of the Proposed System
This system will accept natural English language as input and will produce pertinent and correct results as
output. It accomplishes this by having a strong logical Knowledge Base which is built on the lines of a
logically adept Ontology which makes sure that the created database remains logically sound and correct.
The Natural language is processed by using a POS Tagger and a NLP Parser both of which
ensure that the English sentence is correctly understood by the system and a corresponding query is
generated for querying the Database modeled on the Demographics of India – Ontology.
Finally, the User interface is done in JSP and will be used to accept the input and display
the output after it is processed.
The POS Tagger and the NLP Parser are done in Java and the User interface is done in
JSP and the Ontology is created with the Protégé tool and the database using MySQL 5.1.
This System will accept simple queries and has the extension capabilities to add more
support to queries which are more complex in nature.
Madhura Raju
This project was proposed in May, 2009
9/1/2009
A Report
On
The Enervation of IPv4
leading to
The Adoption of IPv6
Submitted by
Madhura
Page 1 of 8
9/1/2009
The Enervation of IPv4 leading to the Adoption of IPv6 The limitations of the IPv4 have paved a clear cut way for the deployment of the
IPv6. This is tardily hitting the world and is undoubtedly anticipated to replace the
IPv4. A deep delve into this subject will give a trenchant picture of why, how and
when the whole process is transitioning.
Before the subject matter, the fundamentals of the outgrowth are discoursed. Internet Protocol Address All the Information exchanged between the sender and the receiver in the internet,
irrespective of type and content, is structured in the form of “packets”. A packet is
a perfectly sized block of data, also known as datagram. Each of these packets
includes a header and a message (also called payload). The header will have the
source and the destination address. Internet Protocol is a standard that assists the
various devices on a network to communicate with one another, having a unique
IP address. This unique address serves the purpose of identifying every device on
both the Local Area Network and the internet. The primary objective of the IP
addresses is to route the packets in the network to their respective destinations.
Format of the IP Address The IP Address is represented as a 32 Bit address, written as 4 whole numbers separated by
periods. Each number employs 8 bits and thus, can range from 0 to 255. Every IP address
has two sections: NetID and HostID. The NetID is the identifier for the network while on the
contrary HostID refers to the particular host in the network. The NetID is the
Page 2 of 8
9/1/2009 unique Internet Number that can be requested from the Network Information
Center (NIC). The HostID, also known as the Machine or local address,
represents the specific machine in the network that is linked.
IP Address Management The Internet Assigned Numbers Authority is the key entity behind the successful
management of the IP address. The IANA collaborating with the five Regional Internet
Registries allocates IP blocks to the Internet Service Providers. The ISPs in turn
apportions the IP addresses to the individual nodes or networks. Thus the ISPs bridge
a connection between the network or individual machines to the Internet.
Domain Name System The DNS is a Naming System that simplifies the task of remembering perplexed IP
addresses. It translates the address into alphabetical representation that can be easily put
to memory. A DNS Server is used to interpret the Domain Name into IP addresses.
Though the task sounds simple, it is a complicated operation because the Domain Name
System database is apparently one of the most accessed databases on the internet.
IPv4 Exhaustion The current version of the Internet Protocol is IPv4, which is apparently the first version that
is most extensively used. IPv4 employs 32 bit addressing and has a limitation up to 232
(4,294,967,296) unique addresses. As mentioned before we know that every device on a
network has a unique IP address, this has obviously led to an insufficiency of the IP
addresses. IPv4 caters around 4.29 billion IP addresses. However apart from these, large
Page 3 of 8
9/1/2009 blocks of addresses are allocated to various organizations. These blocks are not
available for public allocation, which introduces the problem of inadequacy of
the IP Addresses. However if there was a way of redefining the address blocks
into a pool of regular addresses that can be used for public allocation, the
problem can be alleviated, but to a very small extent. This is because the whole
operation gets more complicated, expensive and is time consuming.
Solutions to deal with the Exhaustion: Temporary: These issues are temporarily palliated by Network Address Translation in which one
Internet Protocol address can be shared among many hosts in a Local Area Network.
The main function of NAT is IP masquerading, which conceals a whole private Network
behind one public IP address. This undoubtedly serves as a major mitigation process of
the exhaustion. Other temporary ways of mitigations are Classful Networks, Classful
Inter-Domain Routing (CIDR) and the Virtual Private Network (VPN). The only
permanent solution to this problem could be migration to the new version - IPv6.
As the usage of Internet grew exponentially, the depletion of the IPv4 address
space was foreseeable. In the “IP Address Space Report” by Jeoff Huston, he
predicts that the RIR pool will be exhausted by March, 2013. We know that the
main task of RIR is to allocate and register the internet number, within a
particular region. Regional Internet Registry includes 5 regions operating:
Page 4 of 8
9/1/2009 1. American Registry for Internet Numbers (ARIN)
2. RIPE Networking Coordination Centre for Europe (RIPE NCC)
3. Asia-Pacific Network Information Centre (APNIC)
4. Latin American and Caribbean Internet Addresses Registry
5. African Network Information Centre
The IANA delegates the RIRs the responsibility of providing the required resources
in the form of Internet Addresses to the customers or end users. Thus the RIRs
allocate the respective IP addresses to the various stations in a network.
Permanent: After a good amount of experimenting, a consensus was reached that the best
way to handle such a crisis is the adoption of the IPv6. This IPv6 is a Next
Generation Internet Protocol. This was recommended by the IPng Area
Directors of the Internet Engineering Task Force in Toronto in 1994 after which
it was approved by the Internet Engineering Steering Group and made into a
“proposed Standard”. Then in 1997 it was made into a “Draft Standard”.
After the period of exhaustion, the allocation of the IPv4 will be terminated. Now
will be the time for the IPv6 to play its part. The speciality of IPv6 is that the
number of unique addresses it provides is 2128.It has a 128 Bit address unlike
the 32 Bit address of IPv4. Once we reach the threshold of the exhaustion of the
IPv4, organizations will start deploying IPv6.
Page 5 of 8
9/1/2009 This piece of information has already been circulated to the various organizations that
employ the IPv4, by the American Registry for Internet Numbers. ARIN has predicted the
deficiency of the Internet Numbers to take place within two years. It is suggested that the
firms start planning on IPv6 adoption to continue acquiring additional IP addresses.
The large address size of the IPv6 is an obvious advantage of the IPv6 over IPv4. The
header of IPv6 is designed in such a way that it fastens the routing process. In IPv6, the
packets supported in the payload are more than those in IPv4, this is called
Jumbograms. This increases the overall performance over high throughput networks.
Other advantages of IPv6 are Multicasting, high Security (due to IPsec being a part of its
core) and auto configuration of the hosts when connected to the IPv6 Routed Network.
The Transition: The transition from IPv4 to IPv6 is not a task that can be done overnight. It requires
a lot of complications to be handled perfectly. This changeover is a prolonged
process, mainly because of the extended horizon of the internet and the IPv4 users
galore. This could be cited as a reason for the delay in the transition. One important
thing to be noted in this process is that IPv4 and IPv6 can coexist without any
issues. This confirms that the organizations should only process an up gradation.
This migration has to be done node by node in the routed network. This can employ the auto
configuration procedures to avoid manual operations. A closer study of the IPv6 reveals
various interesting facts. The IPv6 is designed in such a way that its addresses can be
derived from the IPv4 addresses. Another interesting feature is that the IPv6 nodes
Page 6 of 8
9/1/2009 conform to the Dual Stack approach. This means the IPv6 nodes can support
both IPv6 and IPv4 at the same time.
Thus the migration involves a comfortable interoperability of IPv4 and IPv6, distribution
of the IPv6 routers and hosts in a gradual manner and easy comprehensibility among
both the Network Administrators and the End Users. To assists the conversion, a list of
mechanisms, is implemented. This is called Simple Internet Transition (SIT).The SIT
attends to the progressive updating of the IPv4 hosts and routers to IPv6 one at a time.
Secondly it facilitates address simplicity, in which IPv6 can use even IPv4 addresses.
Thus when the migration is ascertained the manufacturers will start integrating
IPv6 in the networks, routers and operating systems and the users will adapt
themselves to the new change.
Page 7 of 8
9/1/2009
Bibliography 1. The letter to the CEOs from John Curran, Chairman, ARIN
https://www.arin.net/knowledge/about_resources/ceo_letter.pdf.
2. A chapter on migration: http://www.cu.ipv6tf.org/literatura/chap12.pdf
3. IPv4 and IPv6 threat comparison and best practice
evaluation: http://seanconvery.com/v6-v4-threats.pdf
4. IPv4 Address Report by Geoff Huston:
http://www.potaroo.net/tools/ipv4/index.html
5. Internet Address Spacing by Organization for Economic Cooperation
and Development: http://www.oecd.org/dataoecd/7/1/40605942.pdf
6. Notes on Internet Protocol: http://www.wisegeek.com/what-is-ip-or-
internet-protocol.htm
7. The Choice: Exhaustion or Transition
http://www.6journal.org/archive/00000285/01/the_choice_ipv4_exhaustion_or_tr
ansition_to_ipv6_v4.4.pdf
8. American Registry for Internet Numbers:
https://www.arin.net/resources/request/index.html
Page 8 of 8
SOFTWARE QUALITY
03 October 2008
Software Quality-V unit OOAD
Software Quality Assurance
INTRODUCTION
To develop and deliver Robust systems:
High level of confidence:
q Each component will behave perfectly.
q Collective behavior is correct.
For this:
q Verifying
components in isolation
is
necessary… but not sufficient.
03 October 2008
Software Quality-V Unit OOAD
History of how “deb
ugging”
came into existen
ce
èGrace Murray Hopper during final
days of WWII
èSeptember 9th , 1946
è Working on the Harvard university, Mark II
relay calculator that was room size, that
experienced a problem.
èThere was a “moth” trapped between the
Log book
machine.
è("First actual case of bug being found.")
03 October 2008
Software Quality-V Unit OOAD
DEBUGGING AND TESTING
DEBUGGING:
qIs the process of eliminating the syntactical bugs
TESTING:
qIs the process of detection and elimination of the
logical bug
03 October 2008
Software Quality-V Unit OOAD
Software Quality Ass
urance
What Testing Shows
errors
requirem
ents conform
ance
perform
ance
an in
dication
of quality
03 October 2008
Software Quality-V Unit OOAD
Software Quality Ass
urance
Types of Errors
ü Language
errors result from
incorrectly constructed code.
ü Run-time errors occur when a statement
attempts an impossible operation
ü Logic errors occur when the code does
not perform the way you intended.
03 October 2008
Software Quality-V Unit OOAD
QUALITY ASSURANCE TESTING
TYPES
ERROR BASED TESTING & SCENARIO BASED TESTING
Error -bas
ed tes
ting
Ú Searches a class’s method for clues of
interest and then tests the clues.
E.g.:
Ú Payroll compututation method, Employee class:
Ú anEmployee.computePay(hours)
Ú This is called “Testing the boundary conditions”
03 October 2008
Software Quality-V Unit OOAD
Scenario -based
testing
Ø Also called User-Based Testing
Ø Concentrates on what user does, than
what product does.
Ø Capture use cases and user’s
tasks and perform them as tests
Ø More complex and realistic
Ø Covers higher visibility interaction
bugs, though not find everything.
03 October 2008
Software Quality-V Unit OOAD
TESTING STRATEGIES
ÚThere are many strategies, but most
use a combination of the following:
Ø Blacking box testing
Ø White box testing
Ø Top down testing
Ø Bottom up testing
It can’t prove the correctness of the system
but it can establish the “acceptability”
03 October 2008
Software Quality-V Unit OOAD
Software Quality Assurance
Black Box Testing
üConcept: to represent a system whose inside
workings not available for inspection.
üIn a black box, test item treated as BLACK, since
logic is unknown.
üOnly the input and output is known, not the
implementation part.
INPUT
???
?
OUTPUT
03 October 2008
Software Quality-V Unit OOAD
BLACK BOX TESTING
I npu
t test da ta
I e
Sy stem
O utput test r esults
O e
I npu
ts cau
sing
anom
alous
beha
viour
O utputs w hich r
evea
l the pr esence
of defects
03 October 2008
Software Quality-V Unit OOAD
White Box Testing
ü Assumes that the logic is important and must
be tested to guarantee proper functioning.
ü Main use in : error-based testing..
ü 1 form of white box testing is: path tes
ting!
Statement Testing
Branch testing
coverage
coverage
03 October 2008
Software Quality-V Unit OOAD
Software Quality Ass
urance
White Box Testing
üSoftware testing approach that uses inner
structural and logical properties of the program for
verification and deriving test data
üAlso called: Clear Box Testing, Glass Box
Testing and Structural Testing
INPUT
OUTPUT
03 October 2008
Software Quality-V Unit OOAD
Software Quality Assurance
Top Down Testing
ü Test the top layer or the controlling subsystem first
ü Then combine all the subsystems that are called
by the tested subsystems and test the resulting
collection of subsystems
ü Do this until all subsystems are incorporated into the test
ü Test Stubs are used to simulate the components of
lower layers that have not yet been integrated.
ü No drivers are needed
03 October 2008
Software Quality-V Unit OOAD
Software Quality Assurance
Top Down Testing
A
top m
odule is
tested with
stubs
B
F
G
stubs are replaced one at
a time, "dep
th first"
C
as new
modules are integrated,
some su
bset of tests is re-run
D
E
Assumes that main logic or object interaction of application
needs more testing than an individual object’s method.
03 October 2008
Software Quality-V Unit OOAD
BOTTOM UP TESTING
ü Starts with the details of the system
and proceeds to a higher level.
ü More appropriate
ü Test each object, combine, test their
interaction and the messages passed
among the objects.
ü Leads to integration testing , which
leads to systems testing.
03 October 2008
Software Quality-V Unit OOAD
Software Quality Assurance
Bottom up testing
A
B
F
G
C
drivers are rep
laced one at
a time, "dep
th first"
worker m
odules are grouped
into
build
s an
d in
tegrated
D
E
03 October 2008
Software Quality-V Unit OOAD
IMPACT OF OBJECT ORIENTATION ON
TESTING
ÚSome types of errors could become
less plausible
ÚSome could become more
plausible ÚSome new errors might
appear.
èImpact of inheritance in
testing èReusability of tests
03 October 2008
Software Quality-V Unit OOAD
TEST CASES
èTo tes
t a system
: Ú Construct test input,
Ú describe how output will look,
Ú perform tests
Ú compare with expected output.
èMyer's objective of testing:
§ It is Process of exe a program with intent to find
errors. § Good test = high probability of detecting
errors
§ Successful test case= one that detects
undiscovered errors.
03 October 2008
Software Quality-V Unit OOAD
TEST PLAN
Ú Test plan: is developed to detect and identify
potential problems before delivering the
softwares to the users.
Ú It offers road map for testing activities,
whether usability, user satisfaction or
quality assurance tests.
Ú Users might demand a test plan with the product.
Ú Should state the test objectives and how to
meet them.
03 October 2008
Software Quality-V Unit OOAD
STEPS TO CREATE TEST PLAN
ÚObjectives of the test
§
Create and describe how?!
ÚDevelopment of the test
case
§ Develop input and output data and
test ÚTest analysis
§ Examination of the test output and
documentation.If, errors, debug and
repeat until no error-state.
03 October 2008
Software Quality-V Unit OOAD
Software Quality Ass
urance
Guidelines for Developing Test Plans
ü Try to include as much as detail as
possible about the tests.
ü a Schedule and a list of required resources
ü Document every type of test
ü Tracking the changes to the code.
ü Sync test plan & product and keep up to date
ü Keep configuration information and complete
routine updates.
03 October 2008
Software Quality-V Unit OOAD
Software Quality Assurance
Myer’s Debugging principles
BUG LOCATING PRINCIPLE
DEBUGGING PRINCIPLE
v Bug Loca
ting principles
ü
Think
ü
If you reach an bottleneck, sleep on it
ü
If the bottleneck remains, describe the problem to someone else
ü
Use Debugging tools
ü
Experimentation should be done as a last resort
03 October 2008
Software Quality-V Unit OOAD
v Deb
ugging principles
ü Where there is one bug, there is likely to be another
ü Fix the problem ,not just the symptom of it.
ü The Probability of the solution being correct drops
as the size of the program increases.
ü Beware of the probability that an error correction
will create a new error.
03 October 2008
Software Quality-V Unit OOAD
Ú Lets co
nsider the test case of a ATM system:
Ú If bank client inserts cardè password request
Ú Password incorrectè error Msg , card ejects
Ú Transaction completedèshow main menu
Ú If the cash is lowèsystem notifies bank
Ú Act of vandalism occursè sound alarm
Ú Wrong pin number enteredè 3 chances provided è
still wrongècard is
deactivated Ú And so on..
03 October 2008
Software Quality-V Unit OOAD
ÚThis is an interactive process
ÚAt every iteration new issue is exposed.
ÚThe positive aspects is : arise to
new questions that could refine the
system close to perfection.
èèTHANK YOUçç
èMad
hura Raju
03 October 2008
Software Quality-V Unit OOAD
11/11/2010
A Report
On
SOFTWARE USABILITY
A NECESSITY NOT AN OPTION
Submitted by
Madhura Page 1 of 4
11/11/2010
SOFTWARE USABILITY
A NECESSITY NOT AN OPTION Usability in general refers to the quality of the ability to provide good service. This in
portmanteau with software is a very important aspect in computer science.
“Software Usability” is the process of how easy-to-use a software product or a user
interface can be made. This directly relates to the customer satisfaction, without
which the software product serves no purpose. This is not necessarily restricted to
software, it can also pertain to Websites and Software design in general.
Need for Software Usability:
To keep the customers happy and comfortable, to increase the employee’s productivity
and to augment the modus operandi of a company efficiently the Software Usability
should be at a very good standard. Otherwise, the customers will not use or
recommend the software due to the difficulties he is encountering; the employee will
not be efficient as the content on the intranet is not so easily accessible and the
company can be on loss due to ineffective operation of the websites.
Assessing Usability:
The first question to be asked when assessing the usability of software is: How easy is it to
use? After the answer to this question, various other attribute should be checked. The
software should be easily understood and be well-equipped enough to guide the user
through the lifecycle. The tasks that the user requests, should be fulfilled faster than
Page 2 of 4
11/11/2010 usual, the software or the website should be quickly learned. If the user takes
longer time, the product has to be reconstructured into a simpler application.
The Target Users should be known well before hand and the design of the
product should be done accordingly. Rapid prototyping and Feedback
mechanisms can be carried out in order to ascertain the ease-of-use of the
product. The users handling of the product or the website is observed closely.
His level of satisfaction, the frequencies of errors he makes while working with
the product, his efficiency and consistency in carrying out the operations without
forgetting them are the factors that depicts the usability of the software.
The whole procedure of assessing the usability of a end product or a website
can be done by both third party consultants or by internal groups, like focus
groups. There are various consultancy companies working on usability. The
prototype should be cautiously tested and approved by a sample of customers.
If there is a bad feedback about the usability, it should be handled by an expert
group. Focus Groups are good, but testing directly by users is the best way to
handle Usability. The number of steps taken by the user to accomplish their
tasks without encountering errors and the usage of online help, when they
come across a problem, is observed closely.
Thus, Software Usability is a very important requisite for Customer
Satisfaction and Increased User Efficiency.
Page 3 of 4
11/11/2010
Bibliography
1. http://www.usabilityfirst.com/. Last accessed 08 September 2009. 2. http://www.paciellogroup.com/resources/whitepapers/WPAssessingUsability.html. Last accessed 09 September 2009.
Page 4 of 4