Self adaptive based natural language interface for disambiguation of
-
Upload
nurfadhlina-mohd-sharef -
Category
Internet
-
view
65 -
download
0
description
Transcript of Self adaptive based natural language interface for disambiguation of
Self-Adaptive Based Natural Language Interface for Disambiguation of Semantic SearchN U R FA D H L I N A M O H D S H A R E F N U R FA D H L I N A @ U P M . E D U. M YM O H A M M A D YA S S E R S H A FA Z A N D 7 9 . Z A N D @ G M A I L .C O M
FA C U LT Y O F C O M P U T E R S C I E N C E A N D I N F O R M AT I O N T E C H N O LO GY,
U N I V E R S I T I P U T R A M A L AY S I AS E R D A N G , S E L A N G O R , M A L AY S I A
"Big Data" refers to data sets whose size is beyond the ability of typical database software tools to capture, store manage and analyze (McKinsey).
“Linked Data” stands for semantically well structured, interconnected, syntactically interoperable datasets that are distributed among several repositories either inside or outside organisations http://www.semantic-web.at/big-data-linked-data
Utilizing Linked Data and Big Data for organisational and enterprise purposes will be one of the next big challenges in the evolution of the web.
Big Data takes account of the fact that new techniques and technologies are needed for the sustainable and socially balanced exploitation of huge data pools. The Linked Data paradigm is one approach to cope with Big Data, as it advances the hypertext principle from a web of documents to a web of rich data.
Semantic Web: a webby way to link data
Open Data meets the Semantic Web: Linked Open Data
http://www.semantic-web-journal.net/system/files/swj488.pdf
One of the key challenges in making use of Big Data lies in finding ways of dealing with heterogeneity, diversity, and
complexity of the data, while its volume and velocity forbid solutions available for smaller datasets as based, e.g., on manual curation or manual integration of data. Semantic Web Technologies are meant to deal with these issues,
and indeed since the advent of Linked Data a few years ago, they have become central to mainstream Semantic Web
research and development.
We can easily understand Linked Data as being a part of the greater Big Data landscape, as many of the challenges are the same. The linking component of Linked Data, however, puts an additional focus on the integration and conflation
of data across multiple sources.
BIG DATA
Volume Velocity VarietyValue and Veracity
Supercomputing
Internet of Things
Semantic Web
Social Science
Smart DataSmart data makes sense out of Big data http://amitsheth.blogspot.com/2013/06/transforming-big-data-into-smart-data.html
It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making.
uses background knowledge, experiences, advanced and contextualized reasoning, and is often highly personalized
focused on the actionable value in data creation, processing and consumption phases for improving the human experience
5 steps to Turn Big Data into Smart Datahttp://tdwi.org/Articles/2014/07/15/Turning-Big-Data-into-Smart-Data-2.aspx?Page=1
1. Add meaning
2. Add context
3. Embrace Graphs
4. Iterate
5. Adopt standard
Natural Language Query Generated SPARQLWhat is the lowest point in kansas? SELECT ?c0
WHERE {?c0 ?p0 ?i0 . ?c0 a geo:LoPoint .filter (?i0 = geo:kansas) .filter ( ?p0 = geo:isLowestPointOf ) .
}What is the area of idaho? SELECT ?i0
WHERE {?c0 ?p0 ?i0 .filter (?c0 = geo:idaho) .filter ( ?p0 = geo:stateArea ) .
}what states border oklahoma? SELECT ?i0
WHERE {?c0 ?p0 ?i0 . ?i0 a geo:State .filter (?c0 = geo:oklahoma) .filter ( ?p0 = geo:borders ) .
}what is the population of oregon? SELECT ?i0
WHERE {?c0 ?p0 ?i0 .filter (?c0 = geo:oregon) .filter ( ?p0 = geo:statePopulation ) .
}
Ambiguities in Querying Big Datawhen there are more than one possible concept annotation for a word in the NL input
when a word inside the NL input cannot be matched with any KB concept
when constructing the SPARQL where there is more than one possibility of SPARQL pattern
Self Adaptive Model for Semantic Data Search in Big Data
Input: NL query
Output: Answer
Process:
1. Load ontology and build a matrix of the object properties, classes and instances and its
connections
2. Let T as the tokenized and stemmed NL query
3. For each tT, let A be the set of annotation based on relevant concepts
4. For each aA
a. Create and add possible triplets, filters and options statements using dictionary
and reasoner (using bottom up reasoning rules)
b. Create new SPARQL syntax using (4(a))
c. Run SPARQL and send statements and results to reasoner.
5. Return last created SPARQL syntax which has results.
ResultsThe SANLI is tested on two different datasets namely the Mooney’s Geography ontology and a Quran structure ontology.
SANLI is able to correctly answer all questions in the geography ontology where the questions have <s, p, o>, <o, p, s>, <p, o>, <o, p> and <o > patterns identified.
Rules for other patterns have not yet been implemented. For example <o, p, o> patterns mostly result in a true false result as in “Does Texas border Oklahoma?” which we have not implemented yet.
ConclusionThe Semantic Web can leverage the sophisticated analytics with bigdata.
Big Data and Linked Data will be an integral part of the future webinfrastructure, where massive amounts of data are available,connected and identifiable via Uniform Resource Identifiers.
More personalized-based applications to exploit smart data to itsmaximum potential