Self adaptive based natural language interface for disambiguation of

Self-Adaptive Based Natural Language Interface for Disambiguation of Semantic SearchN U R FA D H L I N A M O H D S H A R E F N U R FA D H L I N A @ U P M . E D U. M YM O H A M M A D YA S S E R S H A FA Z A N D 7 9 . Z A N D @ G M A I L .C O M

FA C U LT Y O F C O M P U T E R S C I E N C E A N D I N F O R M AT I O N T E C H N O LO GY,

U N I V E R S I T I P U T R A M A L AY S I AS E R D A N G , S E L A N G O R , M A L AY S I A

mailto:[email protected]

mailto:[email protected]

"Big Data" refers to data sets whose size is beyond the ability of typical database software tools to capture, store manage and analyze (McKinsey).

“Linked Data” stands for semantically well structured, interconnected, syntactically interoperable datasets that are distributed among several repositories either inside or outside organisations http://www.semantic-web.at/big-data-linked-data

http://www.semantic-web.at/big-data-linked-data

Utilizing Linked Data and Big Data for organisational and enterprise purposes will be one of the next big challenges in the evolution of the web.

Big Data takes account of the fact that new techniques and technologies are needed for the sustainable and socially balanced exploitation of huge data pools. The Linked Data paradigm is one approach to cope with Big Data, as it advances the hypertext principle from a web of documents to a web of rich data.

Semantic Web: a webby way to link data

Open Data meets the Semantic Web: Linked Open Data

http://www.semantic-web-journal.net/system/files/swj488.pdf

http://www.semantic-web-journal.net/system/files/swj488.pdf

One of the key challenges in making use of Big Data lies in finding ways of dealing with heterogeneity, diversity, and

complexity of the data, while its volume and velocity forbid solutions available for smaller datasets as based, e.g., on manual curation or manual integration of data. Semantic Web Technologies are meant to deal with these issues,

and indeed since the advent of Linked Data a few years ago, they have become central to mainstream Semantic Web

research and development.

We can easily understand Linked Data as being a part of the greater Big Data landscape, as many of the challenges are the same. The linking component of Linked Data, however, puts an additional focus on the integration and conflation

of data across multiple sources.

BIG DATA

Volume Velocity VarietyValue and Veracity

Supercomputing

Internet of Things

Semantic Web

Social Science

Smart DataSmart data makes sense out of Big data http://amitsheth.blogspot.com/2013/06/transforming-big-data-into-smart-data.html

It provides value from harnessing the challenges posed by volume, velocity, variety and veracity of big data, in-turn providing actionable information and improve decision making.

uses background knowledge, experiences, advanced and contextualized reasoning, and is often highly personalized

focused on the actionable value in data creation, processing and consumption phases for improving the human experience

http://amitsheth.blogspot.com/2013/06/transforming-big-data-into-smart-data.html

5 steps to Turn Big Data into Smart Datahttp://tdwi.org/Articles/2014/07/15/Turning-Big-Data-into-Smart-Data-2.aspx?Page=1

1. Add meaning

2. Add context

3. Embrace Graphs

4. Iterate

5. Adopt standard

http://tdwi.org/Articles/2014/07/15/Turning-Big-Data-into-Smart-Data-2.aspx?Page=1

Natural Language Query Generated SPARQLWhat is the lowest point in kansas? SELECT ?c0

WHERE {?c0 ?p0 ?i0 . ?c0 a geo:LoPoint .filter (?i0 = geo:kansas) .filter ( ?p0 = geo:isLowestPointOf ) .

}What is the area of idaho? SELECT ?i0

WHERE {?c0 ?p0 ?i0 .filter (?c0 = geo:idaho) .filter ( ?p0 = geo:stateArea ) .

}what states border oklahoma? SELECT ?i0

WHERE {?c0 ?p0 ?i0 . ?i0 a geo:State .filter (?c0 = geo:oklahoma) .filter ( ?p0 = geo:borders ) .

}what is the population of oregon? SELECT ?i0

WHERE {?c0 ?p0 ?i0 .filter (?c0 = geo:oregon) .filter ( ?p0 = geo:statePopulation ) .

}

Ambiguities in Querying Big Datawhen there are more than one possible concept annotation for a word in the NL input

when a word inside the NL input cannot be matched with any KB concept

when constructing the SPARQL where there is more than one possibility of SPARQL pattern

Self Adaptive Model for Semantic Data Search in Big Data

Input: NL query

Output: Answer

Process:

1. Load ontology and build a matrix of the object properties, classes and instances and its

connections

2. Let T as the tokenized and stemmed NL query

3. For each tT, let A be the set of annotation based on relevant concepts

4. For each aA

a. Create and add possible triplets, filters and options statements using dictionary

and reasoner (using bottom up reasoning rules)

b. Create new SPARQL syntax using (4(a))

c. Run SPARQL and send statements and results to reasoner.

5. Return last created SPARQL syntax which has results.

ResultsThe SANLI is tested on two different datasets namely the Mooney’s Geography ontology and a Quran structure ontology.

SANLI is able to correctly answer all questions in the geography ontology where the questions have <s, p, o>, <o, p, s>, <p, o>, <o, p> and <o > patterns identified.

Rules for other patterns have not yet been implemented. For example <o, p, o> patterns mostly result in a true false result as in “Does Texas border Oklahoma?” which we have not implemented yet.

ConclusionThe Semantic Web can leverage the sophisticated analytics with bigdata.

Big Data and Linked Data will be an integral part of the future webinfrastructure, where massive amounts of data are available,connected and identifiable via Uniform Resource Identifiers.

More personalized-based applications to exploit smart data to itsmaximum potential

http://en.wikipedia.org/wiki/Big_data

http://linkeddata.org/

http://en.wikipedia.org/wiki/Uniform_Resource_Identifier

Self adaptive based natural language interface for disambiguation of

Internet

Transcript of Self adaptive based natural language interface for disambiguation of