Semantic Interpretation of User Query for Question Answering on Interlinked Data
-
Upload
saeedeh-shekarpour -
Category
Engineering
-
view
585 -
download
1
Transcript of Semantic Interpretation of User Query for Question Answering on Interlinked Data
+
Semantic Interpretation of User Queries forQuestion Answering on Interlinked Data
Saeedeh ShekarpourSupervisor: Prof. Dr. Sören Auer
1
EIS research group - Bonn University7 January 2015
+Search engines can answer queries which match certain templates
EIS research group - Bonn University
2
7 January 2015
+Search engines still lack the ability to answer more complex queries
7 January 2015EIS research group - Bonn University
3
+Evolution of Web
Web of Documents
Semantic Web Web of Data
EIS research group - Bonn University
4
7 January 2015
+RDF model
RDF is an standard for describing Web resources.
The RDF data model expresses statesments about Web resources in the form of subject-predicate-object (triple).
The statement “Jack knows Alice” is represented as:
7 January 2015EIS research group - Bonn University
5
Jack Aliceknow
+The growth of Linked Open Data
EIS research group - Bonn University
6
August 2014
570 DatasetsMore than 74 billion triples
May 2007
12 Datasets
7 January 2015
+How to retrieve data from Linked Data?
EIS research group - Bonn University
7
Linked Data characteristics:
• Wide range of topical domains
• Variety in vocabularies
• Interlinked data
SPARQL queries:
• Knowledge about the ontology
• Proficiency in formulating formal queries
• Explicit and unambigious semantics
Text queries (either keyword or natural language):
• Simple retrieval approach
• Implicit and ambiguous semantics
• Popular
7 January 2015
+Comparison of search approaches
Data-semantic
unaware
Data-semantic
aware
Keyword-based
query
Natural language
query
Question Answering
Systems
Information RetrievalSystems
Our approach: SINA
8
EIS research group - Bonn University 7 January 2015
+Objective: transformation from textual query to formal query
Which televisions shows were created by Walt Disney?
7 January 2015EIS research group - Bonn University
9
SELECT * WHERE
{ ?v0 a dbo:TelevisionShow.
?v0 dbo:creator dbr:Walt_Disney. }
1
2
3
+Test bed datasets
EIS research group - Bonn University
10
7 January 2015
One single dataset: DBpedia.
Three interlinked datasets from life-science:
1. Drugbank: contains information about drugs, drug target (i.e. protein) information, interactions and enzymes.
2. Diseasome: contains information about diseases and genes associated with these diseases.
3. Sider: contains information about drugs and their side effects.
+The addressed challenges
Challenges
Query Segmentation
Resource Disambiguation
Query Expansion
Formal Query Construction
Data Fusion on Linked Data
EIS research group - Bonn University
11
7 January 2015
+
Definition: query segmentation is the process of identifying the rightsegments of data items that occur in the keyword queries.
12
EIS research group - Bonn University
Query Segmentation
Two segmentations:
Sequence of keywords:
Input Query: What are the side effects of drugs used for Tuberculosis?
(side, effect, drug , Tuberculosis)
side effect | drug | Tuberculosis side effect drug | Tuberculosis
7 January 2015
+
Definition: resource disambiguation is the process of recognizing the suitable resources in the underlying knowledge base.
EIS research group - Bonn University
13
Resource Disambiguation
Input query
• What are the side effects of drugs used for Tuberculosis?
Ambiguous Resources
• diseasome:Tuberculosis
• sider:Tuberculosis
Input query
• Who produced films starring Natalie Portman?
Ambiguous Resources
• dbpedia/ontology/film
• dbpedia/property/film
7 January 2015
+Concurrent approach
EIS research group - Bonn University
14
Query Segmentation
Resource Disambiguation
7 January 2015
+
1
2
3
Unknown Entity
4
5
6
7
8
9
Start
Keyword 1 Keyword 3Keyword 2 Keyword 4
Modeling using hidden Markov model
EIS research group - Bonn University
15
7 January 2015
Query Segmentation
&
Resource Disambiguation
+Bootstrapping the model parameters
1. Emission probability is defined based on the similarity of the label of each state with a segment, this similarity is computed based on string-similarity and Jaccard-similarity.
2. Semantic relatedness is a base for transition probability and initialprobability. Intuitively, it is based on two values: distance and connectivitydegree. We transform these two values to hub and authority values usingweighted HITS algorithm.
3. HITS algorithm is a link analysis algorithm that was originally developed forranking Web pages. It assign a hub and authority value to each web page.
4. Initial probability and transition probability are defined as a uniformdistribution over the hub and and authority values.
EIS research group - Bonn University
16
7 January 2015
Query Segmentation
&
Resource Disambiguation
+Evaluation of bootstrapping
The accuracy of the bootstrapped transition probability using different distribution functions, i.e., Normal, Zipfian and uniform distributions.
7 January 2015EIS research group - Bonn University
17Query
Segmentation
&
Resource Disambiguation
+Output of the model after running viterbi algorithm
Sequence of keywords
(television show creat Walt Disney)
Paths 0.0023 dbo:TelevisionShow dbo:creator dbr:Walt_Disney
0.0014 dbo:TelevisionShow dbo:creator dbr:Category:Walt_Disney
0.000589 dbr:TelevisionShow dbo:creator dbr:Walt_Disney
0.000353 dbr:TelevisionShow dbo:creator dbr:Category:Walt_Disney
0.0000376 dbp:television dbp:show dbo:creator dbr:Category:Walt_Disney
EIS research group - Bonn University
18
7 January 2015
Query Segmentation
&
Resource Disambiguation
+
Definition: query expansion is a way of reformulating the input queryin order to overcome the vocabulary mismatch problem.
EIS research group - Bonn University
19
Input query
• Wife of Barak Obama
Reformulated query
• Spouse of Barak Obama
Query Expansion
7 January 2015
+
Analysis of linguistic
features vs. semantic features
A method for
automatic query
expansion
EIS research group - Bonn University
20
7 January 2015
Query Expansion
+Linguistic features
WordNet is a popular data source for expansion.
Linguistic features extracted from WordNet are:
1. Synonyms: words having a similar meanings to the input keyword.
2. Hyponyms: words representing a specialization of the input keyword.
3. Hypernyms: words representing a generalization of the input keyword.
EIS research group - Bonn University
21
7 January 2015
Query Expansion
+Semantic features from Linked Data
1. SameAs: deriving resources using owl:sameAs.
2. SeeAlso: deriving resources using rdfs:seeAlso.
3. Equivalence class/property: deriving classes or properties usingowl:equivalentClass and owl:equivalentProperty.
4. Super class/property: deriving all super classes/properties of by following therdfs:subClassOf or rdfs:subPropertyOf property.
5. Sub class/property: deriving resources by following the rdfs:subClassOf orrdfs:subPropertyOf property paths ending with the input resource.
6. Broader concepts: deriving using the SKOS vocabulary properties skos:broader andskos:broadMatch.
7. Narrower concepts: deriving concepts using skos:narrower and skos:narrowMatch.
8. Related concepts: deriving concepts using skos:closeMatch, skos:mappingRelationand skos:exactMatch.
EIS research group - Bonn University
22
7 January 2015
Query Expansion
+Exemplary expansion graph of the word movie
EIS research group - Bonn University
23
movie
home movieproduction
filmmotion picture show
video
telefilm
7 January 2015
Query Expansion
+Objective of experiment
How effective do linguistic as well as semantic features perform?
How well does a linear weighted combination of features perform?
EIS research group - Bonn University
24
7 January 2015
Query Expansion
+Benchmark creation
We created a benchmark extracted from QALD1 and QALD2.
Benchmark contains all keywords having vocabulary mismatchproblem and their corresponding match.
7 January 2015EIS research group - Bonn University
25
Query Expansion
+Accuracy results of prediction function based on linguistic as well as semantic features
Features Weighting Mechanism
Precision Recall F-score
Linguistic SVM 0.730 0.650 0.620
Semantic SVM 0.680 0.630 0.600
Linguistic Decision Tree/ Information Gain
0.588 0.579 0.568
Semantic Decision Tree/ Information Gain
0.755 0.684 0.661
EIS research group - Bonn University
26
7 January 2015
Query Expansion
+Statistics over the number of the derived words and matches
EIS research group - Bonn University
27
7 January 2015
Query Expansion
Feature #derived words #matches
synonym 503 23
hyponym 2703 10
hypernym 657 14
sameAs 2332 12
seeAlso 49 2
equivalence 2 0
super class/property 267 4
Sub class/property 2166 4
+Automatic query expansion
Input query
External Data source
Data extraction and preparation
Heuristic method
Reformulated query
EIS research group - Bonn University
28
7 January 2015
Query Expansion
+Expansion set for each segment
Expansion Set
Original segment
Lemmatized segment
Derived words from WordNet
Synonym
Hyponym
Hypernym
EIS research group - Bonn University
29
7 January 2015
Query Expansion
+Reformulating query using hidden Markov model
EIS research group - Bonn University
30
Barak
Barak Obama
spouseObama
wife
first lady
woman
Barak Obama wife Barak
Obama
Start
Input query: wife of Barak Obama
Obama wife
Barak Obama
wife
7 January 2015
Query Expansion
+Triple-based co-occurence
In a given triple t = (s, p, o), two words w1 and w2 are co-occurring, if they appear in the labels (rdfs:label) of at leasttwo resources.
EIS research group - Bonn University
31
7 January 2015
Query Expansion
+Goals of evaluation
How effective is our method with regard to a correct reformulation of queries which have vocabulary mismatch problem?
How robust is the method for queries which do not have vocabulary mismatch problem?
EIS research group - Bonn University
32
7 January 2015
Query Expansion
Query Mismatch word
Match word #derived words
Movies with Tom Cruise movie film 77
Altitude of Everest altitude elevation 16
Soccer clubs in Spain - - 19
Employees of Google - - 10
Sample of our benchmark
+Rank (R) and Cumulative rank (CR) for the test queries
EIS research group - Bonn University
33
7 January 2015
Query Expansion
+
Definition: Once the resources are detected, a connected subgraphof the knowledge base graph, called the query graph, has to bedetermined which fully covers the set of mapped resources.
EIS research group - Bonn University
34
Formal Query Construction
7 January 2015
Disambiguated resources
sider:sideEffect
diseasome:possibleDrug
diseasome:1154
SPARQL query SELECT ?v3 WHERE {
diseasome:115 diseasome:possibleDrug ?v1 .
?v1 owl:sameAs ?v2 .
?v2 sider:sideEffect ?v3 .}
+
Answer of a question may be spread among different datasetsemploying heterogeneous schemas.
Constructing a federated query from needs to exploit links betweenthe different datasets on the schema and instance levels.
EIS research group - Bonn University
35
Data Fusion on Linked Data
7 January 2015
+Two different approaches
Template-based query construction
Forward chaining based query construction
EIS research group - Bonn University
36
7 January 2015
Federated Query Construction
+Forward chaining based query construction
1. Set of resources
EIS research group - Bonn University
37
Query What is the side effects of drugs used for Tuberculosis?
resources diseasome:1154 (type instance)
diseasome:possibleDrug (type property)
sider:sideEffect (type property)
1154 ?v0
possibleDrug
Graph 1
?v1 ?v2
sideEffect
Graph 2
1154
?v0possibleDrug
Template 1
?v1 ?v2sideEffect
Template 2
1154
?v0possibleDrug
?v1 ?v2sideEffect
7 January 2015
2. Incomplete query graph
3. Query graph
Federated Query Construction
+Evaluation
Goal of experiment:
1. performance of disambiguation method using Mean Reciprocal Rank (MRR).
2. performance of forward chaining query construction method using precision
and recall.
Benchmarks:1. 25 queries on the 3 interlinked datasets from life-science.2. QALD1 and QALD3 benchmarks over DBpedia.3. QALD2 was used for bootstrapping.
7 January 2015EIS research group - Bonn University
38
Federated Query Construction
+Runtime
Parallization over three components:
1. Segment validation
2. Resource retrieval
3. Query construction
7 January 2015EIS research group - Bonn University
39
+
Client
Query Preprocessing
Query Expansion
Resource Retrieval
Disambiguation
Query Construction
Representation
Server
Underlying Interlinked Knowledge Bases
query result
keywords
valid segments
mapped resources
tuple of resources
SPARQLqueries
OWL API
http client
Stanford CoreNLP
Segment Validation
Reformulated query
SINA architecture
EIS research group - Bonn University
40
7 January 2015
+Demo
7 January 2015EIS research group - Bonn University
41
+Conclusion
We researched and addressed a number of challenges.
The result of the evaluation confirms the feasibility and highaccuracy, for instance:
1. query segmentation and resource disambiguation with achieved theMRR from 86% till 96%.
2. query construction with precision 32% in DBpedia QALD3 benchmarkand 95% in life-science.
We learnt that:1. In Linked Data, structure as well as topology of data can be leveraged for any
inference and heuristic.
2. Using structure as well as topology without any deep text analysis, Linked Datacan enhance power of question answering.
7 January 2015EIS research group - Bonn University
42
+Future work
It was the first step in a long agenda.
We plan to:
1. use supervise learning to enhance the parameters of the model.
2. extend our benchmark to make further evolutions.
3. employ more number of interlinked dataset to figure out the challengesof scability.
4. extending different aspects of each approach.
We are going to target new challenges, e.g. query cleaning.
We will continue this work with students joining us for Marie CurieITN network.
7 January 2015EIS research group - Bonn University
43
+
7 January 2015EIS research group - Bonn University
44
+Questiones?
EIS research group - Bonn University
45
7 January 2015