Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
-
Upload
andre-freitas -
Category
Science
-
view
62 -
download
1
Transcript of Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge
Schema-Agnostic Queries
(SAQ-2015)
Semantic Web Challenge
André Freitas, Christina Unger
{Schema-agnostic | Schema-free | Vocabulary-independent}
Motivation
Big Data
Vision: More complete data-based picture of the world for
systems and users.
3
Shift in the Database Landscape
Very-large and dynamic “schemas”.
10s-100s attributes 1,000s-1,000,000s attributes
before 2000 circa 2015
4 Brodie & Liu, 2010
Semantic Heterogeneity Decentralized content generation.
Multiple perspectives (conceptualizations) of the reality.
Ambiguity, vagueness, inconsistency.
5
Size, Complexity, Dynamicity and Decentralisation (SCoDD)
Brodie & Liu, 2010 Helland, 2012
Databases for a Complex World
How do you query data on this scenario?
6
Schema-agnosticism
Ab
str
ac
tio
n
La
ye
r
7
SELECT {
Bill Clinton daughter ?x .
}
Bill
Clinton Chelsea
Clinton child
Schema-agnostic queries
Query approaches over structured databases
which allow users satisfying complex information
needs without the understanding of the
representation (schema) of the database.
8
Schema-agnostic queries
Schema-free queries
Vocabulary-independent queries
...
9
First-level independency
(Relational Model)
“… it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and representation and organization of data on the other”
Codd, 1970
Second-level independency
(Schema-agnosticism)
10
Vocabulary Problem for Databases
BillClinton hasDaughter ?x marriedTo ?y .
Semantic Gap Schema-agnostic
query mechanisms
Abstraction level differences
Lexical variation
Structural (compositional) differences
11
Vocabulary Problem for Databases
BillClinton hasDaughter ?x marriedTo ?y .
Abstraction level differences
Lexical variation
Structural (compositional) differences
12
The SAQ 2015 Test
Collection
The Goal
To support easy querying over complex
databases with large schemata, relieving
users from the need to understand the
formal representation of the data.
14
Test Collection
Training set: 30 schema-agnostic queries
Test set: 103 schema-agnostic queries
DBpedia 2014 and associated YAGO classes
15
Task
Return the correct answers for the schema-
agnostic query.
Precision, recall, f1-score.
16
Query type I: Schema-agnostic SPARQL
query
Keep the query language syntax (SPARQL).
Allow vocabulary and structural variations.
SELECT ?y {
BillClinton hasDaughter ?x .
?x marriedTo ?y .
}
17
Query type I: Schema-agnostic SPARQL
query
SELECT ?y {
BillClinton hasDaughter ?x .
?x marriedTo ?y .
}
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
SELECT ?y {
:Bill_Clinton dbpedia:child ?x .
?x dbpedia2:spouse ?y .
}
18
Expected Result
19
Query type II: Schema-agnostic keyword
query
"Bill Clinton daughter married to"
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
SELECT ?y {
:Bill_Clinton dbpedia:child ?x .
?x dbpedia2:spouse ?y .
}
20
Classification of Mappings
BillClinton (i s) -> Bill_Clinton (i s) | string_similar
daughterOf (p) -> child (p) | substring, similar
marriedTo (p) -> spouse (p) | substring, related
Data model category and position
(instance, subject)
21
Mappings can be Challenging
SELECT DISTINCT ?num
WHERE {
Soyuz_program numberOfMissions ?num .
}
PREFIX dbp: <http://dbpedia.org/property/>
PREFIX res: <http://dbpedia.org/resource/>
SELECT COUNT(DISTINCT ?uri)
WHERE {
?uri dbp:programme res:Soyuz_programme .
}
22
Example Mappings
languageOf (p) -> spokenIn (p) | related
writtenBy (p) -> author (p) | substring, related
FemaleFirstName (c o) -> gender (p) | substring, related
state (p) -> locatedInArea (p) | related
extinct (p) -> conservationStatus (p) | related
constructionDate (p) -> beginningDate (p) | substring, related
calledAfter (p) -> shipNamesake (p) | related
in (p) -> location (p) | functional_content
in (p) -> isPartOf (p) | functional_content
extinct (p) -> 'EX' (v o) | substring, abbreviation
startAt (p) -> sourceCountry (p) | substring, synonym
U.S._State (c o) -> StatesOfTheUnitedStates (c o) | string_similar
wifeOf (p) -> spouse (p) | substring, similar
23
Test Collection Item
24
Schema-agnostic keyword query
Schema-agnostic SPARQL query
Resolved SPARQL query
Categorized mappings
Answer
Test Collection Construction Methodology
Derived from the Question Answering over Linked Data (QALD
– Unger et al.).
Converting natural language queries to schema-agnostic
queries.
25
Test Collection Analysis
What is expressed in the test collection?
Classifying the distribution of features in the mappings.
26
Mapping Type Distribution
exact 0.3645
related 0.1464
substring 0.1402
substring, related 0.0779
null 0.0623
string similar 0.0498
substring, stem 0.0436
similar 0.0218
synonym 0.0156
stem 0.0125
functional 0.0125
substring, synonym 0.0062
substring, similar 0.0062
predicate chain 0.0062
functional content mapping 0.0124
substring, related 0.0031
substring, string similar 0.0031
substring, abbreviation 0.0031
string similar, related 0.0031
string similar, acronym 0.0031
acronym, string similar 0.0031
acronym 0.0031
27
state -> locatedInArea
related
Data Model Mappings Distribution
(p)->(p) 0.3863
(c o)->(c o) 0.1495
(i s)->(i s) 0.0966
(i o)->(i o) 0.0779
(i o)->(i s) 0.0592
(c o)->(p) 0.0436
(p)->(n) 0.0312
(n)->(p) 0.0218
(c o)->(i o) 0.0187
(i s)->(i o) 0.0156
(i o)->(c o) 0.0125
(v)->(v) 0.0093
(p)->(c o) 0.0093
(c o)->(v o) 0.0093
(op)->(p) 0.0093
(v o)->(v o) 0.0062
(c o)->(n) 0.0062
(i s)->(p) 0.0062
(p)->(v o) 0.0031
(i o)->(v) 0.0031
(i s)->(c o) 0.0031
(n)->(v) 0.0031
(p)->(i s) 0.0031
(p)->(op) 0.0031
(c o)->(i s) 0.0031
(i o)->(p) 0.0031
(i o)->(v o) 0.0031
(i s)->(v o) 0.0031
28
state -> locatedInArea
(p)->(p)
Compositional Mapping (# of words)
1->1 0.5358
2->1 0.1308
2->2 0.0872
1->2 0.0810
0->1 0.0249
2->0 0.0218
3->2 0.0156
3->3 0.0156
1->5 0.0156
1->3 0.0125
1->0 0.0125
3->1 0.0093
2->3 0.0093
5->1 0.0062
4->1 0.0062
3->0 0.0031
4->3 0.0031
4->2 0.0031
2->7 0.0031
3->7 0.0031
29
state -> locatedInArea
1 word -> 3 words
Distribution of Operations
select 0.8155
order/offset 0.0680
ask 0.0291
order/offset 0.0291
count 0.0291
limit 0.0097
group/count 0.0097
order/limit 0.0097
30
Query-set heterogeneity
102 distinct query mapping patterns
Query mapping pattern:
- semantic mapping
- composition patterns
- operations
https://sites.google.com/site/eswcsaq2015/
Participant System
33
Results
Measure UMBC_Equity-SFQ
Avg. precision 0.33
Avg. recall 0.36
Avg. F1-measure 0.31
% of queries answered 0.44
34
Summary
Schema-agnostic queries are a primary functionality for contemporary databases.
The SAQ challenge supports a fine-grained understanding of the semantic phenomena behind mapping schema-agnostic queries.
35