Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

35
Schema-Agnostic Queries (SAQ-2015) Semantic Web Challenge André Freitas, Christina Unger {Schema-agnostic | Schema-free | Vocabulary-independent}

Transcript of Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Page 1: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Schema-Agnostic Queries

(SAQ-2015)

Semantic Web Challenge

André Freitas, Christina Unger

{Schema-agnostic | Schema-free | Vocabulary-independent}

Page 2: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Motivation

Page 3: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Big Data

Vision: More complete data-based picture of the world for

systems and users.

3

Page 4: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Shift in the Database Landscape

Very-large and dynamic “schemas”.

10s-100s attributes 1,000s-1,000,000s attributes

before 2000 circa 2015

4 Brodie & Liu, 2010

Page 5: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Semantic Heterogeneity Decentralized content generation.

Multiple perspectives (conceptualizations) of the reality.

Ambiguity, vagueness, inconsistency.

5

Size, Complexity, Dynamicity and Decentralisation (SCoDD)

Brodie & Liu, 2010 Helland, 2012

Page 6: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Databases for a Complex World

How do you query data on this scenario?

6

Page 7: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Schema-agnosticism

Ab

str

ac

tio

n

La

ye

r

7

SELECT {

Bill Clinton daughter ?x .

}

Bill

Clinton Chelsea

Clinton child

Page 8: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Schema-agnostic queries

Query approaches over structured databases

which allow users satisfying complex information

needs without the understanding of the

representation (schema) of the database.

8

Page 9: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Schema-agnostic queries

Schema-free queries

Vocabulary-independent queries

...

9

Page 10: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

First-level independency

(Relational Model)

“… it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and representation and organization of data on the other”

Codd, 1970

Second-level independency

(Schema-agnosticism)

10

Page 11: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Vocabulary Problem for Databases

BillClinton hasDaughter ?x marriedTo ?y .

Semantic Gap Schema-agnostic

query mechanisms

Abstraction level differences

Lexical variation

Structural (compositional) differences

11

Page 12: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Vocabulary Problem for Databases

BillClinton hasDaughter ?x marriedTo ?y .

Abstraction level differences

Lexical variation

Structural (compositional) differences

12

Page 13: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

The SAQ 2015 Test

Collection

Page 14: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

The Goal

To support easy querying over complex

databases with large schemata, relieving

users from the need to understand the

formal representation of the data.

14

Page 15: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Test Collection

Training set: 30 schema-agnostic queries

Test set: 103 schema-agnostic queries

DBpedia 2014 and associated YAGO classes

15

Page 16: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Task

Return the correct answers for the schema-

agnostic query.

Precision, recall, f1-score.

16

Page 17: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Query type I: Schema-agnostic SPARQL

query

Keep the query language syntax (SPARQL).

Allow vocabulary and structural variations.

SELECT ?y {

BillClinton hasDaughter ?x .

?x marriedTo ?y .

}

17

Page 18: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Query type I: Schema-agnostic SPARQL

query

SELECT ?y {

BillClinton hasDaughter ?x .

?x marriedTo ?y .

}

PREFIX : <http://dbpedia.org/resource/>

PREFIX dbpedia2: <http://dbpedia.org/property/>

PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT ?y {

:Bill_Clinton dbpedia:child ?x .

?x dbpedia2:spouse ?y .

}

18

Page 19: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Expected Result

19

Page 20: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Query type II: Schema-agnostic keyword

query

"Bill Clinton daughter married to"

PREFIX : <http://dbpedia.org/resource/>

PREFIX dbpedia2: <http://dbpedia.org/property/>

PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT ?y {

:Bill_Clinton dbpedia:child ?x .

?x dbpedia2:spouse ?y .

}

20

Page 21: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Classification of Mappings

BillClinton (i s) -> Bill_Clinton (i s) | string_similar

daughterOf (p) -> child (p) | substring, similar

marriedTo (p) -> spouse (p) | substring, related

Data model category and position

(instance, subject)

21

Page 22: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Mappings can be Challenging

SELECT DISTINCT ?num

WHERE {

Soyuz_program numberOfMissions ?num .

}

PREFIX dbp: <http://dbpedia.org/property/>

PREFIX res: <http://dbpedia.org/resource/>

SELECT COUNT(DISTINCT ?uri)

WHERE {

?uri dbp:programme res:Soyuz_programme .

}

22

Page 23: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Example Mappings

languageOf (p) -> spokenIn (p) | related

writtenBy (p) -> author (p) | substring, related

FemaleFirstName (c o) -> gender (p) | substring, related

state (p) -> locatedInArea (p) | related

extinct (p) -> conservationStatus (p) | related

constructionDate (p) -> beginningDate (p) | substring, related

calledAfter (p) -> shipNamesake (p) | related

in (p) -> location (p) | functional_content

in (p) -> isPartOf (p) | functional_content

extinct (p) -> 'EX' (v o) | substring, abbreviation

startAt (p) -> sourceCountry (p) | substring, synonym

U.S._State (c o) -> StatesOfTheUnitedStates (c o) | string_similar

wifeOf (p) -> spouse (p) | substring, similar

23

Page 24: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Test Collection Item

24

Schema-agnostic keyword query

Schema-agnostic SPARQL query

Resolved SPARQL query

Categorized mappings

Answer

Page 25: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Test Collection Construction Methodology

Derived from the Question Answering over Linked Data (QALD

– Unger et al.).

Converting natural language queries to schema-agnostic

queries.

25

Page 26: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Test Collection Analysis

What is expressed in the test collection?

Classifying the distribution of features in the mappings.

26

Page 27: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Mapping Type Distribution

exact 0.3645

related 0.1464

substring 0.1402

substring, related 0.0779

null 0.0623

string similar 0.0498

substring, stem 0.0436

similar 0.0218

synonym 0.0156

stem 0.0125

functional 0.0125

substring, synonym 0.0062

substring, similar 0.0062

predicate chain 0.0062

functional content mapping 0.0124

substring, related 0.0031

substring, string similar 0.0031

substring, abbreviation 0.0031

string similar, related 0.0031

string similar, acronym 0.0031

acronym, string similar 0.0031

acronym 0.0031

27

state -> locatedInArea

related

Page 28: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Data Model Mappings Distribution

(p)->(p) 0.3863

(c o)->(c o) 0.1495

(i s)->(i s) 0.0966

(i o)->(i o) 0.0779

(i o)->(i s) 0.0592

(c o)->(p) 0.0436

(p)->(n) 0.0312

(n)->(p) 0.0218

(c o)->(i o) 0.0187

(i s)->(i o) 0.0156

(i o)->(c o) 0.0125

(v)->(v) 0.0093

(p)->(c o) 0.0093

(c o)->(v o) 0.0093

(op)->(p) 0.0093

(v o)->(v o) 0.0062

(c o)->(n) 0.0062

(i s)->(p) 0.0062

(p)->(v o) 0.0031

(i o)->(v) 0.0031

(i s)->(c o) 0.0031

(n)->(v) 0.0031

(p)->(i s) 0.0031

(p)->(op) 0.0031

(c o)->(i s) 0.0031

(i o)->(p) 0.0031

(i o)->(v o) 0.0031

(i s)->(v o) 0.0031

28

state -> locatedInArea

(p)->(p)

Page 29: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Compositional Mapping (# of words)

1->1 0.5358

2->1 0.1308

2->2 0.0872

1->2 0.0810

0->1 0.0249

2->0 0.0218

3->2 0.0156

3->3 0.0156

1->5 0.0156

1->3 0.0125

1->0 0.0125

3->1 0.0093

2->3 0.0093

5->1 0.0062

4->1 0.0062

3->0 0.0031

4->3 0.0031

4->2 0.0031

2->7 0.0031

3->7 0.0031

29

state -> locatedInArea

1 word -> 3 words

Page 30: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Distribution of Operations

select 0.8155

order/offset 0.0680

ask 0.0291

order/offset 0.0291

count 0.0291

limit 0.0097

group/count 0.0097

order/limit 0.0097

30

Page 31: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Query-set heterogeneity

102 distinct query mapping patterns

Query mapping pattern:

- semantic mapping

- composition patterns

- operations

Page 32: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

https://sites.google.com/site/eswcsaq2015/

Page 33: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Participant System

33

Page 34: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Results

Measure UMBC_Equity-SFQ

Avg. precision 0.33

Avg. recall 0.36

Avg. F1-measure 0.31

% of queries answered 0.44

34

Page 35: Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Summary

Schema-agnostic queries are a primary functionality for contemporary databases.

The SAQ challenge supports a fine-grained understanding of the semantic phenomena behind mapping schema-agnostic queries.

35