Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Post on 03-Aug-2015

62 views 1 download

Tags:

Transcript of Schema-Agnostic Queries (SAQ-2015): Semantic Web Challenge

Schema-Agnostic Queries

(SAQ-2015)

Semantic Web Challenge

André Freitas, Christina Unger

{Schema-agnostic | Schema-free | Vocabulary-independent}

Motivation

Big Data

Vision: More complete data-based picture of the world for

systems and users.

3

Shift in the Database Landscape

Very-large and dynamic “schemas”.

10s-100s attributes 1,000s-1,000,000s attributes

before 2000 circa 2015

4 Brodie & Liu, 2010

Semantic Heterogeneity Decentralized content generation.

Multiple perspectives (conceptualizations) of the reality.

Ambiguity, vagueness, inconsistency.

5

Size, Complexity, Dynamicity and Decentralisation (SCoDD)

Brodie & Liu, 2010 Helland, 2012

Databases for a Complex World

How do you query data on this scenario?

6

Schema-agnosticism

Ab

str

ac

tio

n

La

ye

r

7

SELECT {

Bill Clinton daughter ?x .

}

Bill

Clinton Chelsea

Clinton child

Schema-agnostic queries

Query approaches over structured databases

which allow users satisfying complex information

needs without the understanding of the

representation (schema) of the database.

8

Schema-agnostic queries

Schema-free queries

Vocabulary-independent queries

...

9

First-level independency

(Relational Model)

“… it provides a basis for a high level data language which will yield maximal independence between programs on the one hand and representation and organization of data on the other”

Codd, 1970

Second-level independency

(Schema-agnosticism)

10

Vocabulary Problem for Databases

BillClinton hasDaughter ?x marriedTo ?y .

Semantic Gap Schema-agnostic

query mechanisms

Abstraction level differences

Lexical variation

Structural (compositional) differences

11

Vocabulary Problem for Databases

BillClinton hasDaughter ?x marriedTo ?y .

Abstraction level differences

Lexical variation

Structural (compositional) differences

12

The SAQ 2015 Test

Collection

The Goal

To support easy querying over complex

databases with large schemata, relieving

users from the need to understand the

formal representation of the data.

14

Test Collection

Training set: 30 schema-agnostic queries

Test set: 103 schema-agnostic queries

DBpedia 2014 and associated YAGO classes

15

Task

Return the correct answers for the schema-

agnostic query.

Precision, recall, f1-score.

16

Query type I: Schema-agnostic SPARQL

query

Keep the query language syntax (SPARQL).

Allow vocabulary and structural variations.

SELECT ?y {

BillClinton hasDaughter ?x .

?x marriedTo ?y .

}

17

Query type I: Schema-agnostic SPARQL

query

SELECT ?y {

BillClinton hasDaughter ?x .

?x marriedTo ?y .

}

PREFIX : <http://dbpedia.org/resource/>

PREFIX dbpedia2: <http://dbpedia.org/property/>

PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT ?y {

:Bill_Clinton dbpedia:child ?x .

?x dbpedia2:spouse ?y .

}

18

Expected Result

19

Query type II: Schema-agnostic keyword

query

"Bill Clinton daughter married to"

PREFIX : <http://dbpedia.org/resource/>

PREFIX dbpedia2: <http://dbpedia.org/property/>

PREFIX dbpedia: <http://dbpedia.org/ontology/>

SELECT ?y {

:Bill_Clinton dbpedia:child ?x .

?x dbpedia2:spouse ?y .

}

20

Classification of Mappings

BillClinton (i s) -> Bill_Clinton (i s) | string_similar

daughterOf (p) -> child (p) | substring, similar

marriedTo (p) -> spouse (p) | substring, related

Data model category and position

(instance, subject)

21

Mappings can be Challenging

SELECT DISTINCT ?num

WHERE {

Soyuz_program numberOfMissions ?num .

}

PREFIX dbp: <http://dbpedia.org/property/>

PREFIX res: <http://dbpedia.org/resource/>

SELECT COUNT(DISTINCT ?uri)

WHERE {

?uri dbp:programme res:Soyuz_programme .

}

22

Example Mappings

languageOf (p) -> spokenIn (p) | related

writtenBy (p) -> author (p) | substring, related

FemaleFirstName (c o) -> gender (p) | substring, related

state (p) -> locatedInArea (p) | related

extinct (p) -> conservationStatus (p) | related

constructionDate (p) -> beginningDate (p) | substring, related

calledAfter (p) -> shipNamesake (p) | related

in (p) -> location (p) | functional_content

in (p) -> isPartOf (p) | functional_content

extinct (p) -> 'EX' (v o) | substring, abbreviation

startAt (p) -> sourceCountry (p) | substring, synonym

U.S._State (c o) -> StatesOfTheUnitedStates (c o) | string_similar

wifeOf (p) -> spouse (p) | substring, similar

23

Test Collection Item

24

Schema-agnostic keyword query

Schema-agnostic SPARQL query

Resolved SPARQL query

Categorized mappings

Answer

Test Collection Construction Methodology

Derived from the Question Answering over Linked Data (QALD

– Unger et al.).

Converting natural language queries to schema-agnostic

queries.

25

Test Collection Analysis

What is expressed in the test collection?

Classifying the distribution of features in the mappings.

26

Mapping Type Distribution

exact 0.3645

related 0.1464

substring 0.1402

substring, related 0.0779

null 0.0623

string similar 0.0498

substring, stem 0.0436

similar 0.0218

synonym 0.0156

stem 0.0125

functional 0.0125

substring, synonym 0.0062

substring, similar 0.0062

predicate chain 0.0062

functional content mapping 0.0124

substring, related 0.0031

substring, string similar 0.0031

substring, abbreviation 0.0031

string similar, related 0.0031

string similar, acronym 0.0031

acronym, string similar 0.0031

acronym 0.0031

27

state -> locatedInArea

related

Data Model Mappings Distribution

(p)->(p) 0.3863

(c o)->(c o) 0.1495

(i s)->(i s) 0.0966

(i o)->(i o) 0.0779

(i o)->(i s) 0.0592

(c o)->(p) 0.0436

(p)->(n) 0.0312

(n)->(p) 0.0218

(c o)->(i o) 0.0187

(i s)->(i o) 0.0156

(i o)->(c o) 0.0125

(v)->(v) 0.0093

(p)->(c o) 0.0093

(c o)->(v o) 0.0093

(op)->(p) 0.0093

(v o)->(v o) 0.0062

(c o)->(n) 0.0062

(i s)->(p) 0.0062

(p)->(v o) 0.0031

(i o)->(v) 0.0031

(i s)->(c o) 0.0031

(n)->(v) 0.0031

(p)->(i s) 0.0031

(p)->(op) 0.0031

(c o)->(i s) 0.0031

(i o)->(p) 0.0031

(i o)->(v o) 0.0031

(i s)->(v o) 0.0031

28

state -> locatedInArea

(p)->(p)

Compositional Mapping (# of words)

1->1 0.5358

2->1 0.1308

2->2 0.0872

1->2 0.0810

0->1 0.0249

2->0 0.0218

3->2 0.0156

3->3 0.0156

1->5 0.0156

1->3 0.0125

1->0 0.0125

3->1 0.0093

2->3 0.0093

5->1 0.0062

4->1 0.0062

3->0 0.0031

4->3 0.0031

4->2 0.0031

2->7 0.0031

3->7 0.0031

29

state -> locatedInArea

1 word -> 3 words

Distribution of Operations

select 0.8155

order/offset 0.0680

ask 0.0291

order/offset 0.0291

count 0.0291

limit 0.0097

group/count 0.0097

order/limit 0.0097

30

Query-set heterogeneity

102 distinct query mapping patterns

Query mapping pattern:

- semantic mapping

- composition patterns

- operations

https://sites.google.com/site/eswcsaq2015/

Participant System

33

Results

Measure UMBC_Equity-SFQ

Avg. precision 0.33

Avg. recall 0.36

Avg. F1-measure 0.31

% of queries answered 0.44

34

Summary

Schema-agnostic queries are a primary functionality for contemporary databases.

The SAQ challenge supports a fine-grained understanding of the semantic phenomena behind mapping schema-agnostic queries.

35