Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf ·...

108
Database Optimization Techniques for Semantic Queries Ioana Manolescu INRIA, France [email protected], http://pages.saclay.inria.fr/Ioana.Manolescu https://team.inria.fr/oak SEBD 2015, Gaeta I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 1 / 73

Transcript of Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf ·...

Page 1: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Database Optimization Techniques for SemanticQueries

Ioana Manolescu

INRIA, [email protected],

http://pages.saclay.inria.fr/Ioana.Manolescu

https://team.inria.fr/oak

SEBD 2015, Gaeta

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 1 / 73

Page 2: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part I

Motivation and outline

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 2 / 73

Page 3: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic data

Most popular data model: RDF (W3C’s Resource DescriptionFramework)

Famous application: the Linked Open Data cloud

(2007)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 3 / 73

Page 4: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic data

Most popular data model: RDF (W3C’s Resource DescriptionFramework)

Famous application: the Linked Open Data cloud (2007)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 3 / 73

Page 5: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic data

Most popular data model: RDF (W3C’s Resource DescriptionFramework)

Famous application: the Linked Open Data cloud (2008)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 4 / 73

Page 6: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic data

Most popular data model: RDF (W3C’s Resource DescriptionFramework)

Famous application: the Linked Open Data cloud (2009)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 5 / 73

Page 7: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic data

Most popular data model: RDF (W3C’s Resource DescriptionFramework)

Famous application: the Linked Open Data cloud (2010)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 6 / 73

Page 8: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Linked Open Data cloud, 2014

900,129 documents describing 8,038,396 resources(Schmachtenberg, Bizer, Paulheim, ISWC 2014)

Topic Datasets %Government 183 18.05

Publications 96 9.47

Life sciences 83 8.19

User-generated 48 4.73

Cross-domain 41 4.04

Media 22 2.17

Geographic 21 2.07

Social web 520 51.28

Total 1014 100

There is more (Billion Triple Challenge etc.)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 7 / 73

Page 9: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic Data

1 RDF: three-attribute relation (subject, property, object)

The subject is the resource being describedThe resource has the property property whose value is objectResource type is a property, specified just like any other

2 RDFS semantics providing information about the propertiesand classes (types) of resources:

Any undergraduateStudent is a StudentAnyone having a graduationDate is a Student (but may also beof other types) ...

Semantics leads to implicit data

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 8 / 73

Page 10: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic Data

1 RDF: three-attribute relation (subject, property, object)

The subject is the resource being describedThe resource has the property property whose value is objectResource type is a property, specified just like any other

2 RDFS semantics providing information about the propertiesand classes (types) of resources:

Any undergraduateStudent is a StudentAnyone having a graduationDate is a Student (but may also beof other types) ...

Semantics leads to implicit data

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 8 / 73

Page 11: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic Data

1 RDF: three-attribute relation (subject, property, object)

The subject is the resource being describedThe resource has the property property whose value is objectResource type is a property, specified just like any other

2 RDFS semantics providing information about the propertiesand classes (types) of resources:

Any undergraduateStudent is a StudentAnyone having a graduationDate is a Student (but may also beof other types) ...

Semantics leads to implicit data

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 8 / 73

Page 12: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Querying Semantic Data

Main RDFS constraints: rdfs:subClassOf, rdfs:subPropertyOf,rdfs:domain, rdfs:range

Beyond RDFS

Many (richer) constraint (or ontology) languages

W3C standards: OWL 2 profile family

DL Lite family� Calvanese, De Giacomo, Lembo, Lenzerini, Rosati, JAR2007

Datalog±

� Cali, Gottlob Lukasiewick, Marnette, Pierris, LICS 2010

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 9 / 73

Page 13: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Do we really need the semantics?

Yes. All the time.

Application knowledge / constraints:

Every Senator is an ElectedOfficial which is a Person

(On Wikipedia) being BornInAPlace means being a Person

The source and destination of a tripFromTo are either astreetAddress, or a cityAddress or a countryAddress

1 Without the semantics, we may miss query answers

2 Semantic contraints are a compact way of encodinginformation (“every ElectedOfficial is a Person” stated onlyonce)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 10 / 73

Page 14: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Do we really need the semantics?

Yes. All the time.

Application knowledge / constraints:

Every Senator is an ElectedOfficial which is a Person

(On Wikipedia) being BornInAPlace means being a Person

The source and destination of a tripFromTo are either astreetAddress, or a cityAddress or a countryAddress

1 Without the semantics, we may miss query answers

2 Semantic contraints are a compact way of encodinginformation (“every ElectedOfficial is a Person” stated onlyonce)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 10 / 73

Page 15: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Motivation and context

Outline

1 Motivation

2 RDF data model and query language

3 Query answering techniques

4 Cover-based query reformulation for the database fragment ofRDF

5 Cover-based query reformulation framework for FOL-reduciblesettings

6 Performance and concluding remarks

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 11 / 73

Page 16: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part II

RDF data and queries

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 12 / 73

Page 17: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language RDF

The Resource Description Framework (RDF)

RDF graph – set of triples

Assertion Triple Relational notation

Class s rdf:type o o(s)Property s p o p(s, o)

doi1

Book

“El Aleph”

:b1

“J. L. Borges”

“1949”

publishedIn

hasTitle

writtenBy

hasName

rdf:typeresource (URI)

blank node

literal (string)

property

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 13 / 73

Page 18: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language RDFS

RDF Schema (RDFS)

Declare deductive constraints between classes and properties

Constraint Triple OWA interpretation

Subclass s rdfs:subClassOf o s ⊆ o

Subproperty s rdfs:subPropertyOf o s ⊆ o

Domain typing s rdfs:domain o Πdomain(s) ⊆ o

Range typing s rdfs:range o Πrange(s) ⊆ o

Book

Publication

Person

writtenBy

hasAuthor

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 14 / 73

Page 19: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language RDF entailment

Open-world assumption and RDF entailment

RDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate triples

Implicit triples: part of the graph – not explicitly present

Entailment – reasoning mechanism

set of explicit triples+ → derive implicit triples

some entailment rules

Exhaustive application of entailment → saturation (closure)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 15 / 73

Page 20: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language RDF entailment

Open-world assumption and RDF entailment

RDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate triples

Implicit triples: part of the graph – not explicitly present

Entailment – reasoning mechanism

set of explicit triples+ → derive implicit triples

some entailment rules

Exhaustive application of entailment → saturation (closure)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 15 / 73

Page 21: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language RDF entailment

Open-world assumption and RDF entailment

RDF data model – based on the open-world assumption.→ deductive constraints – implicitly propagate triples

Implicit triples: part of the graph – not explicitly present

Entailment – reasoning mechanism

set of explicit triples+ → derive implicit triples

some entailment rules

Exhaustive application of entailment → saturation (closure)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 15 / 73

Page 22: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language RDF entailment

RDF entailment

The semantics of an RDF graph G is its saturation G∞.

Sample RDFSentailment rules

Instance entailment from combining schema and instancetriples

rdfs9 c1 rdfs:subClassOf c2 ∧ s rdf:type c1 `rdfs9RDF s rdf:type c2

rdfs7 p1 rdfs:subPropertyOf p2 ∧ s p1 o `rdfs7RDF s p2 o

rdfs2 p rdfs:domain c ∧ s p o `rdfs2RDF s rdf:type c

rdfs3 p rdfs:range c ∧ s p o `rdfs3RDF o rdf:type c

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73

Page 23: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language Query answering

SPARQL query language and SPARQL conjunctive queries

SPARQL is the W3C query language for RDF.

SPARQL conjunctive queries = Basic Graph Pattern (BGP) queries

Sample BGP query:

q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,hasAuthor, a),(b, publishedIn, ”1949”)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 17 / 73

Page 24: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language Query answering

Query semantics / Answer set

query evaluation 6= query answering

The evaluation of a query only uses the graph’s explicittriples

For the (complete) answer set, evaluate q against thegraph’s saturation

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 18 / 73

Page 25: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language Query answering

Query answering example

Given the query q(x, y):- x rdf:type y:

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(G) = {(doi1,Book)}q(G∞) = {(doi1,Book), (doi1,Publication), ( :b1,Person)}

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 19 / 73

Page 26: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language Query answering

Query answering example

Given the query q(x, y):- x rdf:type y:

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(G) = {(doi1,Book)}

q(G∞) = {(doi1,Book), (doi1,Publication), ( :b1,Person)}

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 19 / 73

Page 27: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

RDF data model and query language Query answering

Query answering example

Given the query q(x, y):- x rdf:type y:

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(G) = {(doi1,Book)}q(G∞) = {(doi1,Book), (doi1,Publication), ( :b1,Person)}

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 19 / 73

Page 28: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part III

Query answering techniques

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 20 / 73

Page 29: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Reasoning

The need for reasoning

Query answering needs explicit and implicit data!

Saturation-based query answering (when the saturation resultis finite)

Reformulation-based query answering

Hybrids of the above� J. Urbani, F. van Harmelen, S. Schlobach, and H. Bal,“QueryPIE: Backward reasoning for OWL Horst over verylarge knowledge bases”, ISWC 2011

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 21 / 73

Page 30: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Graph saturation

Saturation-based query answering

schema

G

G∞

query q

answer

q(G∞) can be computed using an RDBMS

G∞ needs time to be computed and space to be stored

Not suitable for high update rate (data and/or schema triples)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 22 / 73

Page 31: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Graph saturation

Saturation-based query answering

schema

G

G∞

query q

answer

q(G∞) can be computed using an RDBMS

G∞ needs time to be computed and space to be stored

Not suitable for high update rate (data and/or schema triples)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 22 / 73

Page 32: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Graph saturation

Saturation maintenance

schema

G

G∞

query q

answer

Compute ∆ for an update of an RDF graph G s.t.• (µ(G))∞ = G∞ ∪∆ when µ an insertion• (µ(G))∞ = G∞ \∆ when µ a deletion

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 23 / 73

Page 33: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering

schema

query q query qref

answer

G

qref (G) can be evaluated using an RDBMS

Robust to updates

Reformulated queries are complex, thus costly to evaluate

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 24 / 73

Page 34: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering

schema

query q query qref

answer

G

qref (G) can be evaluated using an RDBMS

Robust to updates

Reformulated queries are complex, thus costly to evaluate

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 24 / 73

Page 35: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering

schema

query q query qref

answer

G

qref (G) can be evaluated using an RDBMS

Robust to updates

Reformulated queries are complex, thus costly to evaluate

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 25 / 73

Page 36: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering

schema

query q query qref

answer

G

qref (G) can be evaluated using an RDBMS

Robust to updates

Reformulated queries are complex, thus costly to evaluate

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 25 / 73

Page 37: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Semantic query answering in the presence of mappings

� Di Pinto, Lembo, Lenzerini, Mancini, Poggi, Rosatti, Ruzzi,Savo, EDBT 2013

The logical storage of the data is related to the ontology bymeans of mappings.

A mapping states that a query on the database is includedinto a query on the ontology.

Reformulation-based query answering in the presence of(GAV) mappings:� Lanti, Rezk, Xiao and Calvanese, EDBT 2015

1 Start-up phase (may perform some materialization)2 Query rewriting phase (based on the schema)3 Query translation phase (e.g., into SQL, based on the

mappings)4 Query execution phase (e.g., through an RDBMS)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 26 / 73

Page 38: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Semantic query answering in the presence of mappings

Reformulation-based query answering in the presence of(GAV) mappings:� Lanti, Rezk, Xiao and Calvanese, EDBT 2015

1 Start-up phase (may perform some materialization)2 Query rewriting phase (based on the schema)3 Query translation phase (e.g., into SQL, based on the

mappings)4 Query execution phase (e.g., through an RDBMS)

This talk: Optimized reformulation technique based on con-sidering translation and rewriting together

Can be composed with mappings (and further mapping-based opti-mizations)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 27 / 73

Page 39: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Semantic query answering in the presence of mappings

Reformulation-based query answering in the presence of(GAV) mappings:� Lanti, Rezk, Xiao and Calvanese, EDBT 2015

1 Start-up phase (may perform some materialization)2 Query rewriting phase (based on the schema)3 Query translation phase (e.g., into SQL, based on the

mappings)4 Query execution phase (e.g., through an RDBMS)

This talk: Optimized reformulation technique based on con-sidering translation and rewriting together

Can be composed with mappings (and further mapping-based opti-mizations)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 27 / 73

Page 40: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Semantic query answering vs. Ontology-Based DataAccess (OBDA)

Fundamentally the same

Input data (stored somehow), ontology, query

Output query answer

In general, mappings may be GAV, LAV, GLAV etc.LAV can be very practical (see: RDBMS indexes!)

We optimize query rewriting and translation independently ofthe global ↔ local (source) data mappings

Integration/interactions are intriguing and current/future work

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 28 / 73

Page 41: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering landscape

schema

query q query qref

answer

G

Target reformulation languages for conjunctive queries (CQs):

mainly unions of CQs (UCQs)

� F. Goasdoue, I. Manolescu, A. Roatis: “Efficient query answeringagainst dynamic RDF databases”, EDBT 2013

joins of single-atom CQs (SCQs)

� M. Thomazo: “Compact Rewritings for Existential Rules”, IJCAI 2013

joins of UCQs (JUCQs)

� D. Bursztyn, F. Goasdoue, I. Manolescu: “OptimizingReformulation-based Query Answering in RDF”, EDBT 2015

Wait: is this about SQL syntax?!...

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 29 / 73

Page 42: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering landscape

schema

query q query qref

answer

G

Target reformulation languages for conjunctive queries (CQs):

mainly unions of CQs (UCQs)

� F. Goasdoue, I. Manolescu, A. Roatis: “Efficient query answeringagainst dynamic RDF databases”, EDBT 2013

joins of single-atom CQs (SCQs)

� M. Thomazo: “Compact Rewritings for Existential Rules”, IJCAI 2013

joins of UCQs (JUCQs)

� D. Bursztyn, F. Goasdoue, I. Manolescu: “OptimizingReformulation-based Query Answering in RDF”, EDBT 2015

Wait: is this about SQL syntax?!...

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 29 / 73

Page 43: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Query answering techniques Query reformulation

Reformulation-based query answering landscape

schema

query q query qref

answer

G

Target reformulation languages for conjunctive queries (CQs):

mainly unions of CQs (UCQs)

� F. Goasdoue, I. Manolescu, A. Roatis: “Efficient query answeringagainst dynamic RDF databases”, EDBT 2013

joins of single-atom CQs (SCQs)

� M. Thomazo: “Compact Rewritings for Existential Rules”, IJCAI 2013

joins of UCQs (JUCQs)

� D. Bursztyn, F. Goasdoue, I. Manolescu: “OptimizingReformulation-based Query Answering in RDF”, EDBT 2015

Wait: is this about SQL syntax?!...

Yes. And it makes a big difference.From failing to feasible, or 4 orders of magnitudespeed-up on the 8 M triples DBLP dataset.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 30 / 73

Page 44: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part IV

Cover-based reformulation for the

database fragment of RDF

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 31 / 73

Page 45: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

The database fragment of RDF

Largest RDF fragment for which an FOL reformulation-based queryanswering technique is known

Restrict RDF entailment to the rules dedicated to RDFSchema only (a.k.a. RDFS entailment).

No restriction to graphs: any triple allowed by the RDFspecification is also allowed in the DB fragment (e.g., blanknodes in data and schema)

Algorithm Reformulate(q, db) for BGP queries over schema anddata. Output size bound O((6 ∗#db2)#q)� Goasdoue, Manolescu, Roatis, EDBT 2013

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 32 / 73

Page 46: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-UCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :

(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b,hasAuthor, a),(b,publishedIn, ”1949”)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 33 / 73

Page 47: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-UCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :

(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 34 / 73

Page 48: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-UCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :

(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 35 / 73

Page 49: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-UCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :

(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”) ∪

(3) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 36 / 73

Page 50: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-UCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :

(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”) ∪

(3) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)

q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 37 / 73

Page 51: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-UCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) leads to qref :

(0) q(a, t) :- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(1) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) ∪

(2) q(a, t) :- (b, rdf:type,Book), (b,hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”) ∪

(3) q(a, t) :- (b,writtenBy, x), (b, hasTitle, t), (b,writtenBy, a),(b, publishedIn, ”1949”)

q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 37 / 73

Page 52: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-SCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) produces the qref :

(0) q(b) :- (b, rdf:type,Book) ∪ (b,writtenBy, x) ./(1) q(b, a) :- (b,hasAuthor, a) ∪ (b,writtenBy, a) ./(2) q(b) :- (b,publishedIn, ”1949”)

q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 38 / 73

Page 53: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-SCQ query reformulation example

doi1

Book

Publication

“El Aleph”

:b1

“J. L. Borges”

“1949”

Person

writtenBy

hasAuthor

publishedIn

rdfs:subClassOf

rdfs:domain

rdfs:range

rdfs:subPropertyOf

hasTitle

writtenBy

hasName

rdf:type

rdf:type

hasAuthor rdf:type

rdfs:domain

q(a, t):- (b, rdf:type,Book), (b, hasTitle, t), (b, hasAuthor, a),(b, publishedIn, ”1949”) produces the qref :

(0) q(b) :- (b, rdf:type,Book) ∪ (b,writtenBy, x) ./(1) q(b, a) :- (b,hasAuthor, a) ∪ (b,writtenBy, a) ./(2) q(b) :- (b,publishedIn, ”1949”)

q(G∞) = qref (G) = {( :b1, ”El Aleph”)}.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 38 / 73

Page 54: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation

RDF Entailment Rules

query q Optimiser

Reformulation algorithm

query q′

answer

G

1. Enlarges the query reformulation language w.r.t. UCQ/SCQto have more than one reformulation alternative

2. Uses a cost model for estimating the cost of evaluating qref

through an RDBMS3. Chooses the cheapest alternative from the search space.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 39 / 73

Page 55: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation

RDF Entailment Rules

query q Optimiser

Reformulation algorithm

query q′

answer

G

1. Enlarges the query reformulation language w.r.t. UCQ/SCQto have more than one reformulation alternative2. Uses a cost model for estimating the cost of evaluating qref

through an RDBMS

3. Chooses the cheapest alternative from the search space.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 39 / 73

Page 56: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation

RDF Entailment Rules

query q Optimiser

Reformulation algorithm

query q′

answer

G

1. Enlarges the query reformulation language w.r.t. UCQ/SCQto have more than one reformulation alternative2. Uses a cost model for estimating the cost of evaluating qref

through an RDBMS3. Chooses the cheapest alternative from the search space.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 39 / 73

Page 57: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

Optimized reformulation into JUCQs

qrefc(qref )

Query q

Graph G

CQ-to-UCQ ref. algo. qrefUCQ ref

JUCQ ref

q1

c(q1)

qbest

c(qbest)

≡qn

c(qn)

. . .

≡. . .

qbest

RDBMS

Results

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 40 / 73

Page 58: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation

RDF Entailment Rules

query q Optimiser

Reformulation algorithm

query q′

answer

G

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

and a state-of-the-art CQ-to-UCQ reformulation algorithm .ref :

the UCQ reformulation is: (t1, t2, t3)ref

the SCQ reformulation is: (t1)ref 1 (t2)ref 1 (t3)ref

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 41 / 73

Page 59: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation

RDF Entailment Rules

query q Optimiser

Reformulation algorithm

query q′

answer

G

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

and a state-of-the-art CQ-to-UCQ reformulation algorithm .ref :

the UCQ reformulation is: (t1, t2, t3)ref

the SCQ reformulation is: (t1)ref 1 (t2)ref 1 (t3)ref

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 41 / 73

Page 60: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

and a state-of-the-art CQ-to-UCQ reformulation algorithm .ref , thespace of JUCQs is:

1. (t1, t2, t3)ref

2. (t1)ref 1 (t2)ref 1 (t3)ref

3. (t1, t2)ref 1 (t3)ref

4. (t1)ref 1 (t2, t3)ref

5. (t1, t3)ref 1 (t2)ref

6. (t1, t2)ref 1 (t1, t3)ref

7. (t1, t2)ref 1 (t2, t3)ref

8. (t1, t3)ref 1 (t2, t3)ref

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 42 / 73

Page 61: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation algorithms

Exhaustive algorithm

Impractical since search space size is the number of minimal coversof a set of n elements

Greedy algorithm

Driven by cost; starting from 1-atom fragments and “growing” them

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

a cost-based greedy exploration is:

(t1)ref 1 (t2)ref 1 (t3)ref

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 43 / 73

Page 62: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation: greedy algorithmexample

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

a cost-based greedy exploration is:

(t1)ref 1 (t2)ref 1 (t3)ref

(t1, t2)ref 1 (t3)ref (t1, t3)ref 1 (t2)ref (t1)ref 1 (t2, t3)ref

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 44 / 73

Page 63: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based reformulation for the DB fragment of RDF

CQ-to-JUCQ query reformulation algorithm

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

a cost-based greedy exploration is:

(t1)ref 1 (t2)ref 1 (t3)ref

(t1, t2)ref 1 (t3)ref (t1, t3)ref 1 (t2)ref (t1)ref 1 (t2, t3)ref

(t1, t2, t3)ref (t1, t3)ref 1 (t1, t2)ref (t1, t3)ref 1 (t2, t3)ref

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 45 / 73

Page 64: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part V

Cover-based reformulation in

FOL-reducible settings

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 46 / 73

Page 65: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Optimised reformulation-based query answering

For any setting where query answering is reducible to FOL queryevaluationThe same overall approach:

qrefc(qref )

Query q

Deductive constraints

FOL ref. algo. qrefFixed FOL reform.

Best FOL reformulation

q1

c(q1)

qbest

c(qbest)

≡qn

c(qn)

. . .

≡. . .

qbest

RDBMS

Answers

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 47 / 73

Page 66: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

DL-LiteR facts (a.k.a. ABox)

Given:

NC : set of concept names (unary predicates),

NR : set of role names (binary predicates),

NI : set of individuals (constants).

Finite number of:

Concept assertions: A(a) with A ∈ NC and a ∈ NI

Role assertions: R(a, b) with R ∈ NR and a, b ∈ NI

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 48 / 73

Page 67: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

DL-LiteR facts (a.k.a. ABox)

Given:

NC : set of concept names (unary predicates),

NR : set of role names (binary predicates),

NI : set of individuals (constants).

Finite number of:

Concept assertions: A(a) with A ∈ NC and a ∈ NI

Role assertions: R(a, b) with R ∈ NR and a, b ∈ NI

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 48 / 73

Page 68: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

DL-LiteR constraints (a.k.a. TBox)

Given:

NC : set of concept names (unary predicates),

NR : set of role names (binary predicates),

Set of:

Concept inclusions: B v C

Role inclusions: Q v S

using the following grammar (where A ∈ NC and R ∈ NR):

B := A | ∃Q, C := B | ¬B, Q := R | R−, S := Q | ¬Q

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 49 / 73

Page 69: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

DL-LiteR constraints (a.k.a. TBox)

Given:

NC : set of concept names (unary predicates),

NR : set of role names (binary predicates),

Set of:

Concept inclusions: B v C

Role inclusions: Q v S

using the following grammar (where A ∈ NC and R ∈ NR):

B := A | ∃Q, C := B | ¬B, Q := R | R−, S := Q | ¬Q

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 49 / 73

Page 70: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

FOL Reformulation

Consider the query q(x):- PhDStudent(x) ∧ worksWith(y , x) andKB K = 〈T ,A〉 with the TBox T and ABox A below:

(T1) PhDStudent v Researcher

(T2) ∃worksWith v Researcher

(T3) ∃worksWith− v Researcher

(T4) worksWith v worksWith−

(T5) supervisedBy v worksWith

(T6) ∃supervisedBy v PhDStudent

(T7) PhDStudent v ¬∃supervisedBy−

(A1) worksWith(Ioana,Francois)(A2) supervisedBy(Damian, Ioana)(A3) supervisedBy(Damian,Francois)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 50 / 73

Page 71: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

FOL Reformulation

The UCQ reformulation of q is:

qUCQ(x):- q1(x):- PhDStudent(x) ∧ worksWith(y , x)∨ q2(x):- PhDStudent(x) ∧ worksWith(x , y)∨ q3(x):- PhDStudent(x) ∧ supervisedBy(y , x)∨ q4(x):- PhDStudent(x) ∧ supervisedBy(x , y)∨ q5(x):- supervisedBy(x , z) ∧ worksWith(y , x)∨ q6(x):- supervisedBy(x , z) ∧ worksWith(x , y)∨ q7(x):- supervisedBy(x , z) ∧ supervisedBy(y , x)∨ q8(x):- supervisedBy(x , z) ∧ supervisedBy(x , y)∨ q9(x):- supervisedBy(x , x)∨ q10(x):- supervisedBy(x , y)

Naive exhaustive application of specialization steps leads, ingeneral, to highly redundant reformulations.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 51 / 73

Page 72: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

FOL Reformulation

The UCQ reformulation of q is:

qUCQ(x):- q1(x):- PhDStudent(x) ∧ worksWith(y , x)∨ q2(x):- PhDStudent(x) ∧ worksWith(x , y)∨ q3(x):- PhDStudent(x) ∧ supervisedBy(y , x)∨ q4(x):- PhDStudent(x) ∧ supervisedBy(x , y)∨ q5(x):- supervisedBy(x , z) ∧ worksWith(y , x)∨ q6(x):- supervisedBy(x , z) ∧ worksWith(x , y)∨ q7(x):- supervisedBy(x , z) ∧ supervisedBy(y , x)∨ q8(x):- supervisedBy(x , z) ∧ supervisedBy(x , y)∨ q9(x):- supervisedBy(x , x)∨ q10(x):- supervisedBy(x , y)

Naive exhaustive application of specialization steps leads, ingeneral, to highly redundant reformulations.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 51 / 73

Page 73: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

FOL Reformulation minimization

Minimization of qUCQ by eliminating disjuncts contained in anotherleads to:

qUCQmin(x):- q1(x):- PhDStudent(x) ∧ worksWith(y , x)∨ q2(x):- PhDStudent(x) ∧ worksWith(x , y)∨ q3(x):- PhDStudent(x) ∧ supervisedBy(y , x)∨ q9(x):- supervisedBy(x , y)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 52 / 73

Page 74: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based FOL reformulation

� Bursztyn, Goasdoue, Manolescu (DL 2015)Consider the query q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y) and KBK = 〈T ,A〉 with the TBox T and ABox A below:

(T1) B v ∃R ′(T2) R ′ v R

(A1) A(a)(A2) B(a)

The only answer to q(x) is a.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 53 / 73

Page 75: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based FOL reformulation

� Bursztyn, Goasdoue, Manolescu (DL 2015)Consider the query q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y) and KBK = 〈T ,A〉 with the TBox T and ABox A below:

(T1) B v ∃R ′(T2) R ′ v R

(A1) A(a)(A2) B(a)

The only answer to q(x) is a.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 53 / 73

Page 76: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based FOL reformulation

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C1 = {{A(x),R(x , y)}, {R ′(z , y)}}.

The cover-based FOL reformulation of q w.r.t. C1 is:

qJUCQ(x):- qUCQ1 (x, y):- (A(x) ∧ R(x, y))∨(A(x) ∧ R ′(x, y)) ∧

qUCQ2 (y):- R ′(z , y)

Cover C1 prevents unification ⇒ qJUCQ(x) is not a FOL

reformulation of q w.r.t. T .

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 54 / 73

Page 77: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based FOL reformulation

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C1 = {{A(x),R(x , y)}, {R ′(z , y)}}.The cover-based FOL reformulation of q w.r.t. C1 is:

qJUCQ(x):- qUCQ1 (x, y):- (A(x) ∧ R(x, y))∨(A(x) ∧ R ′(x, y)) ∧

qUCQ2 (y):- R ′(z , y)

Cover C1 prevents unification ⇒ qJUCQ(x) is not a FOL

reformulation of q w.r.t. T .

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 54 / 73

Page 78: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Cover-based FOL reformulation

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C1 = {{A(x),R(x , y)}, {R ′(z , y)}}.The cover-based FOL reformulation of q w.r.t. C1 is:

qJUCQ(x):- qUCQ1 (x, y):- (A(x) ∧ R(x, y))∨(A(x) ∧ R ′(x, y)) ∧

qUCQ2 (y):- R ′(z , y)

Cover C1 prevents unification ⇒ qJUCQ(x) is not a FOL

reformulation of q w.r.t. T .

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 54 / 73

Page 79: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Safe cover

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C2 = {{A(x)}, {R(x , y),R ′(z , y)}}

The cover-based FOL reformulation of q w.r.t. C1 is:

qJUCQ(x):- qUCQ1 (x):- (A(x)

qUCQ2 (x):- (R(x, y) ∧ R ′(z , y))∨ (R ′(x, y) ∧ R ′(z , y))∨ (R ′(x, y)) ∨ (B(x))

Cover C2 preserves unification ⇒ qJUCQ(x) is a FOL

reformulation of q w.r.t. T and C2 a safe cover.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 55 / 73

Page 80: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Safe cover

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C2 = {{A(x)}, {R(x , y),R ′(z , y)}}The cover-based FOL reformulation of q w.r.t. C1 is:

qJUCQ(x):- qUCQ1 (x):- (A(x)

qUCQ2 (x):- (R(x, y) ∧ R ′(z , y))∨ (R ′(x, y) ∧ R ′(z , y))∨ (R ′(x, y)) ∨ (B(x))

Cover C2 preserves unification ⇒ qJUCQ(x) is a FOL

reformulation of q w.r.t. T and C2 a safe cover.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 55 / 73

Page 81: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Safe cover

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C2 = {{A(x)}, {R(x , y),R ′(z , y)}}The cover-based FOL reformulation of q w.r.t. C1 is:

qJUCQ(x):- qUCQ1 (x):- (A(x)

qUCQ2 (x):- (R(x, y) ∧ R ′(z , y))∨ (R ′(x, y) ∧ R ′(z , y))∨ (R ′(x, y)) ∨ (B(x))

Cover C2 preserves unification ⇒ qJUCQ(x) is a FOL

reformulation of q w.r.t. T and C2 a safe cover.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 55 / 73

Page 82: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Root cover

We call root cover the minimal safe cover.

C2 = {{A(x)}, {R(x , y),R ′(z , y)}} is a root cover for qw.r.t. T .

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 56 / 73

Page 83: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Root cover

We call root cover the minimal safe cover.

C2 = {{A(x)}, {R(x , y),R ′(z , y)}} is a root cover for qw.r.t. T .

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 56 / 73

Page 84: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Extended cover and extended fragment queries

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C3 = {f1‖f1, f2‖f0}, with:

f0 = {A(x)}f1 = {R(x , y),R ′(z , y)}f2 = {A(x),R(x , y)}

The extended cover-based FOL reformulation of q w.r.t. C3 is:

qe(x):- qUCQ|f1‖f1(x):- ((R(x, y) ∧ R ′(z , y))

∨R ′(x, y) ∨ B(x))

qUCQf2‖f0(x):- ((A(x) ∧ R(x, y))

∨(A(x) ∧ R ′(x, y))∨(A(x) ∧ B(x)))

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 57 / 73

Page 85: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Extended cover and extended fragment queries

q(x):- A(x) ∧ R(x , y) ∧ R ′(z , y)K = 〈T = {B v ∃R ′,R ′ v R},A = {A(a),B(a)}〉Consider the cover C3 = {f1‖f1, f2‖f0}, with:

f0 = {A(x)}f1 = {R(x , y),R ′(z , y)}f2 = {A(x),R(x , y)}

The extended cover-based FOL reformulation of q w.r.t. C3 is:

qe(x):- qUCQ|f1‖f1(x):- ((R(x, y) ∧ R ′(z , y))

∨R ′(x, y) ∨ B(x))

qUCQf2‖f0(x):- ((A(x) ∧ R(x, y))

∨(A(x) ∧ R ′(x, y))∨(A(x) ∧ B(x)))

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 57 / 73

Page 86: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Search space

Covers

Eq(extended covers)

Lq(safe covers)

Croot

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 58 / 73

Page 87: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part VI

Performance

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 59 / 73

Page 88: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Performance potential of optimized reformulation

CQ-to-JUCQ query reformulation performance

Given the query

q1(x , y) :- x rdf:type y , (t1)x ub:degreeFrom “http : //www .University532.edu”, (t2)x ub:memberOf “http : //www .Department1.University7.edu” (t3)

and the LUBM 100M benchmark:

JUCQ #reformulations exec. time (ms)

(t1, t2, t3)ref 2, 256 6,387(t1)ref 1 (t2)ref 1 (t3)ref 195 1,074,026

(t1, t2)ref 1 (t3)ref 755 1, 968

(t1)ref 1 (t2, t3)ref 200 846, 710

(t1, t3)ref 1 (t2)ref 568 554(t1, t2)ref 1 (t1, t3)ref 1, 316 2, 734

(t1, t2)ref 1 (t2, t3)ref 764 2, 289

(t1, t3)ref 1 (t2, t3)ref 576 588

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 60 / 73

Page 89: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Datasets and RDBMS engines

DBLP (8 M) and LUBM (1 M and 100 M) millions triples

PostgreSQL 9.3.2

System A

System B

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 61 / 73

Page 90: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Reformulation algorithms

Basic CQ-to-UCQ algorithm

Picked that of [Goasdoue et al., EDBT’13] since it handles thelargest known fragment of RDF.

Comparison:

1 UCQ reformulation

2 SCQ reformulation

3 Greedy JUCQ reformulation

4 Exhaustive JUCQ reformulation

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 62 / 73

Page 91: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Query answering on LUBM 100 M using PostgreSQL

28 queries; 2 to 6 atoms; 1 to 318, 089 reformulations

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 63 / 73

Page 92: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Query answering on LUBM 100 M using System A

28 queries; 2 to 6 atoms; 1 to 318, 089 reformulations

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 64 / 73

Page 93: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Query answering on LUBM 100 M using System B

28 queries; 2 to 6 atoms; 1 to 318, 089 reformulations

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 65 / 73

Page 94: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Optimized reformulation: take-home message

1 Equivalent SQL syntaxes are not equal from the RDBMSoptimizer perspective (inside or outside well-supporteddialect)

2 Chosing the reformulation with the help of textbook costmodel formulas makes queries feasible or efficient when theywere not

3 This amounts to enlarging the optimizer’s “can-do” dialect, ata very modest performance overhead.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 66 / 73

Page 95: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Experiments on DL-LiteR reformulation

1 CQ-to-UCQ reformulation

2 Croot reformulation

3 Our cover-based greedy reformulation

4 Our cover-based exhaustive reformulation (as a yardstick forthe quality of the solution our greedy finds)

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 67 / 73

Page 96: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Query answering on LUBM∃20 15 millions facts datasetusing PostgreSQL

13 queries; 2 to 10 atoms, 5.77 on average; 35 to 667reformulations, 290.2 on average.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 68 / 73

Page 97: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Performance Optimized reformulation experiments on three RDBMSs

Algorithms running time on LUBM∃20 15 millions factsdataset

13 queries; 2 to 10 atoms, 5.77 on average; 35 to 667reformulations, 290.2 on average.

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 69 / 73

Page 98: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Part VII

Conclusion and perspectives

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 70 / 73

Page 99: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

Where we stand

Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”

We considered semantic query answering through FOL reduction(i.e., SQL).Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!Vast performance differences between different reformulations ofthe same query; cost-based approach

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73

Page 100: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

Where we stand

Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”We considered semantic query answering through FOL reduction(i.e., SQL).

Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!Vast performance differences between different reformulations ofthe same query; cost-based approach

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73

Page 101: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

Where we stand

Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”We considered semantic query answering through FOL reduction(i.e., SQL).Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!

Vast performance differences between different reformulations ofthe same query; cost-based approach

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73

Page 102: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

Where we stand

Constraints (a.k.a. semantics) are crucial for applications, so thepush is continuous for chosing “the right constraint language”We considered semantic query answering through FOL reduction(i.e., SQL).Not any SQL query resulting from reformulation is handled well bycurrent RDBMSs!Vast performance differences between different reformulations ofthe same query; cost-based approach

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 71 / 73

Page 103: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

What is ahead

Continuous push on the expressivity - efficiency frontier;updates...

RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints

RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.

“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.

Most attractive languages currently: DL-Lite, Datalog±

New relevance of query optimization literature and research!

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73

Page 104: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

What is ahead

Continuous push on the expressivity - efficiency frontier;updates...

RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints

RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.

“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.

Most attractive languages currently: DL-Lite, Datalog±

New relevance of query optimization literature and research!

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73

Page 105: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

What is ahead

Continuous push on the expressivity - efficiency frontier;updates...

RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints

RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.

“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.

Most attractive languages currently: DL-Lite, Datalog±

New relevance of query optimization literature and research!

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73

Page 106: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

What is ahead

Continuous push on the expressivity - efficiency frontier;updates...

RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints

RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.

“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.

Most attractive languages currently: DL-Lite, Datalog±

New relevance of query optimization literature and research!

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73

Page 107: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

What is ahead

Continuous push on the expressivity - efficiency frontier;updates...

RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints

RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.

“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.

Most attractive languages currently: DL-Lite, Datalog±

New relevance of query optimization literature and research!

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73

Page 108: Database Optimization Techniques for Semantic Queriessebd2015.dia.uniroma3.it/pdf/manolescu.pdf · I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 16 / 73 RDF data model

Open issues and perspectives

What is ahead

Continuous push on the expressivity - efficiency frontier;updates...

RDBMSs are highly efficient for some forms of queries (typicallyconjunctive queries of medium size), not for all FOL reductions ofqueries under constraints

RDBMSs have convenient features (indexing, join order,transactions) that make them attractive back-ends for queryanswering.

“Novel” platforms (MapReduce, NoSQL...) will raise the samequery evaluation performance problems anyway.

Most attractive languages currently: DL-Lite, Datalog±

New relevance of query optimization literature and research!

I. Manolescu Semantic Query Optimization SEBD 2015, Gaeta 72 / 73