Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable...

113
Scalable Multi-Query Optimization for SPARQL Wangchao Le 1 Anastasios Kementsietsidis 2 Songyun Duan 2 Feifei Li 1 1 University of Utah 2 IBM Research April 2, 2012

Transcript of Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable...

Page 1: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Scalable Multi-Query Optimization for SPARQL

Wangchao Le1 Anastasios Kementsietsidis2 Songyun Duan2 Feifei Li1

1University of Utah 2IBM Research

April 2, 2012

Page 2: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Outline

1 Introduction

2 Preliminary

3 Our approach

4 Experiments

5 Conclusions

2 / 113

Page 3: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Outline

1 Introduction

2 Preliminary

3 Our approach

4 Experiments

5 Conclusions

3 / 113

Page 4: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

4 / 113

Page 5: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etc

<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>

<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>

</rdf:Description></rdf:RDF>

Internally ...

A large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

5 / 113

Page 6: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etc

<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>

<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>

</rdf:Description></rdf:RDF>

<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .

Triple format:

Internally ...

subject predicate object

A large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

6 / 113

Page 7: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etc

<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>

<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>

</rdf:Description></rdf:RDF>

<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .

Triple format:

Internally ...

subject predicate object

A large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

7 / 113

Page 8: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etc

<rdf:RDFxmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#xmlns:dcterms=”http://purl.org/dc/terms/”>

<rdf:Description rdf:about=”urn:x-states:New York”><dcterms:alternative>NY</dcterms:alternative>

</rdf:Description></rdf:RDF>

<http://. . ./New York> <http://purl.org/dc/terms/alternative> ”NY” .

Triple format:

Internally ...

Query language: SPARQLsubject predicate object

A large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

8 / 113

Page 9: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

9 / 113

Page 10: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

We are inundated with a large collection of RDF (ResourceDescription Framework) data.

DBpedia, Uniprot, Freebase etcA large graph and encode rich semantics

Available engines to manage RDF data?

RDBMS: Migrate RDF, e.g., Sesame, JenaSDB etc.Generic RDF stores: e.g., RDF3X, JenaTDB etc.

[VP07] D.J. Abadi, et al. Scalable semantic web data management using vertical partitioning. In VLDB, 2007.

[HEX08] C. Weiss, et al. Hexastore: sextuple indexing for semantic web data management. In VLDB, 2008.

[RDF3X] T. Neumann, G. Weikum. RDF-3X: a RISC-style engine for RDF. In VLDB, 2008.

[SJP09] T. Neumann, G. Weikum. Scalable Join Processing on Very Large RDF Graphs. In SIGMOD, 2009.

[BM10] M. Atre, et al. Matrix ”Bit” loaded: A Scalable Lightweight Join Query Processor for RDF Data. In WWW, 2010.

[SSQ11] J. Huang, et al. Scalable SPARQL Querying of Large RDF Graphs. In VLDB, 2011.

10 / 113

Page 11: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

Q3

RDF store

Q1Q2

Qn−1QnSPARQL queries

11 / 113

Page 12: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

Q3

RDF store

Q1Q2

Qn−1Qn

• Observation: queries share common parts• Multi-query optimization

SPARQL queries

12 / 113

Page 13: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]

SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.

[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.

[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.

[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.

[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.

[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.

For SPARQL and RDF, new issues arise in practice.

Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution

13 / 113

Page 14: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]

SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.

[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.

[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.

[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.

[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.

[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.

For SPARQL and RDF, new issues arise in practice.

Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution

14 / 113

Page 15: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]

SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.

[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.

[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.

[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.

[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.

[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.

For SPARQL and RDF, new issues arise in practice.

Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joins

Store dependent solution

15 / 113

Page 16: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Introduction

A tempting choice: turn to MQO in relational databases[MQO88][MQO90][MQO00]

SPARQL↔relational algebra [EPS08][FSR07].Exist quite a few relational solutions for RDF store.

[MQO90] T. Sellis, et al. On the Multiple-Query Optimization Problem. In TKDE, 1990.

[MQO88] T. Sellis, et al. Multiple-query optimization. In TODS, 1988.

[MQO00] P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD, 2000.

[EPS08] R. Angles, et al. The Expressive Power of SPARQL. In ISWC, 2008.

[FSR07] A. Polleres, et al. From SPARQL to rules (and back). In WWW, 2007.

For SPARQL and RDF, new issues arise in practice.

Convert SPARQL to SQL: not all engines use RDBMSConversion to SQL → a large number of joinsStore dependent solution

16 / 113

Page 17: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Outline

1 Introduction

2 Preliminary

3 Our approach

4 Experiments

5 Conclusions

17 / 113

Page 18: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

18 / 113

Page 19: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

19 / 113

Page 20: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”

(a) triple table D

SELECT ?nameWHERE { ?x name ?name, ?x zip 10001,

}

(b) Example query QOPT

name”Alice””Bob””Ella”

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

20 / 113

Page 21: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”

(a) triple table D

SELECT ?name , ?mail , ?hpageWHERE { ?x name ?name, ?x zip 10001,

OPTIONAL {?x mbox ?mail }OPTIONAL {?x www ?hpage }}

(b) Example query QOPT

name”Alice””Bob””Ella”

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

21 / 113

Page 22: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

subj pred objp1 name ”Alice”p1 zip 10001p1 mbox alice@homep1 mbox alice@workp1 www http://home/alicep2 name ”Bob”p2 zip 10001p3 name ”Ella”p3 zip 10001p3 www http://work/ellap4 name ”Tim”p4 zip ”11234”

(a) triple table D

SELECT ?name , ?mail , ?hpageWHERE { ?x name ?name, ?x zip 10001,

OPTIONAL {?x mbox ?mail }OPTIONAL {?x www ?hpage }}

(b) Example query QOPT

name mail hpage”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella

(c) Output QOPT(D)

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

22 / 113

Page 23: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

23 / 113

Page 24: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

Problem statement.

Input: a set Q of Type 1 queries and a data graph G

Output: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

24 / 113

Page 25: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queries

Requirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

25 / 113

Page 26: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Preliminary

We foucs on two types of queries

Type 1: Q := SELECT RD WHERE GP

Type 2: QOPT := SELECT RD WHERE GP (OPTIONAL GPOPT)+

Problem statement.

Input: a set Q of Type 1 queries and a data graph GOutput: a set of rewritten queries, QOPT of Type 1 and Type 2 queriesRequirements:

soundness and completeness: QOPT(G)≡Q (G).

cost:Tr (Q)+Te (Qopt)

Te (Q)≤ 1

26 / 113

Page 27: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

1 Introduction

2 Preliminary

3 Our approach

4 Experiments

5 Conclusions

27 / 113

Page 28: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

28 / 113

Page 29: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

: constant : variable

Q1: SELECT * WHERE{?x1 P1 ?z1 . ?y1 P2 ?z1 . ?y1 P3 ?w1 . ?w1 P4 v1 . }

29 / 113

Page 30: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

30 / 113

Page 31: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,

}

?z?x

?y

P1

P2

(I) Structure only QOPT

31 / 113

Page 32: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,

OPTIONAL {?y P3 ?w , ?w P4 v1 }

}

?z?x

?y

P1

P2

P3

v1 ?wP4

(I) Structure only QOPT

32 / 113

Page 33: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,

OPTIONAL {?y P3 ?w , ?w P4 v1 }OPTIONAL {?t P3 ?x , ?t P5 v1, ?w P4 v1 }

}

?z?x

?y

P1

P2

?t

P3

P5 P3

v1 ?wP4

(I) Structure only QOPT

33 / 113

Page 34: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

SELECT *WHERE { ?x P1 ?z , ?y P2 ?z ,

OPTIONAL {?y P3 ?w , ?w P4 v1 }OPTIONAL {?t P3 ?x , ?t P5 v1, ?w P4 v1 }

}

?z?x

?y

P1

P2

?t

P3

P5 P3

v1 ?wP4

Evaluated once→potential saving

(I) Structure only QOPT

OPTIONALs are evaluated on top of the common substructures

(intermediate results cached by engine).

34 / 113

Page 35: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

pattern p α(p)

?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%

*Max common subquery is not selective

(II) Using cost in optimization

35 / 113

Page 36: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

pattern p α(p)

?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%

*Max common subquery is not selective

(II) Using cost in optimization

36 / 113

Page 37: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Motivating example

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2

P4?w2

?t2

SELECT *WHERE { ?w P4 v1,

OPTIONAL {?x1 P1 ?z1, ?y1 P2 ?z1, ?y1 P3 ?w }OPTIONAL {?x2 P1 ?z2, ?y2 P2 ?z2, ?t2 P3 ?x2, ?t2 P5 v1 }

}

(II) Using cost in optimization

37 / 113

Page 38: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

38 / 113

Page 39: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn} They often do not share one common subquery

39 / 113

Page 40: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together

40 / 113

Page 41: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

41 / 113

Page 42: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

Rewriting Rewriting Rewriting

42 / 113

Page 43: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries(hierarchically)→ a set of type 2 queries

43 / 113

Page 44: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries

• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery

(hierarchically)→ a set of type 2 queries

44 / 113

Page 45: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries

• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery

?z?x

?y

P1

P2

?t

P3

P5 P3

v1 ?wP4

pattern p α(p)

?x P1 ?z 30%?y P2 ?z 20%?y P3 ?w 18%?w P4 v1 1%?t P5 v1 2%

(hierarchically)→ a set of type 2 queries

45 / 113

Page 46: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries

• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery

Execution

(hierarchically)→ a set of type 2 queries

46 / 113

Page 47: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Q={q1, q2, . . . , qn}

Paritition input queries

Group 1 Group 2 Group k

• Similar queries can be optimized together• finding structure similarity is expensive• group by predicates• distance: Jaccard similarity of predicate sets

Rewriting Rewriting Rewriting • Recursively rewrite a subset of type 1 queries

• finding common edge subgraphs• optimizations to avoid bad efficiency• cost: guard against bad rewritings• approx. by the min selectivity in common subquery

Execution

Result distribution

r(q1) r(q2) r(qn)

(hierarchically)→ a set of type 2 queries

47 / 113

Page 48: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queriesRD of a Type 1 query: e.g., X and Z

↑↓columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesOPTIONAL queries

48 / 113

Page 49: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queries

X Y Z

ab cd ef g

RD of a Type 1 query: e.g., X and Z↑↓

columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesOPTIONAL queries

49 / 113

Page 50: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queries

X Y Z

ab cd ef g

RD of a Type 1 query: e.g., X and Z↑↓

columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesOPTIONAL queries

50 / 113

Page 51: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queries

X Y Z

ab cd ef g

RD of a Type 1 query: e.g., X and Z↑↓

columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesOPTIONAL queries

51 / 113

Page 52: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Implementation highlights

C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory

Dataset

Extend LUBM benchmark generator:randomness in structure, variances of sel.

RDF stores: Jena TDB 0.85 etc

Queries

Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO

52 / 113

Page 53: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Implementation highlights

C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory

Dataset

Extend LUBM benchmark generator:randomness in structure, variances of sel.

1 5 10 15 20 25 30 35 40 45 50

10−1

100

101

Se

lectivity (

%)

Predicate ID

Selective Non−selective

RDF stores: Jena TDB 0.85 etc

Queries

Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO

53 / 113

Page 54: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Implementation highlights

C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory

Dataset

Extend LUBM benchmark generator:randomness in structure, variances of sel.

RDF stores: Jena TDB 0.85 etc

Queries

Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO

54 / 113

Page 55: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Implementation highlights

C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory

Dataset

Extend LUBM benchmark generator:randomness in structure, variances of sel.

RDF stores: Jena TDB 0.85 etc

QueriesEnsure randomness in structure, e.g., star, chain and circle

Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO

55 / 113

Page 56: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Implementation highlights

C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory

Dataset

Extend LUBM benchmark generator:randomness in structure, variances of sel.

RDF stores: Jena TDB 0.85 etc

QueriesParameter Symbol Default RangeDataset size D 4M 3M to 9MNumber of queries |Q| 100 60 to 160Size of query (num of triple patterns) |Q| 6 5 to 9Number of seed queries κ 6 5 to 10Size of seed queries |qcmn| ∼ |Q|/2 1 to 5Max selectivity of patterns in Q αmax (Q) random 0.1% to 4%Min selectivity of patterns in Q αmin(Q) 1% 0.1% to 4%

Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO

56 / 113

Page 57: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Implementation highlights

C++64-bit Linux, 2GHz Xeon(R) CPU, 4GB memory

Dataset

Extend LUBM benchmark generator:randomness in structure, variances of sel.

RDF stores: Jena TDB 0.85 etc

QueriesParameter Symbol Default RangeDataset size D 4M 3M to 9MNumber of queries |Q| 100 60 to 160Size of query (num of triple patterns) |Q| 6 5 to 9Number of seed queries κ 6 5 to 10Size of seed queries |qcmn| ∼ |Q|/2 1 to 5Max selectivity of patterns in Q αmax (Q) random 0.1% to 4%Min selectivity of patterns in Q αmin(Q) 1% 0.1% to 4%

Rewriting w/ structure: MQO-S; rewriting w/ structure and cost: MQO

57 / 113

Page 58: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Time on rewritingMQO-S-C: structure based rewriting

MQO-C: rewriting integrating with cost

58 / 113

Page 59: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Time on rewritingMQO-S-C: structure based rewriting

MQO-C: rewriting integrating with cost

60 80 100 120 140 1600

0.25

0.5

0.75

1

1.25

1.5

Tim

e (s

econ

ds)

MQO−S−C MQO−C

|Q|

*Costly/bad rewritings are rejected→more rounds of comparisons.

59 / 113

Page 60: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Time on distributing resultsMQO-S-P: parsing results from MQO-S

MQO-P: parsing results with MQO

60 / 113

Page 61: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Time on distributing resultsMQO-S-P: parsing results from MQO-S

MQO-P: parsing results with MQO

60 80 100 120 140 1600

0.25

0.5

0.75

1

1.25

1.5

Tim

e (s

econ

ds)

MQO−S−P MQO−P

|Q|

*Non-selective common subqueriesincrease the set of results.

61 / 113

Page 62: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Time on distributing resultsMQO-S-P: parsing results from MQO-S

MQO-P: parsing results with MQO

60 80 100 120 140 1600

0.25

0.5

0.75

1

1.25

1.5

Tim

e (s

econ

ds)

MQO−S−P MQO−P

|Q|

*Non-selective common subqueriesincrease the set of results.

*Both rewriting and parsingare efficiently doable

62 / 113

Page 63: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying num of queries in a batchNo-MQO: no optimization

MQO-S: optimization based on structural rewriting

MQO: integrating cost

63 / 113

Page 64: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying num of queries in a batchNo-MQO: no optimization

MQO-S: optimization based on structural rewriting

MQO: integrating cost

60 80 100 120 140 1600

50

100

150

Num

ber

of q

uerie

s

No−MQO MQO−S MQO

|Q|

*Both reduce the num ofqueries to be exectued

64 / 113

Page 65: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying num of queries in a batchNo-MQO: no optimization

MQO-S: optimization based on structural rewriting

MQO: integrating cost

60 80 100 120 140 1600

100

200

300

400

Tim

e (s

econ

ds)

No−MQO MQO−S MQO

|Q|65 / 113

Page 66: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying min. selectivity in seed queriesNo-MQO: no optimization

MQO-S: optimization based on structural rewriting

MQO: integrating cost

66 / 113

Page 67: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying min. selectivity in seed queriesNo-MQO: no optimization

MQO-S: optimization based on structural rewriting

MQO: integrating cost

0.1 0.5 1 2 40

25

50

75

100

Num

ber

of q

uerie

s

No−MQO MQO−S MQO

αmin(qcmn) (%)

*MQO: reject more bad rewritings;MQO-S: not sensitive

67 / 113

Page 68: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying min. selectivity in seed queriesNo-MQO: no optimization

MQO-S: optimization based on structural rewriting

MQO: integrating cost

0.1 0.5 1 2 40

100

200

300

Tim

e (s

econ

ds)

No−MQO MQO−S MQO

αmin(qcmn) (%)68 / 113

Page 69: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying seed sizeMQO-S: optimization based on structural rewriting

MQO: integrating cost

percentage=Te (common subquery)Te (Qopt )

× 100%

69 / 113

Page 70: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Experiments

Varying seed sizeMQO-S: optimization based on structural rewriting

MQO: integrating cost

percentage=Te (common subquery)Te (Qopt )

× 100%

1 2 3 4 5

60

80

100

Per

cent

age

(%)

MQO−S MQO

|qcmn|

No−OPTIONAL− − −

*MQO-S: up to 25% time on optional

70 / 113

Page 71: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Conclusions

In dealing RDF data on the Web, store independency is important

Combining SPARQL language and graph algorithms can achieveMQO, i.e., by rewriting queries

Cost must be taken in consideration during rewriting

71 / 113

Page 72: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

The End

Thank You

Q and A

72 / 113

Page 73: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

Represent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

73 / 113

Page 74: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

Represent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

74 / 113

Page 75: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

Represent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

75 / 113

Page 76: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

p1

p2 p3

p1

p3p2 p4

p1

p2 p3

p7

p8p9

p7

p7

p8

p9 p9

p8

p7p10

p2

p1

p1p3

p4

p4

p2p3

p3

p2p2 p1

p1p3

p4 p1

p2

p4

p2

p1p3

p7

p9p8

p9

p8p6

p9

p7p7

q1 q2 q3 q4 q5 q6

q12q11q10q9q8q7

p4

Represent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

76 / 113

Page 77: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

p1

p2 p3

p1

p3p2 p4

p1

p2 p3

p7

p8p9

p7

p7

p8

p9 p9

p8

p7p10

p2

p1

p1p3

p4

p4

p2p3

p3

p2p2 p1

p1p3

p4 p1

p2

p4

p2

p1p3

p7

p9p8

p9

p8p6

p9

p7p7

q1 q2 q3 q4 q5 q6

q12q11q10q9q8q7

p4

Distance: Jaccard similarity on predicates

Represent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

77 / 113

Page 78: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

p1

p2 p3

p1

p3p2 p4

p1

p2 p3

p7

p8p9

p2

p1

p1p3

p4

p4

p2p3

p3

p2p2 p1

p1p3

p4 p1

p2

p4

p2

p1p4

p5p6

p9

p7p7

{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}

{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}

q1 q2 q3 q4 q5 q6

q12q11q10q9q8q7

p4

p7

p7

p8

p9 p9

p8

p7p10

p7

p9p8

p9

Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

78 / 113

Page 79: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

p1

p2 p3

p1

p3p2 p4

p1

p2 p3

p7

p8p9

p2

p1

p1p3

p4

p4

p2p3

p3

p2p2 p1

p1p3

p4 p1

p2

p4

p2

p1p4

p5p6

p9

p7p7

{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}

{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}

q1 q2 q3 q4 q5 q6

q12q11q10q9q8q7

p4

p7

p7

p8

p9 p9

p8

p7p10

p7

p9p8

p9

Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

79 / 113

Page 80: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Partition input queries

Object: similar queries can be optimized together in rewriting

p1

p2 p3

p1

p3p2 p4

p1

p2 p3

p7

p8p9

p2

p1

p1p3

p4

p4

p2p3

p3

p2p2 p1

p1p3

p4 p1

p2

p4

p2

p1p4

p5p6

p9

p7p7

{p1,p2,p3} {p1,p2,p3} {p1,p2,p3} {p7,p8,p9} {p7,p8,p9} {p7,p8,p9,p10}

{p1,p2,p3,p4} {p2,p3,p4} {p1,p2,p3,p4} {p1,p2,p4} {p7,p8,p9} {p5,p6,p7,p9}

q1 q2 q3 q4 q5 q6

q12q11q10q9q8q7

p4

p7

p7

p8

p9 p9

p8

p7p10

p7

p9p8

p9

Distance: Jaccard similarity on predicatesRepresent each query as a set of predicates.

Measure the similarity of a pair of queries by set similarity

Grouping: k-means

80 / 113

Page 81: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewriting → finding maximal common triple patterns

In the language of graph . . .

[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

81 / 113

Page 82: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

qab qcd

qabcd

qa qb qc qd . . . . . . . . . . . . qr qs

Rewriting → finding maximal common triple patterns

In the language of graph . . .

[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

82 / 113

Page 83: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcd

Rewriting → finding maximal common triple patterns

In the language of graph . . .

[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

83 / 113

Page 84: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcd

Rewriting → finding maximal common triple patterns

In the language of graph . . .

[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

84 / 113

Page 85: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcdrewrite

Rewriting → finding maximal common triple patterns

In the language of graph . . .

[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

85 / 113

Page 86: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcdrewrite

Rewriting → finding maximal common triple patterns

In the language of graph . . .

[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

86 / 113

Page 87: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcdrewrite

Rewriting → finding maximal common triple patterns

In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

87 / 113

Page 88: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcdrewrite

Rewriting → finding maximal common triple patterns

In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs

88 / 113

Page 89: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcdrewrite

Rewriting → finding maximal common triple patterns

In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs→ maximal common connected induced sugraphs in linegraphs

89 / 113

Page 90: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Hierarchical rewriting and clustering (inside a group)

Rewrite pairs of queries bottom up

Pair up queries with max Jaccard similarity

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcdrewrite

Rewriting → finding maximal common triple patterns

In the language of graph . . .[CLQ01] I. Koch. Enumerating all connected maximal common subgraphs in two graphs. In Theoretical Computer Science, 2001.

maximal common connected edge subgraphs→ maximal common connected induced sugraphs in linegraphs→ maximal cliques in the product graph

90 / 113

Page 91: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

91 / 113

Page 92: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

92 / 113

Page 93: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

93 / 113

Page 94: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

94 / 113

Page 95: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

95 / 113

Page 96: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

graph query pattern 1 graph query pattern 2

The triangle (clique) highlights the common subgraph composed by

96 / 113

Page 97: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

• blowup in size, esp. > 2 queries

affect clique detection

graph query pattern 1 graph query pattern 2

The triangle (clique) highlights the common subgraph composed by

97 / 113

Page 98: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

• blowup in size, esp. > 2 queries

affect clique detection

• optimize the product graph

graph query pattern 1 graph query pattern 2

The triangle (clique) highlights the common subgraph composed by

98 / 113

Page 99: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

• blowup in size, esp. > 2 queries

affect clique detection

• optimize the product graph

• prune non-common predicates• check the constants

99 / 113

Page 100: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

• blowup in size, esp. > 2 queries

affect clique detection

• optimize the product graph

• prune non-common predicates• check the constants• prune vertices with non-common

neighborhoods

100 / 113

Page 101: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

?z4?x4

?y4

P1

P2

v1

?z3?x3

?y3

P1

P2P3

P5

v1

?z2?x2

?y2

P1

P2P3

P5

?w1v1

?z1?x1

?y1

P1

P2

P4

P3

(a) Query Q1 (b) Query Q2 (c) Query Q3 (d) Query Q4

P4 P4?w2

?t2

?w3

?u4

P4?w4

P3

P6

v1

• Linegraph: invert vertices and edges

• sub−sub:`0, sub−obj:`1, obj−sub:`2, obj−obj:`3

P2

P1

P3

P4

`3

`3

`0

`0

`1

`2

P5

P1 P3

`3

`3`3

`0

P2

`3

`1`2

`1

`2

(a) L(Q1) (b) L(Q2) (c) L(Q3) (d) L(Q4)

`0

P2P1

P3 P5

P4

`3

`3

`2 `1`0

`0

`3`3

P4

`2 `1

P1P2

P3 P6

P4

`3

`3

`0 `0`3

`3

`0`0

• product graph: simultaneous walk

• blowup in size, esp. > 2 queries

affect clique detection

• optimize the product graph

• prune non-common predicates• check the constants• prune vertices with non-common

neighborhoods

`3

`3

P2P1

L(GPp):

S: P3 P4

101 / 113

Page 102: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

102 / 113

Page 103: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

103 / 113

Page 104: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

104 / 113

Page 105: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritings

Measure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

105 / 113

Page 106: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximation

Cost: discard bad rewritings, keep good ones in hierarchical rewriting

106 / 113

Page 107: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

107 / 113

Page 108: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

108 / 113

Page 109: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Find maximal cliques in the product graph [CLQ02][CLQ03][CLQ02] Patric R.J. Ostergard. A fast algorithm for the maximum clique problem. In Discrete Applied Mathematics, 2002.

[CLQ03] E. Tomita et al. An efficient branch-and-bound algorithm for finding a maximum clique. In Discrete Mathematics andTheoretical Computer Science, LNCS, 2003.

Integrate cost into rewriting

Structure: maximize size of the common subquery in a rewritingEvaluation on cost: guard against bad rewritingsMeasure: min selectivity in the common subquery for approximationCost: discard bad rewritings, keep good ones in hierarchical rewriting

qa qb qc qd . . . . . . . . . . . . qr qs

qab qcd

qabcd

Stop when the selectivitydrops

109 / 113

Page 110: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queriesRD of a Type 1 query

↑↓columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesnested OPTIONALs

110 / 113

Page 111: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queries

name mail hpage

”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella

RD of a Type 1 query

↑↓columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesnested OPTIONALs

111 / 113

Page 112: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queries

name mail hpage

”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella

RD of a Type 1 query

↑↓columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesnested OPTIONALs

112 / 113

Page 113: Scalable Multi-Query Optimization for SPARQL - cs.utah.edulifeifei/papers/mqoslides.pdf · Scalable Multi-Query Optimization for SPARQL Wangchao Le1 Anastasios Kementsietsidis2 Songyun

Our approach

Related issues

Distributing results, i.e., Type 2 query→Type 1 queries

name mail hpage

”Alice” alice@home http://home/alice”Alice” alice@work http://home/alice”Bob””Ella” http://work/ella

RD of a Type 1 query

↑↓columns from results of the Type 2 rewriting

Soundness and completeness

Extensibility of the solution: more general queries

handle variable predicatesnested OPTIONALs

113 / 113