Approximate and Incremental Processing of Complex Queries against the Web of Data

31
KIT University of the State of Baden-Württemberg and National Large-scale Research Center of the Helmholtz Association Institute of Applied Informatics and Formal Description Methods (AIFB) www.kit.edu Approximate and Incremental Processing of Complex Queries against the Web of Data Thanh Tran, Günter Ladwig, Andreas Wagner DEXA 2011

Transcript of Approximate and Incremental Processing of Complex Queries against the Web of Data

Page 1: Approximate and Incremental Processing of Complex Queries against the Web of Data

KIT – University of the State of Baden-Württemberg and

National Large-scale Research Center of the Helmholtz Association

Institute of Applied Informatics and Formal Description Methods (AIFB)

www.kit.edu

Approximate and Incremental Processing of Complex Queries against the Web of Data

Thanh Tran, Günter Ladwig, Andreas Wagner

DEXA 2011

Page 2: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)2 August 31st, 2011

Contents

Introduction OverviewApproximate & Incremental

Processing

Entity SearchApproximate

Structure Matching

Structure-based Result

Refinement and Computation

Evaluation Conclusion

DEXA 2011, Toulouse, France

Page 3: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)3 August 31st, 2011

INTRODUCTION

DEXA 2011, Toulouse, France

Page 4: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)4 August 31st, 2011

Introduction – Data Model

Resource Description Framework (RDF)

DEXA 2011, Toulouse, France

p2 p1

super-

vises

p4 p3super-

vises

knows

i2

u2

a2

c2

conference partOf

i1 u1

a1 c1conference

partOf

p5

authorOf

worksAt

worksAt

worksAt

authorOf

authorOf

nameP5

nameP2

name

U1

Page 5: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)5 August 31st, 2011

Introduction – Query Model

Basic Graph Patterns

Conjunctive queries over RDF data: graph pattern matching

DEXA 2011, Toulouse, France

x y v

z u

w

KIT

ICDE

AIFB

29

worksAt

author conf

supervise

partOf

age

name

name

name

Page 6: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)6 August 31st, 2011

Contribution

Techniques for matching (basic) query patterns against graph-

structured data have limits

We might wish to trade completeness and exactness for

responsiveness

DEXA 2011, Toulouse, France

Our approach allows an “affordable” computation of an initial set

of approximate results, which can be incrementally refined as

needed.

Page 7: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)7 August 31st, 2011

Contribution – Pipeline Overview

Pipeline of operations where approximate results are refined

incrementally

DEXA 2011, Toulouse, France

Entity SearchApproximate

StructureMatching

Structure-based ResultRefinement

Structure-based AnswerComputation

Entity &

Neighborhood

Index

Structure

IndexRelation Index

Intermediate,

Approximate Results

Page 8: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)8 August 31st, 2011

ENTITY SEARCH

DEXA 2011, Toulouse, France

Entity SearchApproximate

StructureMatching

Structure-based ResultRefinement

Structure-based AnswerComputation

Page 9: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)9 August 31st, 2011

Entity Search

Entity index

Stores attribute edges of the data graph

Enables lookup of entities by attribute and value

Entity search

Obtains candidate bindings for all variables in the query that have

attribute edges

Does not consider structure (i.e., relations between entities)

Query decomposition and transformation

Decompose query into entity queries to create a transformed

query

DEXA 2011, Toulouse, France

Page 10: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)10 August 31st, 2011

Query Decomposition & Transformation

DEXA 2011, Toulouse, France

Identify entity queries

Breadth-first search starting from random variable

x y v

z u

w

KIT

ICDE

AIFB

29

worksAt

author conf

supervise

partOf

age

name

name

name

Page 11: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)11 August 31st, 2011

Query Decomposition & Transformation

DEXA 2011, Toulouse, France

x y v

z u

w

KIT

ICDE

AIFB

29

worksAt

author conf

supervise

partOf

age

name

name

name

yw

worksAt

author conf

supervise

partOf

xage 29

zname AIFB

uname KIT

vname ICDE

Collapse entity queries

Page 12: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)12 August 31st, 2011

Entity Search Results

Use entity index to obtain bindings for all entity queries in

transformed query

Entity queries are necessary conditions,

but not sufficient

Final results will be a subset

DEXA 2011, Toulouse, France

x z u v

p1 i1 u1 c1

p3 i1 u1 c1

p5 i1 u1 c1

p6 i1 u1 c1

yw

worksAt

author conf

supervise

partOf

xage 29

zname AIFB

uname KIT

vname ICDE

Page 13: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)13 August 31st, 2011

APPROXIMATE STRUCTURE

MATCHING

DEXA 2011, Toulouse, France

Entity SearchApproximate

StructureMatching

Structure-based ResultRefinement

Structure-based AnswerComputation

Page 14: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)14 August 31st, 2011

Approximate Structure Matching

Only entity parts of the query have been matched

Relation edges have yet to be processed

Instead of performing exact equijoins we propose to perform a

neighborhood join

Neighborhood join allows us to check whether two entities are

connected via relation edges (but not which ones)

Again: necessary, but not sufficient

DEXA 2011, Toulouse, France

The k-neighborhood of an entity e is the set of entities in the data graph

that can be reached from e via a path of relation edges of length k or less.

A neighborhood join between two sets of entities E1, E2 is an equijoin

between all pairs e1 ∈ E1, e2 ∈ E2 where e1 and e2 are considered

equivalent if the intersection of their k-neighborhood is non-empty.

Page 15: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)15 August 31st, 2011

Neighborhood Join via Bloom Filters

We store the set of k-neighborhood entities as a bloom filter

Bloom filter

Space-efficient, probabilistic data structure for set membership test

False positives are possible (false negatives are not)

We refine the results of the previous step

To perform a neighborhood join between bindings E1, E2

Load bloom filters for one set of entities, say E1

In a nested loop manner, check if entities in E2 are contained in the

bloom filter

DEXA 2011, Toulouse, France

Page 16: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)16 August 31st, 2011

Neighborhood Join via Bloom Filters

Load bloom filters for entities bound to x

Check whether entities bound to w,y, z are in the neighborhood

of x

When k=2, bloom filters for x also cover u and vDEXA 2011, Toulouse, France

x y v

z u

w

KIT

ICDE

AIFB

29

worksAt

author conf

supervise

partOf

age

namename

name

k=1

k=2

Page 17: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)17 August 31st, 2011

STRUCTURE-BASED RESULT

REFINEMENT

DEXA 2011, Toulouse, France

Entity SearchApproximate

Structure Matching

Structure-based Result Refinement

Structure-based Answer Computation

Page 18: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)18 August 31st, 2011

Structure-based Result Refinement

From ASM we know that entities in intermediate results are

connected

With structure-based result refinement we find out whether they

are connected via paths captured by query atoms

Query is matched against a structure index graph

Bisimulation-based summary of data graph that captures structural

information

Nodes in the data graph with the same “structure” are grouped

together

Much smaller than the data graph

DEXA 2011, Toulouse, France

Necessary, but not sufficient.

Page 19: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)19 August 31st, 2011

Structure Index

DEXA 2011, Toulouse, France

p2 p1

super-

vises

p4 p3super-

vises

knows

i2

u2

a2

c2

conference partOf

i1 u1

a1 c1conference

partOf

p5

E6

p5

E3

i1,i2

E5

u1, u2

E2

p1,p3

E4

a1,a2

E6

c1,c2

authorOf

worksAt

worksAt

worksAt

worksAt

authorOf

authorOf

authorOf

partOf

conference

E1

p2,p4 super-

vises

worksAtauthorOf

knows

Structure Index Graph G~

Data graph G

Bisimulation

Page 20: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)20 August 31st, 2011

Structure-based Result Refinement

We take advantage of this property:

Match the query against the structure index graph to obtain sets

of extensions that contain potential query answers

Bindings computed in previous ES/ASM steps can only be

answers if they are contained in the matched extensions

DEXA 2011, Toulouse, France

Whenever there is a match of a query graph q on G the query also

matches on G~. Moreover, extensions of the index graph

matches will contain all data graph matches, i.e. the bindings to

query variables.

Page 21: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)21 August 31st, 2011

STRUCTURE-BASED ANSWER

COMPUTATION

DEXA 2011, Toulouse, France

Entity SearchApproximate

StructureMatching

Structure-based ResultRefinement

Structure-based AnswerComputation

Page 22: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)22 August 31st, 2011

Structure-based Answer Compution

Finally, results which exactly match the query are computed by

the last refinement.

Only for this step, we actually perform joins on the data.

DEXA 2011, Toulouse, France

Page 23: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)23 August 31st, 2011

EVALUTION

DEXA 2011, Toulouse, France

Page 24: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)24 August 31st, 2011

Evaluation

Systems

INC: the proposed approach

VP: join processing using vertical partitioning with sextuple indexing

Datasets

DBLP: 13M triples

LUBM: 0.7M – 6.7M triples

Queries

Generated 80 queries via random sampling

Different shapes: path, star, graph

DEXA 2011, Toulouse, France

Page 25: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)25 August 31st, 2011

Results – Average Processing Time

DEXA 2011, Toulouse, France

Page 26: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)26 August 31st, 2011

Results – Average Processing Time

Neighborhood Distance

DEXA 2011, Toulouse, France

Page 27: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)27 August 31st, 2011

Results – Precision vs. Time

DEXA 2011, Toulouse, France

Page 28: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)28 August 31st, 2011

Results - Precision

DEXA 2011, Toulouse, France

Page 29: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)29 August 31st, 2011

Conclusion

We proposed a novel process for approximate and

incremental processing of complex graph pattern queries

Initial results are computed in a small fraction of total time and

the incrementally refined via approximate matching at low cost

Increased responsiveness as inexact results are available early

Users can decide if and for which result exactness and

completeness is desirable

Experiments show that our approach is relatively fast w.r.t. exact

and complete results, indicating that the proposed mechanism is

able to reuse intermediate results

DEXA 2011, Toulouse, France

Page 30: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)30 August 31st, 2011 DEXA 2011, Toulouse, France

Page 31: Approximate and Incremental Processing of Complex Queries against the Web of Data

Institute of Applied Informatics and Formal Description Methods (AIFB)31 August 31st, 2011

BACKUP SLIDES

DEXA 2011, Toulouse, France