gStore: Answering SPARQL Queries Via Subgraph Matching

Post on 06-Jan-2016

48 views 0 download

Tags:

description

gStore: Answering SPARQL Queries Via Subgraph Matching. Presented by Guan Wang Kent State University October 24, 2011. Outline. RDF & SPARQL Previous Solutions for SPARQL Queries Overview of gStore Encoding Technique VS*-tree & Query Algorithm Experiments Conclusions. Outline. - PowerPoint PPT Presentation

Transcript of gStore: Answering SPARQL Queries Via Subgraph Matching

1

gStore: Answering SPARQL Queries Via Subgraph Matching

Presented by Guan Wang

Kent State UniversityOctober 24, 2011

2

Outline

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

3

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

4

What is RDF

A general-purpose framework provides structured, machine-understandable metadata for the Web

It is based upon the idea of making statements about resources in the form of subject-predicate-object expressions. These expressions are known as triples in RDF.

Subject Object

Predicate

Statement

5

RDF Model Example

page.html

Guan

Guan’s Home Page

Creator

Title

Subject Predicate Objectpage.html Creator Guanpage.html Creator Guan's Home Page

6

What is SPARQL

SPARQL is a query language for RDF. It provides a standard format for writing queries that target RDF data and a set of standard rules for processing those queries and returning the results.

The building blocks of a SPARQL queries are graph patterns that include variables. The result of the query will be the values that these variables must take to match the RDF graph.

7

Example of SPARQL

Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

Names beginning with a ? or a $ are variables. Graph patterns are given as a list of triple patterns

enclosed within braces {} The variables named after the SELECT keyword are the

variables that will be returned as results. (~SQL) Here each of the conjunctions, denoted by a dot,

corresponds to a join.

8

RDF Graph

9

SPARQL Queries

Query Graph

SPARQL Query: Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

10

Subgraph Match vs. SPARQL Queries

11

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

12

Existing Solutions-Three Column Table

SPARQL Query:

Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. }

Shortage:

Too Many Self-Joins

13

Shortage:

A Big Waste of Space

Existing Solutions-Property Table

14

Existing Solutions-Vertically Partitioned

Shortage:

Too Many Merge Joins

15

Existing Solutions-RDF-3x

Shortage: Different to Handle Updates

Utilize the characteristic of RDF, that there are only three elements(subject, object and predicate) in RDF. Construct all six possible indexes and optimalize merge orders.

16

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

17

Overview of gStore(Store)

Represent an RDF dataset by an RDF graph G and store it by its adjacency list table.

18

Overview of gStore(Encoding)

Encode each entity and class vertex into a bitstring, called signature. Link these vertex signatures to form a data signature graph G according

to RDF graph’s structure

19

Overview of gStore(VS*-tree)

20

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

21

Encoding Technique

22

Encoding Technique

23

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

24

VS*-tree

Each leaf node of the tree corresponds to one vertex signature in G. Given two leaf nodes d1 and d2 in the tree, we introduce an edge between them, if and only if there is an edge between d1 and d2 in G Given nodes d1 and d2 in the tree, we introduce a super edge from d1 to d2 , if and only if there is at least one edge from d1’s children to

d2’s children. Assign an edge label for the edge d1→ d2 by performing bitwise “OR” over these n edge labels from d1’s children to d2’s children.

25

VS*-tree

26

Query Algorithm

27

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

28

Experiments

Used datasets: Yago, DBLP which are popular semantic datasets with millions of triples.

Data size: approximately 4GB.

29

Experiments(Exact Queries)

30

Experiments(Wildcard Queries)

31

RDF & SPARQL

Previous Solutions for SPARQL Queries

Overview of gStore

Encoding Technique

VS*-tree & Query Algorithm

Experiments

Conclusions

Outline

32

Conclusions

Propose to store and query RDF data from graph database perspective. Using VS*-tree as indexing method for bitstring of vertices, which supports the SPARQL queries in

a scalable manner. False positive.

33

Reference

[ICDE09]Thanh Tran, Haofen Wang, Sebastian Rudolph, Philipp Cimiano, "Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data", DOI 10.1109/ICDE.2009.119.

[VLDB07]Daniel J. Abadi, Adam Marcus, Samuel R. Madden,Kate Hollenbach, "Scalable Semantic Web Data Management Using Vertical Partitioning", VLDB ‘07, September 2328, 2007, Vienna, Austria.

[PVLDB08]Cathrin Weiss, Panagiotis Karras, Abraham Bernstein, "Hexastore:Sextuple Indexing for Semantic Web Data Management",PVLDB '08, August 23-28, 2008, Auckland, New Zealand

[PVLDB08]Thomas Neumann, Gerhard Weikum, "RDF3X:a RISCstyle Engine for RDF",PVLDB '08, August 23-28, 2008, Auckland, New Zealand

[VLDB11]Lei Zou, Jinghui Mo, Lei Chen, M. Tamer O¨ zsu, Dongyan Zhao, "gStore: Answering SPARQL Queries via Subgraph Matching" VLDB‘11,August 29th - September 3rd 2011, Seattle, Washington.

Thank you!