G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He...
-
Upload
adam-willis -
Category
Documents
-
view
214 -
download
0
Transcript of G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He...
![Page 1: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/1.jpg)
G-SPARQL: A Hybrid Engine for Querying Large Attributed
Graphs
Sherif Sakr Sameh Elnikety
Yuxiong He
NICTA & UNSWSydney, Australia
Microsoft Research
Redmond, WA
CIKM 2012
Microsoft Research
Redmond, WA
![Page 2: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/2.jpg)
Example 1: Social Network
Bob
Hillary Alice
Chris David
FranceEd George
Bob
Hillary Alice
Chris David
FranceEd George
Photo1
Photo2
Photo3
Photo4Photo5 Photo6
Photo8
Photo7
2
![Page 3: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/3.jpg)
3
Example 2: Bibliographical Network
Alice JohnSmith
age: 28office: 518
Age:42location: Sydney
age:45
Paper 1 Paper 2
UNSW Microsoft
VLDB¶12
Keyword: graphKeyword: XML
type: Demo
location: Istanbul
country: Australiaestablished: 1949
country: USAestablished: 1975
citedBy
title: Professor
title: Senior Researcher
order: 1order: 2 order: 1 order: 2
Month: 1Month: 3
![Page 4: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/4.jpg)
4
Contributions1. G-SPARQL language
– Pattern matching– Reachability
2. Hybrid execution engine– Graph topology in main memory– Graph data in relational database
3. Algebraic transformation– Operators– Optimizations
4. Experimental evaluation
![Page 5: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/5.jpg)
5
1. G-SPARQL Query Language•Extends a subset of SPARQL
– Based on triple pattern: (subject, predicate, object)
•Sub-graph matching patterns on– Graph structure– Node attribute– Edge attribute
•Reachability patterns on– Path– Shortest path
subject object
![Page 6: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/6.jpg)
6
G-SPARQL Syntax
![Page 7: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/7.jpg)
7
G-SPARQL Pattern Matching•Node attribute
– ?Person @officeNumber “518”
•Edge attribute– ?E @Role “Programmer”
•Structural– ?Person worksAt Microsoft– ?Person ?E(worksAt) Microsoft
Alice Microsoft
officeNumber=518
Role = Programmer
![Page 8: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/8.jpg)
8
G-SPARQL Reachability•Path
– Subject ??PathVar Object
•Shortest path– Subject ?*PathVar Object
•Path filters– Path length– All edges– All nodes
![Page 9: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/9.jpg)
9
Example: G-SPARQL QuerySELECT ?L1 ?L2WHERE {
?X ??P ?Y.
?X @Label ?L1. ?Y @Label ?L2.?X @Age ?Age1. ?Y @Age ?Age2.?X Affiliated UNSW. ?Y ?E(Affiliated) Microsoft.?X LivesIn Sydney. ?E @Title "Researcher".
FILTER(?Age1 >= 40). FILTER(?Age2 >= 40).FILTERPATH( Length( ??P, <= 3) ).
}
![Page 10: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/10.jpg)
10
Outline1. G-SPARQL language
– Pattern matching– Reachability
2. Hybrid execution engine– Graph topology in main memory– Graph data in relational database
3. Algebraic transformation– Operators– Optimizations
4. Experimental evaluation
![Page 11: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/11.jpg)
11
2. Hybrid Execution Engine•Reachability queries
– Main memory algorithms– Example: BFS and Dijkstra’s algorithm
•Pattern matching queries– Relational database– Indexing
» Example: B-tree– Query optimizations,
» Example: selectivity estimation, and join ordering– Recursive queries
» Not efficient: large intermediate results and multiple joins
Bob
Hillary Alice
Chris David
FranceEd George
Photo1
Photo2
Photo3
Photo4Photo5 Photo6
Photo8
Photo7
![Page 12: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/12.jpg)
12
Graph Representation
ID Value
1 John
2 Paper 2
3 Alice
4 Microsoft
5 VLDB’12
6 Paper 1
7 UNSW
8 Smith
ID Value
1 45
3 42
8 28
ID Value
8 518
ID Value
3 Sydney
5 Istanbul
ID Value
2 XML
6 graph
ID Value
2 Demo
ID Value
4 USA
7 Australia
ID Value
4 1975
7 1949
eID sID dID
1 1 2
5 3 2
6 3 6
11 8 6
Node Label age office location keyword type established
country
authorOf
eID sID dID
3 1 4
8 3 7
12 8 7
affiliated
eID sID dID
4 2 5
10 6 5
published
eID sID dID
9 6 2
citedBy
eID sID dID
7 3 8
supervise
eID sID dID
2 1 3
know ID Value
3 Senior Researcher
8 Professor
title
ID Value
1 2
5 1
6 2
11 1
order
ID Value
4 3
10 1
month
![Page 13: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/13.jpg)
13
Hybrid Execution Engine: interfaces
Bob
Hillary Alice
Chris David
FranceEd George
Photo1
Photo2
Photo3
Photo4Photo5 Photo6
Photo8
Photo7
G-SPARQL query
SQL commands
Traversal
operations
![Page 14: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/14.jpg)
14
3. Intermediate Language & Compilation
Physical execution
planSQL
commands
Traversal
operations
G-SPARQL query
Algebraic query plan
Front-end compilation
Step 2
Back-end compilation
Step 1
Bob
Hillary Alice
Chris David
FranceEd George
Photo1
Photo2
Photo3
Photo4Photo5 Photo6
Photo8
Photo7
![Page 15: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/15.jpg)
15
Intermediate Language•Objective
– Generate query plan and chop it» Reachability part -> main-memory algorithms on topology» Pattern matching part -> relational database
– Optimizations
•Features– Independent of execution engine and graph representation– Algebraic query plan
![Page 16: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/16.jpg)
16
G-SPARQL Algebra•Variant of “Tuple Algebra”•Algebra details
– Data: tuples» Sets of nodes, edges, paths.
– Operators» Relational: select, project, join» Graph specific: node and edge attributes, adjacency» Path operators
![Page 17: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/17.jpg)
17
Relational
![Page 18: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/18.jpg)
18
Relational
NOT Relational
![Page 19: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/19.jpg)
19
Front-end Compilation (Step 1)• Input
– G-SPARQL query
•Output– Algebraic query plan
•Technique– Map
» from triple patterns» To G-SPARQL operators
– Use inference rules
![Page 20: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/20.jpg)
20
Front-end Compilation: Inference Rules
![Page 21: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/21.jpg)
21
Front-end Compilation: Optimizations
•Objective– Delay execution of traversal operations
•Technique– Order triple patterns, based on restrictiveness
•Heuristics– Triple pattern P1 is more restrictive than P2
1. P1 has fewer path variables than P22. P1 has fewer variables than P23. P1’s variables have more filter statements than P2’s variables
![Page 22: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/22.jpg)
22
Back-end Compilation (Step 2)• Input
– G-SPARQL algebraic plan
•Output– SQL commands– Traversal operations
•Technique– Substitute G-SPARLQ relational operators with SPJ– Traverse
» Bottom up» Stop when reaching root or reaching non-relational operator» Transform relational algebra to SQL commands
– Send non-relational commands to main memory algorithms
![Page 23: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/23.jpg)
23
Back-end Compilation: Optimizations•Optimize a fragment of query plan
– Before generating SQL command
•All operators are Select/Project/Join•Apply standard techniques
– For example pushing selection
![Page 24: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/24.jpg)
24
Example: G-SPARQL QuerySELECT ?L1 ?L2WHERE {
?X ??P ?Y.
?X @label ?L1. ?Y @label ?L2.?X @age ?Age1. ?Y @age ?Age2.?X affiliated UNSW. ?Y ?E(affiliated) Microsoft.?X livesIn Sydney. ?E @title "Researcher"
FILTER(?Age1 >= 40). FILTER(?Age2 >= 40).}
![Page 25: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/25.jpg)
25
Example: Query Plan
![Page 26: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/26.jpg)
26
4. Experimental Evaluation•Objective
– This is a good idea– Good performance from DBMS and main memory topology
•Data sets– Real ACM bibliographic network
– Synthetic graphs» See technical report
![Page 27: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/27.jpg)
27
Experimental Environment•Workload
– Created Q1 … Q12
•Process– Compare to Neo4J (non-optimized, optimized)
•Environment– Implementation
» Main memory algorithms in C++» IBM DB2
– PC Server
![Page 28: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/28.jpg)
28
Results on Real Dataset
![Page 29: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/29.jpg)
29
Response time on ACM Bibliographic Network
![Page 30: G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,](https://reader037.fdocuments.in/reader037/viewer/2022110101/56649e8f5503460f94b93846/html5/thumbnails/30.jpg)
30
Conclusions•G-SPARQL Language
– Expresses pattern matching and reachability queries on attributed graphs
•Hybrid engine– Graph topology in main memory– Graph data in database
•Compilation into algebraic plan– Operators and optimizations
•Evaluation– Real and synthetic datasets– Good performance
» Leveraging database engine and main memory topology