Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop...
Transcript of Cypher-based Graph Pattern Matching in Gradoop · Cypher-based Graph Pattern Matching in Gradoop...
Cypher-based Graph Pattern Matching in Gradoop
GRADES2017: Graph Data-management Experiences & Systems
Chicago
May 2017
Martin Junghanns1, Max Kießling1,2, Alex Averbuch2, André Petermann1 and Erhard Rahm1 1University of Leipzig – Database Research Group 2Neo Technology
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 2
Motivation GRADOOP Implementation Benchmark
2
Motivation
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 3
Motivation GRADOOP Implementation Benchmark
3
Motivation
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 4
Motivation GRADOOP Implementation Benchmark
4
Motivation
„Who are the closest enemies of each Orc?“
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 5
Motivation GRADOOP Implementation Benchmark
5
Motivation
Cypher
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 6
Motivation GRADOOP Implementation Benchmark
6
Motivation
Flink Gelly
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 7
Motivation GRADOOP Implementation Benchmark
7
Motivation
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 8
Motivation GRADOOP Implementation Benchmark
8
Motivation
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 9
Motivation GRADOOP Implementation Benchmark
9
Motivation
„Which two clan leaders hate each other and one of them knows Frodo over one to ten hops?“
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 10
Motivation GRADOOP Implementation Benchmark
10
Motivation
Cypher
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 11
Motivation GRADOOP Implementation Benchmark
11
Motivation
Flink Gelly (or any other non-declarative
graph processing system)
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 12
Motivation GRADOOP Implementation Benchmark
12
GRADOOP
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 13
Motivation GRADOOP Implementation Benchmark GRADOOP
„An open-source graph dataflow system for declarative analytics of heterogeneous graph data.“
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 14
Motivation GRADOOP Implementation Benchmark GRADOOP
Distributed Graph Storage (Apache HDFS)
Distributed Operator Execution (Apache Flink)
Extended Property Graph Model (EPGM)
Analytical API
I/O Graph Operators Graph Algorithms
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 15
Motivation GRADOOP Implementation Benchmark GRADOOP
1 2
3
4
5 1 2 3
4
5
Hobbit name : Samwise
Orc name : Azog
Clan name : Tribes of Moria founded : 1981
Orc name : Bolg
Hobbit name : Frodo yob : 2968
leaderOf since : 2790
memberOf since : 2013
hates since : 2301
hates
knows since : 2990
Property Graph Model
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 16
Motivation GRADOOP Implementation Benchmark GRADOOP
1 2
3
4
5 1 2 3
4
5
2
1
Hobbit name : Samwise
Orc name : Azog
Clan name : Tribes of Moria founded : 1981
Orc name : Bolg
Hobbit name : Frodo yob : 2968
leaderOf since : 2790
memberOf since : 2013
hates since : 2301
hates
knows since : 2990
|Area|title:Mordor
|Area|title:Shire
Extended Property Graph Model
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 17
Motivation GRADOOP Implementation Benchmark GRADOOP
Gradoop Graph Transformations
Unary Binary
Gra
ph
Co
llect
ion
Lo
gica
l Gra
ph
Equality
Union
Intersection
Difference
Limit
Selection
Pattern Matching
Distinct
Apply
Reduce
Call
Aggregation
Pattern Matching
Transformation
Grouping
Call
Subgraph
Equality
Combination
Overlap
Exclusion
Fusion
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 18
Motivation GRADOOP Implementation Benchmark GRADOOP
3
1 3
4
5 2 Pattern
4 5
1 3
4
2
Graph Collection
Pattern Matching (Single-Graph Setting)
GraphCollection collection = graph3.cypher(‘MATCH (:Green)-[:orange]->(:Orange) RETURN *’, ISO, ISO);
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 19
Motivation GRADOOP Implementation Benchmark
19
Implementation
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 20
Motivation GRADOOP Implementation Benchmark
20
Implementation
Id Label Properties
1 Area {title:Mordor}
2 Area {title:Shire}
Id Label Properties Graphs
1 Orc {name:Azog} {1}
2 Clan {name:Tribes of Moria, founded:1981} {1}
3 Orc {name:Bolg} {1,2}
4 Hobbit {name:Frodo, yob:2968} {2}
5 Hobbit {name:Samwise} {2}
Id Label Source Target Properties Graphs
1 leaderOf 1 2 {since:2790} {1}
2 memberOf 3 2 {since:2013} {1}
3 hates 3 4 {since:2301} {2}
4 hates 3 5 {} {2}
5 knows 5 4 {since:2990} {2}
DataSet<EPGMGraphHead>
DataSet<EPGMVertex> DataSet<EPGMEdge>
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 21
Motivation GRADOOP Implementation Benchmark
21
Implementation
Parsing Execution
c1
o2 h
c2
o1
(c1 != c2) AND (o1 != o2) AND (h.name = Frodo Baggins)
=> 23
=> 42
=> 84
=> 123
=> 456
=> 789
0
3 4
1
2
0
3 5
1
2 4
0
3 6
1
2 4
0
3 6
1
2
4
7
Planning
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 22
Motivation GRADOOP Implementation Benchmark
22
Implementation
PlanTableEntry | type: GRAPH | all-vars: [...] | proc-vars: [...] | attr-vars: [] | est-card: 23 | prediates: () | Plan : |-FilterEmbeddingsNode{filterPredicate=((c1 != c2) AND (o1 != o2))} |.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c1, filterPredicate=((c1.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e0', targetVar='c1', filterPredicate=((_e0.label = leaderOf)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[o1], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o1, filterPredicate=((o1.label = Orc)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o1', edgeVar='_e1', targetVar='o2', filterPredicate=((_e1.label = hates)), projectionKeys=[]} |.|.|-JoinEmbeddingsNode{joinVariables=[o2], vertexMorphism=H, edgeMorphism=I} |.|.|.|-JoinEmbeddingsNode{joinVariables=[h], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=h, filterPredicate=((h.label = Hobbit) AND (h.name = Frodo Baggins)), projectionKeys=[]} |.|.|.|.|-ExpandEmbeddingsNode={startVar='o2', pathVar='_e3', endVar='h', lb=1, ub=10, direction=OUT, vertexMorphism=H, edgeMorphism=I} |.|.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=o2, filterPredicate=((o2.label = Orc)), projectionKeys=[]} |.|.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e3', targetVar='h', filterPredicate=((_e3.label = knows)), projectionKeys=[]} |.|.|.|-JoinEmbeddingsNode{joinVariables=[c2], vertexMorphism=H, edgeMorphism=I} |.|.|.|.|-FilterAndProjectVerticesNode{vertexVar=c2, filterPredicate=((c2.label = Clan)), projectionKeys=[]} |.|.|.|.|-FilterAndProjectEdgesNode{sourceVar='o2', edgeVar='_e2', targetVar='c2', filterPredicate=((_e2.label = leaderOf)), projectionKeys=[]}
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 23
Motivation GRADOOP Implementation Benchmark
23
Implementation
Filter Hobbit(name=Frodo Baggins)
name: Frodo Baggins height: 1.22m gender: male city: Bag End
Project [ ]
h.id h.name h.height …
31 Frodo 1.22 …
h.id
32
id Properties
1 {…}
2 {…}
3 {…}
… …
DataSet<Vertex> DataSet<Embedding>
FlatMap(Vertex -> Embedding)
𝜋ℎ.𝐼𝑑(𝑉′) 𝜎 𝐿𝑎𝑏𝑒𝑙=𝐻𝑜𝑏𝑏𝑖𝑡
∧𝑛𝑎𝑚𝑒=𝐹𝑟𝑜𝑑𝑜(𝑉)
FilterAndProject
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 24
Motivation GRADOOP Implementation Benchmark
24
Implementation
c.id _e1.id o1.id
51 11 2
52 12 3
… … …
DataSet<Embedding> DataSet<Embedding>
FlatJoin(lhs, rhs -> combine(lhs, rhs)) DataSet<Embedding>
o1.id _e2.id o2.id
2 13 5
3 14 3
… … …
c.id _e1.id o1.id _e2.id o2.id
51 11 2 13 5
52 12 3 14 3
… … …
Combine Check for vertex/edge isomorphism,
Remove duplicate entries
JoinEmbeddings Left: (c1:Clan)<-[:hasLeader]-(o1:Orc) Right: (o1:Orc)-[:hates]->(o2.Orc)
𝐿 ⋈𝑜1.𝑖𝑑 𝑅
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 25
Motivation GRADOOP Implementation Benchmark
25
Implementation
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 28
Motivation GRADOOP Implementation Benchmark
28
Benchmark
Cypher-based Graph Pattern Matching in GRADOOP – GRADES 2017 29
Motivation GRADOOP Implementation Benchmark
29
Benchmark
• 16x Intel(R) Xeon(R) 2.50GHz 6 (12), 48 GB RAM • 1 Gigabit Ethernet • Hadoop 2.6.0 • Flink 1.1.2
Dataset # Vertices # Edges Disk size
LDBC-SNB 10 29 M 167 M 19 GB
LDBC-SNB 100 271 M 1.6 B 191 GB
Q2:
Q6:
• Gradoop & Extended Property Graph Model • Schema flexible: Type Labels and Properties • Logical Graphs / Graphs Collection
• Cypher Pattern Matching Operator • Flexible operator for computing matches • Combination with existing analytical operators • Extendible architecture (planner, statistics, …)
• Implemented on Apache Flink • Horizontal scalability • Combine with other Flink libraries
Summary
www.gradoop.com
Junghanns, M.; Kießling, M.; Averbuch, A.,; Petermann, A.; Rahm, E. „Cypher-based Graph Pattern Matching in Gradoop“ Proc. ACM SIGMOD workshop on Graph Data Management Experiences and Systems (GRADES), 2017
Petermann, A.; Junghanns, M.; Kemper, S.; Gomez, K.; Teichmann, N.; Rahm, E., „Graph Mining for Complex Data Analytics “,
Proc. ICDM Conf. (Demo), 2016.
Junghanns, M.; Petermann, A.; Teichmann, N.; Gomez, K.; Rahm, E., „Analyzing Extended Property Graphs with Apache Flink“,
Int. Workshop on Network Data Analytics (NDA), SIGMOD, 2016.
Petermann, A.; Junghanns, M., „Scalable Business Intelligence with Graph Collections“,
it – Special Issue on Big Data Analytics, 2016.
Petermann, A.; Junghanns, M.; Müller, M.; Rahm, E., „Graph-based Data Integration and Business Intelligence with BIIIG“,
Proc. VLDB Conf. (Demo), 2014.