Security Ops, Engineering, and Intelligence Integration through the power of Graph(DB)!
A case-for-graph-db
-
Upload
dhaval-dalal -
Category
Technology
-
view
184 -
download
0
description
Transcript of A case-for-graph-db
A case for Graph Database?
@softwareartisan
11th Sept 2014
Context
Direct and Cross-Functional reporting represents a network even for a simple organisation.
What about modelling a group?
Apiary Functionality
Structural Operations Mine Organisational Data
!• Expand/Collapse levels • View lineage • Summary Data at all levels • CRUD on all data (nodes/
relationships) • Link/De-link Sub-graphs or
nodes • Evolving attributes of nodes
and relationships based • Adding new nodes and
relationships
!• Affinities Graph - Who talks to
whom the most • Discover Skills Communities • Detecting overlap using SLPA
(Speaker-listener Label Propogation)
Convince ourself first!• Anyone should ask - “What makes it a case for
Graph DB? and Can you prove it?”
• Its basically a de-risking act.
• Two major aspects that we looked at
• Flexibility in schema evolution
• Performance
What to compare against?• RDBMS’ are a natural choice to be compared against.
• MongoDB, though a NoSQL document store
• good for storing DDD style aggregates.
• not for inter-connected data.
• We picked Neo4j
• But remember, this is not a battle, we are just trying to find out when you should use what!
Flexibility in Evolution
Flexibility in Evolution
Flexibility in Evolution• Entity Diversity
• Different kinds of nodes
Flexibility in Evolution• Entity Diversity
• Different kinds of nodes
• Connection Diversity
• links could have different weights, directions.
Flexibility in Evolution• Entity Diversity
• Different kinds of nodes
• Connection Diversity
• links could have different weights, directions.
• Evolution of Entities and Links, themselves over time.
• Varietal data needs
• Is every node/link structured regularly or irregularly, connected or disconnected nodes etc…
Analysis Model (Phase 1)
1) Neo4J Domain Model
Node Properties
Person name
type
level
Relationships Properties
DIRECTLY_MANAGES N/A
Note: For the purpose of establishing the case, we have modeled minimal relationships and not all the relationships that would have been in the final application. Below is a list of relationships that are yet pending to be modeled, but are not relevant for the purposes of taking performance measurements.
Relationships Properties
INDIRECTLY_MANAGES N/A
DIRECTLY_REPORTS_TO N/A
INDIRECTLY_REPORTS_TO N/A
MEETS frequency, where
PREFERS_COMPANY_OF N/A
5
Analysis Model (Phase 1)
1) Neo4J Domain Model
Node Properties
Person name
type
level
Relationships Properties
DIRECTLY_MANAGES N/A
Note: For the purpose of establishing the case, we have modeled minimal relationships and not all the relationships that would have been in the final application. Below is a list of relationships that are yet pending to be modeled, but are not relevant for the purposes of taking performance measurements.
Relationships Properties
INDIRECTLY_MANAGES N/A
DIRECTLY_REPORTS_TO N/A
INDIRECTLY_REPORTS_TO N/A
MEETS frequency, where
PREFERS_COMPANY_OF N/A
5
2) SQL Domain Model
Queries
Above screen-flow and modeling for organizations and groups use cases requires us to run queries that do the following:
1) Gather Subordinates names till a visibility level from current level.
2) Gather Subordinates Aggregate Data from current level.
3) Gather Overall Aggregate Data for Dashboard
We have run these queries at different levels for various volumes of person-data ranging from 1000 people (needed by the organization use case) to 1 million people (needed by the Group use case) in steps of multiples of 10, i.e 1K, 10K, 100K, 1M. We have additionally taken readings for 2M and 3M organization at level 8.
Note: For Neo4j, we have restricted ourselves to Parametric Cypher queries (and not included Traversal API) by executing them and noting the execution time programmatically. This, we believe is a fair apple-to-apple comparison because just as what SQL is to RDBMS, Cypher is to Neo4j. Also, we understand that the Traversal API is likely the fastest at the moment; however, for the longer term as Neo4j intends to further the Cypher query planning and optimization, we think its prudent to stick to it.
6
Minimal set of functionality
1) Gather Subordinates names till a visibility level from a current level
CURRENT LEVEL
We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:
start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)
Where visibility level is a number that indicates the number of levels to show.
For SQL we have to recursively add joins for each level, a generic SQL can be written as:
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
Visibility Level
Database Queries
2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)
2
MySQL/MSSQL
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
7
Names
Measured performance of 3 queries
• Subordinate names from current level until a visibility level
• Aggregate Data from current level until a visibility level
• Overall Aggregate Data for Dashboard
• distribution of people at various levels
2) Gather Subordinates Aggregate data from current level
We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people.
Further, the optimized Generic Cypher query is:
start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*0..(totalLevels - n.level)]->m-[:DIRECTLY_MANAGES*1..(totalLevels - n.level)]->owhere n.level + visibilityLevel >= m.levelreturn m.name as Subordinate, count(o) as Total
For SQL aggregate query, we not only have to recursively add joins but also perform inner unions for each level till the last level to obtain the aggregate data for that level. Once we obtain the data for a particular level (per person), we perform outer unions to get the final result for all the levels. This results in a very big SQL query.
Here is a sample query that returns aggregate data for 3 levels below the current level. Say, we have current level as 5 with a total of 8 levels, this means I need to aggregate data for each person starting below level 5 until 8 as well as for the person in question (in the where clause).
(SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count! FROM person_reportee manager! WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION ! SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees
UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid
9
AggregateData
Current Level
Apple-to-Apple comparison• MySQL, MS-SQL and Neo4j
• We did not use Traversal API (though faster), just as Cypher is to Neo4j as SQL is to RDBMS’
• For longer term, Neo4j intends to further Cypher Query planning and optimisation.
• Indexing
• Enabled on join columns for MySQL and MS-SQL DBs
• For Neo4j, Person names were indexed
Environment Consistency• Same machine for all databases
• MySQL v5.6.12, MS-SQL Server 2008 R2, Neo4j v1.9 (Advanced)
• DB and tools on the same machine
• Avoid network transport times being factored in.
• Out-of-box settings for all DBs apart from giving 3 GB to the java process that ran Neo4j in Embedded mode.
Functional Equivalence• Consistent data distribution.
• 8 Levels in org with people at each level managing the next for all DBs.
• Functionally equivalent queries.
• Measurements for worst possible queries scenario being executed by the application
• Say if the top-boss logs in and wants to see all the levels (max. visibility level), the query will take the most time.
Measurement Tools• MS-SQL
• Query Profiler
• MySQL
• we noted the duration (excluding fetch time) from MySQL workbench
• Neo4j
• Executed Parameteric queries programatically in Embedded Mode.
• Did not use Neo4j shell for measurements as its intended to be an Ops tool (and not a transactional tool).
Gather Subordinate Names Query
1) Gather Subordinates names till a visibility level from a current level
We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:
start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)
Where visibility level is a number that indicates the number of levels to show.
For SQL we have to recursively add joins for each level, a generic SQL can be written as:
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
Visibility Level
Database Queries
2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)
2
MySQL/MSSQL
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
7
1) Gather Subordinates names till a visibility level from a current level
We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:
start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)
Where visibility level is a number that indicates the number of levels to show.
For SQL we have to recursively add joins for each level, a generic SQL can be written as:
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
Visibility Level
Database Queries
2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)
2
MySQL/MSSQL
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
7
1) Gather Subordinates names till a visibility level from a current level
CURRENT LEVEL
We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:
start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)
Where visibility level is a number that indicates the number of levels to show.
For SQL we have to recursively add joins for each level, a generic SQL can be written as:
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
Visibility Level
Database Queries
2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)
2
MySQL/MSSQL
SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")
7
Names
Gather Subordinate Names
Volume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 78 0 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269
4 94 15 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183
5 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221
6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319
7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357
8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207
0
50
100
150
200
3 4 5 6 7 8
164178
166 171 177164
1000 People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
375
750
1125
1500
3 4 5 6 7 8
760851
763
1124
9291040
1000 People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
-100
0
100
200
300
400
3 4 5 6 7 8
269
183221
319357
207
1000 People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
14
Neo4jMSSQLMySQL
Warm Cache Plots
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Lvls
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
m
378
0109
093
0226
049
06
06791643329
7601538
269
494
1594
1678
1611482133
6135
06701782878
851892183
5109
062
1547
021243138
1359
08131662848
7631533
221
6125
0156
6263
011878278
7226
0699171360711241387
319
710916
9447
62015190420
7113
07281773123
9291457
357
8110
0141
3178
023146621
8418
072116428791040906207
050100
150
200
34
56
78
164
178
166
171
177
164
1000
Peo
ple
- Gat
her S
ubor
dina
te N
ames
Query Execution Time (ms)
Leve
ls
375
750
1125
1500
34
56
78
760
851
763
1124
929
1040
1000
Peo
ple
- Gat
her S
ubor
dina
te A
ggre
gate
Query Execution Time (ms)
Leve
ls
-1000
100
200
300
400
34
56
78
269
183
221
319
357
207
1000
Peo
ple
- Ove
rall
Aggr
egat
e
Query Execution Time (ms)
Leve
ls
14
Ne
o4j
MS
SQ
LM
yS
QLW
arm
Cac
he P
lots
Volume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613
4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455
5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328
6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453
7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456
8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36
0
75
150
225
300
3 4 5 6 7 8
176 178 184221
185 171
10K People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
750
1500
2250
3000
3 4 5 6 7 8
898 9521194
14581188 1271
10K People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
-175
0
175
350
525
700
3 4 5 6 7 8
613
455
328
453 456
36
10K People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
15
Warm Cache Plots
Neo4jMSSQLMySQL
Volume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566
4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594
5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497
6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573
7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592
8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557
0
250
500
750
1000
3 4 5 6 7 8166 167 167 175 177 183
100K People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
7500
15000
22500
30000
3 4 5 6 7 8
4404 4697 5516 6145 6915 8304
100K People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
150
300
450
600
3 4 5 6 7 8
566 594
497573 592
557
100K People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)Levels
16
Neo4JMSSQLMySQL
Warm Cache Plots
Volume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate AggregateGather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate AggregateGather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm
Cold Warm Cold Warm
Cold Warm Cold Warm Cold Warm
Cold Warm
Cold Warm Cold Warm
3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 371902109
4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 360332194
5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 340682248
6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 330152107
7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 328131960
8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 236731912
Warm Cache Plots
-2250
0
2250
4500
6750
9000
3 4 5 6 7 8
176 182 184 173 177 153
1M People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
75000
150000
225000
300000
3 4 5 6 7 8
44265
10626077528
124810125152
73121
1M People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
750
1500
2250
3000
3 4 5 6 7 8
2109 2194 2248 2107 1960 1912
1M People - Overall AggregateQ
uery
Exe
cutio
n Ti
me
(ms)
Levels
17
Neo4JMSSQLMySQL
4.O
ne o
f the
impo
rtan
t obs
erva
tions
that
we
have
mad
e is
that
LEF
T jo
in in
MyS
QL
is fas
ter
as co
mpa
red
to IN
NER
join
(con
trary
to o
ur th
inking
- LE
FT jo
in w
ould
be
mor
e ex
pens
ive
as co
mpa
red
to IN
NER
). A
s all
the
abov
e qu
eries
are
func
tiona
lly e
quiva
lent,
it w
ould
not
be
fair
to u
se IN
NER
join
for
takin
g m
easu
rem
ents.
H
owev
er, w
e co
uld n
ot r
esist
the
cu
riosit
y to
che
ck o
ut IN
NER
join
perfo
rman
ce.
Exec
uting
the
abov
e qu
eries
usin
g inn
er jo
in on
1M
(all
leve
ls), 2
M a
nd 3
M (f
or le
vel 8
), w
e fo
und
a sig
nifica
nt in
crea
se in
que
ry e
xecu
tion
time
for
MyS
QL.
Belo
w ar
e th
e re
sults
of
rem
oving
the
LEFT
join
from
the
Gat
her S
ubor
dina
te N
ames
que
ry fo
r 1M
, 2M
Lev
el 8,
and
3M L
evel
8.
Wha
t doe
s this
mea
n to
us o
r any
app
licat
ion?
Say
, if th
e sit
uatio
n de
man
ded
that
we
chan
ge
the
LEFT
join
to IN
NER
join,
sudd
enly
we
wou
ld fi
nd o
urse
lves i
n a
bad
posit
ion
perfo
rman
ce
wise
.
Whe
reas
its
impo
rtan
t to
not
e th
at N
eo4j’
s pe
rform
ance
is a
lmos
t co
nsta
nt t
ime
exec
utio
n fo
r all d
ata
sizes
.
For
the
abov
e m
easu
rem
ents,
we
did
not
facto
r-in
IND
IREC
TLY_
MAN
AGES
rela
tions
hip,
how
ever
, tak
en in
to c
onsid
erat
ion,
it w
ould
tran
slate
to a
dditio
nal j
oin
for
SQL,
thus
blo
ating
th
e alr
eady
blo
ated
qu
ery
and
incre
asing
co
mpl
exity
. Th
is w
ill fu
rthe
r de
grad
e th
e pe
rform
ance
as j
oin
selec
tion
wou
ld n
ow ta
ke m
ore
time.
19
-175
000
1750
0
3500
0
5250
0
7000
0 1M-3
1M-5
1M-7
2M-8
LEFT
Vs
INN
ER V
s N
eo4j
Query Execution Time (ms)
Org
Size
-Lev
els
MyS
QL
- LEF
T Jo
inM
ySQ
L - I
NNER
Neo4
j
MyS
QL
MyS
QL
MyS
QL
MyS
QL
Ne
o4j
Ne
o4j
Org
S
ize
-L
eve
ls
LEFT
JOIN
LE
FT JO
IN
INN
ER JO
ININ
NER
JOIN
cold
war
mco
ldw
arm
cold
war
m
1M-3
109
16156
16718
176
1M-4
172
0140
0740
182
1M-5
172
01154
874
721
184
1M-6
141
0531
312
709
173
1M-7
218
03760
3432
700
177
1M-8
265
01714515896
835
153
2M-8
281
04006136301
822
149
3M-8
234
06046661776
744
148
War
m C
ache
Plo
ts
MySQL - Left Vs Inner Join
4. One of the important observations that we have made is that LEFT join in MySQL is faster as compared to INNER join (contrary to our thinking - LEFT join would be more expensive as compared to INNER). As all the above queries are functionally equivalent, it would not be fair to use INNER join for taking measurements. However, we could not resist the curiosity to check out INNER join performance.
Executing the above queries using inner join on 1M (all levels), 2M and 3M (for level 8), we found a significant increase in query execution time for MySQL. Below are the results of removing the LEFT join from the Gather Subordinate Names query for 1M, 2M Level 8, and 3M Level 8.
What does this mean to us or any application? Say, if the situation demanded that we change the LEFT join to INNER join, suddenly we would find ourselves in a bad position performance wise. Whereas its important to note that Neo4j’s performance is almost constant time execution for all data sizes.
For the above measurements, we did not factor-in INDIRECTLY_MANAGES relationship, however, taken into consideration, it would translate to additional join for SQL, thus bloating the already bloated query and increasing complexity. This will further degrade the performance as join selection would now take more time.
19
-17500
0
17500
35000
52500
70000
1M-3 1M-5 1M-7 2M-8
LEFT Vs INNER Vs Neo4jQ
uery
Exe
cutio
n Ti
me
(ms)
Org Size-LevelsMySQL - LEFT Join MySQL - INNER Neo4j
MySQLMySQLMySQLMySQL Neo4jNeo4j
Org Size-Levels
LEFT JOIN LEFT JOIN INNER JOININNER JOIN
cold warm cold warm cold warm
1M-3 109 16 156 16 718 176
1M-4 172 0 140 0 740 182
1M-5 172 0 1154 874 721 184
1M-6 141 0 531 312 709 173
1M-7 218 0 3760 3432 700 177
1M-8 265 0 17145 15896 835 153
2M-8 281 0 40061 36301 822 149
3M-8 234 0 60466 61776 744 148
Warm Cache Plots
Performance
Performance
Performance
1. Cartesian Product All possible combinations
Performance
1. Cartesian Product All possible combinations
Performance
2. Filter
1. Cartesian Product All possible combinations
Performance
2. Filter
1. Cartesian Product All possible combinations
PerformanceData Size
2. Filter
1. Cartesian Product All possible combinations
DS
PerformanceData Size
2. Filter
1. Cartesian Product All possible combinations
DSTime
JoinsPerformance
Data Size
2. Filter
1. Cartesian Product All possible combinations
DSTime
| J
JoinsPerformance
Data Size
2. Filter
1. Cartesian Product All possible combinations
Project
People
Skills
DSTime
| J
JoinsPerformance
Data Size
2. Filter
1. Cartesian Product All possible combinations
Project
People
Skills
DSTime
Query is now localised to a section of graph,
solves large data-size problem| J
JoinsPerformance
Data Size
2. Filter
1. Cartesian Product All possible combinations
Project
People
Skills
DSTime
Traversal Query is now localised to a section of graph,
solves large data-size problem| J
JoinsPerformance
Data Size
2. Filter
1. Cartesian Product All possible combinations
Project
People
Skills
DSTime Time
Traversal Query is now localised to a section of graph,
solves large data-size problem| J
Relationships• Not first-class citizens in RDBMS’ and NoSQL aggregate stores
• document stores or key-value stores.
• Cost of executing connected queries is high.
• “Who reports immediately to Smarty Pants?” - not a problem
• But, “Who all report to Smarty Pants?” - introduces recursive joins and going down more than 5-6 levels the space and time complexity is very high.
• “What skills does this person have?“ is relatively cheaper than “which people have these skills?”
• what about - “which are people who have these skills also have those skills?”
Gather Subordinates Aggregate Data
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Lvls
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
m
378
0109
093
0226
049
06
06791643329
7601538
269
494
1594
1678
1611482133
6135
06701782878
851892183
5109
062
1547
021243138
1359
08131662848
7631533
221
6125
0156
6263
011878278
7226
0699171360711241387
319
710916
9447
62015190420
7113
07281773123
9291457
357
8110
0141
3178
023146621
8418
072116428791040906207
050100
150
200
34
56
78
164
178
166
171
177
164
1000
Peo
ple
- Gat
her S
ubor
dina
te N
ames
Query Execution Time (ms)
Leve
ls
375
750
1125
1500
34
56
78
760
851
763
1124
929
1040
1000
Peo
ple
- Gat
her S
ubor
dina
te A
ggre
gate
Query Execution Time (ms)
Leve
ls
-1000
100
200
300
400
34
56
78
269
183
221
319
357
207
1000
Peo
ple
- Ove
rall
Aggr
egat
e
Query Execution Time (ms)
Leve
ls
14
Ne
o4j
MS
SQ
LM
yS
QLW
arm
Cac
he P
lots
4.O
ne o
f the
impo
rtan
t obs
erva
tions
that
we
have
mad
e is
that
LEF
T jo
in in
MyS
QL
is fas
ter
as co
mpa
red
to IN
NER
join
(con
trary
to o
ur th
inking
- LE
FT jo
in w
ould
be
mor
e ex
pens
ive
as co
mpa
red
to IN
NER
). A
s all
the
abov
e qu
eries
are
func
tiona
lly e
quiva
lent,
it w
ould
not
be
fair
to u
se IN
NER
join
for
takin
g m
easu
rem
ents.
H
owev
er, w
e co
uld n
ot r
esist
the
cu
riosit
y to
che
ck o
ut IN
NER
join
perfo
rman
ce.
Exec
uting
the
abov
e qu
eries
usin
g inn
er jo
in on
1M
(all
leve
ls), 2
M a
nd 3
M (f
or le
vel 8
), w
e fo
und
a sig
nifica
nt in
crea
se in
que
ry e
xecu
tion
time
for
MyS
QL.
Belo
w ar
e th
e re
sults
of
rem
oving
the
LEFT
join
from
the
Gat
her S
ubor
dina
te N
ames
que
ry fo
r 1M
, 2M
Lev
el 8,
and
3M L
evel
8.
Wha
t doe
s this
mea
n to
us o
r any
app
licat
ion?
Say
, if th
e sit
uatio
n de
man
ded
that
we
chan
ge
the
LEFT
join
to IN
NER
join,
sudd
enly
we
wou
ld fi
nd o
urse
lves i
n a
bad
posit
ion
perfo
rman
ce
wise
.
Whe
reas
its
impo
rtan
t to
not
e th
at N
eo4j’
s pe
rform
ance
is a
lmos
t co
nsta
nt t
ime
exec
utio
n fo
r all d
ata
sizes
.
For
the
abov
e m
easu
rem
ents,
we
did
not
facto
r-in
IND
IREC
TLY_
MAN
AGES
rela
tions
hip,
how
ever
, tak
en in
to c
onsid
erat
ion,
it w
ould
tran
slate
to a
dditio
nal j
oin
for
SQL,
thus
blo
ating
th
e alr
eady
blo
ated
qu
ery
and
incre
asing
co
mpl
exity
. Th
is w
ill fu
rthe
r de
grad
e th
e pe
rform
ance
as j
oin
selec
tion
wou
ld n
ow ta
ke m
ore
time.
19
-175
000
1750
0
3500
0
5250
0
7000
0 1M-3
1M-5
1M-7
2M-8
LEFT
Vs
INN
ER V
s N
eo4j
Query Execution Time (ms)
Org
Size
-Lev
els
MyS
QL
- LEF
T Jo
inM
ySQ
L - I
NNER
Neo4
j
MyS
QL
MyS
QL
MyS
QL
MyS
QL
Ne
o4j
Ne
o4j
Org
S
ize
-L
eve
ls
LEFT
JOIN
LE
FT JO
IN
INN
ER JO
ININ
NER
JOIN
cold
war
mco
ldw
arm
cold
war
m
1M-3
109
16156
16718
176
1M-4
172
0140
0740
182
1M-5
172
01154
874
721
184
1M-6
141
0531
312
709
173
1M-7
218
03760
3432
700
177
1M-8
265
01714515896
835
153
2M-8
281
04006136301
822
149
3M-8
234
06046661776
744
148
War
m C
ache
Plo
ts
Volume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 78 0 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269
4 94 15 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183
5 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221
6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319
7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357
8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207
0
50
100
150
200
3 4 5 6 7 8
164178
166 171 177164
1000 People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
375
750
1125
1500
3 4 5 6 7 8
760851
763
1124
9291040
1000 People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
-100
0
100
200
300
400
3 4 5 6 7 8
269
183221
319357
207
1000 People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
14
Neo4jMSSQLMySQL
Warm Cache Plots
Volume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613
4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455
5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328
6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453
7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456
8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36
0
75
150
225
300
3 4 5 6 7 8
176 178 184221
185 171
10K People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
750
1500
2250
3000
3 4 5 6 7 8
898 9521194
14581188 1271
10K People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
-175
0
175
350
525
700
3 4 5 6 7 8
613
455
328
453 456
36
10K People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
15
Warm Cache Plots
Neo4jMSSQLMySQL
Volume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566
4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594
5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497
6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573
7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592
8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557
0
250
500
750
1000
3 4 5 6 7 8166 167 167 175 177 183
100K People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
7500
15000
22500
30000
3 4 5 6 7 8
4404 4697 5516 6145 6915 8304
100K People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
150
300
450
600
3 4 5 6 7 8
566 594
497573 592
557
100K People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
16
Neo4JMSSQLMySQL
Warm Cache Plots
Volume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate AggregateGather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate AggregateGather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm
Cold Warm Cold Warm
Cold Warm Cold Warm Cold Warm
Cold Warm
Cold Warm Cold Warm
3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 371902109
4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 360332194
5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 340682248
6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 330152107
7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 328131960
8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 236731912
Warm Cache Plots
-2250
0
2250
4500
6750
9000
3 4 5 6 7 8
176 182 184 173 177 153
1M People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
75000
150000
225000
300000
3 4 5 6 7 8
44265
10626077528
124810125152
73121
1M People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
750
1500
2250
3000
3 4 5 6 7 8
2109 2194 2248 2107 1960 1912
1M People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
17
Neo4JMSSQLMySQL
Gather Overall Aggregate Query
start n = node(*) !return n.level as Level, count(n) as Total!order by Level
SELECT level, count(id)!FROM person!GROUP BY level
Gather Overall Aggregate Data
Volume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 78 0 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269
4 94 15 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183
5 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221
6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319
7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357
8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207
0
50
100
150
200
3 4 5 6 7 8
164178
166 171 177164
1000 People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
375
750
1125
1500
3 4 5 6 7 8
760851
763
1124
9291040
1000 People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
-100
0
100
200
300
400
3 4 5 6 7 8
269
183221
319357
207
1000 People - Overall AggregateQ
uery
Exe
cutio
n Ti
me
(ms)
Levels
14
Neo4jMSSQLMySQL
Warm Cache Plots
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
Vo
lum
e 1
000 P
eo
ple
Org
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MyS
QL
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
MS
-SQ
L
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Ne
o4j
Lvls
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Nam
es
Gat
her
Subo
rdin
ate
Aggr
egat
e
Gat
her
Subo
rdin
ate
Aggr
egat
e
Ove
rall
Aggr
egat
eO
vera
ll Ag
greg
ate
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
mCo
ldW
arm
Cold
War
m
378
0109
093
0226
049
06
06791643329
7601538
269
494
1594
1678
1611482133
6135
06701782878
851892183
5109
062
1547
021243138
1359
08131662848
7631533
221
6125
0156
6263
011878278
7226
0699171360711241387
319
710916
9447
62015190420
7113
07281773123
9291457
357
8110
0141
3178
023146621
8418
072116428791040906207
050100
150
200
34
56
78
164
178
166
171
177
164
1000
Peo
ple
- Gat
her S
ubor
dina
te N
ames
Query Execution Time (ms)
Leve
ls
375
750
1125
1500
34
56
78
760
851
763
1124
929
1040
1000
Peo
ple
- Gat
her S
ubor
dina
te A
ggre
gate
Query Execution Time (ms)
Leve
ls
-1000
100
200
300
400
34
56
78
269
183
221
319
357
207
1000
Peo
ple
- Ove
rall
Aggr
egat
e
Query Execution Time (ms)
Leve
ls
14
Ne
o4j
MS
SQ
LM
yS
QLW
arm
Cac
he P
lots
4.O
ne o
f the
impo
rtan
t obs
erva
tions
that
we
have
mad
e is
that
LEF
T jo
in in
MyS
QL
is fas
ter
as co
mpa
red
to IN
NER
join
(con
trary
to o
ur th
inking
- LE
FT jo
in w
ould
be
mor
e ex
pens
ive
as co
mpa
red
to IN
NER
). A
s all
the
abov
e qu
eries
are
func
tiona
lly e
quiva
lent,
it w
ould
not
be
fair
to u
se IN
NER
join
for
takin
g m
easu
rem
ents.
H
owev
er, w
e co
uld n
ot r
esist
the
cu
riosit
y to
che
ck o
ut IN
NER
join
perfo
rman
ce.
Exec
uting
the
abov
e qu
eries
usin
g inn
er jo
in on
1M
(all
leve
ls), 2
M a
nd 3
M (f
or le
vel 8
), w
e fo
und
a sig
nifica
nt in
crea
se in
que
ry e
xecu
tion
time
for
MyS
QL.
Belo
w ar
e th
e re
sults
of
rem
oving
the
LEFT
join
from
the
Gat
her S
ubor
dina
te N
ames
que
ry fo
r 1M
, 2M
Lev
el 8,
and
3M L
evel
8.
Wha
t doe
s this
mea
n to
us o
r any
app
licat
ion?
Say
, if th
e sit
uatio
n de
man
ded
that
we
chan
ge
the
LEFT
join
to IN
NER
join,
sudd
enly
we
wou
ld fi
nd o
urse
lves i
n a
bad
posit
ion
perfo
rman
ce
wise
.
Whe
reas
its
impo
rtan
t to
not
e th
at N
eo4j’
s pe
rform
ance
is a
lmos
t co
nsta
nt t
ime
exec
utio
n fo
r all d
ata
sizes
.
For
the
abov
e m
easu
rem
ents,
we
did
not
facto
r-in
IND
IREC
TLY_
MAN
AGES
rela
tions
hip,
how
ever
, tak
en in
to c
onsid
erat
ion,
it w
ould
tran
slate
to a
dditio
nal j
oin
for
SQL,
thus
blo
ating
th
e alr
eady
blo
ated
qu
ery
and
incre
asing
co
mpl
exity
. Th
is w
ill fu
rthe
r de
grad
e th
e pe
rform
ance
as j
oin
selec
tion
wou
ld n
ow ta
ke m
ore
time.
19
-175
000
1750
0
3500
0
5250
0
7000
0 1M-3
1M-5
1M-7
2M-8
LEFT
Vs
INN
ER V
s N
eo4j
Query Execution Time (ms)
Org
Size
-Lev
els
MyS
QL
- LEF
T Jo
inM
ySQ
L - I
NNER
Neo4
j
MyS
QL
MyS
QL
MyS
QL
MyS
QL
Ne
o4j
Ne
o4j
Org
S
ize
-L
eve
ls
LEFT
JOIN
LE
FT JO
IN
INN
ER JO
ININ
NER
JOIN
cold
war
mco
ldw
arm
cold
war
m
1M-3
109
16156
16718
176
1M-4
172
0140
0740
182
1M-5
172
01154
874
721
184
1M-6
141
0531
312
709
173
1M-7
218
03760
3432
700
177
1M-8
265
01714515896
835
153
2M-8
281
04006136301
822
149
3M-8
234
06046661776
744
148
War
m C
ache
Plo
ts
Volume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613
4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455
5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328
6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453
7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456
8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36
0
75
150
225
300
3 4 5 6 7 8
176 178 184221
185 171
10K People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
750
1500
2250
3000
3 4 5 6 7 8
898 9521194
14581188 1271
10K People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
-175
0
175
350
525
700
3 4 5 6 7 8
613
455
328
453 456
36
10K People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
15
Warm Cache Plots
Neo4jMSSQLMySQL
Volume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm
3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566
4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594
5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497
6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573
7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592
8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557
0
250
500
750
1000
3 4 5 6 7 8166 167 167 175 177 183
100K People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
7500
15000
22500
30000
3 4 5 6 7 8
4404 4697 5516 6145 6915 8304
100K People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
150
300
450
600
3 4 5 6 7 8
566 594
497573 592
557
100K People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
16
Neo4JMSSQLMySQL
Warm Cache Plots
Volume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People Org
MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j
Lvls Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate AggregateGather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate Aggregate
Gather Subordinate Aggregate
Overall AggregateOverall Aggregate
Gather Subordinate Names
Gather Subordinate Names
Gather Subordinate AggregateGather Subordinate Aggregate
Overall AggregateOverall Aggregate
Cold Warm
Cold Warm Cold Warm
Cold Warm Cold Warm Cold Warm
Cold Warm
Cold Warm Cold Warm
3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 371902109
4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 360332194
5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 340682248
6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 330152107
7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 328131960
8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 236731912
Warm Cache Plots
-2250
0
2250
4500
6750
9000
3 4 5 6 7 8
176 182 184 173 177 153
1M People - Gather Subordinate Names
Que
ry E
xecu
tion
Tim
e (m
s)Levels
0
75000
150000
225000
300000
3 4 5 6 7 8
44265
10626077528
124810125152
73121
1M People - Gather Subordinate Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
0
750
1500
2250
3000
3 4 5 6 7 8
2109 2194 2248 2107 1960 1912
1M People - Overall Aggregate
Que
ry E
xecu
tion
Tim
e (m
s)
Levels
17
Neo4JMSSQLMySQL
But, what if...• I need to aggregate data, placing an OLAPish demand on
the data?
• Use non-graph stores: RDBMS or NoSQL alongside.
• Graph Compute Engines are optimised for scanning and processing large amounts of information in batch.
• Giraph, Pegasus etc…
• Polyglot persistence is a norm.
• Has anyone tried Datomic?
Asking Questions• Is your data connected?
• Or does your domain naturally gravitate towards or has good number of joins (Dense Connections) leading to explosion with large data?
• For example: Making Recommendations
• Or are you finding yourself writing complex SQL Queries?
• For example: recursive joins or lots of joins
Qs?
References• Graph Databases
• Jim Webber, Ian Robinson and Emil Eifrem
• Apiary: A case for Neo4j?
• Anuj Mathur and Dhaval Dalal
• Code and scaffolding programs that we used are available on - https://github.com/ EqualExperts/Apiary-Neo4j-RDBMS-Comparison