A case-for-graph-db

40
A case for Graph Database? [email protected] @softwareartisan 11th Sept 2014

description

These days we hear a lot about NoSQL and particularly graph database. But before jumping straight into development using any graph database, we should ask the following questions - 'what makes it a case for GraphDB? And can you prove it?' Basically de-risking and making a case for management buy in. Further, its important to convince ourselves. This talk highlights some cases where GraphDB will be useful. Followed by some insights from a comparison we did between Neo4j and MySQL/MSSQL. By the end of this session, you'll understand the advantages of using a GraphDB and also what questions to ask before selecting a GraphDB. This was presented at TechJam on 11th Sept 2014

Transcript of A case-for-graph-db

Page 1: A case-for-graph-db

A case for Graph Database?

[email protected] !

@softwareartisan

11th Sept 2014

Page 2: A case-for-graph-db

Context

Direct and Cross-Functional reporting represents a network even for a simple organisation.

What about modelling a group?

Page 3: A case-for-graph-db

Apiary Functionality

Structural Operations Mine Organisational Data

!• Expand/Collapse levels • View lineage • Summary Data at all levels • CRUD on all data (nodes/

relationships) • Link/De-link Sub-graphs or

nodes • Evolving attributes of nodes

and relationships based • Adding new nodes and

relationships

!• Affinities Graph - Who talks to

whom the most • Discover Skills Communities • Detecting overlap using SLPA

(Speaker-listener Label Propogation)

Page 4: A case-for-graph-db

Convince ourself first!• Anyone should ask - “What makes it a case for

Graph DB? and Can you prove it?”

• Its basically a de-risking act.

• Two major aspects that we looked at

• Flexibility in schema evolution

• Performance

Page 5: A case-for-graph-db

What to compare against?• RDBMS’ are a natural choice to be compared against.

• MongoDB, though a NoSQL document store

• good for storing DDD style aggregates.

• not for inter-connected data.

• We picked Neo4j

• But remember, this is not a battle, we are just trying to find out when you should use what!

Page 6: A case-for-graph-db

Flexibility in Evolution

Page 7: A case-for-graph-db

Flexibility in Evolution

Page 8: A case-for-graph-db

Flexibility in Evolution• Entity Diversity

• Different kinds of nodes

Page 9: A case-for-graph-db

Flexibility in Evolution• Entity Diversity

• Different kinds of nodes

• Connection Diversity

• links could have different weights, directions.

Page 10: A case-for-graph-db

Flexibility in Evolution• Entity Diversity

• Different kinds of nodes

• Connection Diversity

• links could have different weights, directions.

• Evolution of Entities and Links, themselves over time.

• Varietal data needs

• Is every node/link structured regularly or irregularly, connected or disconnected nodes etc…

Page 11: A case-for-graph-db

Analysis Model (Phase 1)

1) Neo4J Domain Model

Node Properties

Person name

type

level

Relationships Properties

DIRECTLY_MANAGES N/A

Note: For the purpose of establishing the case, we have modeled minimal relationships and not all the relationships that would have been in the final application. Below is a list of relationships that are yet pending to be modeled, but are not relevant for the purposes of taking performance measurements.

Relationships Properties

INDIRECTLY_MANAGES N/A

DIRECTLY_REPORTS_TO N/A

INDIRECTLY_REPORTS_TO N/A

MEETS frequency, where

PREFERS_COMPANY_OF N/A

5

Analysis Model (Phase 1)

1) Neo4J Domain Model

Node Properties

Person name

type

level

Relationships Properties

DIRECTLY_MANAGES N/A

Note: For the purpose of establishing the case, we have modeled minimal relationships and not all the relationships that would have been in the final application. Below is a list of relationships that are yet pending to be modeled, but are not relevant for the purposes of taking performance measurements.

Relationships Properties

INDIRECTLY_MANAGES N/A

DIRECTLY_REPORTS_TO N/A

INDIRECTLY_REPORTS_TO N/A

MEETS frequency, where

PREFERS_COMPANY_OF N/A

5

2) SQL Domain Model

Queries

Above screen-flow and modeling for organizations and groups use cases requires us to run queries that do the following:

1) Gather Subordinates names till a visibility level from current level.

2) Gather Subordinates Aggregate Data from current level.

3) Gather Overall Aggregate Data for Dashboard

We have run these queries at different levels for various volumes of person-data ranging from 1000 people (needed by the organization use case) to 1 million people (needed by the Group use case) in steps of multiples of 10, i.e 1K, 10K, 100K, 1M. We have additionally taken readings for 2M and 3M organization at level 8.

Note: For Neo4j, we have restricted ourselves to Parametric Cypher queries (and not included Traversal API) by executing them and noting the execution time programmatically. This, we believe is a fair apple-to-apple comparison because just as what SQL is to RDBMS, Cypher is to Neo4j. Also, we understand that the Traversal API is likely the fastest at the moment; however, for the longer term as Neo4j intends to further the Cypher query planning and optimization, we think its prudent to stick to it.

6

Minimal set of functionality

Page 12: A case-for-graph-db

1) Gather Subordinates names till a visibility level from a current level

CURRENT LEVEL

We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:

start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)

Where visibility level is a number that indicates the number of levels to show.

For SQL we have to recursively add joins for each level, a generic SQL can be written as:

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

Visibility Level

Database Queries

2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)

2

MySQL/MSSQL

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

7

Names

Measured performance of 3 queries

• Subordinate names from current level until a visibility level

• Aggregate Data from current level until a visibility level

• Overall Aggregate Data for Dashboard

• distribution of people at various levels

2) Gather Subordinates Aggregate data from current level

We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people.

Further, the optimized Generic Cypher query is:

start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*0..(totalLevels - n.level)]->m-[:DIRECTLY_MANAGES*1..(totalLevels - n.level)]->owhere n.level + visibilityLevel >= m.levelreturn m.name as Subordinate, count(o) as Total

For SQL aggregate query, we not only have to recursively add joins but also perform inner unions for each level till the last level to obtain the aggregate data for that level. Once we obtain the data for a particular level (per person), we perform outer unions to get the final result for all the levels. This results in a very big SQL query.

Here is a sample query that returns aggregate data for 3 levels below the current level. Say, we have current level as 5 with a total of 8 levels, this means I need to aggregate data for each person starting below level 5 until 8 as well as for the person in question (in the where clause).

(SELECT T.directReportees AS directReportees, sum(T.count) AS count FROM ( SELECT manager.pid AS directReportees, 0 AS count! FROM person_reportee manager! WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") UNION ! SELECT manager.pid AS directReportees, count(manager.directly_manages) AS count FROM person_reportee manager WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees

UNION SELECT manager.pid AS directReportees, count(reportee.directly_manages) AS count FROM person_reportee manager JOIN person_reportee reportee ON manager.directly_manages = reportee.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName") GROUP BY directReportees UNION SELECT manager.pid AS directReportees, count(L2Reportees.directly_manages) AS count FROM person_reportee manager JOIN person_reportee L1Reportees ON manager.directly_manages = L1Reportees.pid JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid

9

AggregateData

Current Level

Page 13: A case-for-graph-db

Apple-to-Apple comparison• MySQL, MS-SQL and Neo4j

• We did not use Traversal API (though faster), just as Cypher is to Neo4j as SQL is to RDBMS’

• For longer term, Neo4j intends to further Cypher Query planning and optimisation.

• Indexing

• Enabled on join columns for MySQL and MS-SQL DBs

• For Neo4j, Person names were indexed

Page 14: A case-for-graph-db

Environment Consistency• Same machine for all databases

• MySQL v5.6.12, MS-SQL Server 2008 R2, Neo4j v1.9 (Advanced)

• DB and tools on the same machine

• Avoid network transport times being factored in.

• Out-of-box settings for all DBs apart from giving 3 GB to the java process that ran Neo4j in Embedded mode.

Page 15: A case-for-graph-db

Functional Equivalence• Consistent data distribution.

• 8 Levels in org with people at each level managing the next for all DBs.

• Functionally equivalent queries.

• Measurements for worst possible queries scenario being executed by the application

• Say if the top-boss logs in and wants to see all the levels (max. visibility level), the query will take the most time.

Page 16: A case-for-graph-db

Measurement Tools• MS-SQL

• Query Profiler

• MySQL

• we noted the duration (excluding fetch time) from MySQL workbench

• Neo4j

• Executed Parameteric queries programatically in Embedded Mode.

• Did not use Neo4j shell for measurements as its intended to be an Ops tool (and not a transactional tool).

Page 17: A case-for-graph-db

Gather Subordinate Names Query

1) Gather Subordinates names till a visibility level from a current level

We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:

start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)

Where visibility level is a number that indicates the number of levels to show.

For SQL we have to recursively add joins for each level, a generic SQL can be written as:

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

Visibility Level

Database Queries

2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)

2

MySQL/MSSQL

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

7

1) Gather Subordinates names till a visibility level from a current level

We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:

start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)

Where visibility level is a number that indicates the number of levels to show.

For SQL we have to recursively add joins for each level, a generic SQL can be written as:

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

Visibility Level

Database Queries

2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)

2

MySQL/MSSQL

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

7

1) Gather Subordinates names till a visibility level from a current level

CURRENT LEVEL

We have varied total hierarchy levels in the organization from 3 to 8 for different volumes of people. Optimized generic Cypher query is:

start n = node:Person(name = "fName lName")match p = n-[:DIRECTLY_MANAGES*1..visibilityLevel]->mreturn nodes(p)

Where visibility level is a number that indicates the number of levels to show.

For SQL we have to recursively add joins for each level, a generic SQL can be written as:

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS Reportee1,L2Reportees.directly_manages AS Reportee2,...FROM person_reportee manager LEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid LEFT JOIN person_reportee L2Reportees ON L1Reportees.directly_manages = L2Reportees.pid ... ... WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

Visibility Level

Database Queries

2 Neo4j start n = node:Person(name = "fName lName")match n-[:DIRECTLY_MANAGES*1..2]->mreturn nodes(p)

2

MySQL/MSSQL

SELECT manager.pid AS Bigboss, manager.directly_manages AS Subordinate, L1Reportees.directly_manages AS ReporteeFROM person_reportee managerLEFT JOIN person_reportee L1ReporteesON manager.directly_manages = L1Reportees.pid WHERE manager.pid = (SELECT id FROM person WHERE name = "fName lName")

7

Names

Page 18: A case-for-graph-db

Gather Subordinate Names

Volume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 78 0 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269

4 94 15 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183

5 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221

6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319

7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357

8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207

0

50

100

150

200

3 4 5 6 7 8

164178

166 171 177164

1000 People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

375

750

1125

1500

3 4 5 6 7 8

760851

763

1124

9291040

1000 People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

-100

0

100

200

300

400

3 4 5 6 7 8

269

183221

319357

207

1000 People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

14

Neo4jMSSQLMySQL

Warm Cache Plots

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Lvls

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

m

378

0109

093

0226

049

06

06791643329

7601538

269

494

1594

1678

1611482133

6135

06701782878

851892183

5109

062

1547

021243138

1359

08131662848

7631533

221

6125

0156

6263

011878278

7226

0699171360711241387

319

710916

9447

62015190420

7113

07281773123

9291457

357

8110

0141

3178

023146621

8418

072116428791040906207

050100

150

200

34

56

78

164

178

166

171

177

164

1000

Peo

ple

- Gat

her S

ubor

dina

te N

ames

Query Execution Time (ms)

Leve

ls

375

750

1125

1500

34

56

78

760

851

763

1124

929

1040

1000

Peo

ple

- Gat

her S

ubor

dina

te A

ggre

gate

Query Execution Time (ms)

Leve

ls

-1000

100

200

300

400

34

56

78

269

183

221

319

357

207

1000

Peo

ple

- Ove

rall

Aggr

egat

e

Query Execution Time (ms)

Leve

ls

14

Ne

o4j

MS

SQ

LM

yS

QLW

arm

Cac

he P

lots

Volume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613

4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455

5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328

6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453

7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456

8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36

0

75

150

225

300

3 4 5 6 7 8

176 178 184221

185 171

10K People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

750

1500

2250

3000

3 4 5 6 7 8

898 9521194

14581188 1271

10K People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

-175

0

175

350

525

700

3 4 5 6 7 8

613

455

328

453 456

36

10K People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

15

Warm Cache Plots

Neo4jMSSQLMySQL

Volume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566

4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594

5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497

6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573

7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592

8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557

0

250

500

750

1000

3 4 5 6 7 8166 167 167 175 177 183

100K People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

7500

15000

22500

30000

3 4 5 6 7 8

4404 4697 5516 6145 6915 8304

100K People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

150

300

450

600

3 4 5 6 7 8

566 594

497573 592

557

100K People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)Levels

16

Neo4JMSSQLMySQL

Warm Cache Plots

Volume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate AggregateGather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate AggregateGather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm

Cold Warm Cold Warm

Cold Warm Cold Warm Cold Warm

Cold Warm

Cold Warm Cold Warm

3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 371902109

4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 360332194

5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 340682248

6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 330152107

7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 328131960

8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 236731912

Warm Cache Plots

-2250

0

2250

4500

6750

9000

3 4 5 6 7 8

176 182 184 173 177 153

1M People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

75000

150000

225000

300000

3 4 5 6 7 8

44265

10626077528

124810125152

73121

1M People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

750

1500

2250

3000

3 4 5 6 7 8

2109 2194 2248 2107 1960 1912

1M People - Overall AggregateQ

uery

Exe

cutio

n Ti

me

(ms)

Levels

17

Neo4JMSSQLMySQL

4.O

ne o

f the

impo

rtan

t obs

erva

tions

that

we

have

mad

e is

that

LEF

T jo

in in

MyS

QL

is fas

ter

as co

mpa

red

to IN

NER

join

(con

trary

to o

ur th

inking

- LE

FT jo

in w

ould

be

mor

e ex

pens

ive

as co

mpa

red

to IN

NER

). A

s all

the

abov

e qu

eries

are

func

tiona

lly e

quiva

lent,

it w

ould

not

be

fair

to u

se IN

NER

join

for

takin

g m

easu

rem

ents.

H

owev

er, w

e co

uld n

ot r

esist

the

cu

riosit

y to

che

ck o

ut IN

NER

join

perfo

rman

ce.

Exec

uting

the

abov

e qu

eries

usin

g inn

er jo

in on

1M

(all

leve

ls), 2

M a

nd 3

M (f

or le

vel 8

), w

e fo

und

a sig

nifica

nt in

crea

se in

que

ry e

xecu

tion

time

for

MyS

QL.

Belo

w ar

e th

e re

sults

of

rem

oving

the

LEFT

join

from

the

Gat

her S

ubor

dina

te N

ames

que

ry fo

r 1M

, 2M

Lev

el 8,

and

3M L

evel

8.

Wha

t doe

s this

mea

n to

us o

r any

app

licat

ion?

Say

, if th

e sit

uatio

n de

man

ded

that

we

chan

ge

the

LEFT

join

to IN

NER

join,

sudd

enly

we

wou

ld fi

nd o

urse

lves i

n a

bad

posit

ion

perfo

rman

ce

wise

.

Whe

reas

its

impo

rtan

t to

not

e th

at N

eo4j’

s pe

rform

ance

is a

lmos

t co

nsta

nt t

ime

exec

utio

n fo

r all d

ata

sizes

.

For

the

abov

e m

easu

rem

ents,

we

did

not

facto

r-in

IND

IREC

TLY_

MAN

AGES

rela

tions

hip,

how

ever

, tak

en in

to c

onsid

erat

ion,

it w

ould

tran

slate

to a

dditio

nal j

oin

for

SQL,

thus

blo

ating

th

e alr

eady

blo

ated

qu

ery

and

incre

asing

co

mpl

exity

. Th

is w

ill fu

rthe

r de

grad

e th

e pe

rform

ance

as j

oin

selec

tion

wou

ld n

ow ta

ke m

ore

time.

19

-175

000

1750

0

3500

0

5250

0

7000

0 1M-3

1M-5

1M-7

2M-8

LEFT

Vs

INN

ER V

s N

eo4j

Query Execution Time (ms)

Org

Size

-Lev

els

MyS

QL

- LEF

T Jo

inM

ySQ

L - I

NNER

Neo4

j

MyS

QL

MyS

QL

MyS

QL

MyS

QL

Ne

o4j

Ne

o4j

Org

S

ize

-L

eve

ls

LEFT

JOIN

LE

FT JO

IN

INN

ER JO

ININ

NER

JOIN

cold

war

mco

ldw

arm

cold

war

m

1M-3

109

16156

16718

176

1M-4

172

0140

0740

182

1M-5

172

01154

874

721

184

1M-6

141

0531

312

709

173

1M-7

218

03760

3432

700

177

1M-8

265

01714515896

835

153

2M-8

281

04006136301

822

149

3M-8

234

06046661776

744

148

War

m C

ache

Plo

ts

Page 19: A case-for-graph-db

MySQL - Left Vs Inner Join

4. One of the important observations that we have made is that LEFT join in MySQL is faster as compared to INNER join (contrary to our thinking - LEFT join would be more expensive as compared to INNER). As all the above queries are functionally equivalent, it would not be fair to use INNER join for taking measurements. However, we could not resist the curiosity to check out INNER join performance.

Executing the above queries using inner join on 1M (all levels), 2M and 3M (for level 8), we found a significant increase in query execution time for MySQL. Below are the results of removing the LEFT join from the Gather Subordinate Names query for 1M, 2M Level 8, and 3M Level 8.

What does this mean to us or any application? Say, if the situation demanded that we change the LEFT join to INNER join, suddenly we would find ourselves in a bad position performance wise. Whereas its important to note that Neo4j’s performance is almost constant time execution for all data sizes.

For the above measurements, we did not factor-in INDIRECTLY_MANAGES relationship, however, taken into consideration, it would translate to additional join for SQL, thus bloating the already bloated query and increasing complexity. This will further degrade the performance as join selection would now take more time.

19

-17500

0

17500

35000

52500

70000

1M-3 1M-5 1M-7 2M-8

LEFT Vs INNER Vs Neo4jQ

uery

Exe

cutio

n Ti

me

(ms)

Org Size-LevelsMySQL - LEFT Join MySQL - INNER Neo4j

MySQLMySQLMySQLMySQL Neo4jNeo4j

Org Size-Levels

LEFT JOIN LEFT JOIN INNER JOININNER JOIN

cold warm cold warm cold warm

1M-3 109 16 156 16 718 176

1M-4 172 0 140 0 740 182

1M-5 172 0 1154 874 721 184

1M-6 141 0 531 312 709 173

1M-7 218 0 3760 3432 700 177

1M-8 265 0 17145 15896 835 153

2M-8 281 0 40061 36301 822 149

3M-8 234 0 60466 61776 744 148

Warm Cache Plots

Page 20: A case-for-graph-db

Performance

Page 21: A case-for-graph-db

Performance

Page 22: A case-for-graph-db

Performance

1. Cartesian Product All possible combinations

Page 23: A case-for-graph-db

Performance

1. Cartesian Product All possible combinations

Page 24: A case-for-graph-db

Performance

2. Filter

1. Cartesian Product All possible combinations

Page 25: A case-for-graph-db

Performance

2. Filter

1. Cartesian Product All possible combinations

Page 26: A case-for-graph-db

PerformanceData Size

2. Filter

1. Cartesian Product All possible combinations

DS

Page 27: A case-for-graph-db

PerformanceData Size

2. Filter

1. Cartesian Product All possible combinations

DSTime

Page 28: A case-for-graph-db

JoinsPerformance

Data Size

2. Filter

1. Cartesian Product All possible combinations

DSTime

| J

Page 29: A case-for-graph-db

JoinsPerformance

Data Size

2. Filter

1. Cartesian Product All possible combinations

Project

People

Skills

DSTime

| J

Page 30: A case-for-graph-db

JoinsPerformance

Data Size

2. Filter

1. Cartesian Product All possible combinations

Project

People

Skills

DSTime

Query is now localised to a section of graph,

solves large data-size problem| J

Page 31: A case-for-graph-db

JoinsPerformance

Data Size

2. Filter

1. Cartesian Product All possible combinations

Project

People

Skills

DSTime

Traversal Query is now localised to a section of graph,

solves large data-size problem| J

Page 32: A case-for-graph-db

JoinsPerformance

Data Size

2. Filter

1. Cartesian Product All possible combinations

Project

People

Skills

DSTime Time

Traversal Query is now localised to a section of graph,

solves large data-size problem| J

Page 33: A case-for-graph-db

Relationships• Not first-class citizens in RDBMS’ and NoSQL aggregate stores

• document stores or key-value stores.

• Cost of executing connected queries is high.

• “Who reports immediately to Smarty Pants?” - not a problem

• But, “Who all report to Smarty Pants?” - introduces recursive joins and going down more than 5-6 levels the space and time complexity is very high.

• “What skills does this person have?“ is relatively cheaper than “which people have these skills?”

• what about - “which are people who have these skills also have those skills?”

Page 34: A case-for-graph-db

Gather Subordinates Aggregate Data

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Lvls

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

m

378

0109

093

0226

049

06

06791643329

7601538

269

494

1594

1678

1611482133

6135

06701782878

851892183

5109

062

1547

021243138

1359

08131662848

7631533

221

6125

0156

6263

011878278

7226

0699171360711241387

319

710916

9447

62015190420

7113

07281773123

9291457

357

8110

0141

3178

023146621

8418

072116428791040906207

050100

150

200

34

56

78

164

178

166

171

177

164

1000

Peo

ple

- Gat

her S

ubor

dina

te N

ames

Query Execution Time (ms)

Leve

ls

375

750

1125

1500

34

56

78

760

851

763

1124

929

1040

1000

Peo

ple

- Gat

her S

ubor

dina

te A

ggre

gate

Query Execution Time (ms)

Leve

ls

-1000

100

200

300

400

34

56

78

269

183

221

319

357

207

1000

Peo

ple

- Ove

rall

Aggr

egat

e

Query Execution Time (ms)

Leve

ls

14

Ne

o4j

MS

SQ

LM

yS

QLW

arm

Cac

he P

lots

4.O

ne o

f the

impo

rtan

t obs

erva

tions

that

we

have

mad

e is

that

LEF

T jo

in in

MyS

QL

is fas

ter

as co

mpa

red

to IN

NER

join

(con

trary

to o

ur th

inking

- LE

FT jo

in w

ould

be

mor

e ex

pens

ive

as co

mpa

red

to IN

NER

). A

s all

the

abov

e qu

eries

are

func

tiona

lly e

quiva

lent,

it w

ould

not

be

fair

to u

se IN

NER

join

for

takin

g m

easu

rem

ents.

H

owev

er, w

e co

uld n

ot r

esist

the

cu

riosit

y to

che

ck o

ut IN

NER

join

perfo

rman

ce.

Exec

uting

the

abov

e qu

eries

usin

g inn

er jo

in on

1M

(all

leve

ls), 2

M a

nd 3

M (f

or le

vel 8

), w

e fo

und

a sig

nifica

nt in

crea

se in

que

ry e

xecu

tion

time

for

MyS

QL.

Belo

w ar

e th

e re

sults

of

rem

oving

the

LEFT

join

from

the

Gat

her S

ubor

dina

te N

ames

que

ry fo

r 1M

, 2M

Lev

el 8,

and

3M L

evel

8.

Wha

t doe

s this

mea

n to

us o

r any

app

licat

ion?

Say

, if th

e sit

uatio

n de

man

ded

that

we

chan

ge

the

LEFT

join

to IN

NER

join,

sudd

enly

we

wou

ld fi

nd o

urse

lves i

n a

bad

posit

ion

perfo

rman

ce

wise

.

Whe

reas

its

impo

rtan

t to

not

e th

at N

eo4j’

s pe

rform

ance

is a

lmos

t co

nsta

nt t

ime

exec

utio

n fo

r all d

ata

sizes

.

For

the

abov

e m

easu

rem

ents,

we

did

not

facto

r-in

IND

IREC

TLY_

MAN

AGES

rela

tions

hip,

how

ever

, tak

en in

to c

onsid

erat

ion,

it w

ould

tran

slate

to a

dditio

nal j

oin

for

SQL,

thus

blo

ating

th

e alr

eady

blo

ated

qu

ery

and

incre

asing

co

mpl

exity

. Th

is w

ill fu

rthe

r de

grad

e th

e pe

rform

ance

as j

oin

selec

tion

wou

ld n

ow ta

ke m

ore

time.

19

-175

000

1750

0

3500

0

5250

0

7000

0 1M-3

1M-5

1M-7

2M-8

LEFT

Vs

INN

ER V

s N

eo4j

Query Execution Time (ms)

Org

Size

-Lev

els

MyS

QL

- LEF

T Jo

inM

ySQ

L - I

NNER

Neo4

j

MyS

QL

MyS

QL

MyS

QL

MyS

QL

Ne

o4j

Ne

o4j

Org

S

ize

-L

eve

ls

LEFT

JOIN

LE

FT JO

IN

INN

ER JO

ININ

NER

JOIN

cold

war

mco

ldw

arm

cold

war

m

1M-3

109

16156

16718

176

1M-4

172

0140

0740

182

1M-5

172

01154

874

721

184

1M-6

141

0531

312

709

173

1M-7

218

03760

3432

700

177

1M-8

265

01714515896

835

153

2M-8

281

04006136301

822

149

3M-8

234

06046661776

744

148

War

m C

ache

Plo

ts

Volume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 78 0 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269

4 94 15 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183

5 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221

6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319

7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357

8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207

0

50

100

150

200

3 4 5 6 7 8

164178

166 171 177164

1000 People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

375

750

1125

1500

3 4 5 6 7 8

760851

763

1124

9291040

1000 People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

-100

0

100

200

300

400

3 4 5 6 7 8

269

183221

319357

207

1000 People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

14

Neo4jMSSQLMySQL

Warm Cache Plots

Volume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613

4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455

5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328

6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453

7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456

8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36

0

75

150

225

300

3 4 5 6 7 8

176 178 184221

185 171

10K People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

750

1500

2250

3000

3 4 5 6 7 8

898 9521194

14581188 1271

10K People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

-175

0

175

350

525

700

3 4 5 6 7 8

613

455

328

453 456

36

10K People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

15

Warm Cache Plots

Neo4jMSSQLMySQL

Volume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566

4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594

5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497

6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573

7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592

8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557

0

250

500

750

1000

3 4 5 6 7 8166 167 167 175 177 183

100K People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

7500

15000

22500

30000

3 4 5 6 7 8

4404 4697 5516 6145 6915 8304

100K People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

150

300

450

600

3 4 5 6 7 8

566 594

497573 592

557

100K People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

16

Neo4JMSSQLMySQL

Warm Cache Plots

Volume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate AggregateGather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate AggregateGather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm

Cold Warm Cold Warm

Cold Warm Cold Warm Cold Warm

Cold Warm

Cold Warm Cold Warm

3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 371902109

4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 360332194

5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 340682248

6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 330152107

7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 328131960

8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 236731912

Warm Cache Plots

-2250

0

2250

4500

6750

9000

3 4 5 6 7 8

176 182 184 173 177 153

1M People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

75000

150000

225000

300000

3 4 5 6 7 8

44265

10626077528

124810125152

73121

1M People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

750

1500

2250

3000

3 4 5 6 7 8

2109 2194 2248 2107 1960 1912

1M People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

17

Neo4JMSSQLMySQL

Page 35: A case-for-graph-db

Gather Overall Aggregate Query

start n = node(*) !return n.level as Level, count(n) as Total!order by Level

SELECT level, count(id)!FROM person!GROUP BY level

Page 36: A case-for-graph-db

Gather Overall Aggregate Data

Volume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People OrgVolume 1000 People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 78 0 109 0 93 0 226 0 49 0 6 0 679 164 3329 760 1538 269

4 94 15 94 16 78 16 114 82 133 61 35 0 670 178 2878 851 892 183

5 109 0 62 15 47 0 212 43 138 13 59 0 813 166 2848 763 1533 221

6 125 0 156 62 63 0 118 78 278 72 26 0 699 171 3607 1124 1387 319

7 109 16 94 47 62 0 151 90 420 71 13 0 728 177 3123 929 1457 357

8 110 0 141 31 78 0 231 46 621 84 18 0 721 164 2879 1040 906 207

0

50

100

150

200

3 4 5 6 7 8

164178

166 171 177164

1000 People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

375

750

1125

1500

3 4 5 6 7 8

760851

763

1124

9291040

1000 People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

-100

0

100

200

300

400

3 4 5 6 7 8

269

183221

319357

207

1000 People - Overall AggregateQ

uery

Exe

cutio

n Ti

me

(ms)

Levels

14

Neo4jMSSQLMySQL

Warm Cache Plots

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

Vo

lum

e 1

000 P

eo

ple

Org

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MyS

QL

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

MS

-SQ

L

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Ne

o4j

Lvls

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Nam

es

Gat

her

Subo

rdin

ate

Aggr

egat

e

Gat

her

Subo

rdin

ate

Aggr

egat

e

Ove

rall

Aggr

egat

eO

vera

ll Ag

greg

ate

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

mCo

ldW

arm

Cold

War

m

378

0109

093

0226

049

06

06791643329

7601538

269

494

1594

1678

1611482133

6135

06701782878

851892183

5109

062

1547

021243138

1359

08131662848

7631533

221

6125

0156

6263

011878278

7226

0699171360711241387

319

710916

9447

62015190420

7113

07281773123

9291457

357

8110

0141

3178

023146621

8418

072116428791040906207

050100

150

200

34

56

78

164

178

166

171

177

164

1000

Peo

ple

- Gat

her S

ubor

dina

te N

ames

Query Execution Time (ms)

Leve

ls

375

750

1125

1500

34

56

78

760

851

763

1124

929

1040

1000

Peo

ple

- Gat

her S

ubor

dina

te A

ggre

gate

Query Execution Time (ms)

Leve

ls

-1000

100

200

300

400

34

56

78

269

183

221

319

357

207

1000

Peo

ple

- Ove

rall

Aggr

egat

e

Query Execution Time (ms)

Leve

ls

14

Ne

o4j

MS

SQ

LM

yS

QLW

arm

Cac

he P

lots

4.O

ne o

f the

impo

rtan

t obs

erva

tions

that

we

have

mad

e is

that

LEF

T jo

in in

MyS

QL

is fas

ter

as co

mpa

red

to IN

NER

join

(con

trary

to o

ur th

inking

- LE

FT jo

in w

ould

be

mor

e ex

pens

ive

as co

mpa

red

to IN

NER

). A

s all

the

abov

e qu

eries

are

func

tiona

lly e

quiva

lent,

it w

ould

not

be

fair

to u

se IN

NER

join

for

takin

g m

easu

rem

ents.

H

owev

er, w

e co

uld n

ot r

esist

the

cu

riosit

y to

che

ck o

ut IN

NER

join

perfo

rman

ce.

Exec

uting

the

abov

e qu

eries

usin

g inn

er jo

in on

1M

(all

leve

ls), 2

M a

nd 3

M (f

or le

vel 8

), w

e fo

und

a sig

nifica

nt in

crea

se in

que

ry e

xecu

tion

time

for

MyS

QL.

Belo

w ar

e th

e re

sults

of

rem

oving

the

LEFT

join

from

the

Gat

her S

ubor

dina

te N

ames

que

ry fo

r 1M

, 2M

Lev

el 8,

and

3M L

evel

8.

Wha

t doe

s this

mea

n to

us o

r any

app

licat

ion?

Say

, if th

e sit

uatio

n de

man

ded

that

we

chan

ge

the

LEFT

join

to IN

NER

join,

sudd

enly

we

wou

ld fi

nd o

urse

lves i

n a

bad

posit

ion

perfo

rman

ce

wise

.

Whe

reas

its

impo

rtan

t to

not

e th

at N

eo4j’

s pe

rform

ance

is a

lmos

t co

nsta

nt t

ime

exec

utio

n fo

r all d

ata

sizes

.

For

the

abov

e m

easu

rem

ents,

we

did

not

facto

r-in

IND

IREC

TLY_

MAN

AGES

rela

tions

hip,

how

ever

, tak

en in

to c

onsid

erat

ion,

it w

ould

tran

slate

to a

dditio

nal j

oin

for

SQL,

thus

blo

ating

th

e alr

eady

blo

ated

qu

ery

and

incre

asing

co

mpl

exity

. Th

is w

ill fu

rthe

r de

grad

e th

e pe

rform

ance

as j

oin

selec

tion

wou

ld n

ow ta

ke m

ore

time.

19

-175

000

1750

0

3500

0

5250

0

7000

0 1M-3

1M-5

1M-7

2M-8

LEFT

Vs

INN

ER V

s N

eo4j

Query Execution Time (ms)

Org

Size

-Lev

els

MyS

QL

- LEF

T Jo

inM

ySQ

L - I

NNER

Neo4

j

MyS

QL

MyS

QL

MyS

QL

MyS

QL

Ne

o4j

Ne

o4j

Org

S

ize

-L

eve

ls

LEFT

JOIN

LE

FT JO

IN

INN

ER JO

ININ

NER

JOIN

cold

war

mco

ldw

arm

cold

war

m

1M-3

109

16156

16718

176

1M-4

172

0140

0740

182

1M-5

172

01154

874

721

184

1M-6

141

0531

312

709

173

1M-7

218

03760

3432

700

177

1M-8

265

01714515896

835

153

2M-8

281

04006136301

822

149

3M-8

234

06046661776

744

148

War

m C

ache

Plo

ts

Volume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People OrgVolume 10K People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 109 0 140 31 109 15 172 107 294 181 103 1 730 176 6129 898 2292 613

4 141 16 343 234 109 0 229 116 1267 156 102 1 776 178 5865 952 2073 455

5 94 15 640 561 94 15 270 164 403 188 327 1 727 184 5718 1194 2119 328

6 94 0 998 905 78 0 401 123 650 254 38 3 742 221 5692 1458 3182 453

7 93 0 1482 1326 78 0 303 177 870 374 68 1 705 185 5710 1188 1990 456

8 93 0 2745 2683 78 0 387 163 1146 573 71 1 713 171 6002 1271 1976 36

0

75

150

225

300

3 4 5 6 7 8

176 178 184221

185 171

10K People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

750

1500

2250

3000

3 4 5 6 7 8

898 9521194

14581188 1271

10K People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

-175

0

175

350

525

700

3 4 5 6 7 8

613

455

328

453 456

36

10K People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

15

Warm Cache Plots

Neo4jMSSQLMySQL

Volume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People OrgVolume 100K People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm Cold Warm

3 124 0 1107 998 140 31 617 537 1201 608 168 12 736 166 13340 4404 5157 566

4 109 0 2231 2106 140 31 1224 650 1263 667 157 19 812 167 14686 4697 5396 594

5 124 16 5616 5460 125 31 1377 757 1830 913 194 12 713 167 15351 5516 5118 497

6 109 0 9734 9625 140 31 1006 811 3002 1274 159 12 756 175 17056 6145 4736 573

7 93 0 16271 16100 156 31 1125 784 2947 1566 158 14 700 177 17821 6915 4592 592

8 109 0 25334 25287 125 32 1285 951 4256 2171 510 14 691 183 19881 8304 4731 557

0

250

500

750

1000

3 4 5 6 7 8166 167 167 175 177 183

100K People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

7500

15000

22500

30000

3 4 5 6 7 8

4404 4697 5516 6145 6915 8304

100K People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

150

300

450

600

3 4 5 6 7 8

566 594

497573 592

557

100K People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

16

Neo4JMSSQLMySQL

Warm Cache Plots

Volume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People OrgVolume 1M People Org

MySQLMySQLMySQLMySQLMySQLMySQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL MS-SQL Neo4jNeo4jNeo4jNeo4jNeo4jNeo4j

Lvls Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate AggregateGather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate Aggregate

Gather Subordinate Aggregate

Overall AggregateOverall Aggregate

Gather Subordinate Names

Gather Subordinate Names

Gather Subordinate AggregateGather Subordinate Aggregate

Overall AggregateOverall Aggregate

Cold Warm

Cold Warm Cold Warm

Cold Warm Cold Warm Cold Warm

Cold Warm

Cold Warm Cold Warm

3 109 16 10998 10171 452 281 6337 4973 22703 20037 798 116 718 176 109432 44265 371902109

4 172 0 20467 19734 515 280 5677 5280 14888 7813 1245 114 740 182 115011 106260 360332194

5 172 16 60606 60653 531 281 7111 5959 12047 8293 1211 140 721 184 169123 77528 340682248

6 141 0 92789 92867 483 265 7278 6266 15226 7076 1251 140 709 173 255813 124810 330152107

7 218 0 160151 158793 499 280 11890 6724 15226 7076 1330 111 700 177 171607 125152 328131960

8 265 0 281301 280630 577 265 11300 8777 20560 14564 976 121 835 153 136432 73121 236731912

Warm Cache Plots

-2250

0

2250

4500

6750

9000

3 4 5 6 7 8

176 182 184 173 177 153

1M People - Gather Subordinate Names

Que

ry E

xecu

tion

Tim

e (m

s)Levels

0

75000

150000

225000

300000

3 4 5 6 7 8

44265

10626077528

124810125152

73121

1M People - Gather Subordinate Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

0

750

1500

2250

3000

3 4 5 6 7 8

2109 2194 2248 2107 1960 1912

1M People - Overall Aggregate

Que

ry E

xecu

tion

Tim

e (m

s)

Levels

17

Neo4JMSSQLMySQL

Page 37: A case-for-graph-db

But, what if...• I need to aggregate data, placing an OLAPish demand on

the data?

• Use non-graph stores: RDBMS or NoSQL alongside.

• Graph Compute Engines are optimised for scanning and processing large amounts of information in batch.

• Giraph, Pegasus etc…

• Polyglot persistence is a norm.

• Has anyone tried Datomic?

Page 38: A case-for-graph-db

Asking Questions• Is your data connected?

• Or does your domain naturally gravitate towards or has good number of joins (Dense Connections) leading to explosion with large data?

• For example: Making Recommendations

• Or are you finding yourself writing complex SQL Queries?

• For example: recursive joins or lots of joins

Page 39: A case-for-graph-db

Qs?

Page 40: A case-for-graph-db

References• Graph Databases

• Jim Webber, Ian Robinson and Emil Eifrem

• Apiary: A case for Neo4j?

• Anuj Mathur and Dhaval Dalal

• Code and scaffolding programs that we used are available on - https://github.com/ EqualExperts/Apiary-Neo4j-RDBMS-Comparison