Post on 23-Jul-2018
How graph databases started the multi-model revolutionLuca GarulliAuthor and CEO @OrientDB
QCon Sao Paulo - March 26, 2015
“90% of the data in the world today has been created in the last two years alone.”
- IBM
Welcome to Big Data
Just Data
Order #134 (Order) Luca
(Provider)
Commodore Amiga 1200
(Product)
Jill (Customer)
Monitor 40” (Product)
Mouse (Product)
Bruno (Provider)
Just Data
Order #134 (Order) Luca
(Provider)
Commodore Amiga 1200
(Product)
Jill (Customer)
Monitor 40” (Product)
Mouse (Product)
Bruno (Provider)
Data by itself has little value, it’s the relationship between data that gives it
incredible value
Relationships give data “meaning”
Order #134 (Order) Luca
(Provider)
Commodore Amiga 1200
(Product)
(Sells)
Jill (Customer)
(Has)(Makes)
Monitor 40” (Product)
(Sells)(Has)
Mouse (Product)
Bruno (Provider)
(Sells)
(Has)
Joins is the Evil
ID Name
10 John
11 John
24 Mike
28 Mike
ID Address
10 24
10 33
32 44
ID Location
24 Milan
33 London
18 Paris
18 Madrid
44 Moscow
Customer CustomerAddress Address
Is this familiar?
A-‐Z
A-‐L M-‐Z
Imagine an Address Book
where we want to find Luca’s phone number
Index Lookup: how does it work?
A-‐Z
A-‐L M-‐Z
A-‐L
A-‐D E-‐L
M-‐Z
M-‐R S-‐Z
Index algorithms are all similar and based on
balanced trees
Index Lookup: how does it work?
A-‐Z
A-‐L M-‐Z
A-‐L
A-‐D E-‐L
M-‐Z
M-‐R S-‐Z
A-‐D
A-‐B C-‐D
E-‐L
E-‐G H-‐L
Index Lookup: how does it work?
A-‐Z
A-‐L M-‐Z
A-‐L
A-‐D E-‐L
M-‐Z
M-‐R S-‐Z
A-‐D
A-‐B C-‐D
E-‐L
E-‐G H-‐L
E-‐G
E-‐F G
H-‐L
H-‐J K-‐L
Index Lookup: how does it work?
Index Lookup: how does it work?
A-‐Z
A-‐L M-‐Z
A-‐L
A-‐D E-‐L
M-‐Z
M-‐R S-‐Z
A-‐D
A-‐B C-‐D
E-‐L
E-‐G H-‐L
E-‐G
E-‐F G
H-‐L
H-‐J K-‐L
Luca
Found! This lookup took 5 steps. With millions of indexed records, the tree depth could be 1000’s of levels!
Joins Kill Performance
ID Name
10 John
11 John
24 Mike
28 Mike
ID Address
10 24
10 33
32 44
ID Location
24 Milan
33 London
18 Paris
18 Madrid
44 Moscow
Customer CustomerAddress AddressJoins are executed every time
you cross relationships
Querying million of records joining 3-4 tables could
generate billions of combinations
In a world that’s becoming more connected, we need a better way to store data and manage relationships
Read: Data is important, but relationships are even more fundamental today
“A graph database is any storage system that provides
index-free adjacency”
- Marko Rodriguez (author of TinkerPop Blueprints)
Vertices and Edges can have properties
Vertices are directed
* https://github.com/tinkerpop/blueprints/wiki/Property-‐Graph-‐Model
Property Graph Model*
Sao Paulo
people: 12,000,000
Luca company:
OrientTechnologies
Vertices and Edges can have properties
Vertices and Edges can have properties
Visited on: 2015
Luca Sao Paulo
Visited on: 2015
An Edge connects only 2 vertices
Use multiple edges to represent 1-‐N and N-‐M relationships
Worked on: 2015
1-N and N-M Relationships
How does a true* Graph Database
manage relationships?
*a “Graph” layer on top of a DBMS doesn’t qualify as a true GraphDB
Luca Sao Paulo
Visited on: 2015
#13:55#15:99
Each element in the Graph has own immutable Record ID
#22:11
(Edge)
(Vertex)(Vertex)
Each element in the Graph has own immutable Record ID
Each element in the Graph has own immutable Record ID
Luca Sao Paulo
Visited on: 2015
#13:55#15:99
Connections use persistent pointers
out = #22:11
in = #22:11
#22:11
(Edge)
(Vertex)(Vertex)
out = #13:55in = #15:99
Luca Sao Paulo
Visited on: 2015
#13:55#15:99out = #22:11
in = #22:11
#22:11
(Edge)
(Vertex)(Vertex)
out = #13:55in = #15:99
Luca Sao Paulo
Visited on: 2015
#13:55#15:99out = #22:11
in = #22:11
#22:11
(Edge)
(Vertex)(Vertex)
out = #13:55in = #15:99
A Graph Database creates the relationship just once
(when the edge is created)
VS
RDBMS computes the relationship every time you query a database
When you move from a RDBMS to a Graph Database you jump
from a O(log N) speed to a near O(1)
With a Graph Database, the traversing time is
not affected by database size!
This is huge in the BigData age
Graph Databases Easily Manage Complex Relationships
No costs to traverse relationships: • Recommendation engines • Social Applications • Spatial Apps • Master Data Management • Information Clustering
John
Thriller
Comedy
Pulp Fiction
Mr Bean
Theater B
Theater A
Theater C
NYC
San Josè
Lives in
Likes
GraphDB Database QuadrantR
elat
ions
hips
Com
plex
ity >
Data Complexity >
Relational
Key Value
Column
Graph
Document
GraphDB Database QuadrantR
elat
ions
hips
Com
plex
ity >
Data Complexity >
Relational
Key Value
Column
Graph
Document
These were 1st generation NoSQL products, where each tool was only good at a few use cases
Oracle (RDBMS)
Redis or Memcache (Key/Value)
MongoDB (DocDB)
Neo4j (GraphDB)
Application
ETL
1st Generation NoSQL: Scenario
Primary DB
Oracle (RDBMS)
Redis or Memcache (Key/Value)
MongoDB (DocDB)
Neo4j (GraphDB)
Application
ETL
1st Generation NoSQL: Problems
- No standard between NoSQL products - Multiple vendors = multiple skills - ETL + synchronization code is costly to write and maintain - Performance and Reliability is hard to predict
What’s Multi-Model DBMS?
GraphDocument
Object
Key/Value
Multi Model represents the intersection
of multiple models in just one product
What’s Multi-Model DBMS?
GraphDocument
Object
Key/Value
Multi Model represents the intersection
of multiple models in just one product
- Just one product to learn and maintain - Just one vendor relationship to manage - No ETL, no synchronization required - Performance and Reliability is easy to test from the
beginning
Relationships give data “meaning”
Order #134 (Order) Luca
(Provider)
Commodore Amiga 1200
(Product)
(Sells)
Jill (Customer)
(Has)(Makes)
Monitor 40” (Product)
(Sells)(Has)
3 Wheel Mouse
(Product)
Bruno (Provider)
(Sells)
(Has)
Multi-Model domain schema
Customer Provider
Product name: string
qty: int
Actor name: string
surname: string
Sells price: decimal
Inherits
Edge
Legenda:
V Vertex
Makes
Order number: int
date: datetime
Has price: decimal
`
Vertices and Edges are Documents
{ ”@rid": “12:382”, ”@class": ”Customer", “name”: “Jill”, “surname” : “Raggio”, “phone” : “+39 33123212”, “details”: { “city”:”London", “tags”:”millennial” } }
Jill
Order
Makes
General purpose solution: • JSON • Schema-less • Schema-full • Schema-hybrid • Nested documents • Rich indexing and querying • Developer friendly
Polymorphic queries
Luca (Provider)
Jill (Customer)SELECT * FROM Customer
SELECT * FROM Provider
SELECT * FROM Actor
Bruno (Provider)
Bruno (Provider)
Jill (Customer)
Luca (Provider)
Multi-Model complex domains schema
Band Genre
AccountMusicTaste
Location
Likes
Performs
Inherits
Edge
Legenda:
V Vertex
Plays
Multi-Model complex domains
Snow Patrol (Band)
Luca (Account)
Indie (Genre)
123, 1st Street Austin, TX (Location)
(Performs) April 7, 2015
9pm-11.30pm
(Likes)
Jill (Account)
(Likes)
(Likes)
Rock (Genre)
(Likes)
(Plays)
Multi-Model Database QuadrantR
elat
ions
hips
Com
plex
ity >
Data Complexity >
Relational
Key Value
Column
Graph Multi-Model
Document
There are a few DBMSs that claim to be Multi-Model, but they do not have a true Graph Engine.
The “Graph” is only a layer on top of the engine.
Under the hood they do JOINs, which means traversal time is affected by database size.
Meet OrientDBThe First Ever Multi-Model Database Combining Flexibility of Documents with Connectedness of Graphs
FEATURES ORIENTDB)) MONGODB NEO4J MYSQL)(RDBMS)
Operational Database X X X Graph Database X X Document Database X X Object-Oriented Concepts X Schema-full, Schema-less, Schema mix X User and Role & Record Level Security X Record Level Locking X X X SQL X X ACID Transaction X X X Relationships (Linked Documents) X X X Custom Data Types X X X Embedded Documents X X Multi-Master Zero Configuration Replication X Sharding X X Server Side Functions X X X Native HTTP Rest/ JSON X X Embeddable with No Restrictions X
OrientDB features
• Support for TinkerPop standard for Graph DB: Gremlin language and Blueprints API
• SQL + extensions for graphs• JDBC driver to connect any BI tool• HTTP/JSON support• Drivers in Java, Node.js, Python,
PHP, .NET, Perl, C/C++ and more
API & Standards
Availability and Integrity
• Atomic, Consistent, Isolated and Durable (ACID) multi-statement transactions
Master Node
Master Node
C
C C C
CC
C
Multi-master Replication
Scalability and Performance
• Multi-Master Replication, Sharding and Auto-Discovery to Simplify Ops
• +200k Tps on Commodity Hardware
Master Node
Master Node
C
C C C
CC
C
Auto-Discovered
Node
Some numbers
50,000 Downloads per
Month from 200+ countries.
70+ Committers
contributing to the product
1000s Users from SMBs
to Fortune 10 Companies.
17+ Years of Research have been put in
the product
A Bright Future
Graph DBMS increased their popularity by 500% within the last 2 years Document DBMS are the 3rd fastest growing category
Get Started for Free
OrientDB Community Edition is FREE for any purpose (Apache 2 license)
Udemy Getting Started Training is ★★★★★ and Freehttp://www.orientechnologies.com/getting-started
OrientDB Enterprise is Free for Development