An Introduction to NOSQL, Graph Databases and Neo4j

51
NOSQL Databases and Neo4j

description

Class lecture on "An Introduction to NOSQL, Graph Databases and Neo4j"

Transcript of An Introduction to NOSQL, Graph Databases and Neo4j

Page 1: An Introduction to NOSQL, Graph Databases and Neo4j

NOSQL Databases and Neo4j

Page 2: An Introduction to NOSQL, Graph Databases and Neo4j

Database and DBMS

• Database - Organized collection of data• The term database is correctly applied to the

data and their supporting data structures.

• DBMS - Database Management System: a software package with computer programs that controls the creation, maintenance and use of a database.

Page 3: An Introduction to NOSQL, Graph Databases and Neo4j

Types of Happening Databases

• Relational database – nothing new but still in use and it seems it will always be a happening one.

• Cloud databases – everything is cloudy.• Data warehouse – Huge! Huge! Huge! archives.• Embedded databases – you can’t see them :P • Document oriented database – the In thing.• Hypermedia database – WWW.• Graph database – facebook, twitter, social

network.

Page 4: An Introduction to NOSQL, Graph Databases and Neo4j

Not Only SQLNOSQL is simply…

Page 5: An Introduction to NOSQL, Graph Databases and Neo4j

Why NOSQL now?

Driving trends

Page 6: An Introduction to NOSQL, Graph Databases and Neo4j

Trend 1: Data Size

Page 7: An Introduction to NOSQL, Graph Databases and Neo4j

Trend 2: ConnectednessIn

form

ation

con

necti

vity

Text Documents

Hypertext

Feeds

Blogs

Wikis

UGC

Tagging

Folksonomies

RDFa

Onotologies

GGG

Page 8: An Introduction to NOSQL, Graph Databases and Neo4j

Trend 3: Semi-structured information

• Individualisation of content– 1970’s salary lists, all elements exactly one job– 2000’s salary lists, we need many job columns!

• Store more data about each entity• Trend accelerated by the decentralization of

content generation – Age of participation (“web 2.0”)

Page 9: An Introduction to NOSQL, Graph Databases and Neo4j

Trend 4: Architecture

DB

Application

1980’s: Single Application

Page 10: An Introduction to NOSQL, Graph Databases and Neo4j

Trend 4: Architecture

DB

Application

1990’s: Integration Database Antipattern

ApplicationApplication

Page 11: An Introduction to NOSQL, Graph Databases and Neo4j

Trend 4: Architecture

2000’s: SOA

DB

Application

DB

Application

DB

Application

RESTful, hypermedia, composite apps

Page 12: An Introduction to NOSQL, Graph Databases and Neo4j

Side note: RDBMS performanceSalary list

Most Web apps

Social Network

Location-based services

Page 13: An Introduction to NOSQL, Graph Databases and Neo4j

Four NOSQL Categories

Page 14: An Introduction to NOSQL, Graph Databases and Neo4j

Four NOSQL Categories

Page 15: An Introduction to NOSQL, Graph Databases and Neo4j

Key-Value Stores

• “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)

• Data model:– Global key-value mapping– Highly fault tolerant (typically)

• Examples:– Riak, Redis, Voldemort

Page 16: An Introduction to NOSQL, Graph Databases and Neo4j

Column Family (BigTable)

• Google’s “Bigtable: A Distributed Storage System for Structured Data” (2006)

• Data model:– A big table, with column families– Map-reduce for querying/processing

• Examples:– HBase, HyperTable, Cassandra

Page 17: An Introduction to NOSQL, Graph Databases and Neo4j

Document Databases

• Data model– Collections of documents– A document is a key-value collection– Index-centric, lots of map-reduce

• Examples– CouchDB, MongoDB

Page 18: An Introduction to NOSQL, Graph Databases and Neo4j

Graph Databases

• Data model:– Nodes with properties– Named relationships with properties– Hypergraph, sometimes

• Examples:– Neo4j (of course), Sones GraphDB, OrientDB,

InfiniteGraph, AllegroGraph

Page 19: An Introduction to NOSQL, Graph Databases and Neo4j

Why Graph Databases?• Schema Less and Efficient storage of Semi Structured Information• No O/R mismatch – very natural to map a graph to an Object Oriented

language like Ruby.• Express Queries as Traversals. Fast deep traversal instead of slow SQL

queries that span many table joins.• Very natural to express graph related problem with traversals

(recommendation engine, find shortest parth etc..)• Seamless integration with various existing programming languages.• ACID Transaction with rollbacks support.• Whiteboard friendly – you use the language of node,properties and

relationship to describe your domain (instead of e.g. UML) and there is no need to have a complicated O/R mapping tool to implement it in your database. You can say that Neo4j is “Whiteboard friendly” !( http://video.neo4j.org/JHU6F/live-graph-session-how-allison-knows-james/)

Page 20: An Introduction to NOSQL, Graph Databases and Neo4j

Social Network “path exists” Performance

• Experiment:• ~1k persons• Average 50 friends per

person• pathExists(a,b)

limited to depth 4

# persons query time

Relational database

1000 2000ms

Page 21: An Introduction to NOSQL, Graph Databases and Neo4j

Social Network “path exists” Performance

• Experiment:• ~1k persons• Average 50 friends per

person• pathExists(a,b)

limited to depth 4

# persons query time

Relational database

1000 2000ms

Neo4j 1000 2ms

Page 22: An Introduction to NOSQL, Graph Databases and Neo4j

Social Network “path exists” Performance

• Experiment:• ~1k persons• Average 50 friends per

person• pathExists(a,b)

limited to depth 4

# persons query time

Relational database

1000 2000ms

Neo4j 1000 2ms

Neo4j 1000000 2ms

Page 23: An Introduction to NOSQL, Graph Databases and Neo4j

What are graphs good for?• Recommendations• Business intelligence• Social computing• Geospatial• Systems management• Web of things• Genealogy• Time series data• Product catalogue• Web analytics• Scientific computing (especially bioinformatics)• Indexing your slow RDBMS• And much more!

Page 24: An Introduction to NOSQL, Graph Databases and Neo4j

Graphs

Page 25: An Introduction to NOSQL, Graph Databases and Neo4j

Directed Graphs

Page 26: An Introduction to NOSQL, Graph Databases and Neo4j

Breadth First Search

Page 27: An Introduction to NOSQL, Graph Databases and Neo4j

Depth First Search

?????????????????

Page 28: An Introduction to NOSQL, Graph Databases and Neo4j

Graph Databases

• A graph database stores data in a graph, the most generic of data structures, capable of elegantly representing any kind of data in a highly accessible way.

Page 29: An Introduction to NOSQL, Graph Databases and Neo4j

Graphs

• “A Graph —records data in→ Nodes —which have→ Properties”

Page 30: An Introduction to NOSQL, Graph Databases and Neo4j

Graphs

• “Nodes —are organized by→ Relationships —which also have→ Properties”

Page 31: An Introduction to NOSQL, Graph Databases and Neo4j

Query a graph with Traversal

• “A Traversal —navigates→ a Graph; it —identifies→ Paths —which order→ Nodes”

Page 32: An Introduction to NOSQL, Graph Databases and Neo4j

Indexes

• “An Index —maps from→ Properties —to either→ Nodes or Relationships”

Page 33: An Introduction to NOSQL, Graph Databases and Neo4j

Neo4j is a Graph Database

• “A Graph Database —manages a→ Graph and —also manages related→ Indexes”

Page 34: An Introduction to NOSQL, Graph Databases and Neo4j

Neo4j – Hey! This is why I am a Graph Database.

• The fundamental units that form a graph are nodes and relationships.

• In Neo4j, both nodes and relationships can contain properties.

• Nodes are often used to represent entities, but depending on the domain relationships may be used for that purpose as well.

Page 35: An Introduction to NOSQL, Graph Databases and Neo4j

Node in Neo4j

Page 36: An Introduction to NOSQL, Graph Databases and Neo4j

Relationships in Neo4j

• Relationships between nodes are a key part of Neo4j.

Page 37: An Introduction to NOSQL, Graph Databases and Neo4j

Relationships in Neo4j

Page 38: An Introduction to NOSQL, Graph Databases and Neo4j

Twitter and relationships

Page 39: An Introduction to NOSQL, Graph Databases and Neo4j

Properties

• Both nodes and relationships can have properties.

• Properties are key-value pairs where the key is a string.

• Property values can be either a primitive or anarray of one primitive type.

For example String, int and int[] values are valid for properties.

Page 40: An Introduction to NOSQL, Graph Databases and Neo4j

Properties

Page 41: An Introduction to NOSQL, Graph Databases and Neo4j

Paths in Neo4j

• A path is one or more nodes with connecting relationships, typically retrieved as a query or traversal result.

Page 42: An Introduction to NOSQL, Graph Databases and Neo4j

Traversals in Neo4j

• Traversing a graph means visiting its nodes, following relationships according to some rules.

• In most cases only a subgraph is visited, as you

already know where in the graph the interesting nodes and relationships are found.

• Traversal API

• Depth first and Breadth first.

Page 43: An Introduction to NOSQL, Graph Databases and Neo4j

Starting and Stopping

Page 44: An Introduction to NOSQL, Graph Databases and Neo4j

Preparing the database

Page 45: An Introduction to NOSQL, Graph Databases and Neo4j

Wrap mutating operations in a transaction.

Page 46: An Introduction to NOSQL, Graph Databases and Neo4j

Creating a small graph

Page 47: An Introduction to NOSQL, Graph Databases and Neo4j

Print the data

Page 48: An Introduction to NOSQL, Graph Databases and Neo4j

Remove the data

Page 49: An Introduction to NOSQL, Graph Databases and Neo4j

The Matrix Graph Database

Page 50: An Introduction to NOSQL, Graph Databases and Neo4j

Traversing the Graph

Page 51: An Introduction to NOSQL, Graph Databases and Neo4j

Resources & References

• Neo4j website : http://neo4j.org/• Neo4j learning resources:

http://neo4j.org/resources/• Videos about Neo4j: http://video.neo4j.org/• Neo4j tutorial:

http://docs.neo4j.org/chunked/snapshot/tutorials.html

• Neo4j Java API documentation: http://api.neo4j.org/current/