An Introduction to Neo4j

44
An Introduction to Neo4j @doryokujin GraphDB Meet-Up Japan #1

Transcript of An Introduction to Neo4j

Page 1: An Introduction to Neo4j

An Introductionto Neo4j@doryokujin

GraphDB Meet-Up Japan #1

Page 2: An Introduction to Neo4j

・Takahiro Inoue(age 26)

・twitter: doryokujin

・Majored in Math (Statistics & Graph Algorithm)

・Data Scientist

・Leader of MongoDB JP

・Interest: DataProcessing, GraphDB

About Me

Page 3: An Introduction to Neo4j

(1) Introduction

(2) Code Examples

(3) Cypher

(4) Other Features

Agenda

Page 4: An Introduction to Neo4j

(1) Introduction

Page 5: An Introduction to Neo4j

・24/7 production deployment since 2003

・2011/09: $10.6 Million Series A Funding

・Always “Java first”

・ACID transactions

・Property Graph Model

・Using Lucene index for graph propeties

Neo4j Product Overview

Page 6: An Introduction to Neo4j

・No Object/Relational mismatch: - Every Object-Oriented Model is a “Graph”

・Easy schema evolution: - Data first, Bottom-up approach to schemas

・Efficient storage of semi-structured information:- Neo4j's key-value properties can efficiently represent semi-structured data

・High performance on deep traversals

・Disk-based, native graph storage manager

Neo4j Key Benefits

Neo4j Product Overview

Page 7: An Introduction to Neo4j

・ACID transactions: - custom JTA/JTS-compliant transaction manager

- distributed transactions

- two-phase commit (2PC)

- transaction recovery

- deadlock detection

- enterprise-strength database.

・Massive scalability:- supporting billions of nodes/relationships/properties on single-machine

Neo4j Key Benefits

Neo4j Product Overview

Page 8: An Introduction to Neo4j

・”Today, NOSQL is by and large a Web phenomenon, not an enterprise success story”

“Bulding The Enterprise NOSQL Company”

・Support for transactions

・Support for durability

・Support for Java

NOSQL, The Web And The Enterprise

NOSQL, The Web And The Enterprise

Page 9: An Introduction to Neo4j

Neo4j License and Price List

Edition license Description Price

Community Open source (GPLv3)

Fully ACID Transactional Graph DB

Free

Advanced Commercial and AGPL

+ SNMP & JMX Monitoring 500 USD

Enterprise Commercial and AGPL

+ High load & High availability 2000 USD

Neo4j PriceList

Page 10: An Introduction to Neo4j

(2) Code Examples

Page 11: An Introduction to Neo4j

・Define the relationship types we want to use:

・next step is to start the database server:

・Finally, shut down the database server

private static enum ExampleRelationshipTypes implements

RelationshipType

{

    EXAMPLE

}

Hello World

GraphDatabaseService graphDb = new EmbeddedGraphDatabase( DB_PATH );

registerShutdownHook( graphDb );

graphDb.shutdown();

Page 12: An Introduction to Neo4j

Transaction tx = graphDb.beginTx();

try

{

    Node firstNode = graphDb.createNode();

    firstNode.setProperty( NAME_KEY, "Hello" );

    Node secondNode = graphDb.createNode();

    secondNode.setProperty( NAME_KEY, "World" );

 

    firstNode.createRelationshipTo( secondNode,

        ExampleRelationshipTypes.EXAMPLE );

 

    String greeting = firstNode.getProperty( NAME_KEY ) + " "

        + secondNode.getProperty( NAME_KEY );

    System.out.println( greeting );

    tx.success();

}

finally

{

    tx.finish();

}

Creates 2 nodes

Creates an relationship

Start transaction

Page 13: An Introduction to Neo4j

Node ClassClass Method Description

Relationship createRelationshipTo(...)

Creates a relationship between this node and another node.

void delete() Deletes this node if it has no relationships attached to it.

Iterable<Relationship> getRelationships() Returns all the relationships attached to this node.

boolean hasRelationship() Returns true if there are any relationships attached to this node.

Traverser traverse(...) Instantiates a traverser

Page 14: An Introduction to Neo4j

Path ClassClass Method Description

Node endNode() Returns the end node of this path.

Iterator<PropertyContainer> iterator() Iterates through both the Nodes and Relationships of this path in order.

Relationship lastRelationship() Returns the last Relationship in this path.

int length() Returns the length of this path.(i.e. the number of relationships)

Iterable<Node> nodes() Returns all the nodes in this path.

Iterable<Relationship> relationships() Returns all the relationships in between the nodes which this path consists of.

Node startNode() Returns the start node of this path.

String toString() Returns a natural string representation of this path.

Page 15: An Introduction to Neo4j

Relationship ClassClass Method Description

void delete() Deletes this relationship.

Node getEndNode() Returns the end node of this relationship.

long getId() Returns the unique id of this relationship.

Node[] getNodes() Returns the two nodes that are attached to this relationship.

Node getOtherNode(Node node) A convenience operation that

Node getStartNode() Returns the start node of this relationship.

RelationshipType getType() Returns the type of this relationship.

boolean isType(RelationshipType type) Indicates whether this relationship is of the type type.

Page 16: An Introduction to Neo4j

PropertyContainer Class

Class Method Description

GraphDatabaseService getGraphDatabase() Get the GraphDatabaseService that this Node or Relationship belongs to.

Object getProperty( String key)

Returns the property value associated with the given key.

boolean hasProperty( String key)

Returns true if this property container has a property accessible through the given key.

Object removeProperty( String key)

Removes the property associated with the given key and returns the old value.

void setProperty(String key, Object value)

Sets the property value for the given key to value.

for nodes and relationships

Page 17: An Introduction to Neo4j

private void printFriends( Node person ){    Traverser traverser = person.traverse(

        Order.BREADTH_FIRST, // 幅優先探索を行う        StopEvaluator.END_OF_GRAPH, // Graph全体を走査        ReturnableEvaluator.ALL_BUT_START_NODE,

        MyRelationshipTypes.KNOWS, // ”KNOWS”の関係を持った辺を辿る        Direction.OUTGOING ); // 外に向かう矢線を辿る    for ( Node friend : traverser )

    { // 返されたNodeの属性”name”の値を取得        System.out.println( friend.getProperty( "name" ) );    }}

Traversals

Page 18: An Introduction to Neo4j

Neo4j Wiki

1

1

2

3

TrinityMorpheusCypherAgent Smith

Page 19: An Introduction to Neo4j

Evaluators ClassMethod Description

all() Returns all nodes.

atDepth(int depth) Returns an Evaluator which only includes positions at depth and prunes everything deeper than that.

excludeStartPosition() Returns the unique id of this relationship.

fromDepth(int depth) Returns an Evaluator which only includes positions from depth and deeper and never prunes anything.

includingDepths(int minDepth, int maxDepth)

Returns an Evaluator which only includes positions between depths minDepth and maxDepth.

toDepth(int depth) Returns an Evaluator which includes positions down to depth and prunes everything deeper than that.

Page 20: An Introduction to Neo4j

private static Traverser findHackers( final Node startNode ){    TraversalDescription td = Traversal.description()            .breadthFirst()            .relationships( RelTypes.CODED_BY, Direction.OUTGOING )            .relationships( RelTypes.KNOWS, Direction.OUTGOING )            .evaluator(                    Evaluators.returnWhereLastRelationshipTypeIs( RelTypes.CODED_BY ) );    return td.traverse( startNode );}

Traverser traverser = findHackers( getNeoNode() );int numberOfHackers = 0;for ( Path hackerPath : traverser ){    System.out.println( "At depth " + hackerPath.length() + " => "                        + hackerPath.endNode()                                .getProperty( "name" ) );} 12.4. Traversal

New Traversal Framework

Traverse among 2 relation types

Page 21: An Introduction to Neo4j

・Order- BREADTH_FIRST

- DEPTH_FIRST

・Relationship- BOTH, INCOMING, OUTGOING

・ReturnType- node

- relationship

- path: contains full representations of start and end node, the rest are URIs

- fullpath: contains full representations of all nodes and relationships

Traverser traverser = person.traverse(

    Order.BREADTH_FIRST,

    StopEvaluator.END_OF_GRAPH,

    ReturnableEvaluator.ALL_BUT_START_NODE,

    MyRelationshipTypes.KNOWS,

    Direction.OUTGOING );

for ( Node friend : traverser ){...}

Traversals

・StopEvaluator- END_OF_GRAPH, DEPTH_ONE

・ReturnableEvaluator- ALL, ALL_BUT_START_NODE

Page 22: An Introduction to Neo4j

/*

1. Begin a transaction.

2. Operate on the graph.

3. Mark the transaction as successful (or not).

4. Finish the transaction.

*/

Transaction tx = graphDb.beginTx();

try

{

... // any operation that works with the node space

tx.success();

}

finally

{

tx.finish();

}

Transaction

Page 23: An Introduction to Neo4j

・Indexing either nodes or relationships

・For their prorerties※ Each node has direct references to its adjacent vertices

・Default: neo4j-lucene-index component

・Full Text Search, Sorting, Caching, Range Query

・Can index with GraphDB “itself”- B-Trees, RTrees, QuadTrees

Indexing

Page 24: An Introduction to Neo4j

Indexing their propertiesGraph Databases and Endogenous Indices

createdcreated

follows

follows

created

citescites

created

cites

createdfollows

follows

follows

name=twarkoage=30

name=ahzf

name=graph_blogviews=1000

name=tenderlovegender=male

date=2007/10

name=neo4jviews=56781

page_rank=0.023

name=peterneubauer

name property index

views property index gender property index

The Graph Traversal Programming Pattern

Lucene Index

Each Element have direct pointer to its neighbours

Page 25: An Introduction to Neo4j

GraphDatabaseService graphDb = new

EmbeddedGraphDatabase( "path/to/neo4j-db" );

IndexService index = new LuceneIndexService( graphDb );

Node andy = graphDb.createNode();

Node larry = graphDb.createNode();

andy.setProperty( "name", "Andy Wachowski" );

andy.setProperty( "title", "Director" );

larry.setProperty( "name", "Larry Wachowski" );

larry.setProperty( "title", "Director" );

index.index( andy, "name", andy.getProperty( "name" ) );

index.index( andy, "title", andy.getProperty( "title" ) );

index.index( larry, "name", larry.getProperty( "name" ) );

index.index( larry, "title", larry.getProperty( "title" ) );

Indexing

http://wiki.neo4j.org/content/Indexing_with_IndexService

Page 26: An Introduction to Neo4j

// Return the andy node.

index.getSingleNode( "name", "Andy Wachowski" );

// Containing only the larry node

for ( Node hit : index.getNodes( "name", "Larry Wachowski" ) )

{

// do something

}

// Containing both andy and larry

for ( Node hit : index.getNodes( "title", "Director" )

{

// do something

}

Indexing

http://wiki.neo4j.org/content/Indexing_with_IndexService

Page 27: An Introduction to Neo4j

IndexService index = // your LuceneFulltextIndexService

index.getNodes( "name", "wachowski" ); // --> andy and larry

index.getNodes( "name", "andy" ); // --> andy

index.getNodes( "name", "Andy" ); // --> andy

index.getNodes( "name", "larry Wachowski" ); // --> larry

index.getNodes( "name", "wachowski larry" ); // --> larry

index.getNodes( "name", "wachow* andy" ); // --> andy and larry

index.getNodes( "name", "Andy" ); // --> andy

index.getNodes( "name", "andy" ); // --> andy

index.getNodes( "name", "wachowski" ); // --> andy and larry

index.getNodes( "name", "+wachow* +larry" ); // --> larry

index.getNodes( "name", "andy AND larry" ); // -->

index.getNodes( "name", "andy OR larry" ); // --> andy and larry

index.getNodes( "name", "Wachowski AND larry" ); // --> larry

Full text Search

http://wiki.neo4j.org/content/Indexing_with_IndexService

Page 28: An Introduction to Neo4j

GraphDB as an External Indexing System

The Graph Traversal Pattern 13

3.2 Traversing Endogenous Indices

A graph is a general-purpose data structure. A graph can be used to modellists, maps, trees, etc. As such, a graph can model an index. It was assumed,in §2.2, that a graph database makes use of an external indexing system toindex the properties of its vertices and edges. The reason stated was that spe-cialized indexing systems are better suited for special-purpose queries such asthose involving full-text search. However, in many cases, there is nothing thatprevents the representation of an index within the graph itself—vertices andedges can be indexed by other vertices and edges.24 In fact, given the nature ofhow vertices and edges directly reference each other in a graph database, indexlook-up speeds are comparable. Endogenous indices a↵ord graph databases agreat flexibility in modeling a domain. Not only can objects and their rela-tionships be modeled (e.g. people and their friendships), but also the indicesthat partition the objects into meaningful subsets (e.g. people within a 2Dregion of space).25 The remainder of this subsection will discuss the represen-tation and traversal of a spatial, 2D-index that is explicitly modeled within aproperty graph.

The domain of spatial analysis makes use of advanced indexing structuressuch as the quadtree [4, 17]. Quadtrees partition a two-dimensional plane intorectangular boxes based upon the spatial density of the points being indexed.Figure 7 diagrams how space is partitioned as the density of points increaseswithin a region of the index.

Fig. 7. A quadtree partition of a plane. This figure is an adaptation of a publicdomain image provided courtesy of David Eppstein.

24 One of the primary motivations behind this article is to stress the importance ofthinking of a graph as simply an index of itself, where the primary purpose is totraverse the various defined indices in ways that elicit problem-solving within thedomain being modeled.

25 Those indices that have a graph-like structure are suited for representing as agraph. It is noted that not all indices meet this criteria.

14 Marko A. Rodriguez1 and Peter Neubauer2

In order to demonstrate how a quadtree index can be represented and tra-versed, a toy graph data set is presented. This data set is diagrammed in Fig-ure 8. The top half of Figure 8 represents a quadtree index (vertices 1-9). This

a b

c

e

f

h

i

d

g

1

2 4

5 86 7

3

type=quadbl=[0,0]

tr=[100,100]

[100,100]

[0,0]

[0,100]

[100,0]

1

2

3

4

5

6

7

8

type=quadbl=[0,0]

tr=[50,100]

type=quadbl=[50,0]

tr=[100,100]

type=quadbl=[0,50]tr=[50,100]

type=quadbl=[50,0]tr=[100,50]

type=quadbl=[0,0]tr=[50,50]

type=quadbl=[50,50]tr=[100,100]

type=quadbl=[50,25]tr=[75,50]

bl=[25,20]tr=[90,45]

sub sub

9

9

type=quadbl=[50,25]tr=[62,37]

Fig. 8. A quadtree index of a space that contains points of interest. The index iscomposed of the vertices 1-9 and the points of interest are the vertices a-i. While notdiagrammed for the sake of clarity, all edges are labeled sub (meaning subsumes) andeach point of interest vertex has an associated bottom-left (bl) property, top-right(tr) property, and a type property which is equal to “poi.”

quadtree index is partitioning “points of interest” (vertices a-i) located withinthe diagrammed plane.26 All vertices maintain three properties—bottom-left

26 The plane depicted does not actually exist as a data structure, but is representedhere to denote how the di↵erent vertices lying on that plane are spatially located(i.e. spatial information is represented explicitly in the properties of the vertices).Thus, vertices closer to each other on the plane are closer together.

The Graph Traversal Pattern

Page 29: An Introduction to Neo4j

(3) Cypher

Page 30: An Introduction to Neo4j

Cypher・Designed to be a humane query language

・Most of the keywords like WHERE and ORDER BY are inspired by SQL

・Pattern matching borrows expression approaches from SPARQL

・Regular expression matching is implemented using the Scala programming language

Page 31: An Introduction to Neo4j

TraversalDescription description = Traversal.description()

.breadthFirst()

.relationships(Relationships.PAIRED, Direction.OUTGOING)

.evaluator(Evaluators.excludeStartPosition());

description.traverse( startNode ); // Retrieves the traverser

Cypher

start programmer=(3) match (programmer)-[:PAIRED]->(pair) return pair

start programmer=(3) match (programmer)-[:PAIRED]->(pair)

where pair.age > 30 return pair, count(*) order by age

skip 5 limit 10

Cyper is very simple

More complex conditions

How Neo4j uses Scala’s Parser Combinator: Cypher’s internals ‒ Part 1

Page 32: An Introduction to Neo4j

[InComing/Outgoing relationships]

# All nodes that A has outgoing relationships to.

> start n=node(3) match (n)-->(x) return x

==> Node[4]{name->"Bossman"}

==> Node[5]{name->"Cesar"}

> start n=node(3) match (n)<--(x) return x

==> Node[1]{name->David"}

[Match by relationship type]

# All nodes that are Blocked by A.

> start n=node(3) match (n)-[:BLOCKS]->(x) return x

==> Node[5]{name->"Cesar"}

[Multiple relationships]

# The three nodes in the path.

> start a=node(3) match (a)-[:KNOWS]->(b)-[:KNOWS]->(c) return a,b,c

==> a: Node[3]{name->"Anders"}

==> b: Node[4]{name->"Bossman"}

==> c: Node[2]{name->"Emil"} 15.4. Match

Match

Page 33: An Introduction to Neo4j

[Shortest path]

# : find the shortest path between two nodes, as long as the path is max 15

relationships long. Inside of the parenthesis you can write

> start d=node(1), e=node(2) match p = shortestPath( d-[*..15]->e ) return p

==> p: (1)--[KNOWS,2]-->(3)--[KNOWS,0]-->(4)--[KNOWS,3]-->(2)

15.4. Match

Graph Algorithm

Page 34: An Introduction to Neo4j

[Count/ Group Count]

> start n=node(2) match (n)-->(x) return n, count(*)

# The start node and the count of related nodes.

==>

n: Node[2]{name->"A",property->13}

count(*): 3

> start n=node(2) match (n)-[r]->() return type(r), count(*)

# The relationship types and their group count.

==>

TYPE(r): KNOWS

count(*): 3

15.7.Aggregation

Aggregation

Page 35: An Introduction to Neo4j

[SUM/AVG/MAX/COLLECT]

> start n=node(2,3,4) return sum(n.property)

==> 90

> start n=node(2,3,4) return avg(n.property)

==> 30.0

> start n=node(2,3,4) return max(n.property)

==> 44

> start n=node(2,3,4) return collect(n.property)

==> List(13, 33, 44)

15.7.Aggregation

Aggregation

Page 36: An Introduction to Neo4j

(4) Other Features

Page 37: An Introduction to Neo4j

High Availability

Page 38: An Introduction to Neo4j

・Always a single master and zero or more slaves

・Neo4j HA can handle writes on a slave so there is no need to redirect writes to the master

・A slave will handle writes by synchronizing with the master to preserve consistency

・Updates propagate from the master to other slaves eventually

・If the master goes down any running write transaction will be rolled back and during master election no write can take place

High Availability

Page 39: An Introduction to Neo4j

High Availability

[Zookeeper as a distributed coordination service]・Master Election・Propagation of Cluster and Machine Status Information・Fault Detection

Slaves can handle write transactions.

Updates to slaves are eventual consistent

Write operation automatically synchronize with the master

Automatic Failover

Only Available in the Neo4j Enterprise

Page 40: An Introduction to Neo4j

[1] Full BackUp

・Full backup copies the database files

・Without acquiring any locks

・Transactions will continue and the store will change

[2] Incremental BackUp

・Incremental backup does not copy store files

・Instead it copies the logs of the transactions

BackUpOnly Available in the Neo4j Enterprise

Page 42: An Introduction to Neo4j

[Languages]

・Clojure

・Erlang bindings to Neo4j: nerlo , cali

・Gremlin Graph programming language

・Groovy

・Java object mapping

・PHP

・Python

・Ruby (including RESTful API)

・Scala (including RESTful API)

Languages[Frameworks]

・Grails

・Griffon

・Qi4j Domain Driven Development in

Java, with great persistence architecture

・Roo

[Neo4j REST clients]

・Neo4RestNet .Net REST client

・Neo4jRestSharp .Net REST client

・Common Lisp REST client project page

・PHP REST client getting started

・Python REST client

Page 43: An Introduction to Neo4j

Heroku Add-on

http://addons.heroku.com/neo4j

Page 44: An Introduction to Neo4j

GraphDB: Comparison (2010/12)GraphDB License Language Protocol Data

Model Gremlin Binding SQL Like Query

Neo4j GPL Java REST/JSON

Property Graph

YesRuby, Python, Scala,...

-

sones AGPLv3 C#REST/JSON(XML)

Property Graph(+Extend)

Yes - Yes

OrientDB Apache2.0 Java REST/JSON

Property Graph

YesPHP, Jruby,Python, JS,...

Yes

Info Grid AGPLv3 Java REST/JSON

Property Graph?

(MeshObj)- - -

Infinite Graph Product C++ - Property

Graph - - -