Building Applications with a Graph Database

104
Building Applications with a Graph Database Tobias Lindaaker Software Developer @ Neo Technology twitter: @thobe / @neo4j / #neo4j email: [email protected] web: http://neo4j.org/ web: http://thobe.org/ CON6484

description

Presented at JavaOne 2013, Tuesday September 24. "Data Modeling Patterns" co-created with Ian Robinson. "Pitfalls and Anti-Patterns" created by Ian Robinson.

Transcript of Building Applications with a Graph Database

Page 2: Building Applications with a Graph Database

What you’ll face

๏Modeling your domain

๏Choosing yourdeployment model

๏Deploying and maintainingyour application and DB

๏Evolving your applicationand your domain

2Most things are Surprisingly Familiar

Page 3: Building Applications with a Graph Database

Introducing the sample Application

3

Page 4: Building Applications with a Graph Database

Neo Technology Test Lab

4

๏One-Stop place for QA

•Real World cluster tests

• Benchmarks

•Charting

• Statistics

‣Uses HdrHistogramhttp://giltene.github.io/HdrHistogram/

• Integrated Log analysis

‣GC logs and App logs

๏Click-and-go cluster deployment

Page 5: Building Applications with a Graph Database

Neo Technology Test Lab

5

๏2 perpetual servers

• 1 database server(could be extended to a cluster for high availability)

•1 “Test Lab Manager”

‣Manages clusters and test executions

‣Serves up the UI

๏Data-centric HTTP API

๏UI in pure javascript,static files,client-side rendering

Page 6: Building Applications with a Graph Database

Neo Technology Test Lab

6

๏All state in DB, allows for multiple Manager instances,greatly simplifies redeploy:

1. Start new instance for the new manager

2. Verify that the new manager works properly

3. Re-bind elastic IP to new instance

4. Terminate old instance

๏No downtime on redeploy

Page 7: Building Applications with a Graph Database

Neo Technology Test Lab

7

๏Cute but useful:Single click to SSH into a cluster server in the browser

๏VT100 emulator in JavaScript

๏Uses com.jcraft:jsch to let the manager connect to the server

• (only) the manager has the private key to the servers

๏Tunnel terminal connection through WebSocket

๏Really useful for introspection

Why did installation fail?

Page 8: Building Applications with a Graph Database

Analysis of requirements

๏UI for reporting and overview of activity

๏Easy to use & Easy to extend

๏API for triggering real world cluster tests from the CI system

๏Eat our own dog food

•Use Neo4j for storage needs

•Use our Cloud hosting solution

๏Make costs visible

๏Strong desire not to own hardware

8

Page 9: Building Applications with a Graph Database

Data storage/retrieval requirements

๏Store all meta-data about tests and their outcome

•The actual result data can be raw files

๏All entities can have arbitrary events attachedthese should always be fetched,used to determine state of the entity

๏Minimize the number of round-trips made to the databaseEach action should preferably be only one DB call

9

Page 10: Building Applications with a Graph Database

Graph Database Queries

10

Page 11: Building Applications with a Graph Database

An overview of Cypher

11

๏START - the node(s) your query starts from - Not needed in Neo4j 2.0

๏MATCH - the pattern to follow from the start point(s)this expands your search space

๏WHERE - filter instances of the patternthis reduces your search space

๏RETURN - create a result projectionof each matching instance of the pattern

๏Patterns are described using ASCII-art

•(me)-[:FRIEND]-()-[:FRIEND]-(my_foaf)(me)-[:LIKES]->()<-[:LIKES]-(foaf)// find friends of my friends that share an interest with me

The basics in one slide

Page 12: Building Applications with a Graph Database

An overview of Cypher

12

๏CREATE - create nodes and relationships based on a pattern

๏SET - assign properties to nodes and relationships

๏DELETE - delete nodes or relationships

๏CREATE UNIQUE - as CREATE, but only if no match is found

• being superseded by MERGE in Neo4j 2.0

๏FOREACH - perform update operation for each item in a collection

Creates and Updates

Page 13: Building Applications with a Graph Database

Some more advanced Cypher

๏WITH - start a sub-query, carrying over only the declared variablesSimilar format to return, allows the same kinds of projections

๏ORDER BY - sort the matching pattern instances by a propertyUsed in WITH or RETURN.

๏SKIP and LIMIT - page through results, used with ORDER BY.

๏Aggregation

•COLLECT - turn a part of a pattern instance into a collection of that part for each matching pattern instanceComparable to SQLs GROUP BY.

•SUM - summarize an expression for each match (like in SQL)

•AVG, MIN, MAX, and COUNT - as in SQL13

Page 14: Building Applications with a Graph Database

Modeling your domain

14

Page 15: Building Applications with a Graph Database

Domain modeling guideline

15

๏Query first

๏Whiteboard first

๏Examples first

๏Redundancy - avoid

๏Thank You

Look at the top left of your key

board!

Page 16: Building Applications with a Graph Database

Query First

16

๏Create the model to satisfy your queries

๏Do not attempt to mirror the real world

•You might do that, but it is not a goal in itself

๏Start by writing down the queries you need to satisfy

•Write using natural language

•Then analyze and formalize

๏Now you are ready to draw the model...

Page 17: Building Applications with a Graph Database

Whiteboard first

17

Page 18: Building Applications with a Graph Database

Example first

18

๏Draw one or more examples of entities in your domain

๏Do not leap straight to UML or other archetypical models

๏Once you have a few examples you can draw the model(unless it is already clear from the examples)

Page 19: Building Applications with a Graph Database

Redundancy - avoid

19

๏Relationships are bi-directional,avoid creating “inverse” relationships

๏Don’t connect each node of a certain “type” to some node that represents that type

• Leads to unnecessary bottle necks

•Use the path you reached a node through to know its type

•Use labels to find start points

‣and for deciding type dynamically if multiple are possible

๏Avoid materializing information that can be inferred

•Don’t add FRIEND_OF_A_FRIEND relationships,when you have FRIEND relationships

Page 20: Building Applications with a Graph Database

Domain modeling method

20

Page 21: Building Applications with a Graph Database

Method

1. Identify application/end-user goals

2. Figure out what questions to ask of the domain

3. Identify entities in each question

4. Identify relationships between entities in each question

5. Convert entities and relationships to paths

- These become the basis of the data model

6. Express questions as graph patterns

- These become the basis for queries

21Thanks to Ian Robinson

Page 22: Building Applications with a Graph Database

1. Application/End-User Goals

22

As an employeeI want to know who in the company has similar skills to me

So that we can exchange knowledge

Thanks to Ian Robinson

Page 23: Building Applications with a Graph Database

2. Questions to ask of the Domain

23

Which people, who work for the same company as me, have similar skills to me?

As an employeeI want to know who in the company has similar skills to me

So that we can exchange knowledge

Thanks to Ian Robinson

Page 24: Building Applications with a Graph Database

3. Identify Entities

24

Which people, who work for the same company as me, have similar skills to me?

•Person

•Company

•Skill

Thanks to Ian Robinson

Page 25: Building Applications with a Graph Database

4. Identify Relationships Between Entities

25

Which people, who work for the same company as me, have similar skills to me?

•Person WORKS FOR Company

•Person HAS SKILL Skill

Thanks to Ian Robinson

Page 26: Building Applications with a Graph Database

5. Convert to Cypher Paths

26

•Person WORKS FOR Company

•Person HAS SKILL Skill

Thanks to Ian Robinson

Page 27: Building Applications with a Graph Database

5. Convert to Cypher Paths

26

•Person WORKS FOR Company

•Person HAS SKILL Skill

NodeNode

Node Node

Thanks to Ian Robinson

Page 28: Building Applications with a Graph Database

5. Convert to Cypher Paths

26

•Person WORKS FOR Company

•Person HAS SKILL Skill

Relationship

NodeNode Relationship

Node Node

Thanks to Ian Robinson

Page 29: Building Applications with a Graph Database

5. Convert to Cypher Paths

26

•Person WORKS FOR Company

•Person HAS SKILL Skill

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

Relationship

NodeNode Relationship

Node Node

Thanks to Ian Robinson

Page 30: Building Applications with a Graph Database

5. Convert to Cypher Paths

26

•Person WORKS FOR Company

•Person HAS SKILL Skill

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

Relationship

NodeNode Relationship

Node Node

Label Label

Label Label

Thanks to Ian Robinson

Page 31: Building Applications with a Graph Database

5. Convert to Cypher Paths

26

•Person WORKS FOR Company

•Person HAS SKILL Skill

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

Relationship

NodeNode Relationship

Node Node

Label Label

Label Label

Relationship Type

Relationship Type

Thanks to Ian Robinson

Page 32: Building Applications with a Graph Database

Consolidate Pattern

(:Person)-[:WORKS_FOR]->(:Company),(:Person)-[:HAS_SKILL]->(:Skill)

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

27

Person SkillCompany

WORKS_FOR HAS_SKILL

Thanks to Ian Robinson

Page 33: Building Applications with a Graph Database

Candidate Data Model

(:Company)<-[:WORKS_FOR]-(:Person)-[:HAS_SKILL]->(:Skill)

28

name:Neo4j

name:Ian

name:ACME

Person

Company

WO

RKS_

FOR

HAS_

SKILL

name:Jacob

Personname:Tobias

Person

WORKS_F

OR WORKS_FOR

name:Scala

name:Python

name:C#

SkillSkillSkillSkill

HAS_

SKILL

HAS_SKILLHAS_SKILL HAS_SKILL

HAS_SKILLHAS_

SKILL

Thanks to Ian Robinson

Page 34: Building Applications with a Graph Database

6. Express Question as Graph Pattern

Which people, who work for the same company as me, have similar skills to me?

29

skill

company

Company

colleagueme

PersonWORK

S_FOR

WORKS_FOR

Skill

HAS_SKILL HAS_SKILL

Person

Thanks to Ian Robinson

Page 35: Building Applications with a Graph Database

Cypher Query

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

30skill

company

Company

colleagueme

PersonWORK

S_FOR

WORKS_FOR

Skill

HAS_SKILL HAS_SKILL

Person

Thanks to Ian Robinson

Page 36: Building Applications with a Graph Database

Cypher Query

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

31skill

company

Company

colleagueme

PersonWORK

S_FOR

WORKS_FOR

Skill

HAS_SKILL HAS_SKILL

Person1. Graph pattern

Page 37: Building Applications with a Graph Database

Cypher Query

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

32skill

company

Company

colleagueme

PersonWORK

S_FOR

WORKS_FOR

Skill

HAS_SKILL HAS_SKILL

Person1. Graph pattern2. Filter, using index if available

Page 38: Building Applications with a Graph Database

Cypher Query

Which people, who work for the same company as me, have similar skills to me?

MATCH (company)<-[:WORKS_FOR]-(me:Person)-[:HAS_SKILL]->(skill) (company)<-[:WORKS_FOR]-(colleague)-[:HAS_SKILL]->(skill)WHERE me.name = {name}RETURN colleague.name AS name, count(skill) AS score, collect(skill.name) AS skillsORDER BY score DESC

33skill

company

Company

colleagueme

PersonWORK

S_FOR

WORKS_FOR

Skill

HAS_SKILL HAS_SKILL

Person1. Graph pattern2. Filter, using index if available3. Create projection of result

Page 39: Building Applications with a Graph Database

First Match

34

name:Neo4j

name:Ian

name:ACME

Person

Company

WO

RKS_

FOR

HAS_

SKILL

name:Jacob

Personname:Tobias

Person

WORKS_F

OR WORKS_FOR

name:Scala

name:Python

name:C#

SkillSkillSkillSkill

HAS_

SKILL

HAS_SKILLHAS_SKILL HAS_SKILL

HAS_SKILLHAS_

SKILL

skill

company

Company

colleagueme

Person

WORKS_

FOR WORKS_FOR

Skill

HAS_SKILL

HAS_

SKILL

Person

Thanks to Ian Robinson

Page 40: Building Applications with a Graph Database

Second Match

35

name:Neo4j

name:Ian

name:ACME

Person

Company

WO

RKS_

FOR

HAS_

SKILL

name:Jacob

Personname:Tobias

Person

WORKS_F

OR WORKS_FOR

name:Scala

name:Python

name:C#

SkillSkillSkillSkill

HAS_

SKILL

HAS_SKILLHAS_SKILL HAS_SKILL

HAS_SKILLHAS_

SKILL

skill

company

Company

colleagueme

Person

WORKS_

FOR WORKS_FOR

Skill

HAS_SKILL

HAS_

SKILL

Person

Thanks to Ian Robinson

Page 41: Building Applications with a Graph Database

Third Match

36

name:Neo4j

name:Ian

name:ACME

Person

Company

WO

RKS_

FOR

HAS_

SKILL

name:Jacob

Personname:Tobias

Person

WORKS_F

OR WORKS_FOR

name:Scala

name:Python

name:C#

SkillSkillSkillSkill

HAS_

SKILL

HAS_SKILLHAS_SKILL HAS_SKILL

HAS_SKILLHAS_

SKILL

skill

company

Company

colleagueme

Person

WORKS_

FOR WORKS_FOR

Skill

HAS_SKILL

HAS_

SKILL

Person

Thanks to Ian Robinson

Page 42: Building Applications with a Graph Database

Result of the Query

+-------------------------------------+| name | score | skills |+-------------------------------------+| "Ian" | 2 | ["Scala","Neo4j"] || "Jacob" | 1 | ["Neo4j"] |+-------------------------------------+2 rows

37Thanks to Ian Robinson

Page 43: Building Applications with a Graph Database

Data Modeling Patterns

38

Page 44: Building Applications with a Graph Database

Ordered List of Entities

39

๏When

•Entities have a natural succession

•You need to traverse the sequence

๏You may need to identify the beginning or end(first/last, earliest/latest, etc.)

๏Examples

• Event stream

•Episodes of a TV series

• Job history

Thanks to Ian Robinson

Page 45: Building Applications with a Graph Database

Example: Episodes in Doctor Who

40

title:Robot

title:The Ark in

Space

title:The Sontaran

Experiment

title:Genesis of the Daleks

title:Revenge of

the Cybermen

NEXT NEXT NEXT NEXT NEXT NEXT

NEXT IN PRODUCTION

Thanks to Ian Robinson

Page 46: Building Applications with a Graph Database

Example: Episodes in Doctor Who

40

title:Robot

title:The Ark in

Space

title:The Sontaran

Experiment

title:Genesis of the Daleks

title:Revenge of

the Cybermen

NEXT NEXT NEXT NEXT NEXT NEXT

NEXT IN PRODUCTION

NEXT IN PRODUCTION

NEXT IN PRODUCTION

NEXT IN PRODUCTION

NEXT IN PRODUCTION

๏Can interleave multiple lists with different semanticsUsing different relationship types

Thanks to Ian Robinson

Page 47: Building Applications with a Graph Database

Example: Episodes in Doctor Who

40

title:Robot

title:The Ark in

Space

title:The Sontaran

Experiment

title:Genesis of the Daleks

title:Revenge of

the Cybermen

NEXT NEXT NEXT NEXT NEXT NEXT

NEXT IN PRODUCTION

season: 12

NEXT IN PRODUCTION

NEXT IN PRODUCTION

NEXT IN PRODUCTION

NEXT IN PRODUCTION

LASTFIRST

๏Can interleave multiple lists with different semanticsUsing different relationship types

๏Can organize lists into groups by group nodes

season: 11

NEXT SEASON

Thanks to Ian Robinson

Page 48: Building Applications with a Graph Database

Example: Recent events

41

Page 49: Building Applications with a Graph Database

Add to list

42

MATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

WITH recents, test

MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),

(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)

DELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

Page 50: Building Applications with a Graph Database

Add to list

43

MATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

WITH recents, test

MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),

(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)

DELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

// create the structure we want for the most recent oneMATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

Page 51: Building Applications with a Graph Database

Add to list

44

// create the structure we want for the most recent oneMATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

WITH recents, test

MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),

(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)

DELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

// start a new sub-query, carrying through ‘recents’ and ‘test’WITH recents, test

Page 52: Building Applications with a Graph Database

Add to list

45

// create the structure we want for the most recent oneMATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

// start a new sub-query, carrying through ‘recents’ and ‘test’WITH recents, test

MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test),

(previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)

DELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

// matching the relationship we just created...MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), // ...ensures that ‘previous’ is a different relationship (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)// if there was no previous, this sub-query will match nothing

Page 53: Building Applications with a Graph Database

Add to list

46

// create the structure we want for the most recent oneMATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

// start a new sub-query, carrying through ‘recents’ and ‘test’WITH recents, test

// matching the relationship we just created...MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), // ...ensures that ‘previous’ is a different relationship (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)// if there was no previous, this sub-query will match nothing

DELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

// re-link to the previousTestDELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

Page 54: Building Applications with a Graph Database

Add to list

47

// create the structure we want for the most recent oneMATCH (test:Test{testId:{testId}})MERGE (recents:Recent{type:"Test"})CREATE (recents)-[:LAST_COMPLETED_TEST]->(test)

// start a new sub-query, carrying through ‘recents’ and ‘test’WITH recents, test

// matching the relationship we just created...MATCH (recents)-[:LAST_COMPLETED_TEST]-> (test), // ...ensures that ‘previous’ is a different relationship (previousTest)<-[previous:LAST_COMPLETED_TEST]-(recents)// if there was no previous, this sub-query will match nothing

// re-link to the previousTestDELETE previousCREATE (test)-[:PREVIOUS_COMPLETED_TEST]->(previousTest)

Page 55: Building Applications with a Graph Database

Get 5 most recently completed tests

MATCH (recents:Recent{type:"Test"}), (recents)-[:LAST_COMPLETED_TEST]->(last)tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->()

WITH tests ORDER BY length(tests) DESC LIMIT 1

RETURN extract(test IN nodes(tests) : test.testId) AS testIds

48

Page 56: Building Applications with a Graph Database

Get 5 most recently completed tests

MATCH (recents:Recent{type:"Test"}), (recents)-[:LAST_COMPLETED_TEST]->(last)tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->()

WITH tests ORDER BY length(tests) DESC LIMIT 1

RETURN extract(test IN nodes(tests) : test.testId) AS testIds

48

Get the next page of 5

MATCH (last:Test{testId={testId}})

tests=(last)-[:PREVIOUS_COMPLETED_TEST*0..5]->()

WITH tests ORDER BY length(tests) DESC LIMIT 1

RETURN extract(test IN nodes(tests) : test.testId) AS testIds

Page 57: Building Applications with a Graph Database

Active-Set pattern

49

Page 58: Building Applications with a Graph Database

Adding and Removing from Active Set

50

// Create cluster into active setMATCH (clusters:ActiveSet{type:"Cluster"}), (creator:User{userId:{userId}})CREATE (clusters)-[:CLUSTER]->(cluster:Cluster{ clusterId: {clusterId}, clusterType: {clusterType} }), (cluster)-[:CREATED]->(:Event{timestamp:{creationDate}}) <-[:ACTION]-(creator)

// Destroy cluster (remove it from the active set)MATCH (cluster:Cluster{clusterId:{clusterId}})<-[r:CLUSTER]-(), (destroyer:User{userId:{userId}})CREATE (cluster)-[:DESTROYED]->(:Event{timestamp:{destroyDate}}) <-[:ACTION]-(destroyer)DELETE r

Page 59: Building Applications with a Graph Database

Entities and Events/Actions

51

๏Events/Actions often involve multiple parties

• Eg. the actor that caused the event, and the affected entity

๏Can include other circumstantial detail, which may be common to multiple events

๏Examples:

• Patrick worked for Acme from 2001 to 2005 as a Software Developer

• Sarah sent an email to Lucy, copying in David and Claire

๏ In environments with concurrent updates,events can be used to compute state

•No need to explicitly store state

Thanks to Ian Robinson

Page 60: Building Applications with a Graph Database

Represent the Event/Action as a Node

52

name:Patrick

from: 2001to: 2005

title:Software

Developer

name:Acme

EMPLOYMENTROLE

COMPANY

name:Sarah

subject: ...content: ...

name:Lucy

name:Sarah

name:Sarah

FROM TO

CC CC

Thanks to Ian Robinson

Page 61: Building Applications with a Graph Database

Using Events to compute State

53

๏Every update of an entity adds an event to it

๏Every read query collects up all events for the entity

๏Entity state is computed in your (Java) code from the events

public class Cluster { private final List<ClusterEvent> events; public ClusterState getState() { ClusterState state = ClusterState.AWAITING_LAUNCH; for ( ClusterEvent event : events ) { ClusterState candidate = event.impliedState(); if ( candidate.comparedTo( state ) > 0 ) state = candidate; } return state; } // ...}

Page 62: Building Applications with a Graph Database

Repository pattern

54

๏Centralize your queries into one or a few places

๏Puts load logic (with translation from DB layer to App layer)next to store logic (with the reverse transformation logic)

๏Simplifies testing

• If you use Java, test with Embedded Neo4j.Interact through Cypher (for the code under test)Verify using the object graph API

๏Simplifies model evolution - load/store & conversion encapsulated

Page 63: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 55

Page 64: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 56

MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), // each active cluster

Page 65: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 57

(server)-[?:MEMBER_OF]->(cluster),// 0 or more servers (server)-[e]->(event:Event),// any relationship to an Event (event)-[?]->(details)// 0 or more details

Page 66: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 58

// group by (cluster, server, e, event)WITH cluster, server, e, event, collect(details) as eventDetails // A second WITH to do collect-of-collectWITH cluster, server, // group by (cluster, server) collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

Page 67: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 59

// Group the servers (with events) for each clusterWITH cluster, collect({ server: server, events: serverEvents }) as servers

Page 68: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 60

MATCH (cluster)-[?:PARAMETERS]->(parameters), // Find all events for this cluster (cluster)-[e]->(event:Event)<-[:ACTION]-(actor)

Page 69: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster),

(server)-[?:MEMBER_OF]->(cluster), (server)-[e]->(event:Event), (event)-[?]->(details)

WITH cluster, server, e, event, collect(details) as eventDetails

WITH cluster, server, collect({ type: type(e), data: event, details: eventDetails }) as serverEvents

WITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters),

(cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters,

collect({ type: type(e), data: event, actor: actor} ) as events 61

RETURN cluster, serverNodeIds, parameters, // Collect the events in three (aligned) collections collect({ type: type(e), data: event, actor: actor} ) as events

Page 70: Building Applications with a Graph Database

Find all active clusters - Neo4j 2.0

62

MATCH (clusters:ActiveSet{type:"Cluster"}) (clusters)-[:CLUSTER]->(cluster), // each active cluster

(server)-[?:MEMBER_OF]->(cluster),// 0 or more servers (server)-[e]->(event:Event),// any relationship to an Event (event)-[?]->(details)// 0 or more details // group by (cluster, server, e, event)WITH cluster, server, e, event, collect(details) as eventDetails // A second WITH to do collect-of-collectWITH cluster, server, // group by (cluster, server) collect({ type: type(e), data: event, details: eventDetails }) as serverEvents // Group the servers (with events) for each clusterWITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters), // Find all events for this cluster (cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters, // Collect the events in three (aligned) collections collect({ type: type(e), data: event, actor: actor} ) as events

Page 71: Building Applications with a Graph Database

Get Cluster by ID - Neo4j 2.0

(server)-[?:MEMBER_OF]->(cluster),// 0 or more servers (server)-[e]->(event:Event),// any relationship to an Event (event)-[?]->(details)// 0 or more details // group by (cluster, server, e, event)WITH cluster, server, e, event, collect(details) as eventDetails // A second WITH to do collect-of-collectWITH cluster, server, // group by (cluster, server) collect({ type: type(e), data: event, details: eventDetails }) as serverEvents // Group the servers (with events) for each clusterWITH cluster, collect({ server: server, events: serverEvents }) as serversMATCH (cluster)-[?:PARAMETERS]->(parameters), // Find all events for this cluster (cluster)-[e]->(event:Event)<-[:ACTION]-(actor)RETURN cluster, serverNodeIds, parameters, // Collect the events in three (aligned) collections collect({ type: type(e), data: event, actor: actor} ) as events 63

MATCH (cluster{type:{clusterId}}) // match single cluster by ID

Page 72: Building Applications with a Graph Database

Query Code Management

64

Page 73: Building Applications with a Graph Database

Query Code Management

•Queries will have similar fragments.

• Store fragments as String constants in code

•Concatenate on load time to get full queries

•Keep all queries static - constants from load time

•Use query parameters for the things that change

•Use repository pattern to encapsulate queries

64

Page 74: Building Applications with a Graph Database

Query Code Management

•Queries will have similar fragments.

• Store fragments as String constants in code

•Concatenate on load time to get full queries

•Keep all queries static - constants from load time

•Use query parameters for the things that change

•Use repository pattern to encapsulate queries

What you’ll gain

• Improves testability - all your queries are known and tested

• Improves security - no injections (parameters are values only)

• Improves performance - the query optimizer cache will love you

64

Page 75: Building Applications with a Graph Database

Multiple layers of models

65

Page 76: Building Applications with a Graph Database

Domain modeling layers

66

Client model (or UI model)

Application model

Database model

๏Multiple abstraction layers

๏Allows evolving the layers independently

•Client / UI

•Application / Business logic

•Database model

๏Specialize each layer for its purpose

Page 77: Building Applications with a Graph Database

Implementing the domain

67

Page 78: Building Applications with a Graph Database

Choosing your deployment model

68

Page 79: Building Applications with a Graph Database

First: choosing a database!

๏First choice: Model (Relational, Graph, Document, ...)

๏Second choice: Vendor

•Neo4j - Market leader

•OrientDB - Document/Graph/SQL

•InfiniteGraph - Objectivity as Graph

•DEX - spin off from research group

๏Different vendor, different query language:

•Cypher (Neo4j)

•Gremlin / Blueprints (tinkerpop)69

Page 80: Building Applications with a Graph Database

Choosing your deployment model

70

๏Standalone DB with the Application as a connecting client?

๏Database embedded in the Application?

๏Standalone DB with custom extensions?

๏Which client driver?

•Community developed? (endorsed)

•Roll your own?

•No “official” drivers (yet)

Page 81: Building Applications with a Graph Database

vs๏Pros:

•Familiar deployment

•Code in any language๏Cons:

•“Interpreted” queries

•Round-trip for algorithmic queries

71

Standalone Embedded๏Pros:

•Super fast

•Persistent, Transactional, infinite memory

๏Cons:

•Java Only(any JVM language)

•Your App and the DB will contend for GC

Page 82: Building Applications with a Graph Database

Standalone with custom extensions?

๏A tradeoff attempt to get the best of both worlds.

•Use Cypher for most queries

•Write extensions with custom querieswhere performance is insufficient

๏Requires you to write Java(other JVM languages possible, but harder)

๏Trickier and more verbose API than writing Cypher

๏Can do algorithmic things (custom code) that Cypher cant

๏Better performance in many cases

•Cypher is constantly improving - the need is diminishing

๏Not supported by Neo4j Cloud hosting providers

๏Start with Standalone, add extensions when needed72

Page 83: Building Applications with a Graph Database

Choosing a client driver

๏Spring Data Neo4j (by Neo Technology)

๏Neography (Ruby, by Max de Marzi, now at Neo Technology)

๏Neo4jPHP (PHP, by Josh Adell)

๏Neo4jClient (.NET, by Tatham Oddie and Romiko Derbynew)

๏Py2neo (Python, by Nigel Small)

๏Neocons (Clojure, by Michael Klishin)

๏and more: neo4j.org/develop/drivers

73

Page 84: Building Applications with a Graph Database

Quite simple to write your own...

๏Focus on the Cypher HTTP endpoint

๏Convert returned JSON to something convenient to work with

• I.e. convert Nodes & Relationships to maps of properties

๏Also need the indexing HTTP endpoint

• at least for Neo4j pre 2.0

๏Less than half a days effort, 1265 LOC (>50% test code)

74

public interface Cypher{ CypherResult execute( CypherStatement statement )

throws CypherExecutionException;

void addToIndex( long nodeId, String indexName, String propertyKey, String propertyValue )

throws CypherExecutionException;

void createNodeIfAbsent( String indexName,String propertyKey, String propertyValue,Map<String, Object> properties )

throws CypherExecutionException;}

public class CypherStatement // Builder pattern{ public CypherStatement( String... lines ) {...} public CypherStatement withParameter(

String key, Object value ) { ... return this; }}

Official client coming w/ Neo4j 2.{low}

Page 85: Building Applications with a Graph Database

The choices we made for our Test Lab

๏Use AWS

•Mainly for EC2, but once you have bought in to AWS there are a lot of other services that will serve you well

‣SQS for sending work between servers

‣SNS for sending messages back to the manager

‣S3 for storing files (benchmark results, logs, et.c.)

๏Use Neo4j Cloud

•To have an app where we try it out ourselves

•Make backup and availability a separate concern

75

Page 86: Building Applications with a Graph Database

Deploying and maintainingyour application and DB

76

Page 87: Building Applications with a Graph Database

Operational Concerns

๏Backups

•Weekly full backups

•Daily incremental backups

•Keep logs for 48H (enable incremental backup even if a bit late)

•Why that frequency?

‣Fits the load schedule of most apps

‣Provides very good recovery ability

๏Monitoring

• JMX and Logback supported

•Notifications (e.g. Nagios) being worked on

77

Page 88: Building Applications with a Graph Database

Scaling Neo4j

78

๏Neo4j HA provides

• Fault tolerance by redundancy

•Read scalability by replication

•Writes at same levels as a single instance

๏Neo4j does not yet scale “horizontally”, i.e. shard automatically,this is being worked on

•For reads your application can route queries for certain parts of your domain to certain hosts, effectively “sharding” the cache in Neo4j, keeping different data elements in RAM on different machines

Page 89: Building Applications with a Graph Database

Pitfalls and Anti-Patterns

79

Page 90: Building Applications with a Graph Database

Modeling Entities as Relationships

80

๏Limits data model evolution

•A relationship connects two things

•Modeling an entity as a relationship prevents it from being related to more than two things

๏Smells:

• Lots of attribute-like properties

•Use of relationships as starting point of queries

๏Entities hidden in verbs:

• E.g. emailed, reviewed

Thanks to Ian Robinson

Page 91: Building Applications with a Graph Database

Example: Movie Reviews

81

name:Tobias

name:Jonas

title:The Hobbit

title:The Matrix

REVIEWED REVIEWED REVIEWEDtext: This is the ...source: amazon.comdate: 20100515

text: When I saw ...source: imdb.comdate: 20121218

text: My brother and ...source: filmreview.orgdate: 20121218

Person Person

Movie Movie

Thanks to Ian Robinson

Page 92: Building Applications with a Graph Database

New Requirement: Comment on Reviews

82

๏Allow users to comment on each others reivews

๏Not possible in this model, can’t connect a review to another entity

name:Tobias

name:Jonas

title:The Hobbit

title:The Matrix

REVIEWED REVIEWED REVIEWEDtext: This is the ...source: amazon.comdate: 20100515

text: When I saw ...source: imdb.comdate: 20121218

text: My brother and ...source: filmreview.orgdate: 20121218

Person Person

Movie Movie

Thanks to Ian Robinson

Page 93: Building Applications with a Graph Database

Revised model

83

name:Tobias

name:Jonas

title:The Hobbit

title:The Matrix

Person Person

Movie Movie

text: This is the ...source: amazon.comdate: 20100515

text: When I saw ...source: imdb.comdate: 20121218

text: My brother and ...source: filmreview.orgdate: 20121218

WROTE_REVIEW WROTE_REVIEW WROTE_REVIEW

REVIEW_OFREVIEW_OFREVIEW_OF

ReviewReviewReview

Thanks to Ian Robinson

Page 94: Building Applications with a Graph Database

Evolving your application and your domain

84

Page 95: Building Applications with a Graph Database

Updating the domain model

85

๏Query first, Whiteboard First, Examples first...

๏Update your application domain model to supportboth DB model versions

•Write the new version only

๏Test, test, test

๏Re-deploy

๏Run background job to update from old model version to new

•Can be as simple as a single query...(but can also be more complex)

๏Remove support for old model

๏Re-deploy

Page 96: Building Applications with a Graph Database

Refactoring your graph

Definition

•Restructure graph without changing informational semantics

Reasons

• Improve design

• Enhance performance

•Accommodate new functionality

• Enable iterative and incremental development of data model

The common ones

•Convert a Property to a Node

•Convert a Relationship to a Node

86Thanks to Ian Robinson

Page 97: Building Applications with a Graph Database

Convert a Property to a Node

// find nodes that have the currency propertyMATCH (t:Trade) WHERE has(t.currency)

// limit the size of the transactionWITH t LIMIT {batchSize}

// find or create the (unique) node for this currencyMERGE (c:Currency{code:t.currency})

// create relationship to the currency nodeCREATE (t)-[:CURRENCY]->(c)

// remove the propertyREMOVE t.currency

// when the returned count is smaller then batchSize,// you are doneRETURN count(t) AS numberRemoved

87Thanks to Ian Robinson

Page 98: Building Applications with a Graph Database

Convert a Relationship to a Node

// find emailed relationshipsMATCH (a:User)-[r:EMAILED]->(b:User)

// limit the size of each transactionWITH a, r, b LIMIT {batchSize}

// create a new node and relationships for itCREATE (a)<-[:FROM]-(:Email{ content: r.content, title: t.title }) -[:TO]-> (b)

// delete the old relationshipDELETE r

// when the returned count is smaller then batchSize,// you are doneRETURN count(r) AS numberDeleted

88Thanks to Ian Robinson

Page 99: Building Applications with a Graph Database

Neo4j 2.0

89

Page 100: Building Applications with a Graph Database

Neo4j 2.0

90

๏All about making it more convenient to model data

๏“Labels” for Nodes, enable you to model your types

๏ Indexing performed by the database, automatically, based on Labels

๏Also adds user definable constraints, based on Labels

๏START clause is gone from Cypher,instead MATCH uses the schema information from labels used in your query to determine the best start points.

Page 101: Building Applications with a Graph Database

Migrating to 2.0

91

๏Test queries with new Neo4j version

• Explicitly specify Cypher version for queries that fail(prefix with CYPHER 1.9 - this will work with existing db)

๏Redeploy app with queries known to work on both versions

๏Update the database - rolling with HA, downtime with single db

๏Very similar process for updating the domain model...

•Create schema for your domainwith indexes to replace your manual indexes

•Make your write-queries add labels

•Update all existing data: add labels

•Change reads to use MATCH with labels instead of START

•Drop old (manual) indexes

Page 102: Building Applications with a Graph Database

Summary

92

Page 103: Building Applications with a Graph Database

Building apps with Graph Databases

๏Model for your Queries, draw on a Whiteboard, using Examples, avoid Redundancy, Thank You.

๏Use Cypher where possible,write Java extensions if needed for performance(frequently not needed - just update to next version)

๏ Incremental modeling approach supported and pleasant!

๏Most Application Development Best Practices are the same!

๏Neo4j 2.0 makes modeling a whole lot nicer

•makes Cypher complete - no need to index manually!

93

Page 104: Building Applications with a Graph Database

http://neotechnology.com

Questions?