Roland Guijt - Introduction to Graph Databases and Neo4j

118
Introduction to Graph Databases and Neo4j What Is a Graph Database? Roland Guijt www.rmgsolutions.nl @rolandguijt

Transcript of Roland Guijt - Introduction to Graph Databases and Neo4j

Introduction to Graph Databases and Neo4j

What Is a Graph Database?

Roland Guijtwww.rmgsolutions.nl

@rolandguijt

Agenda

What is a Graph?

What is a Graph Database?

Why a Graph Database?

Graph Databases vs

Relational Databases

Graph Databases vs

NosqlDatabases

Examples of Graph

Databases

What Is a Graph Database?

Roland Guijtwww.rmgsolutions.nl

@rolandguijt

Node

Node

Mark

JoannaNick Follows

Follows

Follows

Megan

Follows

@Nick I love you

Loves

Graphs• Easily extendable and expandable• Friendly to the human brain• Whiteboard compatible

A graph database is a database that uses graph structures to represent and store data

Graph Databases

All about relationships Agility Flexibility

Query language Performance

Property Graph Model

Nodes and relationships contain

properties

Relationships are named and directed with a start and end

node

Contains nodes and relationships

JoannaName: Joanna

City: Salt Lake CityMarried: true

PluralsightName: PluralsightCity: Salt Lake City

Rocks: true

Works_ForSince: 2010/1/1

Why a Graph Database?

“Use a relational database for all applications”

“Consider the type of database for every application you’re writing”

Why a Graph Database?

Flexible schemaStructure and queries are brain friendly (=

easier)Highly related data

Graph Databases vs. Relational Databases

Relational Graph

Tables Nodes

Schema with nullables No schema

Relations with foreign keys Relation is first class citizen

Related data fetched with joins

Related data fetched with a pattern

Relational Databases Advantages

Calculations within one table Grouping of dataHighly structured

data

The Foreign Key System

Customer

CustomerId Name City

1 Joanna Salt Lake City

Order

OrderId CustomerId Date

1 1 2015/1/1

LineItem

OrderId ProductId Quantity

1 1 5

Product

ProductId Description Use

1 Candle Inside

Partner and Vukotic’s Experiment

• Social network• Friends of Friends structure• mySql and Neo4j• 1000.000 people• Each with an average of 50 friends• Depth 2: Find all friends of a user’s friends• Depth 3: Find all friends of friends of a user’s friends• Etc.

Depth Rel. Db (s) Neo4j (s) # records

2 0,016 0,01 ~2500

3 30,267 0,168 ~110000

4 1543,505 1,359 ~600000

5 Unfinished 2,132 ~8000000

Relational Database Normalization

Normalization is encouraged

Created when disk space was expensive

A Document

Name: JoannaCity: Salt Lake CityOrder: {

id: 1,Date: 2015/1/1LineItems: [{

Quantity: 3,Product: {Description: “Candle”,Use: “Inside”

}]}

}

Customer

Document Databases

Duplication of data is not something to

avoidCopy master dataAll related data in

one entity

Documents

Name: JoannaCity: Salt Lake CityOrder: {

id: 1,Date: 2015/1/1LineItems: [{

Quantity: 3,Product: {Description: “Candle”,Use: “Inside”

}]}

}

Customer

Name: PeterCity: DallasOrder: {

id: 2,Date: 2015/2/1LineItems: [{

Quantity: 2,Product: {Description: “Matches”,Use: “Inside”

}]}

}

Customer

Graph Databases vs. Document Databases

Document Graph

Document Nodes

No schema No schema

Relations with foreign keys or embedded

Relation is first class citizen

Related data fetched with joins or embedded

Related data fetched with a pattern

A Social Graph

Graphs ALM Testing Java .Net Web API

John Cathy Deborah Jennifer Mike

Cyber ITActive

Who shares Cathy’s skills?

Who works in the same company as Cathy and shares the most skills?

Security

Login Read Insert Update Delete Grant rights

John Cathy Deborah Jennifer Mike

Admin Editor PosterReader

Which rights does Deborah have?

Who edited a blog post and when?

Logistics

Summary

• A graph is a collection of nodes connected by relationships• Graph databases are flexible and performant with highly

related data• All database types have their place• Relational database suitable for reporting and calculation on a

single table. Weak point: related tables• Document database suitable to store objects. Weak point:

related documents• Graph databases are great in many scenarios, but not all

What’s Next?

• Neo4j

Introducing Neo4j

Roland Guijtwww.rmgsolutions.nl

@rolandguijt

Agenda

What is Neo4j? Installing Neo4j Neo4j’s Graph

Getting Started Neo4j’s Editions Settings

What Is Neo4j?

ACID Graph db Java

Enterprisefeatures

Billions of Entities Rest API

What Is Neo4j?

Graph db

Relations

NodesNo Schema

Cypher

Neo4j Editions

Building Block 1: Node

Schemaless entities/objects

Contain properties Key = string

Value = primitive data type

Indexing

Unique Constraint

Can have labels

Data Types

boolean (true/false)

byte (8 bits)

short (16 bits)

int (32 bits)

long (64 bits)

float (32 bits)

double (64 bits)

char (Unicode)

string (Unicode)

arrays

- Set implicitly- Automatic conversion when updating- No nulls

Building Block 2: Relationships

Connect nodes

Are directed

Are named

Can contain properties Same as in node

Installing Neo4j

Windows• Desktop app (installer)• Console app• Windows service

Linux• Unix console app• Linux service

Mac OSX• Homebrew• Terminal• OSX service

• Community Edition great way to start• Need JDK

Demo: Installing Neo4j on Windows

Demo: Getting Started

Demo: Settings

Summary

• Neo4j is a reliable graph database implemented in Java with enterprise features and a REST API.

• It uses the property graph model and is by default schemaless.• Cypher is what Neo4j uses as it’s primary query language.• The community edition is free and open source.• Enterprise features are available in the other editions.• The properties of a node have an implicit data type when set

with a query and support indexing and a unique constraint.• Relationships implement the same properties.• Neo4j can be installed as a desktop app (Windows only),

command line app and as a service.• Neo4j is configurable using text files.

What’s Next?

• Cypher

Querying Data

Roland Guijtwww.rmgsolutions.nl

@rolandguijt

Agenda

Data ModellingOther

Language Elements

MATCH RETURN

What is Cypher?

Advanced Syntax

What Is Cypher?

Graph query

languageDeclarative Easy on the

brainPattern

matching Clauses

Cypher Is About Pattern Matching

Recipe to make a query:- Think of a whiteboard friendly pattern or

structure you would like to retrieve - Translate into ASCII art- Surround by clauses

Node NodePlayed

() –[:PLAYED]->()

Cypher Is About Pattern Matching

Actor CharacterPlayed

(:Actor) –[:PLAYED]->(:Character)

Cypher Is About Pattern Matching

Actorname: Matt Smith

CharacterPlayed

(:Actor{name:’Matt Smith’}) –[:PLAYED]->(:Character)

Cypher Is About Pattern Matching

Actorname: Matt Smith

CharacterPlayed

(:Actor{name:’Matt Smith’}) –[:PLAYED]->(:Character)-[:COMES_FROM]->(:Planet{name:’Gallifrey’})

Planetname: Gallifrey

The MATCH and RETURN Clauses

Actorname: Matt Smith

CharacterPlayed

(:Actor{name:’Matt Smith’}) –[:PLAYED]->(:Character)(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)MATCH(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)RETURN c

Query Examples: 2 Loose Ends

MATCH (actors:Actor)-[:REGENERATED_TO]-> (others)

RETURN actors.name, others.name;

Return the name properties of all nodes with the Label property and put them side by side with the name properties of all nodes that are on the other end of the regenerated_to relation

MATCH (:Character{name:'Doctor'})<-[:ENEMY_OF]-(:Character)-[:COMES_FROM]->(p:Planet)

RETURN p.name as Planet, count(p) AS Count;

Collect all nodes with the Character label which have the enemy_of relation with the Doctor. Check if they have a comes_from relation with nodes with a Planet label. Return the name of the planets along

with the number of occurances

Query Examples: More Complex

MATCH (:Actor{name:"Matt Smith"}) -[:APPEARED_IN]-> (ep:Episode) <-[:APPEARED_IN]- (:Character{name:'Amy Pond'}),

(ep) <-[:APPEARED_IN]-(enemies:Character) <-[:ENEMY_OF]-(:Character{name:'Doctor'})

RETURN ep AS Episode, collect(enemies.name) AS Enemies;

Give me all the episodes the character Amy Pond and the Actor Matt Smith were in. List the enemies of the Doctor that were in that episode beside it.

Query Examples: More Complex

Where

MATCH(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)RETURN c

MATCH(a:Actor) –[:PLAYED]->(c:Character)WHERE a.name = ‘Matt Smith’RETURN c

• Filters result set

Order By

MATCH(a:Actor) –[:PLAYED]->(c:Character)WHERE a.name = ‘Matt Smith’RETURN cORDER BY c.name

• Orders result set• Supports multiple properties• Use DESC to reverse order

Skip and Limit

MATCH(:Actor{name:’Matt Smith’}) –[:PLAYED]->(c:Character)RETURN cLIMIT 10SKIP 5

• Limits result set

Union

MATCH (a:Actor) RETURN a.nameUNIONMATCH (c:Character)RETURN c.name

• Glues result sets together• Use UNION ALL to include duplicates

With

MATCH(a:Actor)WITH a.name AS name, count(a) AS countORDER BY nameWHERE count > 10RETURN name

• Manipulate result set for the rest of the query• Can have ORDER BY clause

Start (legacy)

• Was used to access legacy indexes• Provide a starting point for the pattern

Predicates

• Return true or false for a given input• Input can be properties or patterns• Mostly used in WHERE clause• ALL, ANY, NONE, SINGLE, EXISTS

MATCH(a:Actor)WHERE EXISTS ((a)-[:PLAYED]->())RETURN a.name

Scalar Functions

• Return a single value• LENGTH, TYPE, ID, COALESCE, HEAD,• LAST, TIMESTAMP, TOINT, TOFLOAT, TOSTRING

MATCHp = (:Actor)-[:PLAYED]->(:Character)RETURN LENGTH(p)

Collection Functions

• Return collections of ‘things’• NODES, RELATIONSHIPS, LABELS• EXTRACT, FILTER, TAIL• RANGE, REDUCE

MATCHp = (:Actor)-[:PLAYED]->(:Character)RETURN NODES(p)

Mathematical Functions

• ABS• ACOS• ASIN• ATAN• COS• COT• DEGREES• EXP• FLOOR• ROUND• SQRT• Etc.

String Functions

• STR• REPLACE• SUBSTRING• LEFT• RIGHT• LTRIM• RTRIM• TRIM• LOWER• UPPER• SPLIT

Advanced Syntax: Directionless Relationships

MATCH(:Episode)-[:PREVIOUS]-(e:Episode)RETURN e

Advanced Syntax: No Relationship Defined

MATCH(:Episode)-->(e:Episode)RETURN e

Advanced Syntax: No Relationship Name

MATCH(:Actor)-[]->()-[]->(p:Planet)RETURN p

Advanced Syntax: Number of Hops

MATCH(:Actor)-[*2]->(p:Planet)RETURN p

MATCH(c:Character)-[:COMPANION_OF*1..2]-(:Character)RETURN c

Advanced Syntax: Shortest Path

MATCH (earth:Planet { name:"Earth" }),(gallifrey:Planet { name:"Gallifrey" }),p = shortestPath((earth)-[*..15]-(gallifrey))RETURN p

Advanced Syntax: Optional MATCH

MATCH (a:Character) OPTIONAL MATCH (a)-[r:COMES_FROM]->() RETURN r

Summary

• Cypher is a powerful, declarative query language for Neo4j.• It uses patterns to query data.• Cypher’s main clauses are MATCH and RETURN.• There are more SQL-like clauses like WHERE.• Many powerful functions to be used in query complement

the language.• Going beyond the basic syntax opens up even more

powerful query possibilities.

What’s Next?

• Manipulating data

Manipulating Data

Roland Guijtwww.rmgsolutions.nl

@rolandguijt

Agenda

Creating, Updating and

DeletingImporting CSV

Indexes and Unique

Constraint

Advanced data manipulation

CREATE (n)

CreateCreates nodes and relationships

CREATE (n:Actor{name: ‘Peter Capaldi’}) RETURN n

MATCH (matt:Actor{name: ‘Matt Smith’}), chris:Actor{name: ‘Christopher Eccleston’}

CREATE (matt) [:REGENERATED_TO] (chris)

Create Complete Path

CREATE p =(:Actor{name: ‘Peter Capaldi’})-[:APPEARED_IN] ->(:Episode{name:’The Time of The Doctor’})

RETURN p

DeleteDeletes nodes and relationships

MATCH (matt:Actor{name: ‘Matt Smith’})

DELETE matt

MATCH (matt:Actor{name: ‘Matt Smith’})-[r]-()

DELETE matt, r

SetManipulates properties

MATCH (matt:Actor{name: ‘Matt Smith’})

SET matt.salary = 100000, matt.active = true

MATCH (matt:Actor{name: ‘Matt Smith’}), chris:Actor{name: ‘Christopher Eccleston’}

SET matt = chris

MATCH (matt:Actor{name: ‘Matt Smith’})

SET matt.salary = NULL

SetSets labels

MATCH (matt:Actor{name: ‘Matt Smith’})

SET matt:Doctor

RemoveRemoves properties or labels

MATCH (matt:Actor{name: ‘Matt Smith’})

REMOVE matt:Doctor

MATCH (matt:Actor{name: ‘Matt Smith’})

REMOVE matt.salary

MergeMatch replacement: returns or creates (parts of) a pattern

MERGE (peter:Actor{name: ‘Peter Capaldi’})RETURN peter

MERGE (peter:Actor{name: ‘Peter Capaldi’, salary: 100000})RETURN peter

MATCH (peter:Actor{name: ‘Peter Capaldi’}), (doctor:Character{name: “Doctor”})MERGE (peter –[r:PLAYED]->doctor)RETURN r

ForeachHelper to set a property or label in a path

MATCH p=(actors:Actor)–[r:PLAYED]->others)WHERE actors.salary > 100000FOREACH (n IN nodes(p)| set n.done = true)

IndexPerformance gain when querying for a certain property value

CREATE INDEX ON :Actor(name)

- Keeps dictionary of values- Watch performance issues while writing- The use of an index is automatic

DROP INDEX ON :Actor(name)

MATCH (matt:Actor{name: ‘Matt Smith’})RETURN matt

Unique ConstraintEnsures uniqueness of a property value

- Currently the only constraint available- Watch performance issues while writing- Will also add an index

CREATE CONSTRAINT ON (a:Actor)ASSERT a.name IS UNIQUE

DROP CONSTRAINT ON (a:Actor)ASSERT a.name IS UNIQUE

Importing CSV

- Cypher supports importing CSV- CSV files can be loaded from the local file

system or via HTTPS, HTTP and FTP- Use CREATE and MERGE in conjunction with

LOAD CSV- Example: actors, movies, connections

Importing CSV: Step 1- Import actors- CSV looks like this:

id name

3 Michael Douglas

4 Martin Sheen

5 Morgan Freeman

LOAD CSV WITH HEADERS FROM“http://docs.neo4j.org/chunked/2.1.6/csv/import/persons.csv”

AS csvLine

CREATE (p:Person {id: toInt(csvLine.id), name: csvLine.name})

Importing CSV: Step 2- Import movies, normalize countries- CSV looks like this:

LOAD CSV WITH HEADERS FROM“http://docs.neo4j.org/chunked/2.1.6/csv/import/movies.csv”

AS csvLineMERGE (country: Country {name: csvLine.country})CREATE (movie:Movie {id: toInt(csvLine.id), title: csvLine.title})CREATE (movie)-[MADE_IN]->(country)

id title country

1 Wall Street USA

2 The American President USA

Importing CSV: Step 3- Import actor -> movies relationship- CSV looks like this:

LOAD CSV WITH HEADERS FROM“http://docs.neo4j.org/chunked/2.1.6/csv/import/roles.csv”

AS csvLineMATCH (actor:Person {id: toInt(csvLine.personId}), (movie:Movie {id: toInt(csvLine.movieId})CREATE (actor)-[:PLAYED {role: csvLine.role}]->(movie)

personId movieId role

4 1 Carl Fox

4 2 A.J. MacInerney

Summary

• Use CREATE to create nodes and relationships.• With DELETE you can remove them.• Set and update property values and add labels to nodes

with SET.• REMOVE deletes properties and labels.• MERGE only creates if needed.• An index on a property makes querying faster, but writing

slower.• Use the unique constraint to make property values unique.• Import data from other systems with Cypher’s support for

reading CSV.

What’s Next?

• REST API

The REST API

Roland Guijtwww.rmgsolutions.nl

@rolandguijt

Agenda

RESTIndexes and

Unique Constraint

Node and Relationship Operations

Service Root

Cypher via REST Client Access

What You Should Know About REST

Web Service HTTP Data at

URLHTTP

Methods

Hyper

media

Controls

A Typical Request and Response

POST http://someurlAccept: application/json; charset=UTF-8Content-Type: application/json

{name: “Peter Capaldi”

}

Request:

Response:201: CreatedContent-Length: 1239Content-Type: application/json; charset=UTF-8Location: http://localhost:7474/db/data/node/107

{<Some Data>

}

Service Root

GET http://localhost:7474/db/data/Accept: application/json; charset=UTF-8

- Provides a REST starting point- Returns list of hypermedia links

Node Operations: Get by Id

GET http://localhost:7474/db/data/node/1Accept: application/json; charset=UTF-8

- GET HTTP Method- On service root node URL- Returns data object with properties- And hypermedia links to get the rest

Node Operations: Create

POST http://localhost:7474/db/data/nodeAccept: application/json; charset=UTF-8

- POST HTTP Method- On service root node URL- Returns created node

Node Operations: Create with Properties

POST http://localhost:7474/db/data/nodeAccept: application/json; charset=UTF-8Content-Type: application/json

{name: “Peter Capaldi”

}

- Attach content to the POST request

Node Operations: Delete

DELETE http://localhost:7474/db/data/node/100Accept: application/json; charset=UTF-8

- DELETE HTTP Method

Node Operations: Properties

PUThttp://localhost:7474/db/data/node/1/properties/salaryAccept: application/json; charset=UTF-8Content-Type: application/json100000

- Use same base URL to GET all properties for a node- PUT HTTP method: SET property on node- Name in URL, value attached- PUT without property name replaces all- DELETE HTTP method: remove property from node

Node Operations: Labels

POST http://localhost:7474/db/data/node/1/labelsAccept: application/json; charset=UTF-8Content-Type: application/json

[“Person”, “Actor”]

- Like properties- GET lists, POST adds, PUT replaces

Relationship Operations: General

- Like nodes- Use relationship URL- Notable exceptions follow

Relationship Operations: Get by node

GEThttp://localhost:7474/db/data/node/1/relationships/allAccept: application/json; charset=UTF-8

GEThttp://localhost:7474/db/data/node/1/relationships/all/PLAYED&REGENERATED_TOAccept: application/json; charset=UTF-8

Relationship Operations: Create

POST http://localhost:7474/db/data/node/1/relationshipsAccept: application/json; charset=UTF-8Content-Type: application/json{"to" : "http://localhost:7474/db/data/node/19","type" : "LOVES","data" : {“intensity" : “medium"

}}

- POST- Include JSON with details

Node Operations: Traversals

- Traverse the graph- One node as starting point- Paged traversals are stored for later retrieval

Ingredients- URL of starting node- What to return as URL extension

path, fullpath, node, relationship- Further details in attachment

Node Operations: TraversalsPOST http://localhost:7474/db/data/node/1/traverse/nodeAccept: application/json; charset=UTF-8Content-Type: application/json{"order" : "breadth_first","return_filter" : {"body" : "position.endNode().getProperty('name').toLowerCase().contains(‘p')","language" : "javascript"

},"prune_evaluator" : {"body" : "position.length() > 10","language" : "javascript"

},"uniqueness" : "node_global","relationships" : [ {"direction" : “out","type" : “REGENERATED_TO"

}, {"direction" : "all","type" : “PLAYED"

} ],"max_depth" : 3

}

Batch Operations

POST http://localhost:7474/db/data/batchAccept: application/json; charset=UTF-8Content-Type: application/json[ {

"method" : "POST","to" : "/node","id" : 0,"body" : {

"name" : "bob"}

}, {"method" : "POST","to" : "/node","id" : 1,"body" : {

"age" : 12}

},

{"method" : "POST","to" : "{0}/relationships","id" : 3,"body" : {

"to" : "{1}","data" : {

"since" : "2010"},"type" : "KNOWS"

}}

Indexes

GET http://localhost:7474/db/data/schema/index/ActorAccept: application/json; charset=UTF-8

• List all indexes for a label

POST http://localhost:7474/db/data/schema/index/ActorAccept: application/json; charset=UTF-8{

"property_keys" : [ "name" ]}

• Create an index on a label

DELETEhttp://localhost:7474/db/data/schema/index/ActorAccept: application/json; charset=UTF-8

• Drop index

Constraints

• Like indexes• Base URL example:http://localhost:7474/db/data/schema/constraint/Actor

Transactional Cypher Endpoint

• Execute Cypher via the REST API• Support different output styles, all in JSON• Transaction can remain open between requests• Transaction can timeout

Begin a Transaction and Commit in One Request

POST http://localhost:7474/db/data/transaction/commitAccept: application/json; charset=UTF-8Content-Type: application/json{

"statements" : [ {"statement" : "CREATE (n {props}) RETURN n","parameters" : {

"props" : {"name" : “Peter Capaldi“,“salary” : 100000

}}

} ]}

Output Styles

• Specify style after statement{

"statements" : [ {"statement" : "CREATE (n) RETURN n","resultDataContents" : [ "REST" ]

} ]}

• Default: Columns and contents• REST: Same output as REST operations• Graph: To reconstruct a graph

Begin a Transaction

• POST to transaction base URLPOST http://localhost:7474/db/data/transaction

• Returns info about the transaction201: CreatedContent-Type: application/jsonLocation: http://localhost:7474/db/data/transaction/7{

"commit" : "http://localhost:7474/db/data/transaction/7/commit","results" : …..,"transaction" : {

"expires" : "Mon, 2 Feb 2015 20:53:51 +0000"}}

Execute Subsequent Request in Transaction

• POST to transaction returned earlierPOST http://localhost:7474/db/data/transaction/7

• POST to commit url for final statements in transactionPOST http://localhost:7474/db/data/transaction/7/Commit

• To rollback: DELETE to transaction URL DELETE http://localhost:7474/db/data/transaction/7

• or let timeout expire

Client Access

Create HTTP requests and parse JSON

Use client library

- Do it yourself- More work- No dependency- Total freedom

- Someone else does the work- Ready to go- Dependency- Maybe not entirely what you want

Client Access Demo

Create HTTP requests and parse JSON

Use client library

- C# .Net console app- Microsoft HTTPClient- Class models for request/response

- C# .Net console app- Readify neo4jclient- Class models for actor node

Summary

• The REST API provides access from various platforms.• REST accomplishes this by leveraging HTTP.• Call the service root to get a list of URLs, called hypermedia

controls, that provide a starting point.• There are two ways to do operations on the REST API: Use

pure REST operations or execute Cypher.• To access Neo4j from your app, a client library is the easiest

way, but low level HTTP calls are also a possibility.

Thank You

Contact me:

[email protected]@rolandguijt