Neo4j Manual 2.1 SNAPSHOT

603

description

Neo4j Manual 2.1 SNAPSHOT

Transcript of Neo4j Manual 2.1 SNAPSHOT

  • The Neo4j Manual v2.1-SNAPSHOT

    The Neo4j Team neo4j.org www.neotechnology.com

  • The Neo4j Manual v2.1-SNAPSHOTby The Neo4j Team neo4j.org www.neotechnology.com

    Publication date 2014-03-0304:16:34Copyright 2014 Neo Technology

    Starting points

    What is a graph database? Cypher Query Language Languages / Remote Client Libraries REST API Installation Upgrading Security

    License: Creative Commons 3.0This book is presented in open source and licensed through Creative Commons 3.0. You are free to copy, distribute, transmit, and/or adapt the work. Thislicense is based upon the following conditions:

    Attribution. You must attribute the work in the manner specified by the author or licensor (but not in any way that suggests that they endorse you or youruse of the work).

    Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.

    Any of the above conditions can be waived if you get permission from the copyright holder.

    In no way are any of the following rights affected by the license:

    Your fair dealing or fair use rights The authors moral rights Rights other persons may have either in the work itself or in how the work is used, such as publicity or privacy rights

    NoteFor any reuse or distribution, you must make clear to the others the license terms of this work. The best way to do this is with a direct link tothis page: http://creativecommons.org/licenses/by-sa/3.0/

  • iii

    Table of ContentsPreface .................................................................................................................................................... vI. Introduction ........................................................................................................................................ 1

    1. Neo4j Highlights ....................................................................................................................... 22. Graph Database Concepts ......................................................................................................... 33. The Neo4j Graph Database ..................................................................................................... 11

    II. Tutorials .......................................................................................................................................... 214. Getting started with Cypher .................................................................................................... 225. Data Modeling Examples ........................................................................................................ 346. Languages ................................................................................................................................ 77

    III. Cypher Query Language ............................................................................................................... 837. Introduction .............................................................................................................................. 848. Syntax ...................................................................................................................................... 979. General Clauses ..................................................................................................................... 11510. Reading Clauses .................................................................................................................. 13111. Writing Clauses ................................................................................................................... 15912. Importing Data from CSV ................................................................................................... 18513. Functions .............................................................................................................................. 19014. Schema ................................................................................................................................. 21815. From SQL to Cypher .......................................................................................................... 222

    IV. Reference ..................................................................................................................................... 22916. Capabilities .......................................................................................................................... 23017. Transaction Management .................................................................................................... 23718. Data Import .......................................................................................................................... 24519. Graph Algorithms ................................................................................................................ 24620. REST API ............................................................................................................................ 24821. Deprecations ........................................................................................................................ 369

    V. Operations ..................................................................................................................................... 37022. Installation & Deployment .................................................................................................. 37123. Configuration & Performance ............................................................................................. 38524. High Availability ................................................................................................................. 42025. Backup ................................................................................................................................. 44326. Security ................................................................................................................................ 44927. Monitoring ........................................................................................................................... 455

    VI. Tools ............................................................................................................................................ 47128. Web Interface ...................................................................................................................... 47229. Neo4j Shell .......................................................................................................................... 473

    VII. Community ................................................................................................................................. 49030. Community Support ............................................................................................................ 49131. Contributing to Neo4j .......................................................................................................... 492

    VIII. Advanced Usage ....................................................................................................................... 51632. Extending the Neo4j Server ................................................................................................ 51733. Using Neo4j embedded in Java applications ...................................................................... 52534. The Traversal Framework ................................................................................................... 55735. Legacy Indexing .................................................................................................................. 56636. Batch Insertion ..................................................................................................................... 585

    A. Manpages ...................................................................................................................................... 589neo4j ........................................................................................................................................... 590

  • The Neo4j Manual v2.1-SNAPSHOT

    iv

    neo4j-installer ............................................................................................................................. 592neo4j-shell .................................................................................................................................. 593neo4j-backup .............................................................................................................................. 595neo4j-arbiter ............................................................................................................................... 597

  • vPreface

    This is the reference manual for Neo4j version 2.1-SNAPSHOT, authored by the Neo4j Team.

    The main parts of the manual are:

    PartI, Introduction introducing graph database concepts and Neo4j. PartII, Tutorials learn how to use Neo4j. PartIII, Cypher Query Language details on the Cypher query language. PartIV, Reference detailed information on Neo4j. PartV, Operations how to install and maintain Neo4j. PartVI, Tools guides on tools. PartVII, Community getting help from, contributing to. PartVIII, Advanced Usage using Neo4j in more advanced ways. AppendixA, Manpages command line documentation.

    The material is practical, technical, and focused on answering specific questions. It addresses howthings work, what to do and what to avoid to successfully run Neo4j in a production environment.

    The goal is to be thumb-through and rule-of-thumb friendly.

    Each section should stand on its own, so you can hop right to whatever interests you. When possible,the sections distill rules of thumb which you can keep in mind whenever you wander out of thehouse without this manual in your back pocket.

    The included code examples are executed when Neo4j is built and tested. Also, the REST API requestand response examples are captured from real interaction with a Neo4j server. Thus, the examples arealways in sync with how Neo4j actually works.

    Theres other documentation resources besides the manual as well:

    Neo4j Cypher Refcard, see http://docs.neo4j.org/ for available versions. Neo4j GraphGist, an online tool for creating interactive web pages with executable Cypher

    statements: http://gist.neo4j.org/. The main Neo4j site at http://www.neo4j.org/ is a good starting point to learn about Neo4j.

    Who should read this?

    The topics should be relevant to architects, administrators, developers and operations personnel.

  • PartI.IntroductionThis part gives a birds eye view of what a graph database is, and then outlines some specifics of Neo4j.

  • 2Chapter1.Neo4j Highlights

    As a robust, scalable and high-performance database, Neo4j is suitable for full enterprise deploymentor a subset of the full server can be used in lightweight projects.

    It features:

    true ACID transactions, high availability, scales to billions of nodes and relationships, high speed querying through traversals, declarative graph query language.

    Proper ACID behavior is the foundation of data reliability. Neo4j enforces that all operations thatmodify data occur within a transaction, guaranteeing consistent data. This robustness extends fromsingle instance embedded graphs to multi-server high availability installations. For details, seeChapter17, Transaction Management.

    Reliable graph storage can easily be added to any application. A graph can scale in size andcomplexity as the application evolves, with little impact on performance. Whether starting newdevelopment, or augmenting existing functionality, Neo4j is only limited by physical hardware.

    A single server instance can handle a graph of billions of nodes and relationships. When datathroughput is insufficient, the graph database can be distributed among multiple servers in a highavailability configuration. See Chapter24, High Availability to learn more.

    The graph database storage shines when storing richly-connected data. Querying is performed throughtraversals, which can perform millions of traversal steps per second. A traversal step resembles a joinin a RDBMS.

  • 3Chapter2.Graph Database Concepts

    This chapter contains an introduction to the graph data model and also compares it to other datamodels used when persisting data.

  • Graph Database Concepts

    4

    2.1.What is a Graph Database?A graph database stores data in a graph, the most generic of data structures, capable of elegantlyrepresenting any kind of data in a highly accessible way. Lets follow along some graphs, using themto express graph concepts. Well read a graph by following arrows around the diagram to formsentences.

    2.1.1.A Graph contains Nodes and RelationshipsA Graph records data in Nodes which have Properties

    The simplest possible graph is a single Node, a record that has named values referred to as Properties.A Node could start with a single Property and grow to a few million Properties, though that can get alittle awkward. At some point it makes sense to distribute the data into multiple nodes, organized withexplicit Relationships.

    Graph

    Nodes

    records data in Relat ionships

    records data in

    Propert ies

    have

    organize

    have

    Labels

    group

    2.1.2.Relationships organize the GraphNodes are organized by Relationships which also have Properties

    Relationships organize Nodes into arbitrary structures, allowing a Graph to resemble a List, a Tree,a Map, or a compound Entity any of which can be combined into yet more complex, richly inter-connected structures.

    2.1.3.Labels group the NodesNodes are grouped by Labels into Sets

    Labels are a means of grouping the nodes in the graph. They can be used to restrict queries to subsetsof the graph, as well as enabling optional model constraints and indexing rules.

  • Graph Database Concepts

    5

    2.1.4.Query a Graph with a TraversalA Traversal navigates a Graph; it identifies Paths which order Nodes

    A Traversal is how you query a Graph, navigating from starting Nodes to related Nodes according toan algorithm, finding answers to questions like what music do my friends like that I dont yet own,or if this power supply goes down, what web services are affected?

    Traversal

    Graph

    navigates

    Paths

    ident ifies

    Algorithm

    expresses

    Relat ionships

    records data in

    Nodes

    records data in order

    organize

    2.1.5.Indexes look-up Nodes or RelationshipsAn Index maps from Properties to either Nodes or Relationships

    Often, you want to find a specific Node or Relationship according to a Property it has. Rather thantraversing the entire graph, use an Index to perform a look-up, for questions like find the Account forusername master-of-graphs.

  • Graph Database Concepts

    6

    Indexes

    Relat ionships

    m ap to

    Nodes

    m ap to

    Propert ies

    m ap fromorganize

    have

    have

    2.1.6.Neo4j is a Graph DatabaseA Graph Database manages a Graph and also manages related Indexes

    Neo4j is a commercially supported open-source graph database. It was designed and built from theground-up to be a reliable database, optimized for graph structures instead of tables. Working withNeo4j, your application gets all the expressiveness of a graph, with all the dependability you expectout of a database.

  • Graph Database Concepts

    7

    Graph Database

    Graph

    m anages

    Indexes

    m anages

    Relat ionships

    records data in

    Nodes

    records data in

    m ap to

    m ap to

    Propert ies

    m ap from organize

    have

    have

    Traversal

    navigates

    Paths

    ident ifies

    Algorithm

    expresses

    order

  • Graph Database Concepts

    8

    2.2.Comparing Database ModelsA Graph Database stores data structured in the Nodes and Relationships of a graph. How does thiscompare to other persistence models? Because a graph is a generic structure, lets compare how a fewmodels would look in a graph.

    2.2.1.A Graph Database transforms a RDBMSTopple the stacks of records in a relational database while keeping all the relationships, and youll seea graph. Where an RDBMS is optimized for aggregated data, Neo4j is optimized for highly connecteddata.

    Figure2.1.RDBMS

    A1

    A2

    A3

    B1

    B2

    B3

    B4

    B5

    B6

    B7

    C1

    C2

    C3

    Figure2.2.Graph Database as RDBMS

    A1

    B1B2

    A2

    B4B6

    A3

    B3B5 B7

    C1 C2C3

    2.2.2.A Graph Database elaborates a Key-Value StoreA Key-Value model is great for lookups of simple values or lists. When the values are themselvesinterconnected, youve got a graph. Neo4j lets you elaborate the simple data structures into morecomplex, interconnected data.

  • Graph Database Concepts

    9

    Figure2.3.Key-Value Store

    K1

    K2

    K3

    V1

    K2

    V2

    K1

    K3

    V3

    K1

    K* represents a key, V* a value. Note that some keys point to other keys as well as plain values.

    Figure2.4.Graph Database as Key-Value Store

    V1

    V2

    V3K1

    K2

    K3

    2.2.3.A Graph Database relates Column-FamilyColumn Family (BigTable-style) databases are an evolution of key-value, using "families" to allowgrouping of rows. Stored in a graph, the families could become hierarchical, and the relationshipsamong data becomes explicit.

    2.2.4.A Graph Database navigates a Document StoreThe container hierarchy of a document database accommodates nice, schema-free data that can easilybe represented as a tree. Which is of course a graph. Refer to other documents (or document elements)within that tree and you have a more expressive representation of the same data. When in Neo4j, thoserelationships are easily navigable.

  • Graph Database Concepts

    10

    Figure2.5.Document Store

    D1

    S1

    D2

    S2S3

    V1D2/S2 V2V3V4D1/S1

    D=Document, S=Subdocument, V=Value, D2/S2 = reference to subdocument in (other) document.

    Figure2.6.Graph Database as Document Store

    D1

    S1D2 S2S3

    V1

    V2

    V3

    V4

  • 11

    Chapter3.The Neo4j Graph Database

    This chapter goes into more detail on the data model and behavior of Neo4j.

  • The Neo4j Graph Database

    12

    3.1.NodesThe fundamental units that form a graph are nodes and relationships. In Neo4j, both nodes andrelationships can contain properties.

    Nodes are often used to represent entities, but depending on the domain relationships may be used forthat purpose as well.

    Apart from properties and relationships, nodes can also be labeled with zero or more labels.

    A Node

    Relat ionships

    can have

    Propert ies

    can have Labels

    can have

    can have

    Lets start out with a really simple graph, containing only a single node with one property:

    nam e: Peter

  • The Neo4j Graph Database

    13

    3.2.RelationshipsRelationships between nodes are a key part of a graph database. They allow for finding related data.Just like nodes, relationships can have properties.

    A Relat ionship

    Start node

    has a

    End node

    has a

    Relat ionship type

    has a

    Propert ies

    can have

    Nam e

    uniquely ident ified by

    A relationship connects two nodes, and is guaranteed to have valid start and end nodes.

    Start node End noderelat ionship

    As relationships are always directed, they can be viewed as outgoing or incoming relative to a node,which is useful when traversing the graph:

    Nodeincom ing relat ionship outgoing relat ionship

    Relationships are equally well traversed in either direction. This means that there is no need to addduplicate relationships in the opposite direction (with regard to traversal or performance).

    While relationships always have a direction, you can ignore the direction where it is not useful in yourapplication.

    Note that a node can have relationships to itself as well:

    Node loop

    To further enhance graph traversal all relationships have a relationship type. Note that the word typemight be misleading here, you could rather think of it as a label. The following example shows asimple social network with two relationship types.

  • The Neo4j Graph Database

    14

    Maja

    Oscar

    follows follows

    William

    blocks

    Alice

    follows

    Using relationship direction and typeWhat Howget who a person follows outgoing follows relationships, depth oneget the followers of a person incoming follows relationships, depth oneget who a person blocks outgoing blocks relationships, depth oneget who a person is blocked by incoming blocks relationships, depth one

  • The Neo4j Graph Database

    15

    3.3.PropertiesBoth nodes and relationships can have properties.

    Properties are key-value pairs where the key is a string. Property values can be either a primitive or anarray of one primitive type. For example String, int and int[] values are valid for properties.

    NoteNULL is not a valid property value. NULLs can instead be modeled by the absence of a key.

    A Property

    Key

    has a

    Value

    has a

    Prim it ive

    boolean

    byte

    short

    int

    long

    float

    double

    char

    St ring

    is acan be acan be an array of

    Property value typesType Description Value rangeboolean true/falsebyte 8-bit integer -128 to 127, inclusiveshort 16-bit integer -32768 to 32767, inclusiveint 32-bit integer -2147483648 to 2147483647, inclusivelong 64-bit integer -9223372036854775808 to

    9223372036854775807, inclusivefloat 32-bit IEEE 754 floating-point numberdouble 64-bit IEEE 754 floating-point number

  • The Neo4j Graph Database

    16

    Type Description Value rangechar 16-bit unsigned integers representing

    Unicode charactersu0000 to uffff (0 to 65535)

    String sequence of Unicode characters

    For further details on float/double values, see Java Language Specification .

  • The Neo4j Graph Database

    17

    3.4.LabelsA label is a named graph construct that is used to group nodes into sets; all nodes labeled with thesame label belongs to the same set. Many database queries can work with these sets instead of thewhole graph, making queries easier to write and more efficient. A node may be labeled with anynumber of labels, including none, making labels an optional addition to the graph.

    A Label

    Nam e

    has a

    Node

    groups

    Labels are used when defining contraints and adding indexes for properties.

    An example would be a label named User that you label all your nodes representing users with. Withthat in place, you can ask Neo4j to perform operations only on your user nodes, such as finding allusers with a given name.

    However, you can use labels for much more. For instance, since labels can be added and removedduring runtime, they can be used to mark temporary states for your nodes. You might create anOffline label for phones that are offline, a Happy label for happy pets, and so on.

    3.4.1.Label namesAny non-empty unicode string can be used as a label name. In Cypher, you may need to use thebacktick (`) syntax to avoid clashes with Cypher identifier rules. By convention, labels are writtenwith CamelCase notation, with the first letter in upper case. For instance, User or CarOwner.

    Labels have an id space of an int, meaning the maximum number of labels the database can contain isroughly 2 billion.

  • The Neo4j Graph Database

    18

    3.5.PathsA path is one or more nodes with connecting relationships, typically retrieved as a query or traversalresult.

    A Path

    Start Node

    has a

    Relat ionship

    can contain one or m ore

    End Node

    has an

    Node

    accom panied by a

    The shortest possible path has length zero and looks like this:

    Node

    A path of length one:

    Node 1

    Node 2

    Relat ionship 1

    Another path of length one:

    Node 1 Relat ionship 1

  • The Neo4j Graph Database

    19

    3.6.TraversalTraversing a graph means visiting its nodes, following relationships according to some rules. In mostcases only a subgraph is visited, as you already know where in the graph the interesting nodes andrelationships are found.

    Cypher provides a declarative way to query the graph powered by traversals and other techniques. SeePartIII, Cypher Query Language for more information.

    Neo4j comes with a callback based traversal API which lets you specify the traversal rules. At a basiclevel theres a choice between traversing breadth- or depth-first.

    For an in-depth introduction to the traversal framework, see Chapter34, The Traversal Framework.For Java code examples see Section33.7, Traversal.

  • The Neo4j Graph Database

    20

    3.7.SchemaNeo4j is a schema-optional graph database. You can use Neo4j without any schema. Optionally youcan introduce it in order to gain performance or modeling benefits. This allows a way of workingwhere the schema does not get in your way until you are at a stage where you want to reap the benefitsof having one.

    3.7.1.Indexes

    NoteThis feature was introduced in Neo4j 2.0, and is not the same as the legacy indexes (seeChapter35, Legacy Indexing).

    Performance is gained by creating indexes, which improve the speed of looking up nodes in thedatabase. Once youve specified which properties to index, Neo4j will make sure your indexes arekept up to date as your graph evolves. Any operation that looks up nodes by the newly indexedproperties will see a significant performance boost.

    Indexes in Neo4j are eventually available. That means that when you first create an index, theoperation returns immediately. The index is populating in the background and so is not immediatelyavailable for querying. When the index has been fully populated it will eventually come online. Thatmeans that it is now ready to be used in queries.

    If something should go wrong with the index, it can end up in a failed state. When it is failed, it willnot be used to speed up queries. To rebuild it, you can drop and recreate the index. Look at logs forclues about the failure.

    You can track the status of your index by asking for the index state through the API you are using.Note, however, that this is not yet possible through Cypher.

    How to use indexes in the different APIs:

    Cypher: Section14.1, Indexes REST API: Section20.13, Indexing Listing Indexes via Shell: Section29.6.11, Listing Indexes and Constraints Java Core API: Section33.3, User database with indexes

    3.7.2.Constraints

    NoteThis feature was introduced in Neo4j 2.0.

    Neo4j can help you keep your data clean. It does so using constraints, that allow you to specify therules for what your data should look like. Any changes that break these rules will be denied.

    In this version, unique constraints is the only available constraint type.

    How to use constraints in the different APIs:

    Cypher: Section14.2, Constraints REST API: Section20.14, Constraints Listing Constraints via Shell: Section29.6.11, Listing Indexes and Constraints

  • PartII.TutorialsThe tutorial part describes how use Neo4j. It takes you from Hello World to advanced usage of graphs.

  • 22

    Chapter4.Getting started with Cypher

    This chapter will guide you through your first steps with Cypher.

    In the online edition of this manual, all queries in this section can be executed interactively withoutinstalling Neo4j on your computer.

    Otherwise, first get the Neo4j server running to try things out locally. Instructions are found inSection22.2, Server Installation. With the server running, you can choose to issue Cypher queriesfrom either the web interface or the Neo4j shell. See Chapter28, Web Interface or Chapter29, Neo4jShell.

  • Getting started with Cypher

    23

    4.1.Create nodes and relationshipsCreate a node for the actor Tom Hanks:CREATE (n:Actor { name:"Tom Hanks" });

    Lets find the node we created:MATCH (actor:Actor { name: "Tom Hanks" })RETURN actor;

    Now lets create a movie and connect it to the Tom Hanks node with an ACTED_IN relationship:MATCH (actor:Actor)WHERE actor.name = "Tom Hanks"CREATE (movie:Movie { title:'Sleepless IN Seattle' })CREATE (actor)-[:ACTED_IN]->(movie);

    Using a WHERE clause in the query above to get the Tom Hanks node does the same thing as the patternin the MATCH clause of the previous query.

    This is how our graph looks now:

    Actor

    nam e = 'Tom Hanks'

    Movie

    t it le = 'Sleepless in Seat t le'

    ACTED_IN

    We can do more of the work in a single clause. CREATE UNIQUE will make sure we dont create duplicatepatterns. Using this: [r:ACTED_IN] lets us return the relationship.MATCH (actor:Actor { name: "Tom Hanks" })CREATE UNIQUE (actor)-[r:ACTED_IN]->(movie:Movie { title:"Forrest Gump" })RETURN r;

    Set a property on a node:MATCH (actor:Actor { name: "Tom Hanks" })SET actor.DoB = 1944RETURN actor.name, actor.DoB;

    The labels Actor and Movie help us organize the graph. Lets list all Movie nodes:MATCH (movie:Movie)RETURN movie AS `All Movies`;

    All MoviesNode[1]{title:"Sleepless in Seattle"}

    Node[2]{title:"Forrest Gump"}

    2 rows

  • Getting started with Cypher

    24

    4.2.Movie DatabaseOur example graph consists of movies with title and year and actors with a name. Actors have ACTS_INrelationships to movies, which represents the role they played. This relationship also has a roleattribute.

    Well go with three movies and three actors:CREATE (matrix1:Movie { title : 'The Matrix', year : '1999-03-31' })CREATE (matrix2:Movie { title : 'The Matrix Reloaded', year : '2003-05-07' })CREATE (matrix3:Movie { title : 'The Matrix Revolutions', year : '2003-10-27' })CREATE (keanu:Actor { name:'Keanu Reeves' })CREATE (laurence:Actor { name:'Laurence Fishburne' })CREATE (carrieanne:Actor { name:'Carrie-Anne Moss' })CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix1)CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix2)CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix3)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix1)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix2)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix3)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix1)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix2)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix3)

    This gives us the following graph to play with:

    Movie

    t it le = 'The Matrix 'year = '1999-03-31'

    Movie

    t it le = 'The Matrix Reloaded'year = '2003-05-07'

    Movie

    t it le = 'The Matrix Revolut ions'year = '2003-10-27'

    Actor

    nam e = 'Keanu Reeves'

    ACTS_INrole = 'Neo'

    ACTS_INrole = 'Neo'

    ACTS_INrole = 'Neo'

    Actor

    nam e = 'Laurence Fishburne'

    ACTS_INrole = 'Morpheus'

    ACTS_INrole = 'Morpheus'

    ACTS_INrole = 'Morpheus'

    Actor

    nam e = 'Carrie-Anne Moss'

    ACTS_INrole = 'Trinity '

    ACTS_INrole = 'Trinity '

    ACTS_INrole = 'Trinity '

    Lets check how many nodes we have now:MATCH (n)RETURN "Hello Graph with " + count(*)+ " Nodes!" AS welcome;

    Return a single node, by name:MATCH (movie:Movie { title: 'The Matrix' })RETURN movie;

    Return the title and date of the matrix node:MATCH (movie:Movie { title: 'The Matrix' })RETURN movie.title, movie.year;

    Which results in:

    movie.title movie.year"The Matrix" "1999-03-31"

    1 row

    Show all actors:MATCH (actor:Actor)RETURN actor;

  • Getting started with Cypher

    25

    Return just the name, and order them by name:MATCH (actor:Actor)RETURN actor.nameORDER BY actor.name;

    Count the actors:MATCH (actor:Actor)RETURN count(*);

    Get only the actors whose names end with s:MATCH (actor:Actor)WHERE actor.name =~ ".*s$"RETURN actor.name;

    Heres some exploratory queries for unknown datasets. Dont do this on live production databases!

    Count nodes:MATCH (n)RETURN count(*);

    Count relationship types:MATCH (n)-[r]->()RETURN type(r), count(*);

    type(r) count(*)"ACTS_IN" 9

    1 row

    List all nodes and their relationships:MATCH (n)-[r]->(m)RETURN n AS FROM , r AS `->`, m AS to;

    from -> toNode[3]{name:"Keanu Reeves"} :ACTS_IN[0]{role:"Neo"} Node[0]{title:"The Matrix",

    year:"1999-03-31"}

    Node[3]{name:"Keanu Reeves"} :ACTS_IN[1]{role:"Neo"} Node[1]{title:"The MatrixReloaded", year:"2003-05-07"}

    Node[3]{name:"Keanu Reeves"} :ACTS_IN[2]{role:"Neo"} Node[2]{title:"The MatrixRevolutions", year:"2003-10-27"}

    Node[4]{name:"LaurenceFishburne"}

    :ACTS_IN[3]{role:"Morpheus"} Node[0]{title:"The Matrix", year:"1999-03-31"}

    Node[4]{name:"LaurenceFishburne"}

    :ACTS_IN[4]{role:"Morpheus"} Node[1]{title:"The MatrixReloaded", year:"2003-05-07"}

    Node[4]{name:"LaurenceFishburne"}

    :ACTS_IN[5]{role:"Morpheus"} Node[2]{title:"The MatrixRevolutions", year:"2003-10-27"}

    Node[5]{name:"Carrie-AnneMoss"}

    :ACTS_IN[6]{role:"Trinity"} Node[0]{title:"The Matrix", year:"1999-03-31"}

    9 rows

  • Getting started with Cypher

    26

    from -> toNode[5]{name:"Carrie-AnneMoss"}

    :ACTS_IN[7]{role:"Trinity"} Node[1]{title:"The MatrixReloaded", year:"2003-05-07"}

    Node[5]{name:"Carrie-AnneMoss"}

    :ACTS_IN[8]{role:"Trinity"} Node[2]{title:"The MatrixRevolutions", year:"2003-10-27"}

    9 rows

  • Getting started with Cypher

    27

    4.3.Social Movie DatabaseOur example graph consists of movies with title and year and actors with a name. Actors have ACTS_INrelationships to movies, which represents the role they played. This relationship also has a roleattribute.

    So far, we queried the movie data; now lets update the graph too.

    CREATE (matrix1:Movie { title : 'The Matrix', year : '1999-03-31' })CREATE (matrix2:Movie { title : 'The Matrix Reloaded', year : '2003-05-07' })CREATE (matrix3:Movie { title : 'The Matrix Revolutions', year : '2003-10-27' })CREATE (keanu:Actor { name:'Keanu Reeves' })CREATE (laurence:Actor { name:'Laurence Fishburne' })CREATE (carrieanne:Actor { name:'Carrie-Anne Moss' })CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix1)CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix2)CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix3)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix1)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix2)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix3)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix1)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix2)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix3)

    We will add ourselves, friends and movie ratings.

    Heres how to add a node for yourself and return it, lets say your name is Me:

    CREATE (me:User { name: "Me" })RETURN me;

    meNode[6]{name:"Me"}

    1 rowNodes created: 1Properties set: 1Labels added: 1

    Lets check if the node is there:

    MATCH (me:User { name: "Me" })RETURN me.name;

    Add a movie rating:

    MATCH (me:User { name: "Me" }),(movie:Movie { title: "The Matrix" })CREATE (me)-[:RATED { stars : 5, comment : "I love that movie!" }]->(movie);

    Which movies did I rate?

    MATCH (me:User { name: "Me" }),(me)-[rating:RATED]->(movie)RETURN movie.title, rating.stars, rating.comment;

    movie.title rating.stars rating.comment"The Matrix" 5 "I love that movie!"

    1 row

  • Getting started with Cypher

    28

    We need a friend!CREATE (friend:User { name: "A Friend" })RETURN friend;

    Add our friendship idempotently, so we can re-run the query without adding it several times. Wereturn the relationship to check that it has not been created several times.MATCH (me:User { name: "Me" }),(friend:User { name: "A Friend" })CREATE UNIQUE (me)-[friendship:FRIEND]->(friend)RETURN friendship;

    You can rerun the query, see that it doesnt change anything the second time!

    Lets update our friendship with a since property:MATCH (me:User { name: "Me" })-[friendship:FRIEND]->(friend:User { name: "A Friend" })SET friendship.since='forever'RETURN friendship;

    Lets pretend us being our friend and wanting to see which movies our friends have rated.MATCH (me:User { name: "A Friend" })-[:FRIEND]-(friend)-[rating:RATED]->(movie)RETURN movie.title, avg(rating.stars) AS stars, collect(rating.comment) AS comments, count(*);

    movie.title stars comments count(*)"The Matrix" 5. 0 ["I love that movie!"] 1

    1 row

    Thats too little data, lets add some more friends and friendships.MATCH (me:User { name: "Me" })FOREACH (i IN range(1,10)| CREATE (friend:User { name: "Friend " + i }),(me)-[:FRIEND]->(friend));

    Show all our friends:MATCH (me:User { name: "Me" })-[r:FRIEND]->(friend)RETURN type(r) AS friendship, friend.name;

    friendship friend.name"FRIEND" "A Friend"

    "FRIEND" "Friend 1"

    "FRIEND" "Friend 2"

    "FRIEND" "Friend 3"

    "FRIEND" "Friend 4"

    "FRIEND" "Friend 5"

    "FRIEND" "Friend 6"

    "FRIEND" "Friend 7"

    "FRIEND" "Friend 8"

    "FRIEND" "Friend 9"

    "FRIEND" "Friend 10"

    11 rows

  • Getting started with Cypher

    29

    4.4.Finding PathsOur example graph consists of movies with title and year and actors with a name. Actors have ACTS_INrelationships to movies, which represents the role they played. This relationship also has a roleattribute.We queried and updated the data so far, now lets find interesting constellations, a.k.a. paths.CREATE (matrix1:Movie { title : 'The Matrix', year : '1999-03-31' })CREATE (matrix2:Movie { title : 'The Matrix Reloaded', year : '2003-05-07' })CREATE (matrix3:Movie { title : 'The Matrix Revolutions', year : '2003-10-27' })CREATE (keanu:Actor { name:'Keanu Reeves' })CREATE (laurence:Actor { name:'Laurence Fishburne' })CREATE (carrieanne:Actor { name:'Carrie-Anne Moss' })CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix1)CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix2)CREATE (keanu)-[:ACTS_IN { role : 'Neo' }]->(matrix3)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix1)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix2)CREATE (laurence)-[:ACTS_IN { role : 'Morpheus' }]->(matrix3)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix1)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix2)CREATE (carrieanne)-[:ACTS_IN { role : 'Trinity' }]->(matrix3)

    All other movies that actors in The Matrix acted in ordered by occurrence:MATCH (:Movie { title: "The Matrix" })(movie)RETURN movie.title, count(*)ORDER BY count(*) DESC ;

    movie.title count(*)"The Matrix Revolutions" 3

    "The Matrix Reloaded" 3

    2 rows

    Lets see who acted in each of these movies:MATCH (:Movie { title: "The Matrix" })(movie)RETURN movie.title, collect(actor.name), count(*) AS countORDER BY count DESC ;

    movie.title collect(actor.name) count"The Matrix Revolutions" ["Keanu Reeves", "Laurence

    Fishburne", "Carrie-Anne Moss"]3

    "The Matrix Reloaded" ["Keanu Reeves", "LaurenceFishburne", "Carrie-Anne Moss"]

    3

    2 rows

    What about co-acting, that is actors that acted together:MATCH (:Movie { title: "The Matrix" })(movie)

  • Getting started with Cypher

    30

    actor.name collect(distinct colleague.name)"Laurence Fishburne" ["Keanu Reeves", "Carrie-Anne Moss"]

    "Keanu Reeves" ["Laurence Fishburne", "Carrie-Anne Moss"]

    3 rows

    Who of those other actors acted most often with anyone from the matrix cast?MATCH (:Movie { title: "The Matrix" })(movie)

  • Getting started with Cypher

    31

    p length(p)[Node[3]{name:"Keanu Reeves"}, :ACTS_IN[1]{role:"Neo"}, Node[1]{title:"The MatrixReloaded", year:"2003-05-07"}, :ACTS_IN[4]{role:"Morpheus"}, Node[4]{name:"LaurenceFishburne"}, :ACTS_IN[3]{role:"Morpheus"}, Node[0]{title:"The Matrix", year:"1999-03-31"}, :ACTS_IN[6]{role:"Trinity"}, Node[5]{name:"Carrie-Anne Moss"}]

    4

    [Node[3]{name:"Keanu Reeves"}, :ACTS_IN[1]{role:"Neo"}, Node[1]{title:"The MatrixReloaded", year:"2003-05-07"}, :ACTS_IN[4]{role:"Morpheus"}, Node[4]{name:"LaurenceFishburne"}, :ACTS_IN[5]{role:"Morpheus"}, Node[2]{title:"The Matrix Revolutions", year:"2003-10-27"}, :ACTS_IN[8]{role:"Trinity"}, Node[5]{name:"Carrie-Anne Moss"}]

    4

    [Node[3]{name:"Keanu Reeves"}, :ACTS_IN[1]{role:"Neo"}, Node[1]{title:"The MatrixReloaded", year:"2003-05-07"}, :ACTS_IN[7]{role:"Trinity"}, Node[5]{name:"Carrie-AnneMoss"}]

    2

    [Node[3]{name:"Keanu Reeves"}, :ACTS_IN[2]{role:"Neo"}, Node[2]{title:"The MatrixRevolutions", year:"2003-10-27"}, :ACTS_IN[5]{role:"Morpheus"}, Node[4]{name:"LaurenceFishburne"}, :ACTS_IN[3]{role:"Morpheus"}, Node[0]{title:"The Matrix", year:"1999-03-31"}, :ACTS_IN[6]{role:"Trinity"}, Node[5]{name:"Carrie-Anne Moss"}]

    4

    [Node[3]{name:"Keanu Reeves"}, :ACTS_IN[2]{role:"Neo"}, Node[2]{title:"The MatrixRevolutions", year:"2003-10-27"}, :ACTS_IN[5]{role:"Morpheus"}, Node[4]{name:"LaurenceFishburne"}, :ACTS_IN[4]{role:"Morpheus"}, Node[1]{title:"The Matrix Reloaded", year:"2003-05-07"}, :ACTS_IN[7]{role:"Trinity"}, Node[5]{name:"Carrie-Anne Moss"}]

    4

    [Node[3]{name:"Keanu Reeves"}, :ACTS_IN[2]{role:"Neo"}, Node[2]{title:"The MatrixRevolutions", year:"2003-10-27"}, :ACTS_IN[8]{role:"Trinity"}, Node[5]{name:"Carrie-AnneMoss"}]

    2

    9 rows

    Bur thats a lot of data, we just want to look at the names and titles of the nodes of the path.MATCH p =(:Actor { name: "Keanu Reeves" })-[:ACTS_IN*0..5]-(:Actor { name: "Carrie-Anne Moss" })

  • Getting started with Cypher

    32

    RETURN extract(n IN nodes(p)| coalesce(n.title,n.name)) AS `names AND titles`, length(p)ORDER BY length(p)LIMIT 10;

    names and titles length(p)["Keanu Reeves", "The Matrix", "Carrie-AnneMoss"]

    2

    ["Keanu Reeves", "The Matrix Reloaded", "Carrie-Anne Moss"]

    2

    ["Keanu Reeves", "The MatrixRevolutions", "Carrie-Anne Moss"]

    2

    ["Keanu Reeves", "The Matrix", "LaurenceFishburne", "The Matrix Reloaded", "Carrie-AnneMoss"]

    4

    ["Keanu Reeves", "The Matrix", "LaurenceFishburne", "The Matrix Revolutions", "Carrie-Anne Moss"]

    4

    ["Keanu Reeves", "The Matrix Reloaded", "LaurenceFishburne", "The Matrix", "Carrie-Anne Moss"]

    4

    ["Keanu Reeves", "The Matrix Reloaded", "LaurenceFishburne", "The Matrix Revolutions", "Carrie-Anne Moss"]

    4

    ["Keanu Reeves", "The MatrixRevolutions", "Laurence Fishburne", "TheMatrix", "Carrie-Anne Moss"]

    4

    ["Keanu Reeves", "The MatrixRevolutions", "Laurence Fishburne", "The MatrixReloaded", "Carrie-Anne Moss"]

    4

    9 rows

  • Getting started with Cypher

    33

    4.5.Labels, Constraints and IndexesLabels are a convenient way to group nodes together. They are used to restrict queries, defineconstraints and create indexes.

    The following will give an example of how to use labels. Lets start out adding a constraint in thiscase we decided that all Movie node titles should be unique.CREATE CONSTRAINT ON (movie:Movie) ASSERT movie.title IS UNIQUE

    Note that adding the unique constraint will add an index on that property, so we wont do thatseparately. If we drop the constraint, we will have to add an index instead, as needed.

    In this case we want an index to speed up finding actors by name in the database:CREATE INDEX ON :Actor(name)

    Indexes can be added at any time. Constraints can be added after a label is already in use, but thatrequires that the existing data complies with the constraints. Note that it will take some time for anindex to come online when theres existing data.

    Now, lets add some data.CREATE (actor:Actor { name:"Tom Hanks" }),(movie:Movie { title:'Sleepless IN Seattle' }), (actor)-[:ACTED_IN]->(movie);

    Normally you dont specify indexes when querying for data. They will be used automatically. Thismeans we can simply look up the Tom Hanks node, and the index will kick in behind the scenes toboost performance.MATCH (actor:Actor { name: "Tom Hanks" })RETURN actor;

    Now lets say we want to add another label for a node. Heres how to do that:MATCH (actor:Actor { name: "Tom Hanks" })SET actor :American;

    To remove a label from nodes, this is what to do:MATCH (actor:Actor { name: "Tom Hanks" })REMOVE actor:American;

    For more information on labels and related topics, see:

    Section3.4, Labels Chapter14, Schema Section14.2, Constraints Section14.1, Indexes Section9.7, Using Section11.4, Set Section11.6, Remove

  • 34

    Chapter5.Data Modeling Examples

    The following chapters contain simplified examples of how different domains can be modeled usingNeo4j. The aim is not to give full examples, but to suggest possible ways to think using nodes,relationships, graph patterns and data locality in traversals.

    The examples use Cypher queries a lot, read PartIII, Cypher Query Language for more information.

  • Data Modeling Examples

    35

    5.1.Linked ListsA powerful feature of using a graph database, is that you can create your own in-graph datastructures for example a linked list.

    This data structure uses a single node as the list reference. The reference has an outgoing relationshipto the head of the list, and an incoming relationship from the last element of the list. If the list isempty, the reference will point to itself.

    To make it clear what happens, we will show how the graph looks after each query.

    To initialize an empty linked list, we simply create a node, and make it link to itself. Unlike the actuallist elements, it doesnt have a value property.

    CREATE (root { name: 'ROOT' })-[:LINK]->(root)RETURN root

    nam e = 'ROOT' LINK

    Adding values is done by finding the relationship where the new value should be placed in, andreplacing it with a new node, and two relationships to it. We also have to handle the fact that thebefore and after nodes could be the same as the root node. The case where before, after and the rootnode are all the same, makes it necessary to use CREATE UNIQUE to not create two new value nodes bymistake.

    MATCH (root)-[:LINK*0..]->(before),(after)-[:LINK*0..]->(root),(before)-[old:LINK]->(after)WHERE root.name = 'ROOT' AND (before.value < 25 OR before = root) AND (25 < after.value OR after = root)CREATE UNIQUE (before)-[:LINK]->({ value:25 })-[:LINK]->(after)DELETE old

    nam e = 'ROOT'

    value = 25

    LINK LINK

    Lets add one more value:

    MATCH (root)-[:LINK*0..]->(before),(after)-[:LINK*0..]->(root),(before)-[old:LINK]->(after)WHERE root.name = 'ROOT' AND (before.value < 10 OR before = root) AND (10 < after.value OR after = root)CREATE UNIQUE (before)-[:LINK]->({ value:10 })-[:LINK]->(after)DELETE old

  • Data Modeling Examples

    36

    nam e = 'ROOT'

    value = 10

    LINK

    value = 25

    LINK

    LINK

    Deleting a value, conversely, is done by finding the node with the value, and the two relationshipsgoing in and out from it, and replacing the relationships with a new one.MATCH (root)-[:LINK*0..]->(before),(before)-[delBefore:LINK]->(del)-[delAfter:LINK]->(after), (after)-[:LINK*0..]->(root)WHERE root.name = 'ROOT' AND del.value = 10CREATE UNIQUE (before)-[:LINK]->(after)DELETE del, delBefore, delAfter

    nam e = 'ROOT'

    value = 25

    LINK LINK

    Deleting the last value node is what requires us to use CREATE UNIQUE when replacing the relationships.Otherwise, we would end up with two relationships from the root node to itself, as both before andafter nodes are equal to the root node, meaning the pattern would match twice.MATCH (root)-[:LINK*0..]->(before),(before)-[delBefore:LINK]->(del)-[delAfter:LINK]->(after), (after)-[:LINK*0..]->(root)WHERE root.name = 'ROOT' AND del.value = 25CREATE UNIQUE (before)-[:LINK]->(after)DELETE del, delBefore, delAfter

    nam e = 'ROOT' LINK

  • Data Modeling Examples

    37

    5.2.TV ShowsThis example show how TV Shows with Seasons, Episodes, Characters, Actors, Users and Reviewscan be modeled in a graph database.

    5.2.1.Data ModelLets start out with an entity-relationship model of the domain at hand:

    TV Show

    Season

    has

    Episode

    has

    Review

    has

    Character

    featured

    User

    wrote

    Actor

    played

    To implement this in Neo4j well use the following relationship types:

    Relationship Type DescriptionHAS_SEASON Connects a show with its seasons.HAS_EPISODE Connects a season with its episodes.FEATURED_CHARACTER Connects an episode with its characters.PLAYED_CHARACTER Connects actors with characters. Note that an

    actor can play multiple characters in an episode,and that the same character can be played bymultiple actors as well.

    HAS_REVIEW Connects an episode with its reviews.WROTE_REVIEW Connects users with reviews they contributed.

    5.2.2.Sample DataLets create some data and see how the domain plays out in practice:

    CREATE (himym:TVShow { name: "How I Met Your Mother" })

  • Data Modeling Examples

    38

    CREATE (himym_s1:Season { name: "HIMYM Season 1" })CREATE (himym_s1_e1:Episode { name: "Pilot" })CREATE (ted:Character { name: "Ted Mosby" })CREATE (joshRadnor:Actor { name: "Josh Radnor" })CREATE UNIQUE (joshRadnor)-[:PLAYED_CHARACTER]->(ted)CREATE UNIQUE (himym)-[:HAS_SEASON]->(himym_s1)CREATE UNIQUE (himym_s1)-[:HAS_EPISODE]->(himym_s1_e1)CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(ted)CREATE (himym_s1_e1_review1 { title: "Meet Me At The Bar In 15 Minutes & Suit Up", content: "It was awesome" })CREATE (wakenPayne:User { name: "WakenPayne" })CREATE (wakenPayne)-[:WROTE_REVIEW]->(himym_s1_e1_review1)(himym_s1)-[:HAS_EPISODE]->(himym_s1_e1)CREATE (marshall:Character { name: "Marshall Eriksen" })CREATE (robin:Character { name: "Robin Scherbatsky" })CREATE (barney:Character { name: "Barney Stinson" })CREATE (lily:Character { name: "Lily Aldrin" })CREATE (jasonSegel:Actor { name: "Jason Segel" })CREATE (cobieSmulders:Actor { name: "Cobie Smulders" })CREATE (neilPatrickHarris:Actor { name: "Neil Patrick Harris" })CREATE (alysonHannigan:Actor { name: "Alyson Hannigan" })CREATE UNIQUE (jasonSegel)-[:PLAYED_CHARACTER]->(marshall)CREATE UNIQUE (cobieSmulders)-[:PLAYED_CHARACTER]->(robin)CREATE UNIQUE (neilPatrickHarris)-[:PLAYED_CHARACTER]->(barney)CREATE UNIQUE (alysonHannigan)-[:PLAYED_CHARACTER]->(lily)

  • Data Modeling Examples

    39

    CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(marshall)CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(robin)CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(barney)CREATE UNIQUE (himym_s1_e1)-[:FEATURED_CHARACTER]->(lily)CREATE (himym_s1_e1_review2 { title: "What a great pilot for a show :)", content: "The humour is great." })CREATE (atlasredux:User { name: "atlasredux" })CREATE (atlasredux)-[:WROTE_REVIEW]->(himym_s1_e1_review2)(season)-[:HAS_EPISODE]->(episode)WHERE tvShow.name = "How I Met Your Mother"RETURN season.name, episode.name

    season.name episode.name"HIMYM Season 1" "Pilot"

    1 row

    We could also grab the reviews if there are any by slightly tweaking the query:MATCH (tvShow:TVShow)-[:HAS_SEASON]->(season)-[:HAS_EPISODE]->(episode)WHERE tvShow.name = "How I Met Your Mother"WITH season, episodeOPTIONAL MATCH (episode)-[:HAS_REVIEW]->(review)RETURN season.name, episode.name, review

    season.name episode.name review"HIMYM Season 1" "Pilot" Node[5]{title:"Meet Me At The

    Bar In 15 Minutes & Suit Up", content:"It was awesome"}

    "HIMYM Season 1" "Pilot" Node[15]{title:"What agreat pilot for a show :)", content:"The humour is great. "}

    2 rows

    Now lets list the characters featured in a show. Note that in this query we only put identifiers on thenodes we actually use later on. The other nodes of the path pattern are designated by ().MATCH (tvShow:TVShow)-[:HAS_SEASON]->()-[:HAS_EPISODE]->()-[:FEATURED_CHARACTER]->(character)WHERE tvShow.name = "How I Met Your Mother"RETURN DISTINCT character.name

    character.name"Ted Mosby"

    "Marshall Eriksen"

    "Robin Scherbatsky"

    "Barney Stinson"

    5 rows

  • Data Modeling Examples

    40

    character.name"Lily Aldrin"

    5 rows

    Now lets look at how to get all cast members of a show.MATCH (tvShow:TVShow)-[:HAS_SEASON]->()-[:HAS_EPISODE]->(episode)-[:FEATURED_CHARACTER]->()(er_s9)CREATE UNIQUE (er_s9)-[:HAS_EPISODE]->(er_s9_e17)WITH er_s9_e17MATCH (actor:Actor),(episode:Episode)WHERE actor.name = "Josh Radnor" AND episode.name = "Peter's Progress"WITH actor, episodeCREATE (keith:Character { name: "Keith" })CREATE UNIQUE (actor)-[:PLAYED_CHARACTER]->(keith)CREATE UNIQUE (episode)-[:FEATURED_CHARACTER]->(keith)

    And now well create a query to find the episodes that he has appeared in:MATCH (actor:Actor)-[:PLAYED_CHARACTER]->(character)(character)

  • Data Modeling Examples

    41

    character.name AS Character

    Show Season Episode Character"How I Met Your Mother" "HIMYM Season 1" "Pilot" "Ted Mosby"

    "ER" "ER S7" "Peter's Progress" "Keith"

    2 rows

  • Data Modeling Examples

    42

    5.3.ACL structures in graphsThis example gives a generic overview of an approach to handling Access Control Lists (ACLs) ingraphs, and a simplified example with concrete queries.

    5.3.1.Generic approachIn many scenarios, an application needs to handle security on some form of managed objects. Thisexample describes one pattern to handle this through the use of a graph structure and traversersthat build a full permissions-structure for any managed object with exclude and include overridingpossibilities. This results in a dynamic construction of ACLs based on the position and context of themanaged object.

    The result is a complex security scheme that can easily be implemented in a graph structure,supporting permissions overriding, principal and content composition, without duplicating dataanywhere.

    TechniqueAs seen in the example graph layout, there are some key concepts in this domain model:

    The managed content (folders and files) that are connected by HAS_CHILD_CONTENT relationships The Principal subtree pointing out principals that can act as ACL members, pointed out by the

    PRINCIPAL relationships. The aggregation of principals into groups, connected by the IS_MEMBER_OF relationship. One principal

    (user or group) can be part of many groups at the same time. The SECURITY relationships, connecting the content composite structure to the principal composite

    structure, containing a addition/removal modifier property ("+RW").

  • Data Modeling Examples

    43

    Constructing the ACLThe calculation of the effective permissions (e.g. Read, Write, Execute) for a principal for any givenACL-managed node (content) follows a number of rules that will be encoded into the permissions-traversal:

    Top-down-TraversalThis approach will let you define a generic permission pattern on the root content, and then refine thatfor specific sub-content nodes and specific principals.

    1. Start at the content node in question traverse upwards to the content root node to determine thepath to it.

    2. Start with a effective optimistic permissions list of "all permitted" (111 in a bit encodedReadWriteExecute case) or 000 if you like pessimistic security handling (everything is forbiddenunless explicitly allowed).

    3. Beginning from the topmost content node, look for any SECURITY relationships on it.4. If found, look if the principal in question is part of the end-principal of the SECURITY relationship.5. If yes, add the "+" permission modifiers to the existing permission pattern, revoke the "-"

    permission modifiers from the pattern.6. If two principal nodes link to the same content node, first apply the more generic prinipals

    modifiers.7. Repeat the security modifier search all the way down to the target content node, thus overriding

    more generic permissions with the set on nodes closer to the target node.

    The same algorithm is applicable for the bottom-up approach, basically just traversing from the targetcontent node upwards and applying the security modifiers dynamically as the traverser goes up.

    ExampleNow, to get the resulting access rights for e.g. "user 1" on the "My File.pdf" in a Top-Down approachon the model in the graph above would go like:

    1. Traveling upward, we start with "Root folder", and set the permissions to 11 initially (onlyconsidering Read, Write).

    2. There are two SECURITY relationships to that folder. User 1 is contained in both of them, but "root"is more generic, so apply it first then "All principals" +W +R 11.

    3. "Home" has no SECURITY instructions, continue.4. "user1 Home" has SECURITY. First apply "Regular Users" (-R -W) 00, Then "user 1" (+R +W) 11.5. The target node "My File.pdf" has no SECURITY modifiers on it, so the effective permissions for "User

    1" on "My File.pdf" are ReadWrite 11.

    5.3.2.Read-permission exampleIn this example, we are going to examine a tree structure of directories and files. Also, there areusers that own files and roles that can be assigned to users. Roles can have permissions on directory orfiles structures (here we model only canRead, as opposed to full rwx Unix permissions) and be nested.A more thorough example of modeling ACL structures can be found at How to Build Role-BasedAccess Control in SQL .

  • Data Modeling Examples

    44

    Node[20]'nam e' = 'Hom eU1'

    Node[17]'nam e' = 'File1'

    leaf

    Node[23]'nam e' = 'Desktop'

    Node[16]'nam e' = 'File2'

    leaf

    Node[10]'nam e' = 'Hom e'

    contains

    Node[15]'nam e' = 'Hom eU2'

    contains

    contains

    Node[11]'nam e' = ' init .d'

    Node[12]'nam e' = 'etc'

    contains

    Node[18]'nam e' = 'FileRoot '

    contains contains

    Node[7]'nam e' = 'User'

    Node[14]'nam e' = 'User1'

    m em ber

    Node[13]'nam e' = 'User2'

    m em ber

    owns

    owns

    Node[8]'nam e' = 'Adm in2'

    Node[9]'nam e' = 'Adm in1'

    Node[21]'nam e' = 'Role'

    subRole

    Node[22]'nam e' = 'SUDOers'

    subRole

    canReadm em ber m em ber

    Node[19]'nam e' = 'Root '

    has

    has

    Find all files in the directory structureIn order to find all files contained in this structure, we need a variable length query that follows allcontains relationships and retrieves the nodes at the other end of the leaf relationships.

    MATCH ({ name: 'FileRoot' })-[:contains*0..]->(parentDir)-[:leaf]->(file)RETURN file

    resulting in:

    fileNode[10]{name:"File1"}

    Node[9]{name:"File2"}

    2 rows

    What files are owned by whom?If we introduce the concept of ownership on files, we then can ask for the owners of the files wefind connected via owns relationships to file nodes.

    MATCH ({ name: 'FileRoot' })-[:contains*0..]->()-[:leaf]->(file)

  • Data Modeling Examples

    45

    Who has access to a File?If we now want to check what users have read access to all Files, and define our ACL as

    The root directory has no access granted. Any user having a role that has been granted canRead access to one of the parent folders of a File has

    read access.

    In order to find users that can read any part of the parent folder hierarchy above the files, Cypherprovides optional variable length path.MATCH (file)

  • Data Modeling Examples

    46

    5.4.HyperedgesImagine a user being part of different groups. A group can have different roles, and a user can be partof different groups. He also can have different roles in different groups apart from the membership.The association of a User, a Group and a Role can be referred to as a HyperEdge. However, it can beeasily modeled in a property graph as a node that captures this n-ary relationship, as depicted below inthe U1G2R1 node.

    Figure5.1.Graph

    nam e = 'U1G2R1'

    nam e = 'Role1'

    hasRole nam e = 'Group2'

    hasGroup

    nam e = 'Role'

    isA

    canHave

    nam e = 'Role2'

    canHave

    nam e = 'Group'

    isA

    isA

    nam e = 'Group1'

    canHave canHaveisA

    nam e = 'User1'

    hasRoleInGroup

    in in nam e = 'U1G1R2'

    hasRoleInGroup

    hasRole

    hasGroup

    5.4.1.Find GroupsTo find out in what roles a user is for a particular groups (here Group2), the following query cantraverse this HyperEdge node and provide answers.

    Query.

    MATCH ({ name: 'User1' })-[:hasRoleInGroup]->(hyperEdge)-[:hasGroup]->({ name: 'Group2' }), (hyperEdge)-[:hasRole]->(role)RETURN role.name

    The role of User1 is returned:

  • Data Modeling Examples

    47

    Resultrole.name"Role1"

    1 row

    5.4.2.Find all groups and roles for a userHere, find all groups and the roles a user has, sorted by the name of the role.

    Query.MATCH ({ name: 'User1' })-[:hasRoleInGroup]->(hyperEdge)-[:hasGroup]->(group), (hyperEdge)-[:hasRole]->(role)RETURN role.name, group.nameORDER BY role.name ASC

    The groups and roles of User1 are returned:

    Resultrole.name group.name"Role1" "Group2"

    "Role2" "Group1"

    2 rows

    5.4.3.Find common groups based on shared rolesAssume a more complicated graph:

    1. Two user nodes User1, User2.2. User1 is in Group1, Group2, Group3.3. User1 has Role1, Role2 in Group1; Role2, Role3 in Group2; Role3, Role4 in Group3 (hyper edges).4. User2 is in Group1, Group2, Group3.5. User2 has Role2, Role5 in Group1; Role3, Role4 in Group2; Role5, Role6 in Group3 (hyper edges).

    The graph for this looks like the following (nodes like U1G2R23 representing the HyperEdges):

    Figure5.2.Graph

    nam e = 'U2G2R34'

    nam e = 'Group2'

    hasGroup

    nam e = 'Role3'

    hasRole

    nam e = 'Role4'

    hasRole

    nam e = 'U1G3R34'

    hasRole hasRole

    nam e = 'Group3'

    hasGroup

    nam e = 'User2'

    hasRoleInGroup

    nam e = 'U2G1R25'

    hasRoleInGroup

    nam e = 'U2G3R56'

    hasRoleInGroup

    nam e = 'Role2'

    hasRole

    nam e = 'Role5'

    hasRole

    nam e = 'Group1'

    hasGrouphasGroup

    nam e = 'Role6'

    hasRole hasRole

    nam e = 'User1'

    hasRoleInGroup

    nam e = 'U1G1R12'

    hasRoleInGroup

    nam e = 'U1G2R23'

    hasRoleInGroup

    hasRole hasGroup

    nam e = 'Role1'

    hasRolehasGroup hasRole hasRole

    To return Group1 and Group2 as User1 and User2 share at least one common role in these two groups, thequery looks like this:

    Query.MATCH (u1)-[:hasRoleInGroup]->(hyperEdge1)-[:hasGroup]->(group),(hyperEdge1)-[:hasRole]->(role), (u2)-[:hasRoleInGroup]->(hyperEdge2)-[:hasGroup]->(group),(hyperEdge2)-[:hasRole]->(role)WHERE u1.name = 'User1' AND u2.name = 'User2'RETURN group.name, count(role)

  • Data Modeling Examples

    48

    ORDER BY group.name ASC

    The groups where User1 and User2 share at least one common role:

    Resultgroup.name count(role)"Group1" 1

    "Group2" 1

    2 rows

  • Data Modeling Examples

    49

    5.5.Basic friend finding based on social neighborhoodImagine an example graph like the following one:

    Figure5.3.Graph

    nam e = 'Bill'

    nam e = 'Derrick'

    knows

    nam e = 'Ian'

    knows

    nam e = 'Sara'

    knows

    knows nam e = 'Jill'

    knows

    nam e = 'Joe'

    knows

    knows

    To find out the friends of Joes friends that are not already his friends, the query looks like this:

    Query.MATCH (joe { name: 'Joe' })-[:knows*2..2]-(friend_of_friend)WHERE NOT (joe)-[:knows]-(friend_of_friend)RETURN friend_of_friend.name, COUNT(*)ORDER BY COUNT(*) DESC , friend_of_friend.name

    This returns a list of friends-of-friends ordered by the number of connections to them, and secondly bytheir name.

    Resultfriend_of_friend.name COUNT(*)"Ian" 2

    "Derrick" 1

    "Jill" 1

    3 rows

  • Data Modeling Examples

    50

    5.6.Co-favorited placesFigure5.4.Graph

    nam e = 'SaunaX' nam e = 'CoffeeShop1'

    nam e = 'Cool'

    tagged

    nam e = 'Cosy'

    tagged

    nam e = 'MelsPlace'

    taggedtagged

    nam e = 'CoffeeShop3'

    tagged

    nam e = 'CoffeeShop2'

    tagged

    nam e = 'CoffeShop2'

    nam e = 'Jill'

    favorite favorite favorite

    nam e = 'Joe'

    favorite favorite favorite

    5.6.1.Co-favorited places users who like x also like yFind places that people also like who favorite this place:

    Determine who has favorited place x. What else have they favorited that is not place x.

    Query.MATCH (place)(stuff)WHERE place.name = 'CoffeeShop1'RETURN stuff.name, count(*)ORDER BY count(*) DESC , stuff.name

    The list of places that are favorited by people that favorited the start place.

    Resultstuff.name count(*)"MelsPlace" 2

    "CoffeShop2" 1

    "SaunaX" 1

    3 rows

    5.6.2.Co-Tagged places places related through tagsFind places that are tagged with the same tags:

    Determine the tags for place x. What else is tagged the same as x that is not x.

    Query.MATCH (place)-[:tagged]->(tag)

  • Data Modeling Examples

    51

    ResultotherPlace.name collect(tag.name)"MelsPlace" ["Cool", "Cosy"]

    "CoffeeShop2" ["Cool"]

    "CoffeeShop3" ["Cosy"]

    3 rows

  • Data Modeling Examples

    52

    5.7.Find people based on similar favoritesFigure5.5.Graph

    nam e = 'Sara'

    nam e = 'Cats'

    favorite

    nam e = 'Bikes'

    favorite

    nam e = 'Derrick'

    favoritefavorite

    nam e = 'Jill'

    favorite

    nam e = 'Joe'

    friend

    favoritefavorite

    To find out the possible new friends based on them liking similar things as the asking person, use aquery like this:

    Query.MATCH (me { name: 'Joe' })-[:favorite]->(stuff)

  • Data Modeling Examples

    53

    5.8.Find people based on mutual friends and groupsFigure5.6.Graph

    Node[0]nam e = 'Bill'

    Node[1]nam e = 'Group1'

    m em ber_of_group

    Node[2]nam e = 'Bob'

    m em ber_of_group

    Node[3]nam e = 'Jill'

    knows

    m em ber_of_group

    Node[4]nam e = 'Joe'

    knows

    m em ber_of_group

    In this scenario, the problem is to determine mutual friends and groups, if any, between persons. If nomutual groups or friends are found, there should be a 0 returned.

    Query.MATCH (me { name: 'Joe' }),(other)WHERE other.name IN ['Jill', 'Bob']OPTIONAL MATCH pGroups=(me)-[:member_of_group]->(mg)(mf)

  • Data Modeling Examples

    54

    5.9.Find friends based on similar taggingFigure5.7.Graph

    nam e = 'Anim als' nam e = 'Hobby'

    nam e = 'Surfing'

    tagged

    nam e = 'Sara'

    nam e = 'Bikes'

    favorite

    nam e = 'Horses'

    favorite

    taggedtagged

    nam e = 'Cats'

    tagged

    nam e = 'Derrick'

    favorite

    nam e = 'Joe'

    favorite favoritefavoritefavorite

    To find people similar to me based on the taggings of their favorited items, one approach could be:

    Determine the tags associated with what I favorite. What else is tagged with those tags? Who favorites items tagged with the same tags? Sort the result by how many of the same things these people like.

    Query.MATCH (me)-[:favorite]->(myFavorites)-[:tagged]->(tag)

  • Data Modeling Examples

    55

    5.10.Multirelational (social) graphsFigure5.8.Graph

    nam e = 'cats'

    nam e = 'nature'

    nam e = 'Ben'

    nam e = 'Sara'LIKES

    FOLLOWS

    nam e = 'Joe'

    FOLLOWS

    nam e = 'bikes'

    LIKES

    nam e = 'cars'

    LIKES

    LIKES

    FOLLOWS

    LIKES

    nam e = 'Maria'

    LOVESFOLLOWSFOLLOWSLOVES

    LIKES

    This example shows a multi-relational network between persons and things they like. A multi-relational graph is a graph with more than one kind of relationship between nodes.

    Query.MATCH (me { name: 'Joe' })-[r1:FOLLOWS|:LOVES]->(other)-[r2]->(me)WHERE type(r1)=type(r2)RETURN other.name, type(r1)

    The query returns people that FOLLOWS or LOVES Joe back.

    Resultother.name type(r1)"Sara" "FOLLOWS"

    "Maria" "FOLLOWS"

    "Maria" "LOVES"

    3 rows

  • Data Modeling Examples

    56

    5.11.Implementing newsfeeds in a graph

    nam e = 'Bob'

    nam e = 'bob_s1'text = 'bobs status1'date = 1

    STATUS

    nam e = 'Alice'

    FRIENDstatus = 'CONFIRMED'

    nam e = 'bob_s2'text = 'bobs status2'date = 4

    NEXT

    nam e = 'alice_s1'text = 'Alices status1'date = 2

    STATUS

    nam e = 'Joe'

    FRIENDstatus = 'PENDING'

    nam e = 'alice_s2'text = 'Alices status2'date = 5

    NEXT

    FRIENDstatus = 'CONFIRMED'

    nam e = ' joe_s1'text = 'Joe status1'date = 3

    STATUS

    nam e = ' joe_s2'text = 'Joe status2'date = 6

    NEXT

    Implementation of newsfeed or timeline feature is a frequent requirement for social applications. Thefollowing exmaples are inspired by Newsfeed feature powered by Neo4j Graph Database . The query asked here is:

    Starting at me, retrieve the time-ordered status feed of the status updates of me and and all friends thatare connected via a CONFIRMED FRIEND relationship to me.

    Query.MATCH (me { name: 'Joe' })-[rels:FRIEND*0..1]-(myfriend)WHERE ALL (r IN rels WHERE r.status = 'CONFIRMED')WITH myfriendMATCH (myfriend)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)RETURN myfriend.name AS name, statusupdates.date AS date, statusupdates.text AS textORDER BY statusupdates.date DESC LIMIT 3

    To understand the strategy, lets divide the query into five steps:

    1. First Get the list of all my friends (along with me) through FRIEND relationship (MATCH (me {name:'Joe'})-[rels:FRIEND*0..1]-(myfriend)). Also, the WHERE predicate can be added to check whetherthe friend request is pending or confirmed.

  • Data Modeling Examples

    57

    2. Get the latest status update of my friends through Status relationship (MATCH myfriend-[:STATUS]-latestupdate).

    3. Get subsequent status updates (along with the latest one) of my friends through NEXT relationships(MATCH (myfriend)-[:STATUS]-(latestupdate)-[:NEXT*0..1]-(statusupdates)) which will give you thelatest and one additional statusupdate; adjust 0..1 to whatever suits your case.

    4. Sort the status updates by posted date (ORDER BY statusupdates.date DESC).5. LIMIT the number of updates you need in every query (LIMIT 3).

    Result

    name date text"Joe" 6 "Joe status2"

    "Bob" 4 "bobs status2"

    "Joe" 3 "Joe status1"

    3 rows

    Here, the example shows how to add a new status update into the existing data for a user.

    Query.

    MATCH (me)WHERE me.name='Bob'OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)DELETE rCREATE (me)-[:STATUS]->(latest_update { text:'Status',date:123 })WITH latest_update, collect(secondlatestupdate) AS secondsFOREACH (x IN seconds | CREATE latest_update-[:NEXT]->x)RETURN latest_update.text AS new_status

    Dividing the query into steps, this query resembles adding new item in middle of a doubly linked list:

    1. Get the latest update (if it exists) of the user through the STATUS relationship (OPTIONAL MATCH (me)-[r:STATUS]-(secondlatestupdate)).

    2. Delete the STATUS relationship between user and secondlatestupdate (if it exists), as this wouldbecome the second latest update now and only the latest update would be added through a STATUSrelationship; all earlier updates would be connected to their subsequent updates through a NEXTrelationship. (DELETE r).

    3. Now, create the new statusupdate node (with text and date as properties) and connectthis with the user through a STATUS relationship (CREATE me-[:STATUS]->(latest_update{ text:'Status',date:123 })).

    4. Pipe over statusupdate or an empty collection to the next query part (WITH latest_update,collect(secondlatestupdate) AS seconds).

    5. Now, create a NEXT relationship between the latest status update and the second latest status update(if it exists) (FOREACH(x in seconds | CREATE latest_update-[:NEXT]->x)).

  • Data Modeling Examples

    58

    Resultnew_status"Status"

    1 rowNodes created: 1Relationships created: 2Properties set: 2Relationships deleted: 1

    Node[0]nam e = 'Bob'

    Node[1]nam e = 'bob_s1'text = 'bobs status1'date = 1

    STATUS

    Node[2]nam e = 'bob_s2'text = 'bobs status2'date = 4

    NEXT

  • Data Modeling Examples

    59

    5.12.Boosting recommendation resultsFigure5.9.Graph

    nam e = 'Clark Kent '

    nam e = 'Lois Lane'

    KNOWSweight = 4

    nam e = 'Jim m y Olsen'

    KNOWSweight = 4

    nam e = 'Daily Planet '

    WORKS_ATweight = 2act ivity = 45

    WORKS_ATweight = 2act ivity = 56

    nam e = 'Perry White'

    KNOWSweight = 4

    nam e = 'Anderson Cooper'

    KNOWSweight = 4

    WORKS_ATweight = 2act ivity = 10

    KNOWSweight = 4

    WORKS_ATweight = 2act ivity = 6

    nam e = 'CNN'

    WORKS_ATweight = 2act ivity = 3

    WORKS_ATweight = 2act ivity = 2

    This query finds the recommended friends for the origin that are working at the same place as theorigin, or know a person that the origin knows, also, the origin should not already know the target.This recommendation is weighted for the weight of the relationship r2, and boosted with a factor of 2,if there is an activity-property on that relationship

    Query.MATCH (origin)-[r1:KNOWS|WORKS_AT]-(c)-[r2:KNOWS|WORKS_AT]-(candidate)WHERE origin.name = "Clark Kent" AND type(r1)=type(r2) AND NOT (origin)-[:KNOWS]-(candidate)RETURN origin.name AS origin, candidate.name AS candidate, SUM(ROUND(r2.weight +(COALESCE(r2.activity, 0)* 2))) AS boostORDER BY boost DESC LIMIT 10

    This returns the recommended friends for the origin nodes and their recommendation score.

    Resultorigin candidate boost"Clark Kent" "Perry White" 22. 0

    "Clark Kent" "Anderson Cooper" 4. 0

    2 rows

  • Data Modeling Examples

    60

    5.13.Calculating the clustering coefficient of a networkFigure5.10.Graph

    nam e = 'startnode'

    KNOWS KNOWS

    KNOWS

    KNOWS

    KNOWS KNOWS KNOWS

    In this example, adapted from Niko Gamulins blog post on Neo4j for Social Network Analysis, the graph inquestion is showing the 2-hop relationships of a sample person as nodes with KNOWS relationships.

    The clustering coefficient of a selected nodeis defined as the probability that two randomly selected neighbors are connected to each other. Withthe number of neighbors as n and the number of mutual connections between the neighbors r thecalculation is:

    The number of possible connections between two neighbors is n!/(2!(n-2)!) = 4!/(2!(4-2)!) = 24/4 =6, where n is the number of neighbors n = 4 and the actual number r of connections is 1. Therefore theclustering coefficient of node 1 is 1/6.

    n and r are quite simple to retrieve via the following query:

    Query.MATCH (a { name: "startnode" })--(b)WITH a, count(DISTINCT b) AS nMATCH (a)--()-[r]-()--(a)RETURN n, count(DISTINCT r) AS r

    This returns n and r for the above calculations.

    Resultn r4 1

    1 row

  • Data Modeling Examples

    61

    5.14.Pretty graphsThis section is showing how to create some of the named pretty graphs on Wikipedia .

    5.14.1.Star graphThe graph is created by first creating a center node, and then once per element in the range, creates aleaf node and connects it to the center.

    Query.

    CREATE (center)FOREACH (x IN range(1,6)| CREATE (leaf),(center)-[:X]->(leaf))RETURN id(center) AS id;

    The query returns the id of the center node.

    Result

    id0

    1 rowNodes created: 7Relationships created: 6

    Figure5.11.Graph

    XX

    XX

    X

    X

    5.14.2.Wheel graphThis graph is created in a number of steps:

    Create a center node. Once per element in the range, create a leaf and connect it to the center. Connect neighboring leafs. Find the minimum and maximum leaf and connect these.

  • Data Modeling Examples

    62

    Return the id of the center node.

    Query.

    CREATE (center)FOREACH (x IN range(1,6)| CREATE (leaf { count:x }),(center)-[:X]->(leaf))WITH centerMATCH (large_leaf)(small_leaf)WHERE large_leaf.count = small_leaf.count + 1CREATE (small_leaf)-[:X]->(large_leaf)WITH center, min(small_leaf.count) AS min, max(large_leaf.count) AS maxMATCH (first_leaf)(last_leaf)WHERE first_leaf.count = min AND last_leaf.count = maxCREATE (last_leaf)-[:X]->(first_leaf)RETURN id(center) AS id

    The query returns the id of the center node.

    Result

    id0

    1 rowNodes created: 7Relationships created: 12Properties set: 6

    Figure5.12.Graph

    count = 1

    X

    count = 2X

    count = 3

    X

    count = 4

    X

    count = 5 X

    count = 6

    X

    X

    X

    X

    X

    X

    X

    5.14.3.Complete graphTo create this graph, we first create 6 nodes and label them with the Leaf label. We then match all theunique pairs of nodes, and create a relationship between them.

    Query.

    FOREACH (x IN range(1,6)| CREATE (leaf:Leaf { count : x }))WITH *MATCH (leaf1:Leaf),(leaf2:Leaf)WHERE id(leaf1)< id(leaf2)CREATE (leaf1)-[:X]->(leaf2);

  • Data Modeling Examples

    63

    Nothing is returned by this query.

    Result(empty result)

    Nodes created: 6Relationships created: 15Properties set: 6Labels added: 6

    Figure5.13.Graph

    Leaf

    count = 1

    Leaf

    count = 2

    X

    Leaf

    count = 3

    X

    Leaf

    count = 4

    X

    Leaf

    count = 5

    X

    Leaf

    count = 6 X

    X

    X X

    X

    X X

    X

    X

    X X

    5.14.4.Friendship graphThis query first creates a center node, and then once per element in the range, creates a cycle graphand connects it to the center

    Query.CREATE (center)FOREACH (x IN range(1,3)| CREATE (leaf1),(leaf2),(center)-[:X]->(leaf1),(center)-[:X]->(leaf2), (leaf1)-[:X]->(leaf2))RETURN ID(center) AS id

    The id of the center node is returned by the query.

    Resultid0

    1 rowNodes created: 7Relationships created: 9

  • Data Modeling Examples

    64

    Figure5.14.Graph

    XX

    X

    XX

    X

    X

    X

    X

  • Data Modeling Examples

    65

    5.15.A multilevel indexing structure (path tree)In this example, a multi-level tree structure is used to index event nodes (here Event1, Event2 andEvent3, in this case with a YEAR-MONTH-DAY granularity, making this a timeline indexingstructure. However, this approach should work for a wide range of multi-level ranges.

    The structure follows a couple of rules:

    Events can be indexed multiple times by connecting the indexing structure leafs with the events viaa VALUE relationship.

    The querying is done in a path-range fashion. That is, the start- and end path from the indexing rootto the start and end leafs in the tree are calculated

    Using Cypher, the queries following different strategies can be expressed as path sections and puttogether using one single query.

    The graph below depicts a structure with 3 Events being attached to an index structure at differentleafs.

    Figure5.15.Graph

    Root

    Year 2010

    2010

    Year 2011

    2011

    Month 12

    12

    Month 01

    01

    Day 31

    31

    Day 01

    01

    Day 02

    02

    Day 03

    03

    NEXT

    Event1

    VALUE

    Event2

    VALUE

    NEXT

    VALUE

    NEXT

    Event3

    VALUE

    5.15.1.Return zero rangeHere, only the events indexed under one leaf (2010-12-31) are returned. The query only needs onepath segment rootPath (color Green) through the index.

  • Data Modeling Examples

    66

    Figure5.16.Graph

    Root

    Year 2010

    2010

    Year 2011

    2011

    Month 12

    12

    Month 01

    01

    Day 31

    31

    Day 01

    01

    Day 02

    02

    Day 03

    03

    NEXT

    Event1

    VALUE

    Event2

    VALUE

    NEXT

    VALUE

    NEXT

    Event3

    VALUE

    Query.

    MATCH rootPath=(root)-[:`2010`]->()-[:`12`]->()-[:`31`]->(leaf),(leaf)-[:VALUE]->(event)WHERE root.name = 'Root'RETURN event.nameORDER BY event.name ASC

    Returning all events on the date 2010-12-31, in this case Event1 and Event2

    Result

    event.name"Event1"

    "Event2"

    2 rows

    5.15.2.Return the full rangeIn this case, the range goes from the first to the last leaf of the index tree. Here, startPath (colorGreenyellow) and endPath (color Green) span up the range, valuePath (color Blue) is then connecting theleafs, and the values can be read from the middle node, hanging off the values (color Red) path.

  • Data Modeling Examples

    67

    Figure5.17.Graph

    Root

    Year 2010

    2010

    Year 2011

    2011

    Month 12

    12

    Month 01

    01

    Day 31

    31

    Day 01

    01

    Day 02

    02

    Day 03

    03

    NEXT

    Event1

    VALUE

    Event2

    VALUE

    NEXT

    VALUE

    NEXT

    Event3

    VALUE

    Query.

    MATCH startPath=(root)-[:`2010`]->()-[:`12`]->()-[:`31`]->(startLeaf), endPath=(root)-[:`2011`]->()-[:`01`]->()-[:`03`]->(endLeaf), valuePath=(startLeaf)-[:NEXT*0..]->(middle)-[:NEXT*0..]->(endLeaf), vals=(middle)-[:VALUE]->(event)WHERE root.name = 'Root'RETURN event.nameORDER BY event.name ASC

    Returning all events between 2010-12-31 and 2011-01-03, in this case all events.

    Result

    event.name"Event1"

    "Event2"

    "Event2"

    "Event3"

    4 rows

  • Data Modeling Examples

    68

    5.15.3.Return partly shared path rangesHere, the query range results in partly shared paths when querying the index, making the introductionof and common path segment commonPath (color Black) necessary, before spanning up startPath (colorGreenyellow) and endPath (color Darkgreen) . After that, valuePath (color Blue) connects the leafs and theindexed values are returned off values (color Red) path.

    Figure5.18.Graph

    Root

    Year 2010

    2010

    Year 2011

    2011

    Month 12

    12

    Month 01

    01

    Day 31

    31

    Day 01

    01

    Day 02

    02

    Day 03

    03

    NEXT

    Event1

    VALUE

    Event2

    VALUE

    NEXT

    VALUE

    NEXT

    Event3

    VALUE

    Query.

    MATCH commonPath=(root)-[:`2011`]->()-[:`01`]->(commonRootEnd), startPath=(commonRootEnd)-[:`01`]->(startLeaf), endPath=(commonRootEnd)-[:`03`]->(endLeaf), valuePath=(startLeaf)-[:NEXT*0..]->(middle)-[:NEXT*0..]->(endLeaf), vals=(middle)-[:VALUE]->(event)WHERE root.name = 'Root'RETURN event.nameORDER BY event.name ASC

    Returning all events between 2011-01-01 and 2011-01-03, in this case Event2 and Event3.

    Result

    event.name"Event2"

    2 rows

  • Data Modeling Examples

    69

    event.name"Event3"

    2 rows

  • Data Modeling Examples

    70

    5.16.Complex similarity computations5.16.1.Calculate similarities by complex calculations

    Here, a similarity between two players in a game is calculated by the number of times they have eatenthe same food.

    Query.MATCH (me { name: 'me' })-[r1:ATE]->(food)(food)

  • Data Modeling Examples

    71

    5.17.The Graphity activity stream model5.17.1.Find Activity Streams in a network without scaling penalty

    This is an approach for scaling the retrieval of activity streams in a friend graph put forward by RenePickard as Graphity . In short, a linked list is created for every personsfriends in the order that the last activities of these friends have occured. When new activities occur fora friend, all the ordered friend lists that this friend is part of are reordered, transferring computing loadto the time of new event updates instead of activity stream reads.

    TipThis approach of course makes excessive use of relationship types. This needs to betaken into consideration when designing a production system with this approach. SeeSection16.5, Capacity for the maximum number of relationship types.

    To find the activity stream for a person, just follow the linked list of the friend list, and retrieve theneeded amount of activities form the respective activity list of the friends.

    Query.

    MATCH p=(me { name: 'Jane' })-[:jane_knows*]->(friend),(friend)-[:has]->(status)RETURN me.name, friend.name, status.name, length(p)ORDER BY length(p)

    The returns the activity stream for Jane.

    Result

    me.name friend.name status.name length(p)"Jane" "Bill" "Bill_s1" 1

    "Jane" "Joe" "Joe_s1" 2

    "Jane" "Bob" "Bob_s1" 3

    3 rows

  • Data Modeling Examples

    72

    Figure5.20.Graph

    nam e = 'Bill'

    nam e = 'Bill_s1'

    has

    nam e = 'Joe'

    jane_knows

    nam e = 'Bill_s2'

    next

    nam e = 'Joe_s1'

    has

    nam e = 'Bob'

    jane_knows

    nam e = 'Ted_s1'

    nam e = 'Ted_s2'

    next

    nam e = 'Jane'

    jane_knows

    nam e = 'Joe_s2'

    next

    nam e = 'Bob_s1'

    has

    nam e = 'Ted'

    bob_knows

    bob_knows

    has

  • Data Modeling Examples

    73

    5.18.User roles in graphsThis is an example showing a hierarchy of roles. Whats interesting is that a tree is not sufficient forstoring this kind of structure, as elaborated below.

    This is an implementation of an example found in the article A Model to Represent Directed AcyclicGraphs (DAG) on SQL Databases by Kemal Erdogan . The article discusses how to store directedacyclic graphs (DAGs) in SQL based DBs.DAGs are almost trees, but with a twist: it may be possible to reach the same node through differentpaths. Trees are restricted from this possibility, which makes them much easier to handle. In our caseit is Ali and Engin, as they are both admins and users and thus reachable through these groupnodes. Reality often looks this way and cant be captured by tree structures.In the article an SQL Stored Procedure solution is provided. The main idea, that also have somesupport from scientists, is to pre-calculate all possible (transitive) paths. Pros and cons of thisapproach:

    decent performance on read low performance on insert wastes lots of space relies on stored procedures

    In Neo4j storing the roles is trivial. In this case we use PART_OF (green edges) relationships to modelthe group hierarchy and MEMBER_OF (blue edges) to model membership in groups. We also connect thetop level groups to the reference node by ROOT relationships. This gives us a useful partitioning of thegraph. Neo4j has no predefined relationship types, you are free to create any relationship types andgive them the semantics you want.Lets now have a look at how to retrieve information from the graph. The the queries are done usingCypher, the Java code is using the Neo4j Traversal API (see Section34.2, Traversal Framework JavaAPI, which is part of PartVIII, Advanced Usage).

  • Data Modeling Examples

    74

    5.18.1.Get the adminsIn Cypher, we could get the admins like this:MATCH ({ name: 'Admins' })(group)RETURN group.name

    group.name"ABCTechnicians"

    "Technicians"

    "Users"

    3 rows

    Using the Neo4j Java Traversal API, this query looks like:Node jale = getNodeByName( "Jale" );

  • Data Modeling Examples

    75

    traversalDescription = db.t