The openCypher Project - An Open Graph Query Language

Post on 15-Apr-2017

1.005 views 0 download

Transcript of The openCypher Project - An Open Graph Query Language

The openCypher projectMichael Hunger

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

Topics

• Property Graph Model• Cypher - A language for querying graphs• Cypher History• Cypher Demo• Current implementation in Neo4j• User Feedback• Opening up - The openCypher project• Governance, Contribution Process• Planned Deliverables

The Property-Graph-ModelYou know it, right?

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

CAR

DRIVES

name: “Dan”born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo”model: “V70”

Labeled Property Graph Model Components

Nodes• The objects in the graph• Can have name-value properties• Can be labeled

Relationships• Relate nodes by type and direction• Can have name-value properties

LOVES

LOVES

LIVES WITH

OWN

S

PERSON PERSON

Relational Versus Graph Models

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person PersonPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Cypher Query LanguageWhy, How, When?

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

Why Yet Another Query Language (YAQL)?

• SQL and SparQL hurt our brains

• Our brains crave patterns

• It‘s all about patterns

• Creating a query language is fun (and hard work)

Michael Hunger
probably add a slide on the property graph model for which Cypher is made?
Petra Selmer
As discussed on Weds, talking about the property graph model may be too basic for the audience? Up to you...

What is Cypher?

• A graph query language that allows for expressive and efficient

querying of graph data

• Intuitive, powerful and easy to learn

• Write graph queries by describing patterns in your data

• Focus on your domain not the mechanics of data access.

• Designed to be a human-readable query language

• Suitable for developers and operations professionals

Michael Hunger
probably add a slide on the property graph model for which Cypher is made?
Petra Selmer
As discussed on Weds, talking about the property graph model may be too basic for the audience? Up to you...

What is Cypher?

• Cypher is declarative, which means it lets users express what

data to retrieve

• The guiding principle behind Cypher is to make simple things

easy and complex things possible

• A humane query language

• Stolen from SQL (common keywords), SPARQL (pattern

matching), Python and Haskell (collection semantics)

Why Cypher?

Compared to:• SPARQL (Cypher came from real-world use, not academia)• Gremlin (declarative vs imperative)• SQL (graph-specific vs set-specific)

(Cypher)-[:LOVES]->(ASCII Art)A language should be readable, not just writable. You will read your code

dozens more times than you write it. Regex for example are write-only.

Querying the GraphSome Examples With Cypher

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

Basic Query: Who do people report to?

MATCH (:Employee {firstName:”Steven”} ) -[:REPORTS_TO]-> (:Employee {firstName:“Andrew”} )

REPORTS_TO

Steven Andrew

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Basic Query Comparison: Who do people report to?

SELECT *FROM Employee as e JOIN Employee_Report AS er ON (e.id = er.manager_id) JOIN Employee AS sub ON (er.sub_id = sub.id)

MATCH (e:Employee)-[:REPORTS_TO]->(mgr:Employee)RETURN *

Basic Query: Who do people report to?

Basic Query: Who do people report to?

Cypher SyntaxOnly Tip of the Iceberg

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

Syntax: Patterns

( )-->( )

(node:Label {key:value})

(node1)-[rel:REL_TYPE {key:value}]->(node2)

(node1)-[:REL_TYPE1]->(node2)<-[:REL_TYPE2]-(node3)

(node1)-[:REL_TYPE*m..n]->(node2)

Patterns are used in

• (OPTIONAL) MATCH

• CREATE, MERGE

• shortestPath()

• Predicates

• Expressions

• (Comprehensions)

Syntax: Structure

(OPTIONAL) MATCH <patterns>

WHERE <predicates>

RETURN <expression> AS <name>

ORDER BY <expression>

SKIP <offset> LIMIT <size>

Syntax: Automatic Aggregation

MATCH <patterns>

RETURN <expr>, collect([distinct] <expression>) AS <name>,

count(*) AS freq

ORDER BY freq DESC

DataFlow: WITH

WITH <expression> AS <name>, ....

• controls data flow between query segments• separates reads from writes• can also• aggregate• sort• paginate

• replacement for HAVING• as many WITHs as you like

Structure: Writes

CREATE <pattern>

MERGE <pattern> ON CREATE ... ON MATCH ...

(DETACH) DELETE <entity>

SET <property,label>

REMOVE <property,label>

Data Import

[USING PERODIC COMMIT <count>]

LOAD CSV [WITH HEADERS] FROM „URL“ AS row

... any Cypher clauses, mostly match + updates ...

Collections

UNWIND (range(1,10) + [11,12,13]) AS x

WITH collect(x) AS coll

WHERE any(x IN coll WHERE x % 2 = 0)

RETURN size(coll), coll[0], coll[1..-1] ,

reduce(a = 0, x IN coll | a + x),

extract(x IN coll | x*x), filter(x IN coll WHERE x > 10),

[x IN coll WHERE x > 10 | x*x ]

Maps & Entities

WITH {age:42, name: „John“, male:true} as data

WHERE exists(data.name) AND data[„age“] = 42

CREATE (n:Person) SET n += data

RETURN [k in keys(n) WHERE k CONTAINS „a“

| {key: k, value: n[k] } ]

Optional Schema

CREATE INDEX ON :Label(property)

CREATE CONSTRAINT ON (n:Label) ASSERT n.property IS UNIQUE

CREATE CONSTRAINT ON (n:Label) ASSERT exists(n.property)

CREATE CONSTRAINT ON (:Label)-[r:REL]->(:Label2)

ASSERT exists(r.property)

And much more ...

neo4j.com/docs/stable/cypher-refcard

More Examples

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

MATCH (sub)-[:REPORTS_TO*0..3]->(boss), (report)-[:REPORTS_TO*1..3]->(sub)WHERE boss.firstName = 'Andrew'RETURN sub.firstName AS Subordinate,

count(report) AS Total;

Express Complex Queries Easily with Cypher

Find all direct reports and how many people they manage, each up to 3 levels down

Cypher Query

SQL Query

Who is in Robert’s (direct, upwards) reporting chain?

MATCH path=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = 'Robert'RETURN path;

Who is in Robert’s (direct, upwards) reporting chain?

Product Cross-SellMATCH (choc:Product {productName: 'Chocolade'}) <-[:ORDERS]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:ORDERS]->(other:Product)RETURN employee.firstName, other.productName, count(distinct o2) as countORDER BY count DESCLIMIT 5;

Product Cross-Sell

Neo4j‘s Cypher Implementation

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

History of Cypher

• 1.4 - Cypher initially added to Neo4j• 1.6 - Cypher becomes part of REST API• 1.7 - Collection functions, global search, pattern predicates• 1.8 - Write operations• 1.9 Type System, Traversal Matcher, Caches, String functions, more

powerful WITH, Lazyness, Profiling, Execution Plan• 2.0 Label support, label based indexes and constraints, MERGE,

transactional HTTP endpoint, literal maps, slices, new parser, OPTIONAL MATCH

• 2.1 – LOAD CSV, COST Planner, reduce eagerness, UNWIND, versioning• 2.2 – COST Planner default, EXPLAIN, PROFILE, vis. Query Plan, IDP• 2.3 -

Try it out!

Petra Selmer
+michael.hunger@neotechnology.com - I added a new image
Michael Hunger
cut off the browser chrome, show the graph result instead

APIs• Embedded• graphDb.execute(query, params);

• HTTP – transactional Cypher endpoint• :POST /db/data/transaction[/commit] {statements:[{statement: „query“,

parameters: params, resultDataContents:[„row“], includeStats:true},....]}

• Bolt – binary protocol• Driver driver = GraphDatabase.driver( "bolt://localhost" );

Session session = driver.session();

Result rs = session.run("CREATE (n) RETURN n");

Cypher Today - Neo4j Implementation

• Convert the input query into an abstract syntax tree (AST)• Optimise and normalise the AST (alias expansion, constant folding etc)• Create a query graph - a high-level, abstract representation of the query -

from the normalised AST• Create a logical plan, consisting of logical operators, from the query graph,

using the statistics store to calculate the cost. The cheapest logical plan is selected using IDP (iterative dynamic programming)

• Create an execution plan from the logical plan by choosing a physical implementation for logical operators

• Execute the queryhttp://neo4j.com/blog/introducing-new-cypher-query-optimizer/

Cypher Today - Neo4j Implementation

Neo4j Query Planner

Cost based Query Planner since Neo4j 2.2• Uses database stats to select best plan• Currently for Read Operations• Query Plan Visualizer, finds• Non optimal queries• Cartesian Product• Missing Indexes, Global Scans• Typos• Massive Fan-Out

openCypherAn open graph query language

Philip Rathle
General comment: we have lots & lots of great slides introducing property graphs & Cypher. Suggest you reuse those as that will save lots of time, rather than attempting to create more. I'll send you a few that I have. Suggest asking +michael.hunger@neotechnology.com or +ryan.boyd@neotechnology.com or Nicole as well.

Why ?

We love Cypher!

Our users love Cypher.

We want to make everyone happy through using it.

And have Cypher run on their data(base).

We want to collaborate with community and industry partners to

create the best graph query language possible!

Michael Hunger
what are the guiding principles around openCypher, what about governance, decision making, processes?
Petra Selmer
The final decision - regarding PRs of CIPs, feature reqs, RI artifacts etc - will be made by the CLG. I am unsure as to whether this is something that you want to have written on the slide, though. The idea is to open up the "body", but we have not yet discussed this in enough depth to give more specifics. We've also toyed with opening up the CLG, but nothing further, and certainly nothing we want to commit to at this early stage.
Petra Selmer
Re processes, this is (will be covered in the CIP slides)
Petra Selmer
I have actually put in a point about the CLG and governance after all.... Please amend the point if it sounds too... controlling! Philip has made mention of the fact that the CLG must not appear to be "gatekeepers", so we have to tread very lightly here.
Petra Selmer
I have now published some CLG minutes, but not sure whether it is useful to show this here... so up to you. If you do decide it is useful, the link is https://opencypher.github.io/meeting-minutes/
Michael Hunger
what are the goals
Petra Selmer
I've moved the "Goals" slide to just after this one.. does this address the comment?

We love the love

Michael Hunger
what are the guiding principles around openCypher, what about governance, decision making, processes?
Petra Selmer
The final decision - regarding PRs of CIPs, feature reqs, RI artifacts etc - will be made by the CLG. I am unsure as to whether this is something that you want to have written on the slide, though. The idea is to open up the "body", but we have not yet discussed this in enough depth to give more specifics. We've also toyed with opening up the CLG, but nothing further, and certainly nothing we want to commit to at this early stage.
Petra Selmer
Re processes, this is (will be covered in the CIP slides)
Petra Selmer
I have actually put in a point about the CLG and governance after all.... Please amend the point if it sounds too... controlling! Philip has made mention of the fact that the CLG must not appear to be "gatekeepers", so we have to tread very lightly here.
Petra Selmer
I have now published some CLG minutes, but not sure whether it is useful to show this here... so up to you. If you do decide it is useful, the link is https://opencypher.github.io/meeting-minutes/
Michael Hunger
what are the goals
Petra Selmer
I've moved the "Goals" slide to just after this one.. does this address the comment?

Future of (open)Cypher

• Decouple the language from Neo4j

• Open up and make the language design process transparent

• Encourage use within of databases/tools/highlighters/etc

• Delivery of language docs, tools and implementation

• Governed by the Cypher Language Group (CLG)

Michael Hunger
what are the guiding principles around openCypher, what about governance, decision making, processes?
Petra Selmer
The final decision - regarding PRs of CIPs, feature reqs, RI artifacts etc - will be made by the CLG. I am unsure as to whether this is something that you want to have written on the slide, though. The idea is to open up the "body", but we have not yet discussed this in enough depth to give more specifics. We've also toyed with opening up the CLG, but nothing further, and certainly nothing we want to commit to at this early stage.
Petra Selmer
Re processes, this is (will be covered in the CIP slides)
Petra Selmer
I have actually put in a point about the CLG and governance after all.... Please amend the point if it sounds too... controlling! Philip has made mention of the fact that the CLG must not appear to be "gatekeepers", so we have to tread very lightly here.
Petra Selmer
I have now published some CLG minutes, but not sure whether it is useful to show this here... so up to you. If you do decide it is useful, the link is https://opencypher.github.io/meeting-minutes/
Michael Hunger
what are the goals
Petra Selmer
I've moved the "Goals" slide to just after this one.. does this address the comment?

CIP (Cypher Improvement Proposal)• A CIP is a semi-formal specification

providing a rationale for new language features and constructs

• Contributions are welcome: submit either a CIP (as a pull request) or a feature request (as an issue) at the openCypher GitHub repository

• See „Ressources“ for• accepted CIPs• Contribution Process• Template

github.com/opencypher/openCypher

CIP structure• Sections include:• motivation, • background, • proposal (including the

syntax and semantics), • alternatives, • interactions with existing

features, • benefits,• drawbacks

• Example of the “STARTS WITH / ENDS WITH / CONTAINS” CIP

Deliverables

✔ Improvement Process ✔ Governing Body ✔ Language grammar (Jan-2016)

Technology certification kit (TCK) Cypher Reference Documentation Cypher language specification Reference implementation (under Apache 2.0) Cypher style guide Opening up the CLG

Cypher language specification

• EBNF Grammar

• Railroad diagrams

• Semantic specification

• Licensed under a Creative Commons license

Language Grammar (RELEASED Jan-30-2016)

…Match = ['OPTIONAL', SP], 'MATCH', SP, Pattern, {Hint}, [Where] ;

Unwind = 'UNWIND', SP, Expression, SP, 'AS', SP, Variable ;

Merge = 'MERGE', SP, PatternPart, {SP, MergeAction} ;

MergeAction = ('ON', SP, 'MATCH', SP, SetClause) | ('ON', SP, 'CREATE', SP, SetClause);...

github.com/opencypher/openCypher/blob/master/grammar.ebnf

Technology Compliance Kit (TCK)

● Validates a Cypher implementation

● Certifies that it complies with a given version of Cypher

● Based on given dataset

● Executes a set of queries and

● Verifies expected outputs

Michael Hunger
how would that work? checking the outputs? on a textual basis? i.e. completely independent of implementation stack/language?
Petra Selmer
I think at this stage any more detail may back us into a corner... if this was further long, we could say more about it. However, we're just at the beginning..

Cypher Reference Documentation

• Style Guide

• User documentation describing the use of Cypher

• Example datasets with queries

• Tutorials

• GraphGists

Style Guide

• Label are CamelCase

• Properties and functions are lowerCamelCase

• Keywords and Relationship-Types are ALL_CAPS

• Patterns should be complete and left to right

• Put anchored nodes first

• .... to be released ...

Reference implementation (ASL 2.0)

• A fully functional implementation of key parts of the stack needed to support Cypher inside a platform or tool

• First deliverable: parser taking a Cypher statement and parsing it into an AST (abstract syntax tree)

• Future deliverables:• Rule-based query planner• Query runtime

• Distributed under the Apache 2.0 license• Can be used as example or as a implementation foundation

The Cypher Language Group (CLG)

• The steering committee for language evolution

• Reviews feature requests and proposals (CIP)

• Caretakers of the language

• Focus on guiding principles

• Long term focus, no quick fixes & hacks

• Currently group of Cypher authors, developers and users

• Publish Meeting Minutes -> opencypher.github.io/meeting-minutes/

“Graph processing is becoming an indispensable part of the modern big data stack. Neo4j’s Cypher query language has greatly accelerated graph database adoption.

We are looking forward to bringing Cypher’s graph pattern matching capabilities into the Spark stack, making it easier for masses to access query graph processing.”

- Ion Stoica, CEO & Founder Databricks

“Lots of software systems could be improved by using a graph datastore. One thing holding back the category has been the lack of a widely supported, standard graph query language. We see the appearance of openCypher as an important step towards the broader use of graphs across the industry.”

- Rebecca Parsons, ThoughtWorks, CTO

Some people like it

And support openCypher

Ressources

• http://www.opencypher.org/

• https://github.com/opencypher/openCypher• https://github.com/opencypher/openCypher/blob/master/

CONTRIBUTING.adoc

• https://github.com/opencypher/openCypher/tree/master/cip

• https://github.com/opencypher/openCypher/pulls

• http://groups.google.com/group/openCypher

• @openCypher

Please contributeFeedback, Ideas, ProposalsImplementations

Thank You !Questions ?