Webinar: RDBMS to Graphs

Post on 21-Mar-2017

493 views 0 download

Transcript of Webinar: RDBMS to Graphs

RDBMS TO GRAPH

Live from San Mateo, March 9, 2017Webinar

HR-tools Supply Payments Logistics CRM Support

TRADITIONAL DATA STRUCTURE

RDBMS RDBMS RDBMSRDBMS RDBMS RDBMS

SHIFT TOWARDS SYSTEMS OF ENGAGEMENT

Users Engaging With DevicesUsers Engaging With Users Devices Engaging With Devices

SYSTEMS OF ENGAGEMENT

SHIFT TOWARDS SYSTEMS OF ENGAGEMENT

You are here!

Data volume

SYSTEMS OF RECORDRelational Database Model

StructuredPre-computed

Based on rigid rules

SYSTEMS OF ENGAGEMENTNoSQL Database Model

Highly FlexibleReal-Time QueriesHighly Contextual

SYSTEMS OF RECORD

SYSTEMS OF ENGAGEMENT

This is data modeled as a graph!

IntuitivnessSpeedAgility

IntuitivenessSpeedAgility

Intuitiveness

IntuitivnessSpeedAgility

Speed

“We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require

10-100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.”

- Volker Pacher, Senior Developer

“Minutes to milliseconds” performance Queries up to 1000x faster than RDBMS or other NoSQL

IntuitivnessSpeed, because Native Graph Database

Agility

WHAT DOES “NATIVE” MEAN?

Employee ID Name PictureRef Building Office Departme

nt Title Degree1 Uni1 Major1

4951870 John Doe

s3://acme-pics/

4951870.png

1200 124A Eng Software Engineer II MS Harvard Computer

Science

9765207 Jane Smith

s3://acme-pics/

9765207.png

1300 187D BizOpsSr

Operations Associate

BS Stanford Physics

4150915 Shyam Bhatt

s3://acme-pics/

4150915.png

45 432C SalesEnterprise

Sales Assoc

MBA Penn Accounting

7566243 Kathryn Bates

s3://acme-pics/

7566243.png

44 334B EngStaff

Software Engineer

PhD UCB Computer Science

WHY DOES “NATIVE” MATTER?

Think of a Relational DB QueryEmpID Name PictureRef

4951870 John Doe s3://acme-pics/4951870.png

9765207 Jane Smith s3://acme-pics/9765207.png

4150915 Shyam Bhatt s3://acme-pics/4150915.png

7566243 Kathryn Bates s3://acme-pics/7566243.png

EmpID Manager ID StartDate EndDate

4951870 9765207 20170101 null

9765207 7566243 20150130 null

4150915 8795882 20141215 20150312

7566243 8509238 20120605 20140124

EmpID Building Office

4951870 1200 124A

9765207 1300 187D

4150915 45 432C

Think of a Relational DB QueryEmpID Name PictureRef

4951870 John Doe s3://acme-pics/4951870.png

9765207 Jane Smith s3://acme-pics/9765207.png

4150915 Shyam Bhatt s3://acme-pics/4150915.png

7566243 Kathryn Bates s3://acme-pics/7566243.png

EmpID Manager ID StartDate EndDate

4951870 9765207 20170101 null

9765207 7566243 20150130 null

4150915 8795882 20141215 20150312

7566243 8509238 20120605 20140124

EmpID Building Office

4951870 1200 124A

9765207 1300 187D

4150915 45 432C

Think of a Relational DB QueryEmpID Name PictureRef

4951870 John Doe s3://acme-pics/4951870.png

9765207 Jane Smith s3://acme-pics/9765207.png

4150915 Shyam Bhatt s3://acme-pics/4150915.png

7566243 Kathryn Bates s3://acme-pics/7566243.png

EmpID Manager ID StartDate EndDate

4951870 9765207 20170101 null

9765207 7566243 20150130 null

4150915 8795882 20141215 20150312

7566243 8509238 20120605 20140124

EmpID Building Office

4951870 1200 124A

9765207 1300 187D

4150915 45 432C

16+ Index LookupsExpensive!

(Partial) Graph View

1 Index Lookup (find :Employee nodes)

Then Index-Free Adjacency

:Employee{id:4951870}

:Employee{id:9765207}

:Office{id: 1200124a}

:Building{id: 1200}

[:IS_MANAGED_BY]

[:HAS_OFFICE]

[:LOCATED_IN]

IntuitivnessSpeedAgility

A Naturally Adaptive Model

A Query Language Designed for Connectedness

+

=Agility

CypherTypical Complex SQL Join The Same Query using Cypher

MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN sub.name AS Subordinate, count(report) AS Total

Project ImpactLess time writing queries

Less time debugging queries

Code that’s easier to read

ABOUT ME• Developed web apps for 5 years

including e-commerce, business workflow, more.

• Worked at Google for 8 years on Google Apps, Cloud Platform

• Technologies: Python, Java, BigQuery, Oracle, MySQL, OAuth

ryan@neo4j.com @ryguyrg

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Real Time Recommendations

VIEW

ED

VIEWED

BOUG

HT

VIEWED BOUGHT

BOUGHT

BO

UG

HT

BOUG

HT

“As the current market leader in graph databases, and with enterprise features for scalability and availability, Neo4j is the right choice to meet our demands.” Marcos Wada

Software Developer, Walmart

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Master Data Management

MANAGES

MANAGES

LEADS

REGION

MANAGES

MANAGES

REGION

LEADS

LEADS

COLL

ABO

RATE

S

Neo4j is the heart of Cisco HMP: used for governance and single source of truth and a one-stop shop for all of Cisco’s hierarchies.

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Master Data Management

Solu%onSupportCase

SupportCase

KnowledgeBaseAr%cle

Message

KnowledgeBaseAr%cle

KnowledgeBaseAr%cle

Neo4j is the heart of Cisco’s Helpdesk Solution too.

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Fraud Detection

OPENED_ACCOUNT

HAS IS_ISSUED

HAS

LIVES LIVES

IS_ISSUED

OPE

NED_

ACCO

UNT

“Graph databases offer new methods of uncovering fraud rings and other sophisticated scams with a high-level of accuracy, and are capable of stopping advanced fraud scenarios in real-time.”

Gorka SadowskiCyber Security Expert

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Graph Based Search

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

PUBLISH

INCLUDE

INCLUDE

CREATE

CAPT

URE

IN

INSO

URCE

USES

USES

IN

IN

USES

SOURCE SOURCE

Uses Neo4j to manage the digital assets inside of its next generation in-flight entertainment system.

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

BROWSESCO

NNEC

TS

BRIDGES

ROUTES

POW

ERSROUTES

POWERSPOWERS

HOSTS

QUERIES

GRAPH THINKING: Network & IT-Operations

Uses Neo4j for network topology analysis for big telco service providers

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

GRAPH THINKING: Identity And Access Management

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

TRUSTS

TRUSTS

ID

ID

AUTHENTICATES

AUTH

ENTI

CATE

S

OWNS

OWNSC

AN

_REA

D

UBS was the recipient of the 2014 Graphie Award for “Best Identify And Access Management App”

NEO4j USE CASESReal Time Recommendations

Master Data Management

Fraud Detection

Identity & Access Management

Graph Based Search

Network & IT-Operations

Neo4j Adoption by Selected VerticalsSOFTWARE FINANCIAL

SERVICES RETAIL MEDIA & BROADCASTING

SOCIAL NETWORKS TELECOM HEALTHCARE

AGENDA• Use Cases • SQL Pains • Building a Neo4j Application • Moving from RDBMS -> Graph Models

• Walk through an Example • Creating Data in Graphs • Querying Data

SQL

Day in the Life of a RDBMS Developer

SELECT p.name, c.country, c.leader, p.hair, u.name, u.pres, u.stateFROM people p LEFT JOIN country c ON c.ID=p.country LEFT JOIN uni u ON p.uni=u.idWHERE u.state=‘CT’

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

JOIN

• Complex to model and store relationships • Performance degrades with increases in data • Queries get long and complex • Maintenance is painful

SQL Pains

• Easy to model and store relationships • Performance of relationship traversal remains constant with

growth in data size • Queries are shortened and more readable • Adding additional properties and relationships can be done on

the fly - no migrations

Graph Gains

What does this Graph look like?

CYPHER

Ann DanLoves

Property Graph Model

CREATE (:Person { name:“Dan”} ) - [:LOVES]-> (:Person { name:“Ann”} )

LOVES

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

MATCH (p:Person)-[:WENT_TO]->(u:Uni), (p)-[:LIVES_IN]->(c:Country), (u)-[:LED_BY]->(l:Leader), (u)-[:LOCATED_IN]->(s:State)WHERE s.abbr = ‘CT’RETURN p.name, c.country, c.leader, p.hair, u.name, l.name, s.abbr

How do you use Neo4j?

CREATE MODEL

+

LOAD DATA QUERY DATA

How do you use Neo4j?

How do you use Neo4j?

Official Language Drivers

Community Language Drivers

Java Stored Procedures and Functions

GET STARTED TODAY!!!

GET STARTED TODAY!!!

https://neo4j.com/sandbox-v2

Architectural Options

DataStorageandBusinessRulesExecu5on

DataMiningandAggrega5on

Applica'on

GraphDatabaseCluster

Neo4j Neo4j Neo4j

AdHocAnalysis

BulkAnaly'cInfrastructureHadoop,EDW…

DataScien'st

EndUser

DatabasesRela5onalNoSQLHadoop

RDBMS to Graph Options

MIGRATEALLDATA

MIGRATESUBSET

DUPLICATESUBSET

Non-GraphQueries GraphQueries

GraphQueriesNon-GraphQueries

AllQueries

Rela3onalDatabase

GraphDatabase

Application

Application

Application

NonGraphData

AllData

FROM RDBMS TO GRAPHS

Northwind

Northwind - the canonical RDBMS Example

( )-[:TO]->(Graph)

( )-[:IS_BETTER_AS]->(Graph)

Starting with the ER Diagram

Locate the Foreign Keys

Drop the Foreign Keys

Find the JOIN Tables

(Simple) JOIN Tables Become Relationships

Attributed JOIN Tables -> Relationships with Properties

Querying a Subset Today

As a Graph

QUERYING THE GRAPH

using openCypher

Property Graph Model

CREATE(:Employee{firstName:“Steven”})-[:REPORTS_TO]->(:Employee{firstName:“Andrew”})

REPORTS_TO Steven Andrew

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

Who do people report to?MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee)RETURN *

Who do people report to?

Who do people report to?MATCH (e:Employee)<-[:REPORTS_TO]-(sub:Employee)RETURN e.employeeID AS managerID, e.firstName AS managerName, sub.employeeID AS employeeID, sub.firstName AS employeeName;

Who do people report to?

Who does Robert report to?

MATCH p=(e:Employee)<-[:REPORTS_TO]-(sub:Employee)WHERE sub.firstName = ‘Robert’RETURN p

Who does Robert report to?

What is Robert’s reporting chain?

MATCH p=(e:Employee)<-[:REPORTS_TO*]-(sub:Employee)WHERE sub.firstName = ‘Robert’RETURN p

What is Robert’s reporting chain?

Who’s the Big Boss?MATCH (e:Employee)WHERE NOT (e)-[:REPORTS_TO]->()RETURN e.firstName as bigBoss

Who’s the Big Boss?

Product Cross-SellingMATCH (choc:Product {productName: 'Chocolade'}) <-[:INCLUDES]-(:Order)<-[:SOLD]-(employee), (employee)-[:SOLD]->(o2)-[:INCLUDES]->(other:Product)RETURN employee.firstName, other.productName, COUNT(DISTINCT o2) as countORDER BY count DESCLIMIT 5;

Product Cross-Selling

(ASIDE ON GRAPH COMPUTE)

Shortest Path Between AirportsMATCH p = shortestPath( (a:Airport {code:”SFO”})-[*0..2]-> (b:Airport {code: “MSO”}))RETURN p

(END ASIDE ON GRAPH COMPUTE)

POWERING AN APP

Simple App

Simple App

Simple Python Code

Simple Python Code

Simple Python Code

Simple Python Code

LOADING OUR DATA

CSV

CSV files for Northwind

CSV files for Northwind

3 Steps to Creating the Graph

IMPORT NODES CREATE INDEXES IMPORT RELATIONSHIPS

Importing Nodes// Create customersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/customers.csv" AS rowCREATE (:Customer {companyName: row.CompanyName, customerID: row.CustomerID, fax: row.Fax, phone: row.Phone});

// Create productsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowCREATE (:Product {productName: row.ProductName, productID: row.ProductID, unitPrice: toFloat(row.UnitPrice)});

Importing Nodes// Create suppliersUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/suppliers.csv" AS rowCREATE (:Supplier {companyName: row.CompanyName, supplierID: row.SupplierID});

// Create employeesUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/employees.csv" AS rowCREATE (:Employee {employeeID:row.EmployeeID, firstName: row.FirstName, lastName: row.LastName, title: row.Title});

Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (customer:Customer {customerID: row.CustomerID})MERGE (customer)-[:PURCHASED]->(order);

USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/products.csv" AS rowMATCH (product:Product {productID: row.ProductID})MATCH (supplier:Supplier {supplierID: row.SupplierID})MERGE (supplier)-[:SUPPLIES]->(product);

Creating RelationshipsUSING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (product:Product {productID: row.ProductID})MERGE (order)-[pu:INCLUDES]->(product)ON CREATE SET pu.unitPrice = toFloat(row.UnitPrice), pu.quantity = toFloat(row.Quantity);

USING PERIODIC COMMITLOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/neo4j-contrib/developer-resources/gh-pages/data/northwind/orders.csv" AS rowMATCH (order:Order {orderID: row.OrderID})MATCH (employee:Employee {employeeID: row.EmployeeID})MERGE (employee)-[:SOLD]->(order);

High Performance LOADingneo4j-import

4.58 million thingsand their relationships…

Loads in 100 seconds!

JDBCapoc.load.jdbc

THERE’S A PROCEDURE FOR THAT

https://github.com/neo4j-contrib/neo4j-apoc-procedures

WRAPPING UP

“We found Neo4j to be literally thousands of times faster than our prior MySQL solution, with queries that require 10 to 100 times less code. Today, Neo4j provides eBay with functionality that was previously impossible.”

Volker PacherSenior Developer

THANK YOU!

Ryan Boyd @ryguyrg ryan@neo4j.com