Grails goes Graph

download Grails goes Graph

If you can't read please download the document

Transcript of Grails goes Graph

SpringSource 2GX 2009

Grails Goes Graph

Stefan Armbruster, presales engineer @[email protected] Twitter: @darthvader42

2012 SpringOne 2GX. All rights reserved. Do not distribute without permission.

about @self

This talk: Grails goes Graph

Intro into Graph (Databases)

Intro into Neo4j

Grails Neo4j plugin

Live demo

case study

trend 1: data growth

source: Digital Universe Study 2011 by IDC

Eric Schmidt: Every two days we create as much information as up to 2003

trend 2: data connectedness

Information connectivityText DocumentsHypertextFeedsBlogsWikisUGCTaggingFolksonomiesRDFOnotologiesGGG

trend 3: semi-structured information

Individualisation of content1970s salary lists, all elements exactly one job

2000s salary lists, we need many job columns!

All encompassing entire world viewsStore more data about each entity

Trend accelerated by the decentralization of content generation

Age of participation (web 2.0)

trend 4: architecture

1980's: mainframe

trend 4: architecture

1990's: DB as integration platform

trend 4: architecture

2000's: decoupling of services

trend 4: architecture

2010: SOA

trend 4: scale for performance

Salary listMost Web appsSocial NetworkLocation-based services

data is different over times: 4 trends

amount of data grows (bigdata)

data gets more connected

less structure semi-structured

architecture massive horizontal scalability

NoSQL what does that mean?

NO to SQL ?

not only SQL!

simplistic cartography of NoSQL

side note: aggregate oriented databases

89% of all virtualized applications
in the world run on VMware. Gartner, December 2008

"There is a significant downside - the whole approach works really well when data access is aligned with the aggregates, but what if you want to look at the data in a different way? ...Order entry naturally stores orders as aggregates, but analyzing product sales cuts across the aggregate structure. This is why aggregate-oriented stores talk so much about map-reduce"Martin Fowler on http://martinfowler.com/bliki/AggregateOrientedDatabase.html

graphs are everywhere

graphs everywhere

Relationships in Politics, Economics, History, Science, Transportation

Biology, Chemistry, Physics, SociologyBody, Ecosphere, Reaction, Interactions

InternetHardware, Software, Interaction

Social NetworksFamily, Friends

Work, Communities

Neighbours, Cities, Society

relationships

the world is rich, messy and related data

relationships are as least as important as the things they connect

Graphs = Whole > parts

complex interactions

always changing, change of structures as well

Graph: Relationships are part of the data

RDBMS: Relationships part of the fixed schema

questions & answers

Complex Questions

Answers lie between the lines (things)

Locality of the information

Global searches / operations very expensive

constant query time, regardless of data volume

categories

Categories == Classes, Trees ?

What if more than one category fits?

Tags

Categories via relationships like IS_A

any number, easy change

virtual Relationships - Traversals

Category dynamically derived from queries

everyone is talking about graphs

Facebook Open Graph

Neo4j

example of a property graph

querying the graph: your choice

Simple way: navigate relationship paths by core API

More powerful: simple traversers with callbacks forWhere to end traversal

What should be in the result set

Even more powerful: Traversal APIFluent interface for specifying traversals,

Shell: mimics unix filesystem commands (ls, cd, ...)

Gremlin: graph traversal language

Cypher: the SQL for Neo4jDeclarative

Designed for Humans (Devs + Domain experts)

deprecated

to be deprecated

Cypher examples

START user=node(5,4,1,2,3)MATCH user-[:friend]->followerWHERE follower.name =~ /S.*/RETURN user, follower.name

START john=node:node_auto_index(name = 'John')MATCH john-[:friend]->()-[:friend]->fofRETURN john, fof

query performance

a sample social graph with ~1,000 persons

average 50 friends per person

pathExists(a,b) limited to depth 4

caches warmed up to eliminate disk I/O

# Personquery time

relational DB1.0002.000 ms

Neo4j1.0002 ms

Neo4j1.000.0002 ms

deployment options

Embedded in JVMJust drop couple of jars into your application

Use EmbeddedGraphDatabase

Very fast no marshalling/unmarshalling, no network overhead

Neo4j as ServerExposes rich REST interfacegranular API many requests, consider network overhead

use batching or Cypher if possible

Add custom modules to the server (plugins/unmanaged extensions)

Both, embedded and server can be run as HA!One master, multiple slaves

Zookeeper for managing the cluster, about to change for upcoming versions

Neo4j HA architecture

Licensing Neo4j

3 editions available:

Community: GPL

AdvancedCommunity + enhanced Monitoring + enhanced Webadmin

AGPL or Commercial

EnterpriseAdvanced + HA + online backup + GCR-Cache

AGPL or Commercial

Neo4j - Overview

RUNS_AS

HIGH_AVAIL.

SCALES_TO

RUNS_AS

RUNS_ON

PROVIDES

LICENSED_LIKE

INTEGRATES

TRAVERSALS

Sharding

Master/Slave

graphconnect.com, Nov 6 7

GORM

Grails Object Relational Mapping (GORM) aka grails-data-mappingLib: https://github.com/SpringSource/grails-data-mapping

manages meta-model of domain classes

Common data persistence abstraction layer

Methods for domain classes (CRUD + finders + X)

Extensible

Access to low level API of the implementation

TCK for implementation, +200 testcases

Existing implementationsSimple (In-Memory, hashmap based for unit testing)

Hibernate, JPA

MongoDB, SimpleDB, Dynamo, Redis, (Riak), Neo4j

some key abstractions in g-d-m

MappingContext:holds metainformation about mapping domain classes to the underlying datastore, does type conversion, holds list of EntityPersisters

Datastore:create sessions

manage connection to low-level storage

Session:similar HibernateSession

EntityPersister:does the dirty work: interact with low level datastore

Query:knows how to query the datastore by criteria (criterion, projections,...)

GORM has a price tag ;-)

Grails Neo4j Integration

Resources:Lib: https://github.com/SpringSource/grails-data-mapping

Plugin: http://www.grails.org/plugin/neo4j

Plugin docs: http://springsource.github.com/grails-data-mapping/neo4j/manual/index.html

goal: use Neo4j as persistence layer for a standard Grails domain model

Mapping Grails domain model to the nodespace

domain classassociationdomain classinstancedomain instanceproperty

reference node

subreference

instance

properties

2 challanges involved

Locking of domain nodes in HA mode

Category nodes become super nodes causes potential bottleneck on traversals

Solutions:add intermediate category nodes

use indexing instead

reference node

domain node

instance nodes

currently working in the neo4j plugin (1/2)

passing >98% of GORM TCK (hurray!)

accessing embedded, REST and HA datasourcesand ImpermanentGraphdatabase for testing

property type conversion

support of schemaless properties

access to native APIinstance.getNode(), bean: graphDatabaseService

GORM enhancements:.traverseStatic, .cypherStatic

.traverse, .cypher

currently working in the neo4j plugin (2/2)

prevention of locking exceptions by using intermediate category nodes

Declarative Indexingapply static mapping closure just the standard way

convenience methods on Neo4j's nodes and relationships:node. =

JSON marshalling for Neo4j's Node and Relationships

embed Neo4j's webadmin into grails application

praying to the demo god...

looking into the crystal ball

get rid of subreferences in favour of indexing

migrate plugin to use Cypher only instead of core-API

option for mapping domain classes as a relationshipthink of roads between cities having a distance property

fix open issues: http://bit.ly/KEmVX2

maybe use Spring Data Neo4j internally

and more

case study

back in 2010 a website to collect and aggregate opinions of soccer fans went life

votes can be based on almost everythingplayers, teams, matches, events in matches

hard to model with classic RDBMS

Neo4j to the rescue, used in embedded mode

as always: hard and very tight schedulebuild up technical debt due to lack of automated tests

Neo4j HA scales very good for reads

case study: lessons learned

massive amount of very small write transactions in HA mode caused trouble:e.g. locking exceptions upon user registration

aggregate multiple write transactions using JMS queue

serious issues with full GCssince app AND Neo4j reside in same JVM full GCs happen

if stop-the-world pause is too large: master switch

have loadbalancer with 2 setups (planned):write-driven requests go to master node

read-driven requests go to slave nodes

References

general overview of nosql:http://www.nosql-databases.org/

Neo4j itself: http://www.neo4j.orghttp://api.neo4j.org

http://doc.neo4j.org

neo4j grails plugin:source: https://github.com/SpringSource/grails-data-mapping

docs: http://springsource.github.com/grails-data-mapping/neo4j/

issues: http://jira.grails.org/browse/GPNEO4J

demo app: https://github.com/sarmbruster/neo4jsample

Java REST driver: https://github.com/jexp/neo4j-java-rest-binding

my blog: http://blog.armbruster-it.de

twitter: @darthvader42

10/20/12

10/20/12

10/20/12

10/20/12

10/20/12

10/20/12

R: 0G: 152B: 204

R: 194G: 205B: 35

R: 109G: 179B: 63

R: 56G: 124B: 44

R: 102G: 102B: 102

R: 255G: 255B:255

SpringOne 2GX 2011Theme Colors

SpringSourceBrand Colors

R: 102G: 102B: 102

10/20/12

10/20/12

10/20/12

10/20/12

10/20/12

60%

10/20/12

// This is Helvetica: 18 pt or higher please

public class TransferServiceImpl implements TransferService {

public TransferServiceImpl(AccountRepository ar) { this.accountRepository = ar; } }

10/20/12

10/20/12

10/20/12