Post on 08-Sep-2014
description
What’s inside?
‣ PostgreSQL
‣ Neo4j
‣ ArangoDB
Python Frameworks
‣ Bulbflow
‣ py4neo
‣ NetworkX
‣ Arango-python
Relational to Graph model crash course
“Switching from relational to the graph model”!
by Luca Garulli
http://goo.gl/z08qwk!!
http://www.slideshare.net/lvca/switching-from-relational-to-the-graph-model
My motivation is quite simple:
–Norbert Wiener
“The best material model of a cat is another, or preferably the same, cat.”
Old good Postgres
create table nodes ( node integer primary key, name varchar(10) not null, feat1 char(1), feat2 char(1)) !create table edges ( a integer not null references nodes(node) on update cascade on delete cascade, b integer not null references nodes(node) on update cascade on delete cascade, primary key (a, b)); !create index a_idx ON edges(a); create index b_idx ON edges(b); !create unique index pair_unique_idx on edges (LEAST(a, b), GREATEST(a, b)); !; and no self-loops alter table edges add constraint no_self_loops_chk check (a <> b); !insert into nodes values (1, 'node1', 'x', 'y'); insert into nodes values (2, 'node2', 'x', 'w'); insert into nodes values (3, 'node3', 'x', 'w'); insert into nodes values (4, 'node4', 'z', 'w'); insert into nodes values (5, 'node5', 'x', 'y'); insert into nodes values (6, 'node6', 'x', 'z'); insert into nodes values (7, 'node7', 'x', 'y'); !insert into edges values (1, 3), (2, 1), (2, 4), (3, 4), (3, 5), (3, 6), (4, 7), (5, 1), (5, 6), (6, 1); !; directed graph select * from nodes n left join edges e on n.node = e.b where e.a = 2; !; undirected graph select * from nodes where node in (select case when a=1 then b else a end from edges where 1 in (a,b)); !
Я из Одессы, я просто бухаю.
Neo4j
Most famous graph database.
• 1,333 mentions within repositories on Github • 1,140,000 results in Google • 26,868 tweets • Really nice Admin interface • Awesome help tips
Py2Neo, Neomodel, neo4django, bulbflow
A lot of python libraries
; Create a node1, node2 and ; relation RELATED between two nodes CREATE (node1 {name:"node1"}), (node2 {name: "node2"}), (node1)-[:RELATED]->(node2); !
neo4j is friendly and powerful. The only thing is a bit complex querying language – Cypher
from py2neo import neo4j, node, rel !!graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") !die_hard = graph_db.create( node(name="Bruce Willis"), node(name="John McClane"), node(name="Alan Rickman"), node(name="Hans Gruber"), node(name="Nakatomi Plaza"), rel(0, "PLAYS", 1), rel(2, "PLAYS", 3), rel(1, "VISITS", 4), rel(3, "STEALS_FROM", 4), rel(1, "KILLS", 3))
py4neo nodes
from py2neo import neo4j, node !graph_db = neo4j.GraphDatabaseService( "http://localhost:7474/db/data/") alice, bob, carol = node(name="Alice"), \ node(name="Bob"), \ node(name="Carol") abc = neo4j.Path( alice, "KNOWS", bob, "KNOWS", carol) abc.create(graph_db) abc.nodes # [node(**{'name': 'Alice'}), # node(**{‘name': ‘Bob'}), # node(**{‘name': 'Carol'})]
py4neo paths
Alice KNOWS Bob KNOWS Carol
from bulbs.neo4jserver import Graph g = Graph() james = g.vertices.create(name="James") julie = g.vertices.create(name="Julie") g.edges.create(james, "knows", julie)
bulbflow framework
FlockDB OrientDB InfoGrid
HyperGraphDB
WAT?
ArangoDB
–Michael Jordan
“In any investment, you expect to have fun and make profit.”
I’m developer of python driver for ArangoDB
• NoSQL Database storage
• Graph of documents
• AQL (arango query language) to execute graph queries
• Edge data type to create edges between nodes (with properties)
• Multiple edges collections to keep different kind of edges
• Support of Gremlin graph query language
Small experiment with graphs and twitter:!
I’ve looked on my tweets and people who added it to favorites.
After that I’ve looked to that person’s tweets and did the same thing with people who favorited their tweets.
1-level depth
2-level depth
3-level depth
Code behind
from arango import create !arango = create(db="tweets_maxmaxmaxmax") arango.database.create() arango.tweets.create() arango.tweets_edges.create( type=arango.COLLECTION_EDGES) !
!from_doc = arango.tweets.documents.create({}) to_doc = arango.tweets.documents.create({}) arango.tweets_edges.edges.create(from_doc, to_doc)
query = db.tweets_edge.query.over( F.EDGES( "tweets_edges", ~V("tweets/196297127"), ~V("outbound")))
Here we creating edge from from_doc to to_doc
Getting edges for tweet 196297127
Full example
• Sample dataset with 10 users • Relations between users • Visualise within admin interface
Sample dataset
from arango import create !def dataset(a): a.database.create() a.users.create() a.knows.create(type=a.COLLECTION_EDGES) ! for u in range(10): a.users.documents.create({ "name": "user_{}".format(u), "age": u + 20, "gender": u % 2 == 0}) !!a = create(db="experiments") dataset(a)
Relations between users
def relations(a): rels = ( (0, 1), (0, 2), (2, 3), (4, 3), (3, 5), (5, 1), (0, 5), (5, 6), (6, 7), (7, 8), (9, 8)) ! get_user = lambda id: a.users.query.filter( "obj.name == 'user_{}'".format(id)).execute().first ! for f, t in rels: what = "user_{} knows user_{}".format(f, t) from_doc, to_doc = get_user(f), get_user(t) a.knows.edges.create(from_doc, to_doc, {"what": what}) print ("{}->{}: {}".format(from_doc.id, to_doc.id, what)) !a = create(db="experiments") relations(a)
Relations between users
users/2744664487->users/2744926631: user_0 knows user_1 users/2744664487->users/2745123239: user_0 knows user_2 users/2745123239->users/2745319847: user_2 knows user_3 users/2745516455->users/2745319847: user_4 knows user_3 users/2745319847->users/2745713063: user_3 knows user_5 users/2745713063->users/2744926631: user_5 knows user_1 users/2744664487->users/2745713063: user_0 knows user_5 users/2745713063->users/2745909671: user_5 knows user_6 users/2745909671->users/2746106279: user_6 knows user_7 users/2746106279->users/2746302887: user_7 knows user_8 users/2746499495->users/2746302887: user_9 knows user_8
AQL, getting pathsFOR p IN PATHS(users, knows, 'outbound') FILTER p.source.name == 'user_5' RETURN p.vertices[*].name
from arango import create from arango.aql import F, V !!def querying(a): for data in a.knows.query.over( F.PATHS("users", "knows", ~V("outbound")))\ .filter("obj.source.name == '{}'".format("user_5"))\ .result("obj.vertices[*].name")\ .execute(wrapper=lambda c, i: i): print (data) !!a = create(db="experiments") !querying(a)
Paths output
['user_5'] ['user_5', 'user_1'] ['user_5', 'user_6'] ['user_5', 'user_6', 'user_7'] ['user_5', 'user_6', 'user_7', 'user_8']
Links
• Arango paths: http://goo.gl/n2L3SK • Neo4j: http://goo.gl/au5y9I • Scraper: http://goo.gl/nvMFGk!• Visualiser: http://goo.gl/Rzdwci
Thanks. Q’s? !
@maxmaxmaxmax