NoSQL War Stories preso: Hadoop and Neo4j for networks

13
TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21| 1299 6461 9318 38091|EGP|195.66.224.97|0|0||NAG|| AS1299 AS6461 AS9318 AS38091

Transcript of NoSQL War Stories preso: Hadoop and Neo4j for networks

Page 1: NoSQL War Stories preso: Hadoop and Neo4j for networks

TABLE_DUMP2|1332345590|B|195.66.224.97|1299|1.11.64.0/21|1299 6461 9318 38091|EGP|195.66.224.97|0|0||NAG||

AS1299

AS6461

AS9318

AS38091

Page 2: NoSQL War Stories preso: Hadoop and Neo4j for networks
Page 3: NoSQL War Stories preso: Hadoop and Neo4j for networks

http://www.cascading.org/

Page 4: NoSQL War Stories preso: Hadoop and Neo4j for networks

Every('nodes')[First[decl:'id', 'name']]

Hfs['TextDelimited[['id', 'name']]']['/tmp/nodes']']

[{2}:'id', 'name'][{2}:'id', 'name']

[tail]

[{2}:'id', 'name'][{2}:'id', 'name']

GroupBy('nodes')[by:['id']]

nodes[{1}:'id'][{2}:'id', 'name']

Each('nodes')[FilterPartialDuplicates[decl:'id', 'name']]

[{2}:'id', 'name'][{2}:'id', 'name']

Each('nodes')[PathToNodes[decl:'id', 'name']]

[{2}:'id', 'name'][{2}:'id', 'name']

GlobHfs[/Users/friso/Downloads/bview/alltxt.txt]

[{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'][{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator']

Each('edges')[PathToEdges[decl:'from', 'to', 'updatecount']]

[{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator'][{14}:'proto', 'time', 'type', 'peerip', 'peeras', 'prefix', 'path', 'origin', 'nexthop', 'localpref', 'MED', 'community', 'AAGG', 'aggregator']

Every('edges')[Sum[decl:'updatecount'][args:1]]

Hfs['TextDelimited[['from', 'to', 'updatecount']]']['/tmp/edges']']

[{3}:'from', 'to', 'updatecount'][{3}:'from', 'to', 'updatecount']

[{3}:'from', 'to', 'updatecount'][{3}:'from', 'to', 'updatecount']

GroupBy('edges')[by:['from', 'to']]

edges[{2}:'from', 'to'][{3}:'from', 'to', 'updatecount']

[{3}:'from', 'to', 'updatecount'][{3}:'from', 'to', 'updatecount']

[head]

Page 5: NoSQL War Stories preso: Hadoop and Neo4j for networks
Page 7: NoSQL War Stories preso: Hadoop and Neo4j for networks

http://bit.ly/IzWvcT http://bit.ly/HHNNIband

Page 8: NoSQL War Stories preso: Hadoop and Neo4j for networks

http://neo4j.org/

http://thejit.org/

Page 9: NoSQL War Stories preso: Hadoop and Neo4j for networks

org.neo4j.kernel.impl.batchinsert.BatchInserterorg.neo4j.graphdb.index.BatchInserterIndexProvider

30M nodes + 250M edges, < 20 minutes

Page 10: NoSQL War Stories preso: Hadoop and Neo4j for networks
Page 11: NoSQL War Stories preso: Hadoop and Neo4j for networks
Page 12: NoSQL War Stories preso: Hadoop and Neo4j for networks
Page 13: NoSQL War Stories preso: Hadoop and Neo4j for networks

• No SQL was used throughout the entire codebase• (Even though it was tempting to use Hive at one point)

• You can find code here: https://github.com/friso/graphs

• You can find me on Twitter here: @fzk

• You can find me on e-mail here: [email protected]