HBaseCon 2012 | Storing and Manipulating Graphs in HBase
-
Upload
cloudera-inc -
Category
Technology
-
view
4.263 -
download
3
description
Transcript of HBaseCon 2012 | Storing and Manipulating Graphs in HBase
![Page 1: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/1.jpg)
Storing and Manipulating Graphs in HBase
@danklynn
![Page 2: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/2.jpg)
Keeps Contact Information Current and Complete
Based in Denver, Colorado
CTO & Co-Founder
![Page 3: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/3.jpg)
Turn Partial Contacts Into Full Contacts
![Page 4: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/4.jpg)
Refresher: Graph Theory
![Page 5: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/5.jpg)
Refresher: Graph Theory
![Page 6: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/6.jpg)
Refresher: Graph Theory
Vertex
![Page 7: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/7.jpg)
Refresher: Graph Theory
Edge
![Page 8: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/8.jpg)
Social Networks
![Page 9: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/9.jpg)
Tweets
@danklynn
@xorlev
“#HBase rocks”
author
follows
retweeted
![Page 10: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/10.jpg)
Web Links
http://fullcontact.com/blog/
http://techstars.com/
<a href=”...”>TechStars</a>
![Page 11: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/11.jpg)
Why should you care?
Vertex Influence- PageRank
- Social Influence
- Network bottlenecks
Identifying Communities
![Page 12: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/12.jpg)
Storage Options
![Page 13: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/13.jpg)
neo4j
![Page 14: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/14.jpg)
Very expressive querying(e.g. Gremlin)
neo4j
![Page 15: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/15.jpg)
Transactional
neo4j
![Page 16: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/16.jpg)
Data must fit on a single machine
neo4j
:-(
![Page 17: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/17.jpg)
FlockDB
![Page 18: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/18.jpg)
Scales horizontally
FlockDB
![Page 19: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/19.jpg)
Very fast
FlockDB
![Page 20: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/20.jpg)
No multi-hop query support
:-(
FlockDB
![Page 21: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/21.jpg)
RDBMS(e.g. MySQL, Postgres, et al.)
![Page 22: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/22.jpg)
Transactional
RDBMS
![Page 23: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/23.jpg)
Huge amounts of JOINing
RDBMS
:-(
![Page 24: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/24.jpg)
![Page 25: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/25.jpg)
Massively scalable
HBase
![Page 26: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/26.jpg)
Data model well-suited
HBase
![Page 27: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/27.jpg)
Multi-hop querying?
HBase
![Page 28: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/28.jpg)
Modeling Techniques
![Page 29: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/29.jpg)
1
2
3
Adjacency Matrix
![Page 30: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/30.jpg)
Adjacency Matrix
0 1 1
1 0 1
1 1 0
1 2 3
1
2
3
![Page 31: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/31.jpg)
Adjacency Matrix
Can use vectorized libraries
![Page 32: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/32.jpg)
Adjacency Matrix
Requires O(n2) memory n = number of vertices
![Page 33: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/33.jpg)
Adjacency Matrix
Hard(er) to distribute
![Page 34: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/34.jpg)
1
2
3
Adjacency List
![Page 35: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/35.jpg)
Adjacency List
1 2,3
2 1,3
3 1,2
![Page 36: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/36.jpg)
Adjacency List Design in HBase
t:danklynn
p:+13039316251
![Page 37: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/37.jpg)
Adjacency List Design in HBase
e:[email protected] p:+13039316251= ...
t:danklynn= ...
p:+13039316251
t:danklynn= ...
e:[email protected]= ...
row key “edges” column family
t:danklynn e:[email protected]= ...
p:+13039316251= ...
![Page 38: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/38.jpg)
Adjacency List Design in HBase
e:[email protected] p:+13039316251= ...
t:danklynn= ...
p:+13039316251
t:danklynn= ...
e:[email protected]= ...
row key “edges” column family
t:danklynn e:[email protected]= ...
p:+13039316251= ...
What to
store?
![Page 39: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/39.jpg)
Custom Writables
package org.apache.hadoop.io;
public interface Writable { void write(java.io.DataOutput dataOutput); void readFields(java.io.DataInput dataInput);}
java
![Page 40: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/40.jpg)
Custom Writables
class EdgeValueWritable implements Writable { EdgeValue edgeValue
void write(DataOutput dataOutput) { dataOutput.writeDouble edgeValue.weight }
void readFields(DataInput dataInput) { Double weight = dataInput.readDouble() edgeValue = new EdgeValue(weight) }
// ...}
groovy
![Page 41: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/41.jpg)
Don’t get fancy with byte[]
class EdgeValueWritable implements Writable { EdgeValue edgeValue
byte[] toBytes() { // use strings if you can help it}
static EdgeValueWritable fromBytes(byte[] bytes) { // use strings if you can help it}
}groovy
![Page 42: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/42.jpg)
Querying by vertex
def get = new Get(vertexKeyBytes)get.addFamily(edgesFamilyBytes)
Result result = table.get(get);result.noVersionMap.each {family, data ->
// construct edge objects as needed// data is a Map<byte[],byte[]>
}
![Page 43: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/43.jpg)
Adding edges to a vertex
def put = new Put(vertexKeyBytes)
put.add( edgesFamilyBytes, destinationVertexBytes, edgeValue.toBytes() // your own implementation here)
// if writing directlytable.put(put)
// if using TableReducercontext.write(NullWritable.get(), put)
![Page 44: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/44.jpg)
Distributed Traversal / Indexing
t:danklynn
p:+13039316251
![Page 45: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/45.jpg)
Distributed Traversal / Indexing
t:danklynn
p:+13039316251
![Page 46: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/46.jpg)
Distributed Traversal / Indexing
t:danklynn
p:+13039316251
Pivot vertex
![Page 47: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/47.jpg)
Distributed Traversal / Indexing
t:danklynn
p:+13039316251
MapReduce over outbound edges
![Page 48: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/48.jpg)
Distributed Traversal / Indexing
t:danklynn
p:+13039316251
Emit vertexes and edge data grouped by the pivot
![Page 49: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/49.jpg)
Distributed Traversal / Indexing
t:danklynn
p:+13039316251Reduce key
“Out” vertex
“In” vertex
![Page 51: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/51.jpg)
Distributed Traversal / Indexing
Iteration 0
![Page 52: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/52.jpg)
Distributed Traversal / Indexing
Iteration 1
![Page 53: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/53.jpg)
Distributed Traversal / Indexing
Iteration 2
![Page 54: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/54.jpg)
Distributed Traversal / Indexing
Iteration 2
Reuse edges created during previous iterations
![Page 55: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/55.jpg)
Distributed Traversal / Indexing
Iteration 3
![Page 56: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/56.jpg)
Distributed Traversal / Indexing
Iteration 3
Reuse edges created during previous iterations
![Page 57: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/57.jpg)
Distributed Traversal / Indexing
hops requires only
iterations
![Page 58: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/58.jpg)
Tips / Gotchas
![Page 59: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/59.jpg)
Do implement your own comparator
java
public static class Comparator extends WritableComparator {
public int compare( byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { // ..... }
}
![Page 60: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/60.jpg)
Do implement your own comparator
java
static { WritableComparator.define(VertexKeyWritable, new VertexKeyWritable.Comparator())}
![Page 61: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/61.jpg)
MultiScanTableInputFormat
MultiScanTableInputFormat.setTable(conf,"graph");
MultiScanTableInputFormat.addScan(conf, new Scan());
job.setInputFormatClass(MultiScanTableInputFormat.class);
java
![Page 62: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/62.jpg)
TableMapReduceUtil
TableMapReduceUtil.initTableReducerJob("graph", MyReducer.class, job);
java
![Page 63: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/63.jpg)
Elastic MapReduce
![Page 64: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/64.jpg)
Elastic MapReduce
HFiles
![Page 65: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/65.jpg)
Elastic MapReduce
HFiles
SequenceFiles
Copy to S3
![Page 66: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/66.jpg)
Elastic MapReduce
HFiles
SequenceFiles SequenceFiles
Copy to S3 Elastic MapReduce
![Page 67: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/67.jpg)
Elastic MapReduce
HFiles
SequenceFiles SequenceFiles
Copy to S3 Elastic MapReduce
![Page 68: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/68.jpg)
Elastic MapReduce
HFiles
SequenceFiles SequenceFiles
HFiles
Copy to S3 Elastic MapReduce
HFileOutputFormat.configureIncrementalLoad(job, outputTable)
![Page 69: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/69.jpg)
Elastic MapReduce
HFiles
SequenceFiles SequenceFiles
HFiles HBase
Copy to S3 Elastic MapReduce
HFileOutputFormat.configureIncrementalLoad(job, outputTable)
$ hadoop jar hbase-VERSION.jar completebulkload
![Page 70: HBaseCon 2012 | Storing and Manipulating Graphs in HBase](https://reader037.fdocuments.in/reader037/viewer/2022102608/5565773dd8b42a95028b4c6f/html5/thumbnails/70.jpg)
Additional Resources
Google Pregel: BSP-based graph processing system
Apache Giraph: Implementation of Pregel for Hadoop
MultiScanTableInputFormat: (code to appear on GitHub)
Apache Mahout - Distributed machine learning on Hadoop