GraphOps: A Dataflow Library for Graph Analytics...
Transcript of GraphOps: A Dataflow Library for Graph Analytics...
![Page 1: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/1.jpg)
GraphOps: A Dataflow Library for
Graph Analytics Acceleration
22 February 2016
FPGA 2016
Tayo Oguntebi*
Google, Inc.
Kunle Olukotun
Pervasive Parallelism Laboratory
Stanford University* Work done while the author
was at Stanford University
![Page 2: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/2.jpg)
2
Outline
The GraphOps Library
Locality-Optimized Graph Representation
Results and Conclusions
![Page 3: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/3.jpg)
3
The GraphOps Library
Optimized set of hardware blocks for executing common graph processing functions
Ease of use
Energy-efficiency
Betweenness Centrality
PageRank
Conductance
![Page 4: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/4.jpg)
4
Running Example: PageRankProcedure pagerank(){
Double diff;Int cnt = 0;Double N = G.NumNodes();G.pg_rank = 1 / N;Do {
diff = 0.0;Foreach (t: G.Nodes) {
Double val = (1-d) / N + d*Sum(w: t.InNbrs) {
w.pg_rank / w.OutDegree()} ;
diff += | val - t.pg_rank |;t.pg_rank <= val @ t;
}cnt++;
} While ((diff > e) && (cnt < max));}
Procedure pagerank(){
Double diff;Int cnt = 0;Double N = G.NumNodes();G.pg_rank = 1 / N;Do {
diff = 0.0;Foreach (t: G.Nodes) {
Double val = (1-d) / N + d*Sum(w: t.InNbrs) {
w.pg_rank / w.OutDegree()} ;
diff += | val - t.pg_rank |;t.pg_rank <= val @ t;
}cnt++;
} While ((diff > e) && (cnt < max));}
![Page 5: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/5.jpg)
5
The GraphOps Library
ForAllPropRdr NbrPropRed ElemUpdate
DRAM Interface
![Page 6: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/6.jpg)
6
The GraphOps Library
DATA UTILITYCONTROL
ForAllPropRdr NbrPropRed ElemUpdate
AllNodePropRdr NbrPropRdr SetReader
SetWriter NbrPropFilter GlobNbrRed
VertexReader NbrSetReader
![Page 7: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/7.jpg)
7
Reduction
MutationProperty Filtering
Data Readers
The GraphOps Library
DATA UTILITYCONTROL
ForAllPropRdr
NbrPropRed
ElemUpdate
AllNodePropRdr NbrPropRdr
SetReader
SetWriterNbrPropFilter
GlobNbrRed
VertexReader NbrSetReader
Set Manipulation
![Page 8: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/8.jpg)
8
The GraphOps Library
Set of optimized hardware blocks for executing common graph processing functions
High-level: Easy to use
Composable: Flexible enough to compose different applications
Extensible and parameterizable
Pre-verified: Low-level implementation details built-in to the design
Problem: Poor Locality Poor Performance!
![Page 9: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/9.jpg)
9
Outline
The GraphOps Library
Locality-Optimized Graph Representation
Results and Conclusions
![Page 10: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/10.jpg)
10
Rethinking the Graph Representation
p0 p1 p2 p3 p4 p5 p6Property Array
(e.g. Pagerank scores)
Edge Array 2 3 4 6 5 0 6 5 4 1 1 3 5 2 3 4 2 3
Conventional Form: Compressed Sparse Row (Adjacency Lists)
3
14
6
52
0
No locality!
0 1 3 6 10 13 16 18Node Array
Node Indices 0 1 2 3 4 5 6
![Page 11: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/11.jpg)
11
Rethinking the Graph Representation
0 1 3 6 10 13 16 18
Edge Array
Node Array
p0 p1 p2 p3 p4 p5 p6Property Array
(e.g. Pagerank scores)
2 3 4 6 5 0 6 5 4 1 1 3 5 2 3 4 2 3
Locality-Optimized Form
3
14
6
52
0
p2 p3 p4 p0 p5 p6 p1 p4 p5 p6 p1 p3 p5 p2 p3 p4 p2 p3Locality-Optimized
Array
Node Indices 0 1 2 3 4 5 6
Trades off compactness for locality…Space for time
![Page 12: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/12.jpg)
12
Pre-Processing the Layout
We have locality…now need to restore consistency
p0 p1 p2 p3 p4 p5 p6Property Array (e.g. Pagerank)
p2 p3 p4 p0 p5 p6 p1 p4 p5 p6 p1 p3 p5 p2 p3 p4 p2 p3Locality-Optimized
Array
ProcessGraphLayout(): Scatter operation
Performed on the host
“The cheapest decent memory controller that you can buy is still an Intel Xeon CPU…” – Prof. Christos Kozyrakis
![Page 13: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/13.jpg)
13
Programming Model
Graph* g;GenerateGraph(g);
PreprocessGraphLayout(); // Prepare locality-optimized form
do {
WriteToDeviceMem();Run();ReadFromDeviceMem();
ProcessGraphLayout(); // i.e. scatter
} while (not converged);
![Page 14: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/14.jpg)
14
Outline
The GraphOps Library
Locality-Optimized Graph Representation
Results and Conclusions
![Page 15: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/15.jpg)
15
Energy Efficiency (Throughput / Watt)
0
1
2
3
4
5
6
7
8
512K 1M 2M 4M 8M 16M
Ener
gy E
ffic
ien
cy (
MEP
S/W
)
Graph Size (N)
Efficiency: SpMV
GraphOps
SW 1
SW 8
GraphOps+Scatter
Uniform graph. Avg degree 8.
![Page 16: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/16.jpg)
16
Thank You
• Details and full results in the paper
• Questions: Find me during the break / poster session.
• Complete library open-sourced (MIT License) and available at:
https://github.com/tayo/GraphOps
![Page 17: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/17.jpg)
17
SupplementaryMaterial
![Page 18: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/18.jpg)
18
ForAll Property ReaderForAllPropRdr
![Page 19: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/19.jpg)
19
Evaluation Platforms
• Intel Xeon 5650 @ 2.7GHz
• 2 sockets, 12 cores, 24 threads
• Bandwidth: 32 GB/s per socket
• 3 Memory Channels
• FPGA: Xilinx Virtex-6 (150MHz)
• Connected to host via PCIex8 Gen 2
• Bandwidth: 38.4 GB/s
PageRank System
ForAllPropRdr NbrPropRed ElemUpdate
DRAM
Vertices
Constraining Factors
Locality
• Optimal: Sequential access
• Using: Alternating reads to the different
arrays. All units operating simultaneously
Packet size
• Optimal: 384 bytes x 4
• Using: 192 bytes x 2
Burst size
• Optimal: as large as possible (max 256)
• Using: usually 1-2 (enough for a nbr set)
Bandwidth Usage
Pagerank scores
(Locality-Optimized)
Updated
Pageranks
2%
93%
2%2%
0%
20%
40%
60%
80%
100%
120%PageRank Bandwidth Usage
Prop Array(Write)
Prop Array(Read)
L-O Array(Read)
Node Array(Read)
Effective performance is about 1/6 of what bandwidth allows
Single-memory channel
L-O array access has to wait on others
Vertices
(Fully-Used)
Pagerank scores
(Not Fully-Used)
Updated
Pageranks
(Fully-Used)
![Page 20: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/20.jpg)
20
The GraphOps Library: Utility Blocks
ForAllPropRdr NbrPropRed ElemUpdate
DRAM Interface
MemUnit MemUnit MemUnitMemUnit
EndSignalINT
DoneDone Done
![Page 21: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/21.jpg)
21
Neighbor Property ReducerNbrPropRed
![Page 22: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/22.jpg)
22
Element UpdateElemUpdate
![Page 23: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/23.jpg)
23
X-Stream: Streaming Graphs on CPUs
Graph processing system using commodity hardware
Sequentially streams entire edge lists, generates updates on active edges
Designed to take advantage of sequential memory –absolutely no memory lookups necessary
![Page 24: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/24.jpg)
24
X-Stream Comparison: Datasets
Datasets
Name Nodes Edges Description
amazon0601 475K 3.4M Amazon product co-purchasing network from June 1 2003
cit-Patents 3.8M 16.5M Citation network among US Patents
wiki-Talk 2.4M 5M Wikipedia talk (communication) network
web-BerkStan 685K 7.6M Web graph of Berkeley and Stanford
soc-Pokec 1.6M 30.6M Pokec online social network
Datasets are courtesy of the Stanford SNAP project. snap.stanford.edu
![Page 25: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/25.jpg)
25
0100200300400500600700
Ru
n-t
ime
(m
s)
spmv
x-stream
graphops (total)
graphops (run-time)
graphops(scatter)
X-Stream Comparison
050
100150200250300350400
Ru
n-t
ime
(m
s)
conductance
x-stream
graphops (total)
graphops (run-time)
graphops (scatter)
0
500
1000
1500
2000
2500
Ru
n-t
ime
(m
s)
pagerank (5 iterations)
x-stream
graphops (total)
graphops (run-time)
graphops(scatter)
Power Comparison
X-Stream: 190 W (2 sockets, TDP)
GraphOps: ~25 W
![Page 26: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/26.jpg)
26
Potential Future Work
• Higher level synthesis tool to target the GraphOpslibrary
• Hide data transfer latency with double buffering and asynchronous execution
• Investigate locality-optimized storage for other sparse domains, e.g. machine learning
• Batch updates for host-side application
• Multi-FPGA
• Dynamic Graphs
![Page 27: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/27.jpg)
27
Memory Consistency
Single writer per array
If a GraphOps block is modifying an array, only that block may be simultaneously reading from the array
Replicated arrays are read-only. Updates are made to the standard property array.
Use a SCATTER operation at the end of the computation
![Page 28: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/28.jpg)
28
PageRank
2M nodes, 16M edges
OMP-C++, 4-thread
Current run-time: 3040
With replicated arrays: 1610
Advantage was erased with the scatter
Locality-Optimized Format: on CPUs
![Page 29: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/29.jpg)
29
Locality-Optimized Format: on CPUs
Colleague (Chris) has been working on graph storage formats
He attempted to implement my idea as part of a CPU run-time
The scatter nullifies the advantage of the coalesced memory accesses
![Page 30: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/30.jpg)
30
Bandwidth Study
Evaluation Platforms
• Intel Xeon 5650 @ 2.7GHz
• 2 sockets, 12 cores, 24 threads
• Bandwidth: 32 GB/s per socket
• 3 Memory Channels
• FPGA: Xilinx Virtex-6 (150MHz)
• Connected to host via PCIex8 Gen 2
• Bandwidth: 38.4 GB/s
2%
93%
2%2%
0%
20%
40%
60%
80%
100%
120%
Page Rank
Bandwidth Usage
Prop Array(Write)
Prop Array(Read)
Rep Array(Read)
Node Array(Read)
Streaming Architecture: Page Rank Accelerator
Vertex Reader
NbrReducer
Elem Updater
DRAM
VerticesPagerank data
(replicated)Updated Pageranks
Methodology
• Instrumented memory interface units with
counters
Line Bandwidth: 6.4 GB/s
Constraining Factors:
• Locality
• Optimal: Sequential access
• Using: Alternating reads to the different
arrays. All units operating simultaneously
• Packet size:
• Optimal: 384 bytes x 4
• Using: 192 bytes x 2
• Burst size:
• Optimal: as large as possible (max 256)
• Using: usually 1-2 (enough for nbr set)
0 1 3 6 10 13 16 18
p2 p3 p4 p0 p5 p6 p1 p4 p5 p6 p1 p3 p5 p2 p3 p4 p2 p3
Edge Array (not used)
Node Array
p0 p1 p2 p3 p4 p5 p6 p7Property Array
(Pagerank)
Replicated Array (Pagerank)
2 3 4 0 5 6 1 4 5 6 1 3 5 2 3 4 2 3
Memory Layout
![Page 31: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/31.jpg)
31
Bandwidth Efficiency
Streaming Architecture: Page Rank Accelerator
Vertex Reader
NbrReducer
Elem Updater
DRAM
Vertices
(Fully Utilized)
Replicated data
(Not fully used)
Updated Pageranks
(Fully Utilized)
Usage Calculations
• Replicated array requests (number of bursts):
1, 1, 1, 2, 1, 1, 1, 2, …
• Average number of bursts per request: 1.22
• Divided instrumented value of repl data divided by number of nodes
• Average number of bursts used per request: Assuming uniform with average
degree of 8: 8 nbrs is 0.25 bursts. So usage rate is: 0.25/1.22 = 0.205
• Expected nbr bandwidth is: 6.4 GB/s * 0.205 = 1.312 GB/s
• Peak performance of PageRank is: 36 MEPS == 216 MB/s
• About a factor of 1/6 of the expected performance
• Cause of performance being dropped on the floor: Single Memory Channel
• Queuing/Switching: Nbr Reducer has to wait on the other requests
using the memory channel concurrently and pay the cost of switching
the active bank/rank/columns etc
• Ideally: multiple memory channels. One of them dedicated to Replicated data
for streaming.
![Page 32: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/32.jpg)
32
Limitations of the GraphOps Library
• Limited expressability
• Limited portability
• Requires coalesced data for efficiency 1
– Common graph formats lead to highly inefficient memory behavior
1 Efficient Parallel Graph Exploration on Multi-core CPU and GPU. Hong, Oguntebi, et al.
![Page 33: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/33.jpg)
33
Streaming Processors
• Multiple “functional units” execute simultaneously• Each function performs a different task on the data
stream flowing through it• GraphOps blocks are implemented as coarse-grained
functions• More simple approach for end user: higher level
building blocks
Function 1 Function 2 Function 3
Function 4
![Page 34: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/34.jpg)
34
Disadvantages to Graph Replication
![Page 35: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/35.jpg)
35
Additional GraphOps diagrams
![Page 36: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/36.jpg)
36
Figures from FPGA Paper
![Page 37: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/37.jpg)
37
GraphOps are Parameterizable
![Page 38: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/38.jpg)
38
Edge Properties
• A logical way of describing the locality-optimized format
– Think of the data as being associated with an edge instead of a vertex
![Page 39: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/39.jpg)
39
Approach
• Initially started with a domain-specific HLS approach
• Was hoping to build full applications on the FPGA– Sensitive control was difficult / time-consuming to
generate automatically in hardware
– Especially without an ISA and full architecture
– Memory behavior was bad anyway
• Converted to an accelerator-based approach
![Page 40: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/40.jpg)
40
Real-world Dataset properties
• (from snap website)
![Page 41: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/41.jpg)
41
How Different from GPUs and CPU Vector Machines
![Page 42: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/42.jpg)
42
Scatter/Gather Options in HW
![Page 43: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/43.jpg)
We develop brand name with individual creative solutions and help our
customers to ear money
About Us
BUSINESS
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem. Architecto beatae vitae dicta.
ANALYTICS
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard
dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type
specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially
unchanged.
02
![Page 44: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/44.jpg)
We develop brand name with individual creative solutions and help
our customers to ear money
Marcus Lopez
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem.
DesignerMaría Castro
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem.
MarketingCarlos Perez
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem.
AnimationAntonio Ruiz
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem.
Sales Rep
Team Work Sample
03
![Page 45: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/45.jpg)
Contrary to popular. It has roots in a piece of classical Latin our process,
from start to finish.
Work Process Sample
2Discover & Plan
PageMaker including versions of Lorem Ipsum
PageMaker including versions.
nOrganize
PageMaker including versions of Lorem Ipsum
PageMaker including versions.
@Build & Deploy
PageMaker including versions of Lorem Ipsum
PageMaker including versions.
7
Grow & Measure
PageMaker including versions of Lorem Ipsum
PageMaker including versions.
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the
industry's standard dummy text ever since the 1500s, when an unknown.
04
![Page 46: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/46.jpg)
We develop brand name with individual creative solutions and help our
customers to ear money
INVESTMENT
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem. Architecto beatae vitae dicta.
CONSULTING
Services List Sample
TAX
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem. Architecto beatae vitae dicta.
STRATEGIESBROKER
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem. Architecto beatae vitae dicta.
COMPARISIONBUSINESS
Architecto beatae vitae dicta sunt explicabo nemo enim ipsam voluptatem. Architecto beatae vitae dicta.
ANALYTICS
05
![Page 47: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/47.jpg)
3 Columns Sample
We develop brand name with individual creative solutions and help our
customers to ear money
Lorem Ipsum is simply dummy text
of the printing and typesetting
industry. Lorem Ipsum has been the
industry's standard dummy text ever
since the 1500s,
when an unknown printer took a
galley of type and scrambled it to
make a type specimen book. It has
survived not only five centuries, but
also the leap into electronic
typesetting, remaining essentially
unchanged.
Lorem Ipsum is simply dummy text
of the printing and typesetting
industry. Lorem Ipsum has been the
industry's standard dummy text ever
since the 1500s, when an unknown
printer took a galley of type and
scrambled it to make a type
specimen book. It has survived not
only five centuries, but also the leap
into electronic typesetting, remaining
essentially unchanged.
Lorem Ipsum is simply dummy text
of the printing and typesetting
industry. Lorem Ipsum has been the
industry's standard dummy text ever
since the 1500s, when an unknown
printer took a galley of type and
scrambled it to make a type
specimen book.
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's
standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make
a type specimen book.
06
![Page 48: GraphOps: A Dataflow Library for Graph Analytics Accelerationisfpga.org/fpga2016/index_files/Slides/3_5.pdf · GraphOps: A Dataflow Library for Graph Analytics Acceleration 22 February](https://reader030.fdocuments.in/reader030/viewer/2022040308/5f07f4cf7e708231d41f9adf/html5/thumbnails/48.jpg)
2 Columns Sample
We develop brand name with individual creative solutions and help our
customers to ear money
Lorem Ipsum is simply dummy text of the printing and
typesetting industry. Lorem Ipsum has been the industry's
standard dummy text ever since the 1500s, when an
unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five
centuries, but also the leap into electronic typesetting,
remaining essentially unchanged.
Lorem Ipsum is simply dummy text of the printing and
typesetting industry. Lorem Ipsum has been the industry's
standard dummy text ever since the 1500s, when an
unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five
centuries, but also the leap into electronic typesetting,
remaining essentially unchanged.
Lorem Ipsum is simply dummy text of the printing and
typesetting industry. Lorem Ipsum has been the industry's
standard dummy text ever since the 1500s, when an
unknown printer took a galley of type and scrambled it to
make a type specimen book. It has survived not only five
centuries, but also the leap into electronic typesetting,
remaining essentially unchanged.
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's
standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make
a type specimen book.
07