Abstract. Pose Graph Optimization arXiv:1907.05538v2 [cs ...
Optimization of graph storage using GoFFish
-
Upload
anushree-prasanna-kumar -
Category
Data & Analytics
-
view
56 -
download
2
Transcript of Optimization of graph storage using GoFFish
![Page 1: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/1.jpg)
GoFFish: A Sub-Graph Centric Framework for
Large-Scale Graph Analytics
Guide: Prof. Dinkar Sitaram Department of Computer Science and Engineering PES Institute of Technology
Ms. Prafullata Department of Computer Science and Engineering PES Institute of Technology
Guide: Prof. Yogesh Simmhan Department of SERC Indian Institute of Science
Team Members: Anushree P K 1PI11IS017 Bhavani B 1PI11IS027 Mithilesh K G 1PI11IS059
![Page 2: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/2.jpg)
What is GoFFISH?
GoFFish is a scalable software framework for storing graphs, and composing and executing graph
analytics in a Cloud and commodity cluster environment
It consists of :-
1. GoFS – It is a distributed store for partitioning, storing and accessing graph datasets across hosts in a cluster.
2. Gopher - Gopher is a programming framework that offers sub-graph centric abstractions on a a Cloud or cluster in conjunction with GoFS.
GoFFish is implemented in Java.
![Page 3: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/3.jpg)
Existing GoFFISH Storage Architecture
Worker 2 Worker 3
Worker 1 + Head Node
Partition 2
Partition 1
Partition 3
t0 – t10
Slice 1 Slice 1
Same Storage format in worker 1
• Large graph partitioned into subgraphs and distributed across workers.
• GoFFISH default storage 10 bins in a slice. 10 instances of every subgraph.
![Page 4: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/4.jpg)
Using the GoFFish framework: To store the real time graphs in – temporal and spatial formats To compute the efficiencies of both storage formats based on the input algorithm (gopher job)
Intuition :-
The time taken for a gopher job over large graphs should be optimized when the graphs are stored
In the given two formats depending on the algorithm run.
For example for vertex count algorithm temporal format should take lessar computation time
Problem Definition
![Page 5: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/5.jpg)
GoFFISH Storage Architecture – Our Model
Worker 2 Worker 3
Worker 1 + Head Node
Partition 2
Partition 1
Partition 3
t0 – tn
Slice 1
Same Storage format in worker 1
• All instances for a subgraph in one slice.
• One bin per slice.
Slice 2
Slice 3
Slice 4
![Page 6: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/6.jpg)
We used slicing pointers instancegroupingsize and numsubgraphbins to manipulate the way the graphs were partitioned in order to obtain the desired storage format. These slicing pointers correspond the temporal and subgraph bin packing schemes in GoFFISH.
![Page 7: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/7.jpg)
Algorithms • Vertex Count Each subgraph processor calculate number of vertices within a subgraph. They send messages to all other subgraphs with their count where each subgraph calculate the total using the messages received.
• Connected Components Each subgraph finds the smallest vertex id for each subgraph and propagate that smallest value to its connected subgraphs. If incoming value to a subgraph is different from its current value it updates the current value and propagate the changes to its neighbours.
![Page 8: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/8.jpg)
Flow Diagram
![Page 9: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/9.jpg)
Dataset :- Road network graph Vehicle route tracking using traffic cams –Time-series graph of sync camera snapshots –Sensors are vertices –Edges are road connectivity w/ distance weight Graph instance is image metadata every N sec –License plate, vehicle color, direction, speed Urban
Dataset
![Page 10: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/10.jpg)
0
2000
4000
6000
8000
10000
12000
0 1 2 3 4
TIm
e T
ake
n
Partition Id
Partition-Wise Total App Time
S=10,t=10
S=1,T=ALL
4200
4250
4300
4350
4400
4450
0 1 2 3 4
TIm
e T
ake
n
Partition Id
Partition-Wise Total App Time
S=10,t=10
S=1,T=ALL
Vertex Count Connected Components
Performance Analysis
![Page 11: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/11.jpg)
Partition 1
0
500
1000
1500
2000
2500
3000
3500
4000
4500
0 1 2 3 4
Tim
e T
ake
n
Superstep
Superstep- wise Compute Task Time
VC(s=10,t=10)
VC(s=1,t=ALL)
0
50
100
150
200
250
300
350
400
0 0.5 1 1.5 2 2.5 3 3.5
Tim
e T
ake
n
Superstep
Superstep- wise Compute Task Time
CC(s=10,t=10)
CC(s=1,t=ALL)
Vertex Count Connected Components
Performance Analysis
![Page 12: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/12.jpg)
DEMO + Loggers
![Page 13: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/13.jpg)
Screenshots
![Page 14: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/14.jpg)
Screenshots
![Page 15: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/15.jpg)
GoFFish: A Sub-Graph Centric Framework for Large-Scale Graph Analytics - Indian
Institute of Science, Bangalore 560012 India, University of Southern California, Los
Angeles CA 90089 USA, November 26, 2013 Scalable Analytics over Distributed Time-series Graphs using GoFFish - Indian
Institute of Science, Bangalore 560012 India, University of Southern California, Los
Angeles CA 90089 USA, June 23, 2014 Chronos: A Graph Engine for Temporal Graph Analysis - Tsinghua University,
University of Science and Technology of China, Microsoft Research
References
![Page 16: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/16.jpg)
The Team
Anushree Prasanna Kumar 8th Sem ISE
Bhavani B 8th Sem ISE
Mithilesh Kumar 8th Sem ISE
![Page 17: Optimization of graph storage using GoFFish](https://reader031.fdocuments.in/reader031/viewer/2022022411/58ec9b9d1a28aba0758b46a1/html5/thumbnails/17.jpg)
Thank you