YERBA MATE LOS COMPONENTES DE MATE. Alguien ha oído sobre el mate?
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining
description
Transcript of Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining
![Page 1: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/1.jpg)
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining
Wei Jiang and Gagan Agrawal
![Page 2: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/2.jpg)
Outline
April 21, 20232
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 3: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/3.jpg)
Outline
April 21, 20233
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 4: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/4.jpg)
April 21, 20234
Map-Reduce Simple API : map and reduce
Easy to write parallel programs Fault-tolerant for large-scale data centers
Performance? Always a concern for HPC community
Generalized Reduction First proposed in FREERIDE that was developed at Ohio
State 2001-2003 Shared a similar processing structure
The key difference lies in a programmer-managed reduction-object
Better performance?
Background (I)
![Page 5: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/5.jpg)
April 21, 20235
Map-Reduce Execution
![Page 6: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/6.jpg)
Comparing Processing Structures
6
• Reduction Object represents the intermediate state of the execution• Reduce func. is commutative and associative• Sorting, grouping.. .overheads are eliminated with red. func/obj.
April 21, 2023
![Page 7: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/7.jpg)
Our Previous Work A comparative study between FREERIDE and
Hadoop: FREERIDE outperformed Hadoop with factors of 5 to 10 Possible reasons:
Java VS C++? HDFS overheads? Inefficiency of Hadoop? API difference?
Developed MATE (Map-Reduce system with an AlternaTE API) on top of Phoenix from Stanford Adopted Generalized Reduction Focused on API differences MATE improved Phoenix with an average of 50%
Avoids large set of intermediate pairs between Map & Reduce Reduces memory requirements
April 21, 20237
![Page 8: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/8.jpg)
Extending MATE Main issues of the original MATE:
Only works on a single multi-core machine Datasets should reside in memory Assumes the reduction object MUST fit in memory
This paper extended MATE to address these limitations Focus on graph mining: an emerging class of apps
Require large-sized reduction objects as well as large-scale datasets
E.g., PageRank could have a 8GB reduction object! Support of managing arbitrary-sized reduction objects
Also reading disk-resident input data Evaluated Ex-MATE using PEGASUS
PEGASUS: A Hadoop-based graph mining system
April 21, 20238
![Page 9: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/9.jpg)
Outline
April 21, 20239
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 10: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/10.jpg)
April 21, 202310
System Design and Implementation System design of Ex-MATE
Execution overview Support of distributed environments
System APIs in Ex-MATE One set provided by the runtime
operations on reduction objects Another set defined or customized by the users
reduction, combination, etc.. Runtime in Ex-MATE
Data partitioning Task scheduling Other low-level details
![Page 11: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/11.jpg)
April 21, 202311
Ex-MATE Runtime Overview Basic one-stage execution
![Page 12: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/12.jpg)
April 21, 202312
Implementation Considerations Support for processing very large datasets
Partitioning function: Partition and distribute to a number of nodes
Splitting function: Use the multi-core CPU on each node
Management of a large reduction-object (R.O.): Reduce disk I/O! Outputs (R.O.) are updated in a demand-driven way
Partition the reduction object into splits Inputs are re-organized based on data access
patterns Reuse a R.O. split as much as possible in memory
Example: Matrix-Vector Multiplication
![Page 13: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/13.jpg)
A MV-Multiplication Example
April 21, 202313
Output Vector
Input Vector
Input Matrix(1, 1)
(2, 1)
(1, 2)
![Page 14: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/14.jpg)
Outline
April 21, 202314
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 15: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/15.jpg)
GIM-V for Graph Mining (I) Generalized Iterative Matrix-Vector
Multiplication(GIM-V) Proposed at CMU at first Similar to the common MV Multiplication
MV Mul. : Three operations in
GIM-V: combine m(i, j) and v(j) :
Not have to be a multiplication combineAll n partial results for the element i :
Not have to be the sum assign v(new) to v(i) :
The previous value of v(i) is updated by a new value
April 21, 202315
Multiplication
Sum
Assignment
![Page 16: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/16.jpg)
GIM-V for Graph Mining (II) A set of graph mining applications can fit
into this GIM-V PageRank, Diameter Estimation, Finding
Connected Components, Random Walk with Restart, etc..
Parallelization of GIM-V: Use Map-Reduce in PEGASUS
A two-stage algorithm: two consecutive map-reduce jobs
Use Generalized Reduction in Ex-MATE A one-stage algorithm: simpler code
April 21, 202316
![Page 17: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/17.jpg)
GIM-V Example: PageRank PageRank is used by Google to calculate the
relative importance of web-pages: Direct implementation of GIM-V: v(j) is the ranking
value The three customized operations are:
April 21, 202317
Multiplication
Sum
Assignment
![Page 18: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/18.jpg)
GIM-V: Other Algorithms Diameter Estimation: HADI is an algorithm to
estimate the diameter of a given graph The three customized operations are:
Finding Connected Components: HCC is a new algorithm to find the connected components of large graphs The three customized operations are:
April 21, 202318
Multiplication
Bitwise-or
Bitwise-or
Multiplication
Minimal
Minimal
![Page 19: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/19.jpg)
Parallelization of GIM-V (I) Using Map-Reduce: Stage I
Map:
April 21, 202319
Map M(i,j) and V(j) to reducer j
![Page 20: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/20.jpg)
Parallelization of GIM-V (II) Using Map-Reduce: Stage I (cont.)
Reduce:
April 21, 202320
Map “combine2(M(i,j) , V(j)) “to reducer i
![Page 21: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/21.jpg)
Parallelization of GIM-V (III) Using Map-Reduce: Stage II
Map:
April 21, 202321
![Page 22: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/22.jpg)
Parallelization of GIM-V (IV) Using Map-Reduce: Stage II (cont.)
Reduce:
April 21, 202322
![Page 23: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/23.jpg)
Parallelization of GIM-V (V) Using Generalized Reduction in Ex-MATE:
Reduction:
April 21, 202323
![Page 24: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/24.jpg)
Parallelization of GIM-V (VI) Using Generalized Reduction in Ex-MATE:
Finalize:
April 21, 202324
![Page 25: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/25.jpg)
Outline
April 21, 202325
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 26: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/26.jpg)
April 21, 202326
Applications: Three graph mining algorithms:
PageRank, Diameter Estimation, and Finding Connected Components
Evaluation: Performance comparison with PEGASUS
PEGASUS provides a naïve version and an optimized version
Speedups with an increasing number of nodes Scalability speedups with an increasing size of
datasets Experimental platform:
A cluster of multi-core CPU machines Used up to 128 cores (16 nodes)
Experiments Design
![Page 27: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/27.jpg)
April 21, 202327
Results: Graph Mining (I) PageRank: 16GB dataset; a graph of 256
million nodes and 1 billion edgesA
vg
. Tim
e P
er
Itera
tion
(m
in)
# of nodes
10.0 speedup
![Page 28: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/28.jpg)
April 21, 202328
Results: Graph Mining (II) HADI: 16GB dataset; a graph of 256 million
nodes and 1 billion edgesA
vg
. Tim
e P
er
Itera
tion
(m
in)
# of nodes
11.0 speedup
![Page 29: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/29.jpg)
April 21, 202329
Results: Graph Mining (III) HCC: 16GB dataset; a graph of 256 million
nodes and 1 billion edgesA
vg
. Tim
e P
er
Itera
tion
(m
in)
# of nodes
9.0 speedup
![Page 30: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/30.jpg)
April 21, 202330
Scalability: Graph Mining (IV) HCC: 8GB dataset; a graph of 256 million
nodes and 0.5 billion edgesA
vg
. Tim
e P
er
Itera
tion
(m
in)
# of nodes
1.7 speedup
1.9 speedup
![Page 31: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/31.jpg)
April 21, 202331
Scalability: Graph Mining (V) HCC: 32GB dataset; a graph of 256 million
nodes and 2 billion edgesA
vg
. Tim
e P
er
Itera
tion
(m
in)
# of nodes
1.9 speedup
2.7 speedup
![Page 32: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/32.jpg)
April 21, 202332
Scalability: Graph Mining (VI) HCC: 64GB dataset; a graph of 256 million
nodes and 4 billion edgesA
vg
. Tim
e P
er
Itera
tion
(m
in)
# of nodes
1.9 speedup
2.8 speedup
![Page 33: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/33.jpg)
Observations
April 21, 202333
Performance trends are similar for all three applications Consistent with the fact that all three applications
are implemented using the GIM-V method Ex-MATE outperforms PEGASUS significantly
for all three graph mining algorithms Reasonable speedups for different datasets Better scalability for larger datasets with a
increasing number of nodes
![Page 34: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/34.jpg)
Outline
April 21, 202334
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 35: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/35.jpg)
Related Work: Academia
April 21, 202335
Evaluation of Map-Reduce-like models in various parallel programming environments: Phoenix-rebirth for large-scale multi-core machines Mars for a single GPU MITHRA for GPGPUs in heterogeneous platforms Recent IDAV for GPU clusters
Improvement of Map-Reduce API: Integrating pre-fetch and pre-shuffling into Hadoop Supporting online queries Enforcing a less restrictive synchronization
semantics between Map and Reduce
![Page 36: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/36.jpg)
Related Work: Industry
April 21, 202336
Google’s Pregel System: Map-reduce may not so suitable for graph
operations Proposed to target graph processing Open source version: HAMA project in Apache
Variants of Map-Reduce: Dryad/DryadLINQ from Microsoft Sawzall from Google Pig/Map-Reduce-Merge from Yahoo! Hive from Facebook
![Page 37: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/37.jpg)
Outline
April 21, 202337
Background System Design of Ex-MATE Parallel Graph Mining with Ex-MATE Experiments Related Work Conclusion
![Page 38: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/38.jpg)
April 21, 202338
Conclusion Ex-MATE supports the management of
reduction objects of arbitrary sizes Deals with disk-resident reduction objects
Outperforms PEGASUS for both the naïve and optimized implementations for all three graph mining application Has a simpler code
Offers a promising alternative for developing efficient data-intensive applications, Uses GIM-V for parallelizing graph mining
![Page 39: Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining](https://reader036.fdocuments.in/reader036/viewer/2022062423/56814833550346895db55423/html5/thumbnails/39.jpg)
39
Thank You, and Acknowledgments Questions and comments
Wei Jiang - [email protected] Gagan Agrawal - [email protected]
This project was supported by: