SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.
-
Upload
wilfred-day -
Category
Documents
-
view
214 -
download
0
description
Transcript of SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.
![Page 1: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/1.jpg)
SALSA
Harp: Collective Communication on Hadoop
Judy Qiu, Indiana University
![Page 2: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/2.jpg)
SALSA
Prof. David CrandallComputer Vision
Prof. Filippo MenczerComplex Networks and Systems
Bingjing Zhang
AcknowledgementXiaoming Gao Stephen Wu
Thilina Gunarathne Yuan Young
Prof. Haixu TangBioinformatics
SALSA HPC Group http://salsahpc.indiana.edu
School of Informatics and ComputingIndiana University
Zhenghao Gu
Prof. Madhav MarathNetwork Science and HCI
Prof. Andrew NgMachine Learning
![Page 3: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/3.jpg)
SALSA
Machine Learning on Big Data
• Mahout on Hadoop• https://mahout.apache.org/
• MLlib on Spark• http://spark.apache.org/mllib/
• GraphLab Toolkits• http://graphlab.org/projects/toolkits.html• GraphLab Computer Vision Toolkit
Extracting Knowledge with Data Analytics
![Page 4: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/4.jpg)
MapReduce ModelDAG Model Graph Model BSP/Collective Model
Storm
TwisterFor Iterations/Learning
For Streaming
For Query
S4
Drill
HadoopMPI
Dryad/DryadLINQ Pig/PigLatin
Spark
Shark
Spark Streaming
MRQL
HiveTez
GiraphHama
GraphLab
HarpGraphX
HaLoop
Samza
The World of Big Data Tools
StratosphereReef
Do we need 140 software packages?
![Page 5: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/5.jpg)
SALSA
Programming Runtimes
High-level programming models such as MapReduce adopt a data-centered designComputation starts from dataSupport moving computation to dataShows promising results for data-intensive computing( Google, Yahoo, Amazon, Microsoft …)
Challenges: traditional MapReduce and classical parallel runtimes cannot solve iterative algorithms efficiently
Hadoop: repeated data access to HDFS, no optimization to (in memory) data caching and (collective) intermediate data transfers MPI: no natural support of fault tolerance; programming interface is complicated
MPI, PVM, Hadoop MapReduce
Chapel, X10,HPF
Classic Cloud: Queues, Workers
DAGMan, BOINC
Workflows, Swift, Falkon
PaaS:Worker Roles
Perform Computations EfficientlyAchieve Higher Throughput
Pig Latin, Hive
![Page 6: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/6.jpg)
SALSA
(a) Map Only(Pleasingly Parallel)
(b) ClassicMapReduce
(c) Iterative MapReduce
(d) Loosely Synchronous
- CAP3 Gene Analysis- Smith-Waterman
Distances- Document conversion
(PDF -> HTML)- Brute force searches in
cryptography- Parametric sweeps- PolarGrid MATLAB data
analysis
- High Energy Physics (HEP) Histograms
- Distributed search- Distributed sorting- Information retrieval- Calculation of Pairwise
Distances for sequences (BLAST)
- Expectation maximization algorithms
- Linear Algebra- Data mining, includes
K-means clustering - Deterministic
Annealing Clustering- Multidimensional
Scaling (MDS) - PageRank
Many MPI scientific applications utilizing wide variety of communication constructs, including local interactions- Solving Differential
Equations and particle dynamics with short range forces
Pij
Collective Communication MPI
Input
Output
mapInput
map
reduce
Inputmap
iterations
No Communication
reduce
Applications & Different Interconnection Patterns
Domain of MapReduce and Iterative Extensions
![Page 7: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/7.jpg)
SALSA
Iterative MapReduce
• Mapreduce is a Programming Model instantiating the paradigm of bringing computation to data
• Iterative Mapreduce extends Mapreduce programming model and support iterative algorithms for Data Mining and Data Analysis
• Is it possible to use the same computational tools on HPC and Cloud?• Enabling scientists to focus on science not programming distributed
systems
![Page 8: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/8.jpg)
SALSA
Data Analysis ToolsMapReduce optimized for iterative computations
Twister: the speedy elephant
In-Memory• Cacheable map/reduce tasks
Data Flow • Iterative• Loop Invariant • Variable data
Thread • Lightweight• Local aggregation
Map-Collective • Communication patterns optimized for large intermediate data transfer
Portability• HPC (Java)• Azure Cloud (C#)• Supercomputer (C++, Java)
Abstractions
![Page 9: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/9.jpg)
SALSA
Reduce (Key, List<Value>)
Map(Key, Value)
Loop Invariant DataLoaded only once
Faster intermediate data transfer mechanismCombiner
operation to collect all reduce
outputs
Cacheable map/reduce tasks
(in memory)
Configure()
Combine(Map<Key,Value>)
Programming Model for Iterative MapReduce
Distinction on loop invariant data and variable data (data flow vs. δ flow)Cacheable map/reduce tasks (in-memory)Combine operation
Main Programwhile(..){ runMapReduce(..)}
Variable data
![Page 10: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/10.jpg)
SALSA10
Broadcast Comparison: Twister vs. MPI vs. Spark
At least a factor of 120 on 125 nodes, compared with the simple broadcast algorithm
The new topology-aware chain broadcasting algorithm gives 20% better performance than best C/C++ MPI methods (four times faster than Java MPJ) A factor of 5 improvement over non-optimized (for topology) pipeline-based method over 150 nodes.
Tested on IU Polar Grid with 1 Gbps Ethernet connection
High Performance Data Movement
![Page 11: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/11.jpg)
SALSA
Harp Map-Collective Communication Model
• Parallelism Model • Architecture
ShuffleM M M MCollective Communication
M M M M
R R
Map-Collective ModelMapReduce Model
YARN
MapReduce V2
Harp
MapReduce Applications
Map-Collective ApplicationsApplication
Framework
Resource Manager
We generalize the Map-Reduce concept to Map-Collective, noting that large collectives are a distinguishing feature of data intensive and data mining applications.
Hadoop Plugin (on Hadoop 1.2.1 and Hadoop 2.2.0)
![Page 12: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/12.jpg)
SALSA
Vertex Table
KeyValue Partition
Array
Commutable
Key-ValuesVertices, Edges, MessagesDouble Array
Int Array
Long Array
Array Partition < Array Type >
Struct Object
Vertex Partition
Edge Partition
Array Table <Array Type>
Message Partition
KeyValue Table
Byte Array
Message Table
EdgeTable
Broadcast, Send, Gather
Broadcast, Allgather, Allreduce, Regroup-(combine/reduce), Message-to-Vertex, Edge-to-Vertex
Broadcast, Send
Table
Partition
Basic Types
Hierarchical Data Abstraction and Collective Communication
![Page 13: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/13.jpg)
SALSA
K-means Clustering Parallel Efficiency
• Shantenu Jha et al. A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures. 2014.
![Page 14: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/14.jpg)
SALSA
0 20 40 60 80 100 120 1400.00
0.20
0.40
0.60
0.80
1.00
1.20
WDA-MDS Parallel Efficiency on Big Red II Nodes: 8, 16, 32, 64, 128, with 32 Cores per Node
JVM settings: -Xmx42000M -Xms42000M -XX:NewRatio=1 -XX:SurvivorRatio=18
100k 200k 300k 400k
WDA-MDS Performance on Big Red II
![Page 15: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/15.jpg)
SALSA
Data Intensive Kmeans Clustering─ Image Classification: 7 million images; 512 features per image; 1 million clusters 10K Map tasks; 64G broadcasting data (1GB data transfer per Map task node);20 TB intermediate data in shuffling.
![Page 16: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/16.jpg)
SALSA
• Provides system authors with a centralized (pluggable) control flow • Embeds a user-defined system controller called the Job Driver• Event driven control
• Package a variety of data-processing libraries (e.g., high-bandwidth shuffle, relational operators, low-latency group communication, etc.) in a reusable form.
• To cover different models such as MapReduce, query, graph processing and stream data processing
Apache Open Source Project
![Page 17: SALSASALSA Harp: Collective Communication on Hadoop Judy Qiu, Indiana University.](https://reader035.fdocuments.in/reader035/viewer/2022070611/5a4d1bb77f8b9ab0599cf29c/html5/thumbnails/17.jpg)
SALSA
• Research run times that will run Algorithms on a much larger scale
• Provide Data Service on Clustering and MDS Algorithms
Future Work