HPC with Clouds and Cloud Technologies
-
Upload
inderjeet-singh -
Category
Technology
-
view
2.126 -
download
0
Transcript of HPC with Clouds and Cloud Technologies
High Performance Parallel Computing with
Clouds and Cloud TechnologiesJaliya Ekanayake and Geoffrey Fox
School of Informatics and Computing Indiana University Bloomington
Cloud Computing and Software Services: Theory and TechniquesJuly, 2010
Presented by:Inderjeet Singh
Introduction Problem Data Analysis Applications Evaluations and Analysis Performance of MPI on Clouds Benchmarks and Results Conclusions and Future Work Critique
Overview
Introduction
Clouds and Cloud Technologies
Apache Hadoop (OpenSource version of Google MapReduce)
DryadLINQ (Microsoft API for Dryad) CGL-MapReduce (Iterative version of MapReduce)
Cloud technologies/Parallel Runtimes/Cloud Runtimes
On demand provisioning of resources Customizable Virtual Machines (VM) Root privileges Provisioning is very fast (within minutes) You pay only for what you use Better resource utilization
Advantages of Cloud
Cloud Technologies Moving computation to data Better Quality of Service (QoS) Simple communication topologies Distributed file system (HDFS,GFS)
Most HPC applications are based upon MPI Many fine grained communication
topologies Usage of fast network
Features
Software framework to support distributed computing on large datasets on cluster of computers
Map step - The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node
Reduce step - The master node collects the answers to all the sub-problems and combines them in some way to form the output or answer
MapReduce
Large data/compute intensive applications
Traditional approach Execution on Clusters/grid/supercomputers Moving both application and data to available
computational power Efficiency decreases with large datasets
Better approach Execution with Cloud technologies Moving computations to data to perform processing More data centric approach
Comparisons of features supported by different cloud technologies and MPI
What applications are best handled by cloud technologies?
What overheads do they introduce? Can traditional parallel runtimes such as
MPI be used in cloud? If so, what overheads do they have?
Problem
Types of Applications (Based upon communication)
Map only (Cap3) Map Reduce (HEP) Iterative/Complex style (Matrix Multiplication and
K-Means Clustering)
Data Analysis Applications
Cap3 - Sequence assembly program that operates on a collection of gene sequence files to produce several outputs
HEP - High Energy Physics data analysis application
K-Means clustering - Performs iteratively refining computation of clusters
Matrix Multiplication – Cannon’s algorithm
Iterative/Complex Style
MapReduce does not support iterative/complex style applications so [Fox] build CGL- MapReduce
CGL-Mapreduce – Supports long running tasks and retains static data in memory across invocations
Performance (average running time) Overhead = [P * T(P) – T(1)]/T(1)
P = No. of processes
Evaluation and Analysis
DryadLINQ
Hadoop/CGL MapReduce/MPI
CAP3 (map only) and HEP (mapreduce) perform well with cloud runtimes
K-means clustering (iterative) and matrix multiplications (iterative) show high overheads with cloud runtimes compared to MPI runtime
CGL-Mapreduce also gives less overhead for large datasets
Goals Overhead of Virtual Machines (VM) on parallel
applications in MPI How applications with different
communication/computation (c/c) ratio perform on cloud?
Effect of different CPU core assignment strategies on VMs and running these MPI applications on these VMs
Performance of MPI on Private Cloud
Three MPI applications with different c/c ratios requirements
Matrix multiplication (Cannon’s algorithm) K-Means clustering Concurrent wave solver
Computation and Communication complexities of the different MPI applications used
Eucalyptus and Xen based cloud infrastructure
16 nodes with 2 Quad Core Intel Xeon processors and 32 GB of memory
Nodes connected with 1 gigabit Ethernet connection Same s/w configuration for both bare-metal
nodes and VMs• OS - Red Hat Enterprise Linux Server release 5.2• OpenMP version 1.3.2
Benchmarks and Results
Different CPU core/virtual machines assignment strategies
Invariant to select the number of MPI processesNumber of MPI processes = Number of CPU cores used
Matrix Multiplication (Cannon’s)
◦ Speedup decrease 34% between Bare metal and 8-VM/node at 81 processes
◦ Exchange of large messages and more communication
Speedup – Fixed Matrix size (5184*5184)
Performance – 64 CPU Cores
K-Means Clustering
◦ Communication is very less than computations◦ Communication here depends upon number of clusters formed◦ Overhead is large for small data sizes, so less speedup is
observed
Total overhead (Number of MPI Processes =128)
Performance – 128 CPU Cores
Concurrent Wave Equation Solver
◦ Amount of communications is fixed, less data transfer rates◦ Lower c/c ratio of O(1/n) leads to more latency and lower
performance on VMs◦ 8-VMs per node has 7% more overhead than bare metal node
Performance – 128 CPU CoresTotal Overhead (Number of MPI Processes = 128)
◦ In multi VMs configuration scheduling of I/O operation of DomUs (user domains) happens via Dom0 (privileged OS)
Communication between dom0 and domUs when 1-VM per node is deployed (top). Communication between dom0 and domUs when 8-VMs per node are deployed (bottom)
When using mutliple VMs on multi-core CPUs, it is good to use runtimes supporting in-node communications (OpenMP vs LAM-MPI)
Figure: LAM vs. OpenMPI in different VM configurations
Cloud runtimes work well for pleasing parallel (map only and mapreduce) applications with large datasets
Overheads of cloud runtimes are high with parallel applications that require iterative/complex communication patterns (MPI based applications)
Work needs to be done on finding algorithms for these applications that are cloud friendly
CGL-MapReduce is efficient for iterative style mapreduce applications (k-means)
Conclusions and Future Work
Overheads for MPI applications increase as number of VMs/node increase (22-50% degradation)
In-node communication in important MapReduce applications (not susceptible to
latencies) may perform well on VMs deployed on clouds
Integration of MapReduce and MPI (biological DNA sequencing application)
No results of implementation of pleasing parallel applications (Cap3, HEP) with MPI, missing MPI and cloud runtimes time comparisons
Missing evaluations of HPC applications implemented with cloud runtimes on private cloud, which is critical to show the effect of multi VMs/multi-core configurations on performances of these applications
Difference in memory sizes (16/32 GB) for clusters of different OS. This could lead to biased results
Critique
Ekanayake Jaliya and Fox Geoffrey, High Performance Parallel Computing with Clouds and Cloud Technologies, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (2010), Pages 20, Volume 34
High Performance Parallel Computing with Clouds and Cloud Technologies. http://www.slideshare.net/jaliyae/high-performance-parallel-computing-with-clouds-and-cloud-technologies
Map Reduce, Wikipedia: http://en.wikipedia.org/wiki/MapReduce
References