HPC with Clouds and Cloud Technologies

High Performance Parallel Computing with

Clouds and Cloud TechnologiesJaliya Ekanayake and Geoffrey Fox

School of Informatics and Computing Indiana University Bloomington

Cloud Computing and Software Services: Theory and TechniquesJuly, 2010

Presented by:Inderjeet Singh

Introduction Problem Data Analysis Applications Evaluations and Analysis Performance of MPI on Clouds Benchmarks and Results Conclusions and Future Work Critique

Overview

Introduction

Clouds and Cloud Technologies

Apache Hadoop (OpenSource version of Google MapReduce)

DryadLINQ (Microsoft API for Dryad) CGL-MapReduce (Iterative version of MapReduce)

Cloud technologies/Parallel Runtimes/Cloud Runtimes

On demand provisioning of resources Customizable Virtual Machines (VM) Root privileges Provisioning is very fast (within minutes) You pay only for what you use Better resource utilization

Advantages of Cloud

Cloud Technologies Moving computation to data Better Quality of Service (QoS) Simple communication topologies Distributed file system (HDFS,GFS)

Most HPC applications are based upon MPI Many fine grained communication

topologies Usage of fast network

Features

Software framework to support distributed computing on large datasets on cluster of computers

Map step - The master node takes the input, partitions it up into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node

Reduce step - The master node collects the answers to all the sub-problems and combines them in some way to form the output or answer

MapReduce

Large data/compute intensive applications

Traditional approach Execution on Clusters/grid/supercomputers Moving both application and data to available

computational power Efficiency decreases with large datasets

Better approach Execution with Cloud technologies Moving computations to data to perform processing More data centric approach

Comparisons of features supported by different cloud technologies and MPI

What applications are best handled by cloud technologies?

What overheads do they introduce? Can traditional parallel runtimes such as

MPI be used in cloud? If so, what overheads do they have?

Problem

Types of Applications (Based upon communication)

Map only (Cap3) Map Reduce (HEP) Iterative/Complex style (Matrix Multiplication and

K-Means Clustering)

Data Analysis Applications

Cap3 - Sequence assembly program that operates on a collection of gene sequence files to produce several outputs

HEP - High Energy Physics data analysis application

K-Means clustering - Performs iteratively refining computation of clusters

Matrix Multiplication – Cannon’s algorithm

Iterative/Complex Style

MapReduce does not support iterative/complex style applications so [Fox] build CGL- MapReduce

CGL-Mapreduce – Supports long running tasks and retains static data in memory across invocations

Performance (average running time) Overhead = [P * T(P) – T(1)]/T(1)

P = No. of processes

Evaluation and Analysis

DryadLINQ

Hadoop/CGL MapReduce/MPI

CAP3 (map only) and HEP (mapreduce) perform well with cloud runtimes

K-means clustering (iterative) and matrix multiplications (iterative) show high overheads with cloud runtimes compared to MPI runtime

CGL-Mapreduce also gives less overhead for large datasets

Goals Overhead of Virtual Machines (VM) on parallel

applications in MPI How applications with different

communication/computation (c/c) ratio perform on cloud?

Effect of different CPU core assignment strategies on VMs and running these MPI applications on these VMs

Performance of MPI on Private Cloud

Three MPI applications with different c/c ratios requirements

Matrix multiplication (Cannon’s algorithm) K-Means clustering Concurrent wave solver

Computation and Communication complexities of the different MPI applications used

Eucalyptus and Xen based cloud infrastructure

16 nodes with 2 Quad Core Intel Xeon processors and 32 GB of memory

Nodes connected with 1 gigabit Ethernet connection Same s/w configuration for both bare-metal

nodes and VMs• OS - Red Hat Enterprise Linux Server release 5.2• OpenMP version 1.3.2

Benchmarks and Results

Different CPU core/virtual machines assignment strategies

Invariant to select the number of MPI processesNumber of MPI processes = Number of CPU cores used

Matrix Multiplication (Cannon’s)

◦ Speedup decrease 34% between Bare metal and 8-VM/node at 81 processes

◦ Exchange of large messages and more communication

Speedup – Fixed Matrix size (5184*5184)

Performance – 64 CPU Cores

K-Means Clustering

◦ Communication is very less than computations◦ Communication here depends upon number of clusters formed◦ Overhead is large for small data sizes, so less speedup is

observed

Total overhead (Number of MPI Processes =128)

Performance – 128 CPU Cores

Concurrent Wave Equation Solver

◦ Amount of communications is fixed, less data transfer rates◦ Lower c/c ratio of O(1/n) leads to more latency and lower

performance on VMs◦ 8-VMs per node has 7% more overhead than bare metal node

Performance – 128 CPU CoresTotal Overhead (Number of MPI Processes = 128)

◦ In multi VMs configuration scheduling of I/O operation of DomUs (user domains) happens via Dom0 (privileged OS)

Communication between dom0 and domUs when 1-VM per node is deployed (top). Communication between dom0 and domUs when 8-VMs per node are deployed (bottom)

When using mutliple VMs on multi-core CPUs, it is good to use runtimes supporting in-node communications (OpenMP vs LAM-MPI)

Figure: LAM vs. OpenMPI in different VM configurations

Cloud runtimes work well for pleasing parallel (map only and mapreduce) applications with large datasets

Overheads of cloud runtimes are high with parallel applications that require iterative/complex communication patterns (MPI based applications)

Work needs to be done on finding algorithms for these applications that are cloud friendly

CGL-MapReduce is efficient for iterative style mapreduce applications (k-means)

Conclusions and Future Work

Overheads for MPI applications increase as number of VMs/node increase (22-50% degradation)

In-node communication in important MapReduce applications (not susceptible to

latencies) may perform well on VMs deployed on clouds

Integration of MapReduce and MPI (biological DNA sequencing application)

No results of implementation of pleasing parallel applications (Cap3, HEP) with MPI, missing MPI and cloud runtimes time comparisons

Missing evaluations of HPC applications implemented with cloud runtimes on private cloud, which is critical to show the effect of multi VMs/multi-core configurations on performances of these applications

Difference in memory sizes (16/32 GB) for clusters of different OS. This could lead to biased results

Critique

Ekanayake Jaliya and Fox Geoffrey, High Performance Parallel Computing with Clouds and Cloud Technologies, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (2010), Pages 20, Volume 34

High Performance Parallel Computing with Clouds and Cloud Technologies. http://www.slideshare.net/jaliyae/high-performance-parallel-computing-with-clouds-and-cloud-technologies

Map Reduce, Wikipedia: http://en.wikipedia.org/wiki/MapReduce

References

http://www.slideshare.net/jaliyae/high-performance-parallel-computing-with-clouds-and-cloud-technologies



http://en.wikipedia.org/wiki/MapReduce



HPC with Clouds and Cloud Technologies

Technology

Transcript of HPC with Clouds and Cloud Technologies