The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster

Post on 26-Jan-2015

110 views 0 download

description

The refactoring of Hadoop MapReduce framework, by separating resource management (YARN) from job execution (MapReduce) has allowed multiple programming paradigms to take advantage of the massive scale Hadoop Distributed File System (HDFS) clusters. Hamster (Hadoop And Mpi on the same cluSTER) is a port of OpenMPI to use YARN as a resource manager. Hamster allows applications written using MPI (Message Passing Interface) to run alongside other YARN applications and frameworks, such as MapReduce, on the same Hadoop cluster. In this talk, I will describe the architecture of Hamster, and present a few MPI applications that have been demonstrated to run in Hadoop. GraphLab uses MPI as one of the supported communication libraries, and can read/write data from/to HDFS. I will describe how GraphLab runs on top of Hadoop using Hamster, and present a few benchmarks in graph analytics, comparing GraphLab with other machine frameworks.

Transcript of The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster

The Zoo Expands Labrador 💛 Elephant, Thanks to Hamster

Milind Bhandarkar Chief Scientist, Pivotal Software, Inc.

About Me• http://www.linkedin.com/in/milindb

• Founding member of Hadoop team at Yahoo! [2005-2010]

• Contributor to Apache Hadoop since v0.1

• Built and led Grid Solutions Team at Yahoo! [2007-2010]

• Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)

• Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)

Hamster

• Hadoop and MPI on the same cluster

• Runtime for OpenMPI applications on YARN

• Available on Pivotal HD

Why MPI ?• Hadoop Dataflow paradigms (MapReduce,

TeZ etc) not suitable for iterative applications

• Message Passing Interface (MPI)

• Mature standard

• Used extensively in HPC

• Huge ecosystem

MPI in Science & Engg

Earth Atmosphere

Chemistry

Biology

Math Nuclear

MPI in Industry

Mechanical �ar

Finance/bank Oil Exploration Cryptography

Spacecraft

OpenMPI

• Mature Open Source implementation of MPI 3.0 Standard (mpi-forum.org)

• New BSD license

• 30+ contributing organizations from academia, research and industry

• http://open-mpi.org

OpenMPI Architecture

Pluggable

Hamster Design• YARN as Resource Manager

• Hamster Application Manager

• Manages MPI jobs

• (tries to) Implement Gang-Scheduling

• Leverages OMPI/ORTE strengths

• Wire-up, Task monitoring, Fast Interconnect

Hamster ArchitectureResource Manager

Scheduler

AMService

Node Manager Node Manager Node Manager

…

Proc/Container

Framework Daemon NSMPI Scheduler HNP

MPI AM

Proc/Container…RM-AM

AM-NM

RM-NodeManagerClient

Client-RM

Aux Srvcs

Proc/Container

Framework Daemon NS

Proc/Container…

Aux SrvcsRM-

NodeManager

Hamster AppMaster• Master daemon for MPI ( similar to JobTracker in

MapReduce)

• Implements and participates in the YARN-RM App lifecycle protocol

• Maintains heartbeat with RM to ensure liveness

• MPI Scheduler - Negotiates resource allocation with YARN-RM

• Head Node Process (HNP) - manages job execution

Hamster Node Service

• User-level daemon per MPI job

• Manages task execution

• Coarse-grained container management

• Bootstrapped by YARN-NM

• Implemented as YARN Auxiliary Service

Why GraphLab on Hadoop ?

• Graph Analytics & Machine Learning only one stage in E2E data pipeline

• ETL/Preprocessing

• Building Graphs from fact & dimension tables

• Publishing analytics results, post-processing

GraphLab 2.2

• Communication patterns based on Data

• Several Toolkits (Graph Analytics + ML Algorithms) available

• Graph-Programming API

• Uses MPI for communication

Pivotal HD

HDFS

HBase Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource

Management & Workflow

Yarn

Zookeeper

Apache Pivotal

Command Center Configure,

Deploy, Monitor, Manage

Spring XD

Pivotal HD Enterprise

Spring

Xtension Framework

Catalog Services

Query Optimizer

Dynamic Pipelining

ANSI SQL + Analytics

HAWQ – Advanced Database Services

Distributed In-memory

Store

Query Transactions

Ingestion Processing

Hadoop Driver – Parallel with Compaction

ANSI SQL + In-Memory

GemFire XD – Real-Time Database Services

MADlib Algorithms

Oozie

Virtual Extensions

Graphlab, Open MPI

Performance

Test Environment

• Pivotal Analytics Workbench Cluster

• Pivotal HD 1.1 (Apache Hadoop 2.0.5)

• Hamster - 1.0, OpenMPI-1.7.2

• 515 nodes

• 2x6-core Westmere, 48GB RAM, 12x2TB SATA, Mellanox FDR Infiniband

Null Job• Measures overhead of launching MPI jobs

• Tests scalability of resource allocation, launching and wire-up

• Sub-linear scalability (slightly worse than O(logN)

• Overhead of launching 15000 processes = 1 minute

Total RuntimeTi

me

(Sec

.)

5

18.75

32.5

46.25

60

Process number0 4000 8000 12000 16000

E2E time

Allocation TimeTi

me

(Sec

.)

1

2.25

3.5

4.75

6

Number of Processes0 4000 8000 12000 16000

Allocation Time

Launch TimeTi

me

(Sec

.)

0

7.5

15

22.5

30

Number of processes0 4000 8000 12000 16000

Launch Time

Comparison with OpenMPI

• HPL (HP Linpack for Top-500)

• Number of processes 50—1000

• Hamster 1% slower than OpenMPI

HPL - Hamster vs OpenMPI

Tim

e (S

ec.)

0

30

60

90

120

1000 500 200 50

GraphLab ALS

• Wikipedia dataset

• 4.3 M terms, 3.3M documents, 513M occurrences

• 17 Processes

• 5 Iterations

GraphLab ALSTi

me

(Sec

.)

0

335

670

1005

1340

Hamster OpenMPI

GraphLab PageRank• Twitter Dataset

• 4.1 M nodes, 1.4 B edges

• Data Size : 26GB

• NP = 17

• 50 iterations: 297 seconds

• 100 iterations: 339 seconds

Questions?