The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
-
Upload
milind-bhandarkar -
Category
Data & Analytics
-
view
110 -
download
0
description
Transcript of The Zoo Expands: Labrador *Loves* Elephant, Thanks to Hamster
The Zoo Expands Labrador š Elephant, Thanks to Hamster
Milind Bhandarkar Chief Scientist, Pivotal Software, Inc.
About Meā¢ http://www.linkedin.com/in/milindb
ā¢ Founding member of Hadoop team at Yahoo! [2005-2010]
ā¢ Contributor to Apache Hadoop since v0.1
ā¢ Built and led Grid Solutions Team at Yahoo! [2007-2010]
ā¢ Parallel Programming Paradigms [1989-today] (PhD cs.illinois.edu)
ā¢ Center for Development of Advanced Computing (C-DAC), National Center for Supercomputing Applications (NCSA), Center for Simulation of Advanced Rockets, Siebel Systems (acquired by Oracle), Pathscale Inc. (acquired by QLogic), Yahoo!, LinkedIn, and Pivotal (formerly Greenplum)
Hamster
ā¢ Hadoop and MPI on the same cluster
ā¢ Runtime for OpenMPI applications on YARN
ā¢ Available on Pivotal HD
Why MPI ?ā¢ Hadoop Dataflow paradigms (MapReduce,
TeZ etc) not suitable for iterative applications
ā¢ Message Passing Interface (MPI)
ā¢ Mature standard
ā¢ Used extensively in HPC
ā¢ Huge ecosystem
MPI in Science & Engg
Earth Atmosphere
Chemistry
Biology
Math Nuclear
MPI in Industry
Mechanical ļæ½ar
Finance/bank Oil Exploration Cryptography
Spacecraft
OpenMPI
ā¢ Mature Open Source implementation of MPI 3.0 Standard (mpi-forum.org)
ā¢ New BSD license
ā¢ 30+ contributing organizations from academia, research and industry
ā¢ http://open-mpi.org
OpenMPI Architecture
Pluggable
Hamster Designā¢ YARN as Resource Manager
ā¢ Hamster Application Manager
ā¢ Manages MPI jobs
ā¢ (tries to) Implement Gang-Scheduling
ā¢ Leverages OMPI/ORTE strengths
ā¢ Wire-up, Task monitoring, Fast Interconnect
Hamster ArchitectureResource Manager
Scheduler
AMService
Node Manager Node Manager Node Manager
ā¦
Proc/Container
Framework Daemon NSMPI Scheduler HNP
MPI AM
Proc/Containerā¦RM-AM
AM-NM
RM-NodeManagerClient
Client-RM
Aux Srvcs
Proc/Container
Framework Daemon NS
Proc/Containerā¦
Aux SrvcsRM-
NodeManager
Hamster AppMasterā¢ Master daemon for MPI ( similar to JobTracker in
MapReduce)
ā¢ Implements and participates in the YARN-RM App lifecycle protocol
ā¢ Maintains heartbeat with RM to ensure liveness
ā¢ MPI Scheduler - Negotiates resource allocation with YARN-RM
ā¢ Head Node Process (HNP) - manages job execution
Hamster Node Service
ā¢ User-level daemon per MPI job
ā¢ Manages task execution
ā¢ Coarse-grained container management
ā¢ Bootstrapped by YARN-NM
ā¢ Implemented as YARN Auxiliary Service
Why GraphLab on Hadoop ?
ā¢ Graph Analytics & Machine Learning only one stage in E2E data pipeline
ā¢ ETL/Preprocessing
ā¢ Building Graphs from fact & dimension tables
ā¢ Publishing analytics results, post-processing
GraphLab 2.2
ā¢ Communication patterns based on Data
ā¢ Several Toolkits (Graph Analytics + ML Algorithms) available
ā¢ Graph-Programming API
ā¢ Uses MPI for communication
Pivotal HD
HDFS
HBase Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource
Management & Workflow
Yarn
Zookeeper
Apache Pivotal
Command Center Configure,
Deploy, Monitor, Manage
Spring XD
Pivotal HD Enterprise
Spring
Xtension Framework
Catalog Services
Query Optimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ ā Advanced Database Services
Distributed In-memory
Store
Query Transactions
Ingestion Processing
Hadoop Driver ā Parallel with Compaction
ANSI SQL + In-Memory
GemFire XD ā Real-Time Database Services
MADlib Algorithms
Oozie
Virtual Extensions
Graphlab, Open MPI
Performance
Test Environment
ā¢ Pivotal Analytics Workbench Cluster
ā¢ Pivotal HD 1.1 (Apache Hadoop 2.0.5)
ā¢ Hamster - 1.0, OpenMPI-1.7.2
ā¢ 515 nodes
ā¢ 2x6-core Westmere, 48GB RAM, 12x2TB SATA, Mellanox FDR Infiniband
Null Jobā¢ Measures overhead of launching MPI jobs
ā¢ Tests scalability of resource allocation, launching and wire-up
ā¢ Sub-linear scalability (slightly worse than O(logN)
ā¢ Overhead of launching 15000 processes = 1 minute
Total RuntimeTi
me
(Sec
.)
5
18.75
32.5
46.25
60
Process number0 4000 8000 12000 16000
E2E time
Allocation TimeTi
me
(Sec
.)
1
2.25
3.5
4.75
6
Number of Processes0 4000 8000 12000 16000
Allocation Time
Launch TimeTi
me
(Sec
.)
0
7.5
15
22.5
30
Number of processes0 4000 8000 12000 16000
Launch Time
Comparison with OpenMPI
ā¢ HPL (HP Linpack for Top-500)
ā¢ Number of processes 50ā1000
ā¢ Hamster 1% slower than OpenMPI
HPL - Hamster vs OpenMPI
Tim
e (S
ec.)
0
30
60
90
120
1000 500 200 50
GraphLab ALS
ā¢ Wikipedia dataset
ā¢ 4.3 M terms, 3.3M documents, 513M occurrences
ā¢ 17 Processes
ā¢ 5 Iterations
GraphLab ALSTi
me
(Sec
.)
0
335
670
1005
1340
Hamster OpenMPI
GraphLab PageRankā¢ Twitter Dataset
ā¢ 4.1 M nodes, 1.4 B edges
ā¢ Data Size : 26GB
ā¢ NP = 17
ā¢ 50 iterations: 297 seconds
ā¢ 100 iterations: 339 seconds
Questions?