Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
-
Upload
linh-ngo -
Category
Presentations & Public Speaking
-
view
23 -
download
0
Transcript of Dynamic Provisioning of Data Intensive Computing Middleware Frameworks
Dynamic Provisioning of Data Intensive Computing Middleware Frameworks: A Case
Study
Linh B. Ngo1 Michael E. Payne1 Flavio Villanustre2
Richard Taylor2 Amy W. Apon1
1School of Computing, Clemson University 2LexisNexis® Risk Solutions
Contents
1. Overview of Clemson University’s Cyberinfrastructure Resource 2. Demand for Dynamic Data-‐Intensive Compu@ng Middleware Frameworks 3. Dynamic Provisioning of Data-‐Intensive Compu@ng Framework 4. Deploying Hadoop Ecosystem vs. Deploying HPCC Systems® 5. Lessons Learned
Cyberinfrastructure Resource at Clemson University
Condominium model
2,007 Computer Nodes (21,400 cores), including 276 GPU nodes
Sustained 551 Tflops (benchmarked on GPU nodes only)
1289 active users, 12 academic departments across 36 fields of research
Facilities
Cyberinfrastructure Resource at Clemson University
• 1G/10G/Myrinet-‐10G/Infiniband-‐40G/Infiniband-‐56G • Local storage between 100-‐200GB (majority) and 400-‐900GB (since 2013) • Shared 233TB OrangeFS scratch space and more than 3PB archival space
Demand for Dynamic Data-‐Intensive Compu@ng Middleware Frameworks
• Genome Sequencing (Hadoop MapReduce/GPGPU) • Molecular Dynamic Forward Flux Sampling (Hadoop Streaming/LAMMPS) • Streaming Data Infrastructure for Connected Vehicle System (Hadoop
Distributed File System/Spark/Ka_a) • Big Scholarly Data (HPCC Systems) • CS Course in Distributed and Cluster Compu@ng (MPI/MapReduce,
Hadoop/Spark/HPCC Systems® …)
Demand for Dynamic Data-‐Intensive Compu@ng Middleware Frameworks
• Changes in cyberinfrastructure support model for data infrastructure: – Beyond a tradi@onal remote distributed file system model – From sta@c and dedicated resource to dynamic resource – Data management processes co-‐locate with compu@ng processes
• Challenges for system administrators: – Accommoda@ng different frameworks for different research – Complying with exis@ng administra@ve policy and scheduling priority
• What can users do? – Deploying dynamic data-‐intensive compu@ng frameworks within the
limits of user privilege and without the interven@on of administrators
Dynamic Provisioning of Data-‐Intensive Compu@ng Framework: Installa@on
• Where to install 1. Home directory: Persistent, limited in storage 2. Shared distributed storage: Fast, semi-‐persistent, “unlimited” storage 3. Local storage on compute node: Fast, non-‐persistent, requires
reinstalla@on • How to handle dependencies
1. Ideally in home or shared distributed storage (persistency) 2. Dynamic loading mechanisms via environment paths
Target deployment directories on local disks
PBS_NODEFILE Deployment/
ConfiguraBon Scripts
1
2 3
4
user.palmeHo.clemson.edu
Dynamic Provisioning of Data-‐Intensive Compu@ng Framework: Deployment
Deploying Hadoop Ecosystem vs. deploying HPCC Systems®: Overview
• Open source alterna@ves based on the conceptual architecture of a data-‐intensive compu@ng infrastructure developed by Google
• Comprehensive data-‐intensive compu@ng system targe@ng enterprise users, developed in early 2000, open source since 2011
Deploying Hadoop Ecosystem vs. deploying HPCC Systems®: Installa@on: Hadoop
• Self-‐contained, pre-‐compiled jar files • No installa@on is needed, relies on shell scripts to launch component
daemons • Dependencies: JDK
Deploying Hadoop Ecosystem vs. deploying HPCC Systems®: Installa@on: HPCC Systems
• Standard configure/make/make install – Assump@on about an industrial produc@on environment (with
administra@ve privileges) – Modifica@on to avoid hard-‐coded system installa@on paths – Modifica@on of template XML configura@on files to avoid default
HPCC Systems-‐specific user crea@on and administra@ve check • Dependencies:
– Not on Palmeko: ICU, Xalan, Xerces, APR … – On Palmeko but no correct version: Binu@ls
Deploying Hadoop Ecosystem vs. deploying HPCC Systems: Deployment: Hadoop
• Component placement determina@on
• Cleanup target directories from previous deployment
• Create target directories (log, storage, pid …)
• Synchronize order of component start-‐up
Namenode ResourceManager SparkMaster
DataNode
NodeManager
SparkExecutor
DataNode
NodeManager
SparkExecutor
DataNode
NodeManager
SparkExecutor
1st node in PBS_NODEFILE
2nd node in PBS_NODEFILE
3rd node in PBS_NODEFILE
4th node in PBS_NODEFILE
5th node in PBS_NODEFILE
nth node in PBS_NODEFILE
• Addi@onal components (Hbase, Hive, Ka_a …) can be added to this deployment model
Deploying Hadoop Ecosystem vs. deploying HPCC Systems: Deployment: HPCC Systems
• Determine node
alloca@on and internal IP addresses
• HPCC Systems is configured via its own deployment programs (configmgr, configgen, hpcc-‐init)
1st node in PBS_NODEFILE
2nd node in PBS_NODEFILE
1st node in PBS_NODEFILE
3rd node in PBS_NODEFILE
4th node in PBS_NODEFILE
5th node in PBS_NODEFILE
nth node in PBS_NODEFILE
Deploying Hadoop Ecosystem vs. deploying HPCC Systems: Deployment: HPCC Systems
• Node memory constraints • HPCC Systems reserves
75% of available memory for thor by default
• Palmeko does not allow unlimited memory reserva@on
• As a result, thor_master cannot launch new jobs via fork()
• Resolved by lower memory reserva@on
1st node in PBS_NODEFILE
2nd node in PBS_NODEFILE
1st node in PBS_NODEFILE
3rd node in PBS_NODEFILE
4th node in PBS_NODEFILE
5th node in PBS_NODEFILE
nth node in PBS_NODEFILE
Lessons Learned
• A common approach can be adapted for both Hadoop Ecosystem and HPCC Systems
• Limita@ons on non-‐administra@ve accounts can impact the deployment and performance via system resource constraints – Unable to u@lize all available memory on allocated node (HPCC
Systems) • Dynamic deployment via non-‐administra@ve accounts provide ini@a@ve
for users to experiment with and u@lize new large scale frameworks without addi@onal burden for administrators
Lessons Learned
• Experience in deploying as users is, in turn, extremely applicable to the process of deployment with administra@ve privileges.
• E.g.: CloudLab cloud compu@ng experimental testbed with non-‐persistent, ephemeral, and short-‐term (15 hours) alloca@on – Script-‐based installa@on and deployment are needed, even with
administra@ve right, to automate the deployment of the experiment • Experience in deploying as administrators is helpful in debugging user-‐
based deployment: – Iden@fica@on and resolu@on of memory alloca@on issue in HPCC
Systems were done by changing system limita@on using administra@ve commands.
QUESTIONS?
Linh B. Ngo1 Michael E. Payne1 Flavio Villanustre2 Richard Taylor2 Amy W. Apon1
{lngo,mpayne3,aapon}@clemson.edu 1School of Computing, Clemson University
{flavio.villanustre,richard.taylor}@lexisnexis.com 2LexisNexis Risk Solutions
More information about HPCCSystems can be found at http://hpccsystems.com