Operationalizing YARN based Hadoop Clusters in the Cloud
-
Upload
dataworks-summithadoop-summit -
Category
Technology
-
view
494 -
download
1
Transcript of Operationalizing YARN based Hadoop Clusters in the Cloud
Operationalizing YARN Based Hadoop Clusters in the Cloud
Abhishek ModiLead Developer,Yarn and Hadoop Team,Qubole
Hadoop at Qubole
● Over 300 Petabytes data processed per month.
● More than 100 customers with more than 1000 active users.
● Over 1 million Hadoop jobs completed per month.
● More than 8,000 Hadoop clusters brought up per month.
Qubole Architecture
Qubole UI
Qubole SaaS
Hadoop Cluster
Hadoop Cluster
Hadoop Cluster
Hadoop Cluster
CloudStorage
Prod
New
Qubole REST API
Ephemeral Hadoop Clusters
Bring up Cluster Perform Jobs Terminate Cluster
Scale Up
Scale Down
• Use cloud storage for job output and input.
• Needs to auto-scale as per work-load.
• Store job history and logs at persistent location.
• Adapting YARN/HDFS to take into account ephemeral cloud nodes.
Challenges: Ephemeral Hadoop Clusters
YARN Auto-scaling
Up-scaling for MR jobs
Resource Manager
Node 1
Node 2
User
Submit Job
Launches MR AM
NodeManager
MR AppMaster
ContainerRequest
Allocate Resources
NodeManager
C1 C2
Task Progress
Up Scale Request
Cluster Manager
Add Node
NodeManager
C3 C4Node 3
Generic Up-scaling
Resource Manager
ClusterManager
MR AppMaster
Spark AppMaster
Tez AppMaster
Up Scale Request
Add Node
Node 2
Down-scaling
Resource Manager
NodeManager
C1 C2
C3 C4
NodeManager
C1 C2
C3 C4
NodeManager
C1 C2
C4C3
Status Update
Evaluates cluster is being underutilized and
can be down scaled
Selects node whose estimated task
completion time is lowest
Graceful Shutdown
User
Submits Job
Allocates container
Job1 Completes
Cluster Manager
Remove Node
Job 1Job 2Job 3
DecommissionNode
Node 1
Node 3
Re-commissioning
NodeManager
C2C1
NodeManager
C1 C2
C4C3
C4C3
NodeManager
C4
C2C1
Resource Manager
Graceful Shutdown
User
Submit Job
Allocates Containers
C3
Upscale Request
Re-commission
• Containers contains output of Map tasks.
• Can not be terminated until Map output is consumed.
• Upload Map output to cloud.
• Reducers access Map output directly from cloud.
Further Optimizations in Down-scaling
• DFS used and incoming data rate is monitored periodically.
• Upscale if free DFS goes below an absolute threshold.
• Upscale if free DFS is projected to go below absolute threshold in next few
minutes.
HDFS Based Up-Scaling
Cost Benefits of Auto-scaling
• AWS and Google Cloud provide volatile nodes termed as “Spot Nodes” or “Pre-
emptible Nodes”
• Available at very low price as compared to stable nodes.
• Can be lost at any point of time without any prior notification.
• Hadoop’s failure resilience makes these nodes good candidates for Hadoop.
• Approx. 77% of all Qubole clusters make use of volatile nodes.
Volatile Nodes
• While starting cluster, percentage of volatile nodes can be specified.
• A maximum ‘bid’ price for volatile nodes is also specified.
• Qubole Placement Policy:
– Ensures at least one replica of each HDFS block is present on Stable Node.
– No Application Master is scheduled on volatile nodes.
Volatile Nodes at Qubole
• While up-scaling, RM tries to maintain volatile node percentage.
• If volatile node are not available, fall back to stable nodes.
• Periodically tries to re-balance the volatile node percentage.
Rebalancing – Volatile Nodes
• Show job history for terminated clusters.
• Multi-tenant job history server.
• Clusters are generally running in isolated networks – need a proxy.
• Job History files needs to be stored at cloud storage.
Job History
Job History – Running cluster
QuboleUI Cluster Proxy
Hadoop Cluster
Hadoop Cluster
Hadoop Cluster
Hadoop Cluster
User
Clicks UI link
Authenticates the request
Find cluster corresponding to
the request
Proxifies link in html and js
Sends Request
Job History – Terminated Cluster
QuboleUIUser Cluster
Proxy
Job History Server
Clicks UI link
Authenticates the request
Finds cluster is down
Fetches jhist file from cloud
Jhist file
Rendered JobHist
Proxifies Link
• Writing output directly to cloud without storing at temporary location.
• Optimizations in getting file status for large number of files with common prefix.
• Added streaming upload support in NativeS3FileSystem.
• Added bulk delete and move support in NativeS3FileSystem.
Cloud Read/Write Optimizations
• Issues with newer version of JetS3t (0.9.4)
– Seek performance degraded around 10X.
– Empty files.
• Deadlock when number of threads reading from S3 exceeds JetS3t’s max number
of connections (HADOOP-12739).
• Too many queues causes a deadlock in cluster.(YARN-3633)
• Support for Socks Proxy was missing from HA.
Open Source Issues
Thank You