Economic Scheduling of Hadoop Jobs
-
Upload
steve-loughran -
Category
Technology
-
view
3.794 -
download
1
description
Transcript of Economic Scheduling of Hadoop Jobs
Economic Scheduling of
Hadoop Jobs
-The Dynamic Priority MapReduce Scheduler
Thomas Sandholm, HP Labs, Palo AltoKevin Lai, HP Labs, Palo Alto
1
The Problem
» Allocate slots on compute nodes for job tasks
» Classic Approach: Throughput optimization
» Cross User Priorities inferred based on heuristics
» Social Scheduling
» Our Approach: User value optimization
» Users are given an incentive to scale up or down
» Automate demand conflict resolution
2
Other Hadoop Schedulers
» FIFO
» HOD
» Fairshare
» Capacity
» Designed for no queues or few static fixed QoS
queues
» Works well in corporate clusters
3
Dynamic Priority Scheduler Requirements
» Users may come and go frequently
» Users may be unknown to providers
» Users may want to schedule jobs across data
centers and Hadoop installations
manual, social scheduling of users
(assumed to be cooperating) breaks down
4
Architecture
4/(4+1.5+2)*15=8
5
Our Solution: Automated Resource Allocation
Budget Remaining
Share
Spending Rate
Running Tasks
Pending Tasks
6
Proportional-Share Scheduling
» qi = bi/(bi + p)
» p = ∑ b-i
» Huberman et al Spawn ‘92
» Waldspurger et al Lottery Scheduling ‘95
» Lai et al Tycoon ‘05
7
Key Design Principles
» Pay-per-use: spending rate is only deducted from budget
if a job performed work
» Work-conserving: users are never charged more than
their spending rates but can get more slots if other users
are idle
» Preemptive: higher spending users may cause tasks from
lower spending users to be killed
» Scalable: No memory, or history-based fair-share
smoothing
8
Implementation
» Standalone Hadoop MapReduce JobTracker Scheduler Plugin
» HTTP/XML/REST Servlet to provide secure management
and monitoring of queues
» Generic queue allocation/accounting classes (could
move into mapred core)
» Pluggable scheduler enforcing shares, when scheduling
jobs (could be replaced by capacity/fairshare
enforcers)
9
Configuration
Option Examples
mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler
mapred.priority-scheduler.kill-interval 0
mapred.dynamic-scheduler.alloc-interval 20
mapred.dynamic-scheduler.budget-file /etc/hadoop.budget
mapred.priority-scheduler.acl-file /etc/hadoop.acl
10
Experiment
Fairshare vs Capacity vs FIFO vs DP
2-80 simulated users/queues
2 Clusters
PiEstimator Simulation
11
Budget Dynamics
Funding runs out Budget replenished
DynPrio no preempt
DynPrio preemptCapacity scheduler
FIFO scheduler
12
Service Differentiation
DynPrio FIFO
13
Dynamic Adjustment
14
More info
» Papers
› SIGMETRICS 2009
› Workshop on Job Scheduling for Parallel Processing
(JSSPP’10)
› International Conference on Cloud Computing and
Virtualization (CCV’10)
» HADOOP-4768 JIRA
» Source:
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/
15