Economic Scheduling of Hadoop Jobs

15
Economic Scheduling of Hadoop Jobs -The Dynamic Priority MapReduce Scheduler Thomas Sandholm, HP Labs, Palo Alto Kevin Lai, HP Labs, Palo Alto 1

description

A presentation on The Dynamic Priority MapReduce Scheduler by Thomas Sandholm and Kevin Lai of HP Labs, Palo Alto. This scheduler is a contribution to Hadoop 0.21+

Transcript of Economic Scheduling of Hadoop Jobs

Page 1: Economic Scheduling of Hadoop Jobs

Economic Scheduling of

Hadoop Jobs

-The Dynamic Priority MapReduce Scheduler

Thomas Sandholm, HP Labs, Palo AltoKevin Lai, HP Labs, Palo Alto

1

Page 2: Economic Scheduling of Hadoop Jobs

The Problem

» Allocate slots on compute nodes for job tasks

» Classic Approach: Throughput optimization

» Cross User Priorities inferred based on heuristics

» Social Scheduling

» Our Approach: User value optimization

» Users are given an incentive to scale up or down

» Automate demand conflict resolution

2

Page 3: Economic Scheduling of Hadoop Jobs

Other Hadoop Schedulers

» FIFO

» HOD

» Fairshare

» Capacity

» Designed for no queues or few static fixed QoS

queues

» Works well in corporate clusters

3

Page 4: Economic Scheduling of Hadoop Jobs

Dynamic Priority Scheduler Requirements

» Users may come and go frequently

» Users may be unknown to providers

» Users may want to schedule jobs across data

centers and Hadoop installations

manual, social scheduling of users

(assumed to be cooperating) breaks down

4

Page 5: Economic Scheduling of Hadoop Jobs

Architecture

4/(4+1.5+2)*15=8

5

Page 6: Economic Scheduling of Hadoop Jobs

Our Solution: Automated Resource Allocation

Budget Remaining

Share

Spending Rate

Running Tasks

Pending Tasks

6

Page 7: Economic Scheduling of Hadoop Jobs

Proportional-Share Scheduling

» qi = bi/(bi + p)

» p = ∑ b-i

» Huberman et al Spawn ‘92

» Waldspurger et al Lottery Scheduling ‘95

» Lai et al Tycoon ‘05

7

Page 8: Economic Scheduling of Hadoop Jobs

Key Design Principles

» Pay-per-use: spending rate is only deducted from budget

if a job performed work

» Work-conserving: users are never charged more than

their spending rates but can get more slots if other users

are idle

» Preemptive: higher spending users may cause tasks from

lower spending users to be killed

» Scalable: No memory, or history-based fair-share

smoothing

8

Page 9: Economic Scheduling of Hadoop Jobs

Implementation

» Standalone Hadoop MapReduce JobTracker Scheduler Plugin

» HTTP/XML/REST Servlet to provide secure management

and monitoring of queues

» Generic queue allocation/accounting classes (could

move into mapred core)

» Pluggable scheduler enforcing shares, when scheduling

jobs (could be replaced by capacity/fairshare

enforcers)

9

Page 10: Economic Scheduling of Hadoop Jobs

Configuration

Option Examples

mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.DynamicPriorityScheduler

mapred.priority-scheduler.kill-interval 0

mapred.dynamic-scheduler.alloc-interval 20

mapred.dynamic-scheduler.budget-file /etc/hadoop.budget

mapred.priority-scheduler.acl-file /etc/hadoop.acl

10

Page 11: Economic Scheduling of Hadoop Jobs

Experiment

Fairshare vs Capacity vs FIFO vs DP

2-80 simulated users/queues

2 Clusters

PiEstimator Simulation

11

Page 12: Economic Scheduling of Hadoop Jobs

Budget Dynamics

Funding runs out Budget replenished

DynPrio no preempt

DynPrio preemptCapacity scheduler

FIFO scheduler

12

Page 13: Economic Scheduling of Hadoop Jobs

Service Differentiation

DynPrio FIFO

13

Page 14: Economic Scheduling of Hadoop Jobs

Dynamic Adjustment

14

Page 15: Economic Scheduling of Hadoop Jobs

More info

» Papers

› SIGMETRICS 2009

› Workshop on Job Scheduling for Parallel Processing

(JSSPP’10)

› International Conference on Cloud Computing and

Virtualization (CCV’10)

» HADOOP-4768 JIRA

» Source:

http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/dynamic-scheduler/

15