Hadoop Summit Fair Scheduler (1)

download Hadoop Summit Fair Scheduler (1)

of 30

Transcript of Hadoop Summit Fair Scheduler (1)

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    1/30

    UC Berkeley

    Job Scheduling with theFairand Capacity Schedulers

    Matei Zaharia

    Wednesday, June 10, 2009

    Santa Clara Marriott

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    2/30

    Motiation

    !roide "ast response ti#es to s#all $obs in

    a shared %adoop cluster

    proe utili'ation and data locality oerseparate clusters and %adoop on (e#and

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    3/30

    %adoop at Faceboo)

    *00+node cluster running %ie

    200 $obs-day .0/ users

    pps statistical reports, spa# detection,ad opti#i'ation,

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    4/30

    Faceboo) Job 3ypes

    !roduction $obs data i#port, hourly

    reports, etc

    S#all ad+hoc $obs %ie 4ueries, sa#pling

    5ong e6peri#ental $obs #achinelearning, etc

    785 "ast response ti#es "or s#all $obs,

    guaranteed serice leels "or production$obs

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    5/30

    8utline

    Fair scheduler basics

    Conguring the "air scheduler

    Capacity scheduler

    :se"ul lin)s

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    6/30

    F&F8 Scheduling

    Job Queue

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    7/30

    F&F8 Scheduling

    Job Queue

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    8/30

    F&F8 Scheduling

    Job Queue

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    9/30

    Fair Scheduling

    Job Queue

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    10/30

    Fair Scheduling

    Job Queue

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    11/30

    Fair Scheduler ;asics

    7roup $obs into

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    12/30

    !ools

    (eter#ined "ro# a congurable $ob

    property (e"ault in 0=20 user=na#e >one pool per

    user?

    !ools hae properties

    Mini#u# #ap slots

    Mini#u# reduce slots

    5i#it on @ o" running $obs

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    13/30

    A6a#ple !ool llocations

    entirecluster100slots

    #atei $eB ads

    min share =40

    to#min share =

    30

    $ob 215

    slots

    $ob 15

    slots

    $ob 130

    slots

    $ob 40

    slots

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    14/30

    Scheduling lgorith#

    Split each poolDs #in share a#ong its

    $obs Split each poolDs total share a#ong its$obs

    When a slot needs to be assigned

    &" there is any $ob below its #in share,schedule it

    Alse schedule the $ob that weDe been#ost un"air to >based on

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    15/30

    Scheduler (ashboard

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    16/30

    Scheduler (ashboard

    Changepriority

    ChangepoolF&F8 #ode >"or

    testing?

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    17/30

    dditional Features

    Weights "or une4ual sharing

    Job weights based on priority >each leel 26?

    Job weights based on si'e

    !ool weights

    5i#its "or @ o" running $obs

    !er user

    !er pool

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    18/30

    &nstalling the Fair Scheduler

    ;uild it

    ant pac)age

    !lace it on the classpath

    cp build-contrib-"airscheduler-G=$ar lib

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    19/30

    Conguration Files

    %adoop cong >con"-#apred+site=6#l?

    Contains scheduler options, pointer to poolsle

    !ools le >pools=6#l?

    Contains #in share allocations and li#its onpools

    Heloaded eery 1. seconds at runti#e

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    20/30

    Mini#al hadoop+site=6#l

    Iproperty

    Ina#e#apred=$obtrac)er=tas)SchedulerI-na#eIalueorg=apache=hadoop=#apred=FairSchedulerI-alue

    I-property

    Iproperty

    Ina#e#apred="airscheduler=allocation=leI-na#e

    Ialue-path-to-pools=6#lI-alue

    I-property

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    21/30

    Mini#al pools=6#l

    IK6#l ersionL1=0LK

    IallocationsI-allocations

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    22/30

    Conguring a !ool

    IK6#l ersionL1=0LK

    Iallocations Ipool na#eLadsL

    I#inMaps10I-#inMaps

    I#inHeduces.I-#inHeduces

    I-pool

    I-allocations

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    23/30

    Setting Hunning Job 5i#its

    IK6#l ersionL1=0LK

    Iallocations

    Ipool na#eLadsL

    I#inMaps10I-#inMaps

    I#inHeduces.I-#inHeduces

    I#a6HunningJobsI-#a6HunningJobs

    I-pool

    Iuser na#eL#ateiL

    I#a6HunningJobs1I-#a6HunningJobs

    I-user

    I-allocations

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    24/30

    (e"ault !er+:ser Hunning Job 5i#it

    IK6#l ersionL1=0LK

    Iallocations

    Ipool na#eLadsL

    I#inMaps10I-#inMaps

    I#inHeduces.I-#inHeduces

    I#a6HunningJobsI-#a6HunningJobs

    I-pool

    Iuser na#eL#ateiL

    I#a6HunningJobs1I-#a6HunningJobs

    I-user

    IuserMa6Jobs(e"ault10I-userMa6Jobs(e"aultI-allocations

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    25/30

    8ther !ara#eters

    #apred="airscheduler=assign#ultiple

    ssign a #ap and a reduce on eachheartbeat i#proes ra#p+up speed andthroughput reco##endation set to true

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    26/30

    8ther !ara#eters

    #apred="airscheduler=poolna#eproperty

    Which JobCon" property sets what pool a $obis in

    - (e"ault user=na#e >one pool per user?

    - Can #a)e up your own, e=g=

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    27/30

    :se"ul Setting

    Iproperty

    Ina#e#apred="airscheduler=poolna#epropertyI-na#e

    Ialuepool=na#eI-alue

    I-property

    Iproperty

    Ina#epool=na#eI-na#e

    IalueNOuser=na#ePI-alue

    I-property

    Ma)e pool=na#e

    de"ault touser=na#e

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    28/30

    Future !lans

    !ree#ption >)illing tas)s? i" a $ob is stared

    o" its #in or "air share "or so#e ti#e>%(88!+**.?

    7lobal scheduling opti#i'ation >%(88!+

    **Q?

    F&F8 pools >%(88!+R0, %(88!+.1R*?

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    29/30

    Capacity Scheduler

    8rgani'es $obs into 4ueues

    ueue shares as TDs o" cluster

    F&F8 scheduling within each 4ueue

    Supports pree#ption

    http--hadoop=apache=org-core-docs-current-

    capacityUscheduler=ht#l

  • 7/25/2019 Hadoop Summit Fair Scheduler (1)

    30/30

    3han)sV

    Fair scheduler included in %adoop 0=19/ and

    in ClouderaDs (istribution "or %adoop

    Fair scheduler "or %adoop 0=1Q and 0=1Rhttp--issues=apache=org-$ira-browse-%(88!+Q*

    Capacity scheduler included in %adoop 0=19/

    (ocs

    http--hadoop=apache=org-core-docs-current

    My e#ail#ateicloudera=co#