DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang...

25
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng He School of Computer Engineering Nanyang Technological University 22/6/15

Transcript of DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang...

Page 1: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters

Nanyang Technological University

Shanjiang Tang, Bu-Sung Lee, Bingsheng He

School of Computer Engineering

Nanyang Technological University

23/4/21

Page 2: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

OutLine

• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion

2Nanyang Technological University23/4/21

Page 3: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Big Data is Everywhere

• Lots of data is being collected and warehoused. – Web data, e-commerce– purchases at department/

grocery stores– Bank/Credit Card

transactions– Social Network– Astronomical Image

Processing– Bioinformatics.

3Nanyang Technological University23/4/21

Page 4: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

MapReduce is a Promising Choice

• A popular parallel programming model

4Nanyang Technological University

Map Intermediate

Result

Intermediate

Result

Intermediate

Result

Intermediate

Result

Map

Map

Map

Reduce OutputResult

ReduceOutputResult

ReduceOutputResult

ReduceOutputResult

FinalResult

Map-Phase Computation

Reduce-Phase Computation

InputData

23/4/21

Page 5: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Hadoop

• Apache Hadoop is a open-source framework for reliable, scalable, and distributed computing. It implements the computational paradigm named MapReduce.– Scale up to 6,000-10,000 machines– Support for multi-tenancy

• Useful links:– http://hadoop.apache.org/– http://hadoop.apache.org/docs/r0.20.2/mapred_tutorial.html

– http://apache.panu.it/hadoop/common/stable/

5Nanyang Technological University23/4/21

Page 6: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Challenges in Distributed Environment

• Node failures and Stragglers (slow nodes)– Mean time between failures for 1000 nodes = 1 dayAffecting performance.

• Commodity network = low bandwidth– Push computation to the data (Data Locality Optimization)Affecting performance.

• Resource contention in shared cluster environment– Performance isolation and fair resource sharingAffecting performance and fairness.

Performance and fairness optimization are important!

23/4/21 Nanyang Technological University 6

Page 7: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Our Work

• Challenges: How to improve the performance of Hadoop while guarantee the fairness?

• Our Solution: DynamicMR: A Dynamic Resource Allocation System for Hadoop. – Improve the resource utilization as much as possible.– Improve the utilization efficiency as much as possible.

23/4/21 Nanyang Technological University 7

Page 8: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

OutLine

• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion

8Nanyang Technological University23/4/21

Page 9: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

• Hadoop abstracts resources into map slots and reduce slots.– Configured by Hadoop administrator statically. – Resource constrain: map tasks can only use map slots,

reduce tasks can only use reduce slots.

Observation 1#: Poor Resource Utilization

9Nanyang Technological University23/4/21

0 4 8 12 16 20 24 28 32 36 40 44

1JM

1JM

3JM

4JM

4JM

4JM

4JM

4JM

4JM

4JM

4JM

4JM

2JR 3J

R

1JR 4J

R

2JM

4JR

3JR

Slots resources are wasted during computation!

Page 10: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

• Core idea of DHSA.– Slots are generic and can be used by either map or

reduce tasks, although there is a pre-configuration for the number of map and reduce slots.

– Map tasks will prefer to use map slots and likewise reduce tasks prefer to use reduce slots.

Technique 1#: Dynamic Hadoop SlotAllocation (DHSA)

10Nanyang Technological University23/4/21

0 4 8 12 16 20 24 28 32 36 40 44

1JM

3JM

4JM

4JM

4JM

4JM

4JM

4JM

4JM

4JM

4JM

3JR

1JR

4JR

4JR

3JR

1JM

2JM

2JM

2JR

2JR

Page 11: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Observation 2#: Speculative Execution is a Double-edged Sword• Speculative Scheduling

– Run a backup task for straggled task. – Pros: Can improve the performance of a single Job.– Cons: the resource utilization efficiency is reduced,

especially when there are other pending tasks.

11Nanyang Technological University23/4/21

1122

3 3

44

55

1

stragglerstraggler

Backup taskBackup task

A Performance tradeoff for a single job and batch jobs!

1122

3 3

44

55

66

Benefit J1 Benefit J1

Benefit the whole workloadBenefit the whole workload

Page 12: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

• Key idea of SEPB:– Instead of running speculative tasks immediately when

straggler of a job is detected, we check a subset of jobs (maxNumOfJobsCheckedForPendingTasks)for pending tasks.

– If there are pending tasks, allocate pending tasks. Otherwise, allocate speculative task.

Technique 2#: Speculative Execution Performance Balancing (SEPB)

12Nanyang Technological University23/4/21

J4 J3 J2 J1J5J6

maxNumOfJobsCheckedForPendingTasks

Page 13: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Observation 3#: Load Balance Requirement Harms Data Locality• Load Balancing is adopted by Hadoop.

– Hadoop tries to keep the load (i.e., running tasks) in each node is as close as possible.

13Nanyang Technological University23/4/21

Load Balancing makes J1 failed to achieve data locality!

Page 14: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

• Key idea: Improve data locality at the expense of load balance.– When there are idle slots and local data, we preschedule

the task on that machine first.– Otherwise, we keep the load balance constrain.

Technique 3#: Slot PreScheduling

14Nanyang Technological University23/4/21

Page 15: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

DynamicMR

• A combination of the aforementioned three techniques.– DHSA : Slot Utilization Optimization.– SEPB, Slot PreScheduling: Efficiency Optimization

15Nanyang Technological University23/4/21

Speculative Execution Performance Balancing

(SEPB)Slot PreScheduling

Dynamic Hadoop SlotAllocation (DHSA)

Map Task

ReduceTask

(1). Slot Utilization Optimization

(2). Utilization Efficiency Optimization

IdleSlot

1 2 3

Page 16: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

OutLine

• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion

16Nanyang Technological University23/4/21

Page 17: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Experimental Setup

• Hadoop Cluster– 10 nodes, each with two Intel X5675 CPUs (6 cores per

CPU with 3.07 GHz), 24GB DDR3 memory, 56GB hard disks.

• Benchmark and Data Sets.

17Nanyang Technological University23/4/21

Page 18: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

DynamicMR Performance Evaluation

18Nanyang Technological University23/4/21

Page 19: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

DynamicMR VS YARN

• DynamicMR achieves better performance than YARN.– Benefits from the ratio control of concurrently running map

and reduce tasks of DynamicMR, whereas YARN not.

19Nanyang Technological University23/4/21

Page 20: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

OutLine

• Background and Motivation• DynamicMR Overview• Experimental Evaluation• Conclusion

20Nanyang Technological University23/4/21

Page 21: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Conclusion

• We propose a DynamicMR framework to improve the performance of MapReduce workloads while maintaining the fairness.– Consists of three techniques: DHSA, SEPB, and Slot

Prescheduling.

• Experimental results show that:– It improves the performance of Hadoop 46%~115% for

single jobs and 49%~112% for batch jobs.– It outperforms YARN by about 2%~9% for multiple jobs.

21Nanyang Technological University23/4/21

Page 22: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

22Nanyang Technological University23/4/21

Page 23: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

DHSA Evaluation

• DHSA achieves a better performance than Hadoop.• Hadoop is sensitive to slot configuration, whereas

DHSA does not.

23Nanyang Technological University23/4/21

Page 24: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

SEPB Evaluation

• SEPB improves the performance for the whole jobs (Figure a).

• There is a performance tradeoff between an individual jobs and the whole jobs with SEPB (Figure b).

24Nanyang Technological University23/4/21

Page 25: DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Slot PreScheduling Evaluation

• Data Locality and Performance Improvement

25Nanyang Technological University23/4/21