Overprovisioning for Performance Consistency in Grids

25
24-06-22 Challenge the future Delft University of Technology Overprovisioning for Performance Consistency in Grids Nezih Yigitbasi and Dick Epema Parallel and Distributed Systems Group Delft University of Technology http:// guardg.st.ewi.tudelft.nl/

description

Overprovisioning for Performance Consistency in Grids. Nezih Yigitbasi and Dick Epema. P arallel and Distributed Systems Group Delft University of Technology. http://guardg.st.ewi.tudelft.nl/. The Problem: Performance inconsistency in grids. Inconsistent performance common in grids - PowerPoint PPT Presentation

Transcript of Overprovisioning for Performance Consistency in Grids

Page 1: Overprovisioning for Performance Consistency in Grids

22-04-23

Challenge the future

DelftUniversity ofTechnology

Overprovisioningfor Performance Consistency

in Grids

Nezih Yigitbasi and Dick Epema

Parallel and Distributed Systems GroupDelft University of Technology

http://guardg.st.ewi.tudelft.nl/

Page 2: Overprovisioning for Performance Consistency in Grids

2

The Problem: Performance inconsistency in grids

~70X

• Inconsistent performance common in grids• bursty workloads

• variable background loads

• high rate of failures

• highly dynamic & heterogeneous environment

Bag-of-Tasks with 128 tasks

submitted every 15 minutes

How can we provide consistent performance in grids?How can we provide consistent performance in grids?

Page 3: Overprovisioning for Performance Consistency in Grids

3

GOAL-1Realistic performance evaluation of static and dynamic overprovisioning strategies (system’s perspective)

GOAL-2Dynamically determine the overprovisioning factor (Κ) for user specified performance requirements (user’s perspective)

Our goals

Page 4: Overprovisioning for Performance Consistency in Grids

4

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 5: Overprovisioning for Performance Consistency in Grids

5

Overprovisioning (I)• Increasing the system capacity to provide better, and in

particular, consistent performance even under variable workloads and unexpected demands

Pros• simple • obviates the need for complex algorithms• easy to deploy & maintain

Cons• cost-ineffective• workloads may evolve (e.g., increasing user base)• lowly-utilized systems

Page 6: Overprovisioning for Performance Consistency in Grids

6

Overprovisioning (II)• Preferred way of providing performance guarantees• typical data center utilization is no more than 15-50%• telecommunication systems have ~30% on average

L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing,

IEEE Computer, December 2007.

L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing,

IEEE Computer, December 2007.

• High overprovisioning factors (Κ) are common in modern systems

• Google: 450,000 (2005)• Microsoft: 218,000 (mid-

2008)• Facebook: 10,000+ (2009)

Page 7: Overprovisioning for Performance Consistency in Grids

7

1. Statici. Largestii. Alliii. Number

• Where should we deploy the resources?• Does it make any difference?

2. Dynamic• Dynamic overprovisioning

• a.k.a. auto-scaling• low/high thresholds for acquiring/releasing resources

• Given Κ, it is straightforward to determine the number of processors for a strategy

Overprovisioning strategies

Time

Static Dynamic

Waste

Demand

Page 8: Overprovisioning for Performance Consistency in Grids

8

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 9: Overprovisioning for Performance Consistency in Grids

9

System model

• DAS-3 multi-cluster grid• Global Resource Managers (GRM)

interacting with Local Resource Managers (LRM)GRM

globalqueue

LRM

local queues

local jobsglobal job

LRM

LRM

Page 10: Overprovisioning for Performance Consistency in Grids

10

Workload

• Realistic workloads consisting of Bag-of-Tasks (BoT)

• Simulations using 10 workloads with 80% load• each workload has ~1650 BoTs and ~10K tasks• duration of each workload is [1 day-1week]

• Real background load trace • DAS-3 trace of June’08 (http://gwa.ewi.tudelft.nl/)

(Distribution parameters are determined after base-two log transformation)

Page 11: Overprovisioning for Performance Consistency in Grids

11

Scheduling model

Page 12: Overprovisioning for Performance Consistency in Grids

12

Methodology• Compare the overprovisioned system with the initial system (NO)

• For Dynamic

• 69/129 s and 18/23 s for min/max acquisition/release

• 60%/70% for low/high thresholds

• Κ varies over time so for a fair comparison keep it in ± 10% range

Page 13: Overprovisioning for Performance Consistency in Grids

13

Traditional performance metrics

First task submitted Last task done

Makespan

Page 14: Overprovisioning for Performance Consistency in Grids

14

Consistency metrics

• We define two metrics to capture the notion of consistency across two dimensions

• System gets more consistent as Cd gets closer to 1, Cs gets closer to 0

• A tighter range of the NSL is a sign of better consistency

Page 15: Overprovisioning for Performance Consistency in Grids

15

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 16: Overprovisioning for Performance Consistency in Grids

16

Performance of scheduling policies

ECT is the worst

Dynamic Per Task

is the best

Page 17: Overprovisioning for Performance Consistency in Grids

17

Performance of different strategies

Different Overprovisioning Factors (Κ)DifferentStrategies

• Consistency obtained with overprovisioning is much better than the initial system (NO)

• Static strategies provide similar performance (only K matters)• All and Largest are viable alternatives to Number as Number increases

the administration, installation, and maintenance costs• Dynamic strategy has better performance compared to static strategies• K = 2.5 is the critical value

Page 18: Overprovisioning for Performance Consistency in Grids

18

Cost of different strategies

• Use CPU-Hours• time a processor is used [h]• round up a partial instance-hours to one hour similar to the

Amazon EC2 on-demand instances pricing model

• Significant reduction, as high as ~40%, in cost

Page 19: Overprovisioning for Performance Consistency in Grids

19

Outline

Overprovisioning Strategies

Experimental Setup

Results

Dynamically Determining Κ

Conclusions

Page 20: Overprovisioning for Performance Consistency in Grids

20

Determining Κ dynamically

• So far system’s perspective, now user’s perspective

• How can we dynamically determine Κ given the user performance requirements?

• We use a simple feedback-control approach to deploy additional resources dynamically to meet user performance requirements

Page 21: Overprovisioning for Performance Consistency in Grids

21

Evaluation

• Simulated DAS-3 without background load

• ~1.5 month workload consisting of ~33K BoTs• Empirically show that the controller stabilizes

• Average makespan for the workload in the initial system (without the controller) is ~3120 minutes

• Three scenarios from tight to loose performance requirements• [250m-300m]• [700m-750m]• [1000m-1250m]

Page 22: Overprovisioning for Performance Consistency in Grids

22

Results (I)

•Significant improvement, as high as ~65%, when the performance requirements are tight

•~40%-50% improvement for loose performance requirements

Page 23: Overprovisioning for Performance Consistency in Grids

23

Results (II)

[250m-300m] [700m-750m]

[1000m-1250m]

Page 24: Overprovisioning for Performance Consistency in Grids

24

Conclusions

• Overprovisioning improves performance consistency significantly• Static strategies provide similar performance (only K matters)• Dynamic strategy performs better than the static strategies• Need to determine the critical value to maximize the benefit of overprovisioning

GOAL-2: Dynamically Determining GOAL-2: Dynamically Determining ΚΚ for Given User for Given User Performance RequirementsPerformance Requirements

• Feedback-controlled system tuning K dynamically using historical

performance data and specified performance requirements

• The number of BoTs meeting the performance requirements increases

significantly, as high as 65%, compared to the initial system

GOAL-1: Realistic Performance Evaluation of Different GOAL-1: Realistic Performance Evaluation of Different StrategiesStrategies

Page 25: Overprovisioning for Performance Consistency in Grids

25

More Information:

•Guard-g Project: http://guardg.st.ewi.tudelft.nl/

•PDS publication database: http://www.pds.twi.tudelft.nl

Thank you! Questions? Comments?Thank you! Questions? Comments?

[email protected]”http://www.st.ewi.tudelft.nl/~nezih/

[email protected]”http://www.st.ewi.tudelft.nl/~nezih/