Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The...

34
Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The Electrical and Computer Engineering Department The University of British Columbia

Transcript of Energy Prediction for I/O Intensive Workflow Applications 1 MASc Exam Hao Yang NetSysLab The...

1

Energy Prediction for I/O Intensive Workflow Applications

MASc ExamHao Yang

NetSysLabThe Electrical and Computer Engineering Department

The University of British Columbia

2

Background - Workflow Applications

Montage Workflow

Computation

File Dependency

Characteristics: • File based communication• Large number of tasks• Large amount of I/O • Common data access patterns

Background - Application Execution

3

Central Storage System (e.g., GPFS, NFS)

File based communication

Large I/O volumeWorkflow Runtime

EngineApp. task

Local storage

App. task

Local storage

App. task

Local storage

App. task

Local storage

App. task

Local storage

I/O Bottleneck

Background - Intermediate Storage System

4

Central Storage System (e.g., GPFS, NFS)

App. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate Storage

Workflow Runtime

Engine

Stage In

Stage Out

Compute Nodes

5

Background - Context of this thesis

This work focuses on workflow application execution on intermediate storage systems.

6

Research Problem – Energy Consumption

• The pursuit of performance use to dominate the conventional computing area.

• Energy efficiency is the new concern.

Computing Equipment Energy Bill

7

Research Problem - Configuration Decisions

Montage Workload Energy Delay Product (EDP)

Configuring the runtime system is complex (Example: resource allocation decision)

8

• Q1: What performance optimizations in storage systems lead to energy savings?

• Q2: What is the performance and energy impact of power-centric tuning techniques?

• Q3: How can users balance time-to-solution and energy consumption when given a target application?

Research Problem - Questions

9

Outline

• Background• Research Problem• Methodology • Evaluation• Conclusion

10

Methodology – Building Energy Consumption Predictor

The goal of this work is to build an energy consumption predictor to aid system configuration and provisioning decisions.

• Answer what-if questions (E.g, is A configuration better than B from the energy perspective?)

• Customize optimization metric (E.g., energy consumption, performance-energy product)

Methodology – Energy Model

11

App. task

Local storage

App. task

Local storage

App. task

Local storage

Intermediate Storage

Compute Nodes

Execution States: • Idle • Network Transfer• Storage I/O • Task Processing

A C D

App. task

Local storage

BWorkflow Runtime

Engine

Power Profiles:

12

Methodology – Energy Model

Idle

Network Transfer

I/O ops (read, write)

Task Processing

Energy Power Profile * Predicted TimesExecution States:

13

Methodology – Energy Model

How to seed the energy model?

• Power states: using synthetic benchmarks to get the power consumption in each state.

• Time estimates: augments a performance predictor to track the time spent in each state.

14

Methodology – Building Energy Consumption Predictor

L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, (Acceptance Rate: 20%) June 2014. L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for Workflow Applications”, In Proceedings PDSW'13, 2013.

Sources of inaccuracies

homogeneity, Power meter

Time Prediction

Model Simplification(metadata, scheduling, …)

15

Evaluation Outline

• Synthetic benchmarks: Workflow Patterns• Real workflow applications • Predicting Energy Impact of Power-tuning Techniques• Predicting Energy-Performance Tradeoffs

16

Evaluation - Platform

• Taurus Cluster (11 nodes) two 2.3GHz Intel Xeon E5-2630 CPUs (each with 6 cores), 32GB memory, 10 Gbps NIC

• Sagittaire Cluster (16 nodes) two 2.4GHz AMD Opteron CPUs (each with one core), 2GB RAM and 1 Gbps NIC

• SME Omegawatt power-meter per Node 0.01W power resolution at 1Hz sampling rate

Grid5000 Lyon site

IdleAppStorage I/ONet transfer

17

Evaluation – Synthetic benchmarks: Workflow Patterns

Montage Workflow

Pipeline

Reduce

18

Evaluation – Synthetic benchmarks: Workflow Patterns

19

Evaluation – Synthetic benchmarks: Workflow Patterns

• Average 88% accuracy• 20-30x times faster than running the actual benchmark • 200x-300x less resources

(machines * runtime)

Using Default Storage System Configuration (DSS)

20

Evaluation – Synthetic benchmarks: Workflow Patterns

S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, Journal of Grid Computing 2014.

Pipeline Energy Consumption

DSS – Default Storage System ConfigurationWOSS – Workflow Optimized Storage System Configuration

Q1: What are the energy savings that performance optimizations in storage can bring?

• Accurate in both configurations. • Suggests the configuration from

energy perspective.

21

Evaluation – Real Workflow Applications

BLAST workflow Montage workflow

22

Evaluation – Real Workflow Applications

BLAST Result (Energy 89%, Time 95% )

Montage Result (Energy 84%, Time 86% )

23

Evaluation – CPU Throttling

• CPU throttling is an important technique where processors run at less-than-maximum frequency to conserve power.

• this technique can prolong the execution time while conserving instantaneous power.

Q2: What is the energy and performance impact of CPU throttling? Is it application-specific?

CPU bound application: BLAST I/O bound application: pipeline benchmark

24

Evaluation – CPU Throttling

BLAST Result

Pipeline Result

Energy Time

Energy Time

17% savings when using maximum throttling

96% cost when using maximum CPU throttling

Frequency Level: 1200MHz, 1800MHz, 2300MHz

Conclusion: • The computational and I/O characteristics

Energy savings/ energy costs

• The predictor can be used in make the decisions.

25

Evaluation – Predicting Energy Delay Product

User’s optimization metric • Performance (use more machines)• Energy • Energy-Delay Product (EDP, energy * time)

• Consider allocation decision. • Use Montage workload on two clusters to demonstrate prediction.

Q3: How can users balance time-to-solution and energy consumption when given a target application?

26

Evaluation – Predicting Energy Delay Product

Montage EDP at TaurusMontage EDP at Sagittaire

27

Conclusion

• This thesis presents an energy consumption predictor in the workflow application domain.

• The proposed energy model and prediction framework achieve adequate accuracy to be useful for the energy-oriented configurations this work targets.

28

Resulting PublicationsEnergy Prediction• H. Yang, L. B. Costa and M. Ripeanu, “Energy Prediction for I/O Intensive Workflows Applications”, submitted to

7th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS) 2014 (Co-located with Supercomputing/SC 2014), under-review.

Performance Prediction and Provisioning • L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration and Provisioning for I/O Intensive Workflows”, In Preparation. • L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive

Workflows”, In Proceedings of ICS'14, Acceptance rate: 20%. June 2014. • L. B. Costa, S. Al-Kiswany, A. Barros, H. Yang, and M. Ripeanu, “Predicting Intermediate Storage Performance for

Workflow Applications”, In Proceedings PDSW'13, 2013.

Evaluating Storage Systems for Scientific Data in the Cloud• K. Maheshwari, J. Wozniak, H. Yang, D. S. Katz, M. Ripeanu, V. Zavala, M. Wilde, “Evaluating Storage

Systems for Scientific Data in the Cloud”, In Proceedings of the 5th Workshop on Scientific Cloud Computing (ScienceCloud), Co-located with ACM HPDC 2014 (Best Paper Award)

A Workflow-Optimized Storage System • S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan , M. Ripeanu, “A Software Defined Storage for Scientific Workflow Applications”, In Preparation. • S. Al-Kiswany, L. B. Costa, H. Yang, E. Vairavanathan, M. Ripeanu, “The Case for Cross-Layer Optimizations in Storage: A Workflow-Optimized Storage System”, IEEE Transactions on Parallel and Distributed Systems (TPDS), Under Review, Submitted in June 2014• L.B. Costa, H. Yang, E. Vairavanathan, A. Barros, K. Maheshwari, G. Fedak, D.S. Katz, M. Wilde, M. Ripeanu and S. Al-Kiswany, “The Case for Workflow-Aware Storage: An Opportunity Study using MosaStore”, accepted by Journal of Grid Computing, 2014.

29

• The system model• Model seeding• Workload description

System Deployment ConfigurationNumber of Storage Nodes

Number of Client NodesChunk Size

Replication Level…

Platform Performance ParametersManger Service Time

Storage Service Time

Client Service Time

Remote network service Time

Local network service time

𝜇𝑚𝑎

𝜇𝑠𝑚

𝜇𝑟𝑒−𝑛𝑒𝑡

𝜇lo−𝑛𝑒𝑡

𝜇𝑐𝑙𝑖

I/O traces Task Dependency Graph

L. B. Costa, S. Al-Kiswany, H. Yang, and M. Ripeanu, “Supporting Storage Configuration for I/O Intensive Workflows”, In Proceedings of the 28th ACM International Conference on Supercomputing, ICS'14, June 2014.

Backup Slides

30

Limitations: • Simplification of the model• Short tasks/ small workload• Not validated using new devices (e.g, SSD)

Backup Slides

31

Alternative Approaches: • Utilization• Detailed simulation• Machine learning

Backup Slides

32

Apply benchmarks in parallel to get combined power state: E.g., perform storage and network benchmarks in parallel

91.6W, :129.0W, : 127.7W

Backup Slides

Combined states

33

Energy Composition (pipeline benchmark): • Idle energy: 64%• App processing: 9.2%• Storage operations: 15.8%• Network transfer: 10.6%

Backup Slides

34

Sagittaire power profiles

Backup Slides

175W

25W

8W

7W