Workflow scheduling and optimization on clouds Maciej Malawski AGH University of Science and...

1

Workflow scheduling and optimization on clouds

Maciej Malawski

AGH University of Science and Technology Department of Computer Science

Academic Computer Centre CYFRONETKraków, Poland

University of Notre DameCenter for Research Computing

Indiana, USA

2

Problem space and selected areas

• Applications:– Scientific workflows– Workflow Ensembles (multiple workflows)– Bag-of-task applications

• Infrastructure:– IaaS clouds (Amazon, Google, Azure)

• Single cloud• Multiple clouds

– Private clouds (OpenStack)– Alternative/emerging infrastructures (Google

App Engine, AWS Lambda, EC2 burstable instances (T2))

• Optimization objectives and constraints– Cost optimization under deadline constraint– Maximization of completed workflows under

budget and deadline constraints• Cloud Storage aspects in scheduling

– Multiple clouds – inter-cloud storage and transfer

– Single cloud – delays, caching• Problems:

– Resource provisioning (creating resources on-demand)

– Task scheduling (assigning tasks to resources)– Interplay between autoscaling systems and

schedulers• Algorithms

– Static planning– Dynamic scheduling– Mathematical programming– Adaptive

• Interesting problems– Uncertainty of estimations– Task granularity vs. resource billing frequency

• Performance modeling– Evaluation of clouds– Application benchmarking

Workflow Ensembles - Problem Description

• Typical research question:– How much computation can we complete given the limited

time and budget of our research project?

• Constraints: Budget and Deadline• Goal: given budget and deadline, maximize the

number of prioritized workflows in an ensemble• Workflow = DAG of tasks

3

Time

VM

#

Dea

dli

ne

NVM = Budget / Time

Budget = area

4

Dynamic Algorithm: DPDS

Workflow-Aware: WA-DPDS

Static Algorithm:SPSS

Priority:

1

2

3

a.70 a.95

a.110

a.100 a.160

b.70

b.75

b.90

b.60 b.100

a.70 a.95

a.110

a.100

a.160

c.60 c.45

c.50

c.65

c.55

Deadline 360 minutes

Budget $18 = 3VM * 6h

a.70

a.95

a.110

a.100

a.160

b.70

b.75

b.90

b.60

b.100

c.45

c.55c.50

c.65

c.60

a.70 a.95

a.110

a.100 a.160

b.70

b.75

b.90

b.60 b.100

c.60 c.45

c.50 c.65

VM.1

VM.2

VM.3

60 120 180 240 300 3600Time in minutes

60 120 180 240 300 3600Time in minutes

60 120 180 240 300 3600Time in minutes

VM.1

VM.2

VM.3

VM.1

VM.2

VM.3

VM.4

Evaluation

• Simulation– Enables us to explore a large parameter space– Simulator uses CloudSim framework– CloudWorkflowSimulator https://

github.com/malawski/cloudworkflowsimulator

• Ensembles– Use synthetic workflows generated using

parameters from real applications (Montage, CyberShake, LIGO, SIPHT, Epigenomics)

– Randomized using different distributions, priorities

• Experiments– Determine relative performance– Measure effect of low quality estimates and

delays 5M. Malawski, G. Juve, E. Deelman, J. Nabrzyski: Algorithms for Cost- and Deadline-Constrained Provisioning for Scientific Workflow Ensembles in IaaS Clouds. Future Generation Computer Systems, vol. 48, pp. 1-18 (July 2015) http://dx.doi.org/10.1016/j.future.2015.01.004

https://github.com/malawski/cloudworkflowsimulator


http://dx.doi.org/10.1016/j.future.2015.01.004

6

Model of storage and data access in clouds

• Problem: most scheduling algorithms assume p2p communication between nodes

• Scientific workflow are data-intensive– Communication to computation

ratio– Existing cloud storage

technologies

• We assume:– 1..N replicas– Bandwidth limited at VM and replica

endpoint– Latency, – Fair sharing of bandwidth

• We can model:– In-memory storage – memcache– Cloud storage – Amazon S3– Shared filesystem - NFS

7

Storage and locality-aware algorithms

• Include data transfer estimates in task runtimes:– Storage-Aware DPDS (SA-DPDS), – Storage- and Workflow-Aware DPDS (SWA-DPDS),– Storage-Aware SPSS (SA-SPSS).

• New scheduling algorithm that takes advantage of caches and file locality to improve performance. – Dynamic Provisioning Locality-Aware Scheduling (DPLS), – Storage- and Workflow-Aware DPLS (SWA-DPLS).

8

Locality-Aware Scheduling

• Examine the Virtual Machines caches at the time of task submission

• Chooses the Virtual Machine on which the task is predicted to finish earliest

• Use both runtime and file transfer time estimates.

Using cache No cache

Low priority task

9

Selected results for in-memory storage (memcache)

10

Parallel transfer and cache hit ratio

• Applications with high degree of parallelism can benefit from parallel transfers

• Cache can improve cloud storage significantly

Inaccurate Runtime Estimate Results

11

Cost / Budget Makespan / Deadline

12

Task granularity

• Workflows with many short tasks are much easier to schedule using simple dynamic algorithms

• When tasks are closer to the cloud billing cycle (e.g. 1 hour) the static planning algorithms have advantage

Montage with artificially stretched tasks

13

• Infrastructure model– Multiple compute and

storage clouds– Heterogeneous instance

types• Application model

– Bag of tasks– Multi-level workflows

• Mathematical modeling with AMPL and CMPL

• Cost optimization under deadline constraints

• Mixed integer programming• Bonmin, Cplex solvers• Models for fine-grained and

coarse-grained workflows• Adaptive scheduling model:

– Static scheduling level-by-level

M. Malawski, K. Figiela, J. Nabrzyski: Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786-1794, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2013.01.004M. Malawski, K. Figiela, M. Bubak, E. Deelman, and J. Nabrzyski: Scheduling multi-level deadline-constrained scientific workflows on clouds based on cost optimization. Scientific Programming (2015) http://dx.doi.org/10.1155/2015/680271 Tomasz Dziok, Kamil Figiela, Maciej Malawski: Adaptive Multi-level Workflow Scheduling with Uncertain Task Estimates, PPAM’2015 (accepted)

Cost optimization of applications on multiple clouds




http://dx.doi.org/10.1155/2015/680271

http://dx.doi.org/10.1155/2015/680271

14

PaaSage – Deployment and Execution of Scientific Workflows in Model-based Cloud Platform Upperware

Motivation•Provisioning of multi-cloud resources for scientific workflows•Loosely coupled integration with cloud management platforms• Leverage cloud elasticity for autoscaling of scientific workflows

driven by workflow execution stage

Objectives• Integrate the HyperFlow workflow runtime environment with

the PaaSage cloud platform• Application-agnostic interplay of application-specific workflow

scheduler with generic provisioning and autoscaling components of PaaSage

Novelty• On-demand deployment of the workflow runtime environment

as part of the workflow application• Workflow engine as another app component driving the

execution of other components• Avoidance of tight coupling to a particular cloud infrastructure

and middleware

PaaSage platform

Open and integrated platform to support model-driven development, deployment and adaptive execution of multi-cloud applications.

Integration with PaaSage

• CAMEL application model automatically generated based on the HyperFlow workflow description. Includes initial deployment plan and scalability rules which control autoscaling behavior

• Monitoring information sent from the Task scheduler and VM workers to the PaaSage Executionware; Triggers the scalability rules and automatic scaling of the workflow application

Bartosz Baliś, Marian Bubak, Kamil Figiela, Maciej Malawski, Maciej Pawlik, Towards Deployment and Autoscaling of Scientific Workflows with HyperFlow and PaaSage, CGW’14

15

Levee Monitoring Application – ISMOP project

• Levee breach threat due to a passing wave• High water levels lasting for up to 2 weeks• Large areas of levees affected (100+ km)

16

ISMOP threat level assessment workflow

Implemented in HyperFlow workflow engine

17

ISMOP resource provisioning model

• Cost optimization under deadline• Bag-of-task model

– Selection of dominating tasks– Uniform task runtimes

• Performance model: T = f (v, d, s, …)– T – total computing time– v – number of VMs– d – time window in days– s – number of tasks (sections)

• (1)– Parameters a, b, c to be determined experimentally– Solve eq. (1) to compute number of VMs given a deadline

18

ISMOP Experiments

• Setup: private cloud infrastructure– a node with 8 cores (Xeon E5-2650)– virtual machines (1VCPU, 512MB RAM)– data for simulated scenarios (244MB total) on local disks

• Test runs:– 128 sections, 16 days– 16 VMs, 1 day– 128 sections, 1 day– 1024 sections, 1 day– 16 VMs, 16 VMs

• Warmup tasks:

19

Analysis of results

• Warmup tasks clearly separated as outliers

• Linear functions• Parameters a, b, c determined using non-

linear fit• The model fits well to the data

War

mup{

Bartosz Balis, Marek Kasztelnik, Maciej Malawski, Piotr Nowakowski, Bartosz Wilk, Maciej Pawlik, Marian Bubak, Execution Management and Efficient Resource Provisioning for Flood Decision Support, Procedia Computer Science, Volume 51, 2015, Pages 2377-2386, ISSN 1877-0509, http://dx.doi.org/10.1016/j.procs.2015.05.412.

20

IaaS Provider

EEA Zoning

jClouds API

Support

BLOB storage support

Per-hour

instance billing

API Access

Published price

VM Image

Import / Export

Relational DB

support Score

Weight 20 20 10 5 5 5 3 2 1 Amazon AWS 1 1 1 1 1 1 0 1 27 2 Rackspace 1 1 1 1 1 1 0 1 27 3 SoftLayer 1 1 1 1 1 1 0 0 25 4 CloudSigma 1 1 0 1 1 1 1 0 18 5 ElasticHosts 1 1 0 1 1 1 1 0 18 6 Serverlove 1 1 0 1 1 1 1 0 18 7 GoGrid 1 1 0 1 1 1 0 0 15 8 Terremark ecloud 1 1 0 1 1 0 1 0 13 9 RimuHosting 1 1 0 0 1 1 0 1 12

10 Stratogen 1 1 0 0 1 0 1 0 8 11 Bluelock 1 1 0 0 1 0 0 0 5 12 Fujitsu GCP 1 1 0 0 1 0 0 0 5

• Performance of VM deployment times• Virtualization overhead Evaluation of open source cloud

stacks (Eucalyptus, OpenNebula, OpenStack)• Survey of European public cloud providers • Performance evaluation of top cloud providers (EC2,

RackSpace, SoftLayer)• A grant from Amazon has been obtained

M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma: Evaluation of Cloud Providers for VPH Applications, poster at CCGrid2013 - 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013

Cloud performance evaluation

21

Cloud and Big Data Benchmarking and Verification Methodology

• Methodology of Evaluation of systems and applications– Qualitative metrics (architectures, functionality)– Quantitative metrics (performance, stability, cost)– Test scenarios, test cases and parameters– Experiment planning, analysis of results

• Selection of benchmarks– Portfolio of standard benchmarks– Design of application-specific scenarios

• Target platforms– IaaS clouds (public, private)– Hybrid Clouds with cloud bursting – Real-Time BigData processing systems (Hadoop, Spark, ElasticSearch)

• Collaboration with Samsung R&D Polska– Methodology applied to cloud infrastructure at the industrial partner– Consultancy on the analysis of results and development of Testing-as-a-service (TaaS) system

K. Zieliński, M. Malawski, M. Jarząb, S. Zieliński, K. Grzegorczyk, T. Szepieniec, and M. Zyśk: Evaluation Methodology of Converged Cloud Environments. In: K. Wiatr, J. Kitowski, M. Bubak (Eds) Proceedings of the Seventh ACC Cyfronet AGH Users’ Conference, ACC CYFRONET AGH, Kraków, ISBN 978-83-61433-09-5, pp. 77-78 (2014)

22

Thank you!

• DICE Team at AGH & Cyfronet– Marian Bubak, Piotr Nowakowski,

Bartosz Baliś, Maciej Pawlik, Marek Kasztelnik, Bartosz Wilk, Tomasz Bartyński, Jan Meizner, Daniel Harężlak

• PhD Student:– Kamil Figiela

• MSc Students:– Piotr Bryk, Tomasz Dziok

• Notre Dame:– Jarek Nabrzyski

• USC/ISI:– Ewa Deelman, Gideon Juve

• Projects & Grants– EU FP7 VPH-Share– PL-Grid– EU FP7 PaaSage– ISMOP (PL)

• References:– CloudWorkflowSimulator


– HyperFlow: https://github.com/dice-cyfronet/hyperflow/

– DICE Team: http://dice.cyfronet.pl




https://github.com/dice-cyfronet/hyperflow/



http://dice.cyfronet.pl/

Workflow scheduling and optimization on clouds Maciej Malawski AGH University of Science and...

Documents

Transcript of Workflow scheduling and optimization on clouds Maciej Malawski AGH University of Science and...