Workflow scheduling and optimization on clouds Maciej Malawski AGH University of Science and...
-
Upload
maude-hunter -
Category
Documents
-
view
219 -
download
0
Transcript of Workflow scheduling and optimization on clouds Maciej Malawski AGH University of Science and...
1
Workflow scheduling and optimization on clouds
Maciej Malawski
AGH University of Science and Technology Department of Computer Science
Academic Computer Centre CYFRONETKraków, Poland
University of Notre DameCenter for Research Computing
Indiana, USA
2
Problem space and selected areas
• Applications:– Scientific workflows– Workflow Ensembles (multiple workflows)– Bag-of-task applications
• Infrastructure:– IaaS clouds (Amazon, Google, Azure)
• Single cloud• Multiple clouds
– Private clouds (OpenStack)– Alternative/emerging infrastructures (Google
App Engine, AWS Lambda, EC2 burstable instances (T2))
• Optimization objectives and constraints– Cost optimization under deadline constraint– Maximization of completed workflows under
budget and deadline constraints• Cloud Storage aspects in scheduling
– Multiple clouds – inter-cloud storage and transfer
– Single cloud – delays, caching• Problems:
– Resource provisioning (creating resources on-demand)
– Task scheduling (assigning tasks to resources)– Interplay between autoscaling systems and
schedulers• Algorithms
– Static planning– Dynamic scheduling– Mathematical programming– Adaptive
• Interesting problems– Uncertainty of estimations– Task granularity vs. resource billing frequency
• Performance modeling– Evaluation of clouds– Application benchmarking
Workflow Ensembles - Problem Description
• Typical research question:– How much computation can we complete given the limited
time and budget of our research project?
• Constraints: Budget and Deadline• Goal: given budget and deadline, maximize the
number of prioritized workflows in an ensemble• Workflow = DAG of tasks
3
Time
VM
#
Dea
dli
ne
NVM = Budget / Time
Budget = area
4
Dynamic Algorithm: DPDS
Workflow-Aware: WA-DPDS
Static Algorithm:SPSS
Priority:
1
2
3
a.70 a.95
a.110
a.100 a.160
b.70
b.75
b.90
b.60 b.100
a.70 a.95
a.110
a.100
a.160
c.60 c.45
c.50
c.65
c.55
Deadline 360 minutes
Budget $18 = 3VM * 6h
a.70
a.95
a.110
a.100
a.160
b.70
b.75
b.90
b.60
b.100
c.45
c.55c.50
c.65
c.60
a.70 a.95
a.110
a.100 a.160
b.70
b.75
b.90
b.60 b.100
c.60 c.45
c.50 c.65
VM.1
VM.2
VM.3
60 120 180 240 300 3600Time in minutes
60 120 180 240 300 3600Time in minutes
60 120 180 240 300 3600Time in minutes
VM.1
VM.2
VM.3
VM.1
VM.2
VM.3
VM.4
Evaluation
• Simulation– Enables us to explore a large parameter space– Simulator uses CloudSim framework– CloudWorkflowSimulator https://
github.com/malawski/cloudworkflowsimulator
• Ensembles– Use synthetic workflows generated using
parameters from real applications (Montage, CyberShake, LIGO, SIPHT, Epigenomics)
– Randomized using different distributions, priorities
• Experiments– Determine relative performance– Measure effect of low quality estimates and
delays 5M. Malawski, G. Juve, E. Deelman, J. Nabrzyski: Algorithms for Cost- and Deadline-Constrained Provisioning for Scientific Workflow Ensembles in IaaS Clouds. Future Generation Computer Systems, vol. 48, pp. 1-18 (July 2015) http://dx.doi.org/10.1016/j.future.2015.01.004
6
Model of storage and data access in clouds
• Problem: most scheduling algorithms assume p2p communication between nodes
• Scientific workflow are data-intensive– Communication to computation
ratio– Existing cloud storage
technologies
• We assume:– 1..N replicas– Bandwidth limited at VM and replica
endpoint– Latency, – Fair sharing of bandwidth
• We can model:– In-memory storage – memcache– Cloud storage – Amazon S3– Shared filesystem - NFS
7
Storage and locality-aware algorithms
• Include data transfer estimates in task runtimes:– Storage-Aware DPDS (SA-DPDS), – Storage- and Workflow-Aware DPDS (SWA-DPDS),– Storage-Aware SPSS (SA-SPSS).
• New scheduling algorithm that takes advantage of caches and file locality to improve performance. – Dynamic Provisioning Locality-Aware Scheduling (DPLS), – Storage- and Workflow-Aware DPLS (SWA-DPLS).
8
Locality-Aware Scheduling
• Examine the Virtual Machines caches at the time of task submission
• Chooses the Virtual Machine on which the task is predicted to finish earliest
• Use both runtime and file transfer time estimates.
Using cache No cache
Low priority task
9
Selected results for in-memory storage (memcache)
10
Parallel transfer and cache hit ratio
• Applications with high degree of parallelism can benefit from parallel transfers
• Cache can improve cloud storage significantly
Inaccurate Runtime Estimate Results
11
Cost / Budget Makespan / Deadline
12
Task granularity
• Workflows with many short tasks are much easier to schedule using simple dynamic algorithms
• When tasks are closer to the cloud billing cycle (e.g. 1 hour) the static planning algorithms have advantage
Montage with artificially stretched tasks
13
• Infrastructure model– Multiple compute and
storage clouds– Heterogeneous instance
types• Application model
– Bag of tasks– Multi-level workflows
• Mathematical modeling with AMPL and CMPL
• Cost optimization under deadline constraints
• Mixed integer programming• Bonmin, Cplex solvers• Models for fine-grained and
coarse-grained workflows• Adaptive scheduling model:
– Static scheduling level-by-level
M. Malawski, K. Figiela, J. Nabrzyski: Cost minimization for computational applications on hybrid cloud infrastructures, Future Generation Computer Systems, Volume 29, Issue 7, September 2013, Pages 1786-1794, ISSN 0167-739X, http://dx.doi.org/10.1016/j.future.2013.01.004M. Malawski, K. Figiela, M. Bubak, E. Deelman, and J. Nabrzyski: Scheduling multi-level deadline-constrained scientific workflows on clouds based on cost optimization. Scientific Programming (2015) http://dx.doi.org/10.1155/2015/680271 Tomasz Dziok, Kamil Figiela, Maciej Malawski: Adaptive Multi-level Workflow Scheduling with Uncertain Task Estimates, PPAM’2015 (accepted)
Cost optimization of applications on multiple clouds
14
PaaSage – Deployment and Execution of Scientific Workflows in Model-based Cloud Platform Upperware
Motivation•Provisioning of multi-cloud resources for scientific workflows•Loosely coupled integration with cloud management platforms• Leverage cloud elasticity for autoscaling of scientific workflows
driven by workflow execution stage
Objectives• Integrate the HyperFlow workflow runtime environment with
the PaaSage cloud platform• Application-agnostic interplay of application-specific workflow
scheduler with generic provisioning and autoscaling components of PaaSage
Novelty• On-demand deployment of the workflow runtime environment
as part of the workflow application• Workflow engine as another app component driving the
execution of other components• Avoidance of tight coupling to a particular cloud infrastructure
and middleware
PaaSage platform
Open and integrated platform to support model-driven development, deployment and adaptive execution of multi-cloud applications.
Integration with PaaSage
• CAMEL application model automatically generated based on the HyperFlow workflow description. Includes initial deployment plan and scalability rules which control autoscaling behavior
• Monitoring information sent from the Task scheduler and VM workers to the PaaSage Executionware; Triggers the scalability rules and automatic scaling of the workflow application
Bartosz Baliś, Marian Bubak, Kamil Figiela, Maciej Malawski, Maciej Pawlik, Towards Deployment and Autoscaling of Scientific Workflows with HyperFlow and PaaSage, CGW’14
15
Levee Monitoring Application – ISMOP project
• Levee breach threat due to a passing wave• High water levels lasting for up to 2 weeks• Large areas of levees affected (100+ km)
16
ISMOP threat level assessment workflow
Implemented in HyperFlow workflow engine
17
ISMOP resource provisioning model
• Cost optimization under deadline• Bag-of-task model
– Selection of dominating tasks– Uniform task runtimes
• Performance model: T = f (v, d, s, …)– T – total computing time– v – number of VMs– d – time window in days– s – number of tasks (sections)
• (1)– Parameters a, b, c to be determined experimentally– Solve eq. (1) to compute number of VMs given a deadline
18
ISMOP Experiments
• Setup: private cloud infrastructure– a node with 8 cores (Xeon E5-2650)– virtual machines (1VCPU, 512MB RAM)– data for simulated scenarios (244MB total) on local disks
• Test runs:– 128 sections, 16 days– 16 VMs, 1 day– 128 sections, 1 day– 1024 sections, 1 day– 16 VMs, 16 VMs
• Warmup tasks:
19
Analysis of results
• Warmup tasks clearly separated as outliers
• Linear functions• Parameters a, b, c determined using non-
linear fit• The model fits well to the data
War
mup{
Bartosz Balis, Marek Kasztelnik, Maciej Malawski, Piotr Nowakowski, Bartosz Wilk, Maciej Pawlik, Marian Bubak, Execution Management and Efficient Resource Provisioning for Flood Decision Support, Procedia Computer Science, Volume 51, 2015, Pages 2377-2386, ISSN 1877-0509, http://dx.doi.org/10.1016/j.procs.2015.05.412.
20
IaaS Provider
EEA Zoning
jClouds API
Support
BLOB storage support
Per-hour
instance billing
API Access
Published price
VM Image
Import / Export
Relational DB
support Score
Weight 20 20 10 5 5 5 3 2 1 Amazon AWS 1 1 1 1 1 1 0 1 27 2 Rackspace 1 1 1 1 1 1 0 1 27 3 SoftLayer 1 1 1 1 1 1 0 0 25 4 CloudSigma 1 1 0 1 1 1 1 0 18 5 ElasticHosts 1 1 0 1 1 1 1 0 18 6 Serverlove 1 1 0 1 1 1 1 0 18 7 GoGrid 1 1 0 1 1 1 0 0 15 8 Terremark ecloud 1 1 0 1 1 0 1 0 13 9 RimuHosting 1 1 0 0 1 1 0 1 12
10 Stratogen 1 1 0 0 1 0 1 0 8 11 Bluelock 1 1 0 0 1 0 0 0 5 12 Fujitsu GCP 1 1 0 0 1 0 0 0 5
• Performance of VM deployment times• Virtualization overhead Evaluation of open source cloud
stacks (Eucalyptus, OpenNebula, OpenStack)• Survey of European public cloud providers • Performance evaluation of top cloud providers (EC2,
RackSpace, SoftLayer)• A grant from Amazon has been obtained
M. Bubak, M. Kasztelnik, M. Malawski, J. Meizner, P. Nowakowski and S. Varma: Evaluation of Cloud Providers for VPH Applications, poster at CCGrid2013 - 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Delft, the Netherlands, May 13-16, 2013
Cloud performance evaluation
21
Cloud and Big Data Benchmarking and Verification Methodology
• Methodology of Evaluation of systems and applications– Qualitative metrics (architectures, functionality)– Quantitative metrics (performance, stability, cost)– Test scenarios, test cases and parameters– Experiment planning, analysis of results
• Selection of benchmarks– Portfolio of standard benchmarks– Design of application-specific scenarios
• Target platforms– IaaS clouds (public, private)– Hybrid Clouds with cloud bursting – Real-Time BigData processing systems (Hadoop, Spark, ElasticSearch)
• Collaboration with Samsung R&D Polska– Methodology applied to cloud infrastructure at the industrial partner– Consultancy on the analysis of results and development of Testing-as-a-service (TaaS) system
K. Zieliński, M. Malawski, M. Jarząb, S. Zieliński, K. Grzegorczyk, T. Szepieniec, and M. Zyśk: Evaluation Methodology of Converged Cloud Environments. In: K. Wiatr, J. Kitowski, M. Bubak (Eds) Proceedings of the Seventh ACC Cyfronet AGH Users’ Conference, ACC CYFRONET AGH, Kraków, ISBN 978-83-61433-09-5, pp. 77-78 (2014)
22
Thank you!
• DICE Team at AGH & Cyfronet– Marian Bubak, Piotr Nowakowski,
Bartosz Baliś, Maciej Pawlik, Marek Kasztelnik, Bartosz Wilk, Tomasz Bartyński, Jan Meizner, Daniel Harężlak
• PhD Student:– Kamil Figiela
• MSc Students:– Piotr Bryk, Tomasz Dziok
• Notre Dame:– Jarek Nabrzyski
• USC/ISI:– Ewa Deelman, Gideon Juve
• Projects & Grants– EU FP7 VPH-Share– PL-Grid– EU FP7 PaaSage– ISMOP (PL)
• References:– CloudWorkflowSimulator
https://github.com/malawski/cloudworkflowsimulator
– HyperFlow: https://github.com/dice-cyfronet/hyperflow/
– DICE Team: http://dice.cyfronet.pl