Execution Environment for On-Demand Computing Services Based on Shared Clusters

40
1/40 Execution Environment for On- Demand Computing Services Based on Shared Clusters PhD thesis, Grenoble University By Rodrigue Chakode (LIG/INRIA, Equipe Mescal) Advisors: - Jean-François Méhaut - Maurice Tchuenté

description

This thesis talk studies resource management for on-demand computing services through a shared cluster. In such a context, the aim was to propose tools to enable allocating resources automatically for executing on-demand user requests, to enable sharing resources proportionally among those services, while maximizing their use. Funded by the Minalogic global business cluster through the Ciloe Project (http://ciloe.minalogic.net), this work targets on organizations such as SMB, which are not able to support the charge of purchasing and maintaining a dedicated computing infrastructure. Firstly, we have achieved a deep survey in the areas of on-demand computing and high performance computing. From this survey, we have defined a virtualized architecture to enable dynamic execution of user requests thanks to a special resource manager. Finally, we have proposed policies and algorithms which are so flexible to offer a suitable tradeoff between equity and resource use. Having worked in a context of industrial collaboration, we have developed a prototype of our proposal as a proof of concept. Based on open standards, this prototype relies on existing virtualization tools such as OpenNebula for allocating and manipulating virtual machines over the cluster's nodes. From this prototype along with various workloads, we have carried out experiments to evaluate our architecture and scheduling algorithms. Results have shown that our contributions allow to achieve the expected goals while being reliable and efficient.

Transcript of Execution Environment for On-Demand Computing Services Based on Shared Clusters

Page 1: Execution Environment for On-Demand Computing Services Based on Shared Clusters

1/40

Execution Environment for On-Demand Computing Services Based

on Shared Clusters

PhD thesis, Grenoble University

By Rodrigue Chakode(LIG/INRIA, Equipe Mescal)

Advisors: - Jean-François Méhaut- Maurice Tchuenté

Page 2: Execution Environment for On-Demand Computing Services Based on Shared Clusters

2/40

Cloud Computing in a Nutshell

◉ Enables computing features as services

◉ Free or commercial services accessible over network

◉ On-demand and elastic accesses, plus a utility billing

– Customers (users of the service) only pay for what they use, aka pay-as-you-go

– Requests for more or less features should be satisfied quickly

◉ Services setup transparently against customers

– They don't have to care about how the service is enabled

Page 3: Execution Environment for On-Demand Computing Services Based on Shared Clusters

3/40

Context Statement on Cloud Computing

◉Various sorts of cloud services– Infrastructure-as-a-Service, Platform-as-a-Service, Software-

as-a-Service, Data-as-a-Service, Translation-as-a-Service...

– Almost everything could be a service (XaaS)

◉Requires to set up a suitable computing infrastructure– Servers, storage, network fabrics, cooling system...

◉May need significant investments – Out of reach for many small or medium businesses (SMBs)

– Market currently dominated by biggest organizations

Introduction

Page 4: Execution Environment for On-Demand Computing Services Based on Shared Clusters

4/40

Challenges for HPC

◉ Numerous software require intensive computing capabilities

– E.g. EDA Applications (Ciloe Project)

– Integrated circuits need to be simulated before manufacturing

◉ Computing architectures are increasingly parallel

– SMP, NUMA, GPU, Cluster... and soon many-core architectures

◉ HPC applications run on clusters of multicore nodes (SMP/NUMA)

◉ Also expensive

Example of a cluster. Credit : CEA

Introduction

Page 5: Execution Environment for On-Demand Computing Services Based on Shared Clusters

5/40

Bring HPC Services into Clouds

◉Services requiring intensive computations

◉Services enabled from a mutualized cluster– Cluster supported by several businesses

– Each business providing its own service

– Cluster's resources shared among the services

◉Study with the context of an industrial collaboration– The Ciloe Project [http://ciloe.minalogic.net]

– Three SBEs editing EDA applications involved

Introduction

Page 6: Execution Environment for On-Demand Computing Services Based on Shared Clusters

6/40

Outline

◉ Introduction

◉ Problem statement

◉ Background

– Existing SaaS clouds and their related RM issues

– Survey on existing resource sharing techniques

◉ Contributions

– Overview : Scheduling Approach and Execution Model

– Architecture Model and Scheduling Strategy

– Prototyping

◉ Experimental evaluation

– Evaluation Protocol

– Results

◉ Conclusion & perspectives

Page 7: Execution Environment for On-Demand Computing Services Based on Shared Clusters

7/40

Resource Management for HPC SaaS Services

◉What is a service

–Computes customer data with a specific application

–Input specifies an application and the data

–Output retrieved after the computation

–No more interactions necessary

Problem Statement

Page 8: Execution Environment for On-Demand Computing Services Based on Shared Clusters

8/40

Related Research Issues

◉Data Management

◉Resilience and Fault Tolerance

◉Security and privacy

◉Resource Management

Problem Statement

Page 9: Execution Environment for On-Demand Computing Services Based on Shared Clusters

9/40

Scheduling Problems

◉Share the cluster's resources among the services– according to the investments of the different businesses

◉Maximize the use of resources– Use idle resources to run pending requests

– Run miscellaneous tasks on idle resources in a best-effort way

◉Minimize the impact of selfish behaviors– A business can under-invest while needing a lot of resources

Problem Statement

Page 10: Execution Environment for On-Demand Computing Services Based on Shared Clusters

10/40

Resource Allocation for On-demand Services

◉ Running requests in a dynamic way

– Resources should be allocated dynamically

– Allocated resources should be freed up automatically once a request completed

– Handle Input/Output data in a transparent way

◉ Need to think of resource partitioning

– Modern computing nodes have several cores

– The amount of cores required by certain tasks can be less than the number of cores available on a node

Problem Statement

Page 11: Execution Environment for On-Demand Computing Services Based on Shared Clusters

11/40

Outline

◉ Introduction

◉ Problem statement

◉ Background

– Existing SaaS clouds and their related RM issues

– Survey on existing resource sharing techniques

◉ Contributions

– Overview : Scheduling Approach and Execution Model

– Architecture Model and Scheduling Strategy

– Prototyping

◉ Experimental evaluation

– Evaluation Protocol

– Results

◉ Conclusion & perspectives

Page 12: Execution Environment for On-Demand Computing Services Based on Shared Clusters

12/40

Background on Existing SaaS Clouds

◉ Target office and collaborative applications

– E.g. Google Docs, Salesforce, Office365...

– Need of interactiveness

◉ SaaS cloud as a layer on top of a PaaS

– PaaS can rely on an IaaS layer

– IaaS enables on-demand resource allocation

• Virtualization plays an important role

◉ Resources belong to an unique organization

Background on SaaS Clouds

Page 13: Execution Environment for On-Demand Computing Services Based on Shared Clusters

13/40

Services for Intensive Computations

◉ No need of interactiveness ◉ Requires a high dynamicity and

transparency

• Allocation of resources when executing a task

• Release of resources once a task completed

◉ Mutualized resources

=>Need to deal with sharing the resources among the services

Background on SaaS Clouds

Page 14: Execution Environment for On-Demand Computing Services Based on Shared Clusters

14/40

Scheduling services on mutualized resources

◉ Raises conflicting objectives

– Fairness against the service suppliers

– Efficiency concerning the use of resources

◉ Prioritize an objective penalizes the other

=> Requires to make a tradeoff

Background on resource management

Page 15: Execution Environment for On-Demand Computing Services Based on Shared Clusters

15/40

Common resource scheduling strategies

◉ First-come, First-served (FCFS)

◉ FCFS along with Backfilling (EASY/Conservative)

+ Fair against users

– Inefficient in terms of utilization

– May be unfair against some businesses in out context

+ Improve utilization

– May significantly delay biggest tasks

+ Possible optimization with a conservative backfilling

– Remains unfair in our context

Background on resource management

Page 16: Execution Environment for On-Demand Computing Services Based on Shared Clusters

16/40Background on resource management

How Resources are Assigned to Tasks

◉ Simple assignation strategies

– Greedy and round-robin algorithms

◉ Assignations guided by performance requirements

– Notion of match-making (affinities between resources and tasks)

◉ Prioritization

– More prioritized tasks get access to resources first

• Preemption can be introduced

=> Notion of best-effort when certain tasks only run on idle resources

◉ Reservation and leasing

– Resources are allocated for a given time slot

Page 17: Execution Environment for On-Demand Computing Services Based on Shared Clusters

17/40Background on resource management

Common resource sharing strategies

◉ Static sharing (partitioning)

◉ Fair-sharing (no partitioning + dynamic priorities)

+ Fair and easy to setup – Inefficient in terms of utilization in our context

+ Tradeoff between the fairness and the utilization

– May still raise unfair situations in our context

R1

R2

R3

R4

R5

R6

R7

R1

R2

R3

R4

R5

R6

R7

Business 1

Business 2

Business 3

Page 18: Execution Environment for On-Demand Computing Services Based on Shared Clusters

18/40

Partitioning Individual Node

◉ Requires isolation among tasks

– A task would not access resources allocated to another task

◉ Isolation with containers (cgroups, cpusets, OpenVZ, LXC...)

+ Low level partitioning inducing a low overhead

=> good performances

– Non-flexible since not easy to handle dynamically

◉ Isolation with virtual machines (VMs)

+ High level partitioning

=> High flexibility in terms of automation

– Possible performance overhead

―Several optimizations (e.g. HVM, paravirtualization, PCI passthrough...)

Background on resource management

Page 19: Execution Environment for On-Demand Computing Services Based on Shared Clusters

19/40

Synthesis on Partitioning Resources

◉ Virtual Machines enable interesting features

– To partition each individual node along with a high isolation

– To allocate and free up resources dynamically

– To suspend/restart best-effort tasks

◉ Powerful and proved VM management tools

– Handle VMs on individual node

– Xen, KVM, ESXi, Hyper-V...

– Handle VMs on distributed environments

• OpenNebula, Eucalyptus, OpenStack...

―Target IaaS clouds

Page 20: Execution Environment for On-Demand Computing Services Based on Shared Clusters

20/40

Problems to Address With VMs

◉ Deal with performance overhead

– Generic optimizations

• HVM, PCI Passthrough

– Solution-specific optimizations

• Paravirtualization (Xen, Hyper-V)

• Virtio (KVM, Xen)

◉ Allocate custom VMs dynamically on distributed environments

– Contextualization enables interesting features (OpenNebula)

Page 21: Execution Environment for On-Demand Computing Services Based on Shared Clusters

21/40

Lacks of the Existing According to Our Aims

◉ On-demand HPC services on a mutualized cluster

– Existing SaaS clouds focus on collaborative or office applications

• Resources owned by a single organization

◉ Existing resources sharing strategies don't suit our needs

=> Necessity to design new approaches

◉ Contributions

– Scheduling strategy for sharing mutualized resources

– Architecture for on-demand HPC services

– Prototyping for evaluation

Background on resource management

Page 22: Execution Environment for On-Demand Computing Services Based on Shared Clusters

22/40

Outline

◉ Introduction

◉ Problem statement

◉ Background

– Existing SaaS clouds and their related RM issues

– Survey on existing resource sharing techniques

◉ Contributions

– Overview : Scheduling Approach and Execution Model

– Architecture Model and Scheduling Strategy

– Prototyping

◉ Experimental evaluation

– Evaluation Protocol

– Results

◉ Conclusion & perspectives

Page 23: Execution Environment for On-Demand Computing Services Based on Shared Clusters

23/40

Ideas for the resource sharing strategy

◉ Combines the advantages...

– of a static sharing where the fairness is easy to hold

– and those of a fair-sharing strategy that allows to improve the utilization

◉ Enables a elasticity in resource sharing

– A business to use more resources than its investment :

• When the task raising such a situation has a duration less than a acceptable duration threshold noted D

• Or When the task is of best-effort type

=> Limits the impact of selfish behaviors from certain businesses

Contributions : Overview

Page 24: Execution Environment for On-Demand Computing Services Based on Shared Clusters

24/40

Handling Requests Dynamically

◉ Encapsulate each task within a virtual machine (VM)

– Eases the partitioning of nodes and enables dynamicity

◉ Enable a Specific SaaS Manager

– Implements the scheduling strategy to address the resource sharing issues

– Assumes the allocation and the destruction of VMs

◉ Exploit the Contextualization of VMs

– VM created, customized and started dynamically

• VM suitably set to launch the task once started

– VM automatically destroyed once the task is completed

Page 25: Execution Environment for On-Demand Computing Services Based on Shared Clusters

25/40

Architecture Model

◉ The SaaS Manager on top of the cluster

– Relies on a virtual infrastructure manager (VIM)

– VIM relies on hypervisors

◉ Possibility of reusing existing tools

– Avoids rewriting existing features

– Benefits of features from powerful proved tools

Contributions : Architecture Model

Page 26: Execution Environment for On-Demand Computing Services Based on Shared Clusters

26/40

Design Driven by Openness, Performances and Interoperability

◉ OpenNebula enables support for handling the VMs

– Featuring the contextualization

◉ Xen manages VMs on each individual node

– Exploits the paravirtualization for better performances

◉ The different components coupled though Open APIs

– Ensure a better interopera-bility

Contributions : Architecture Model

Page 27: Execution Environment for On-Demand Computing Services Based on Shared Clusters

27/40

Resource Sharing Strategy : Case study

◉ A situation with three businesses B1, B2 and B3

– B1 (with green tasks) invested for 2/7 of resources (R1, R2...R7)

– B2 (with red tasks) invested for 2/7

– B3 (with blue tasks) for 3/7

◉ On the figure, think of tasks as the related VMs

Contributions : Resource Management Strategy

t2t3 t5

t6

t1 t4

Queued tasks

Page 28: Execution Environment for On-Demand Computing Services Based on Shared Clusters

28/40

Resource Sharing Strategy : Example 1

◉ Assumes the duration of t1 and t5 <= D (the chosen duration threshold)

– B1 and B3 are using ratios of resources geater than their investments

– That representing a complementary ratio of 1/14 for each of them

Contributions : Resource Management Strategy

Queued taskst5t1

t2t3

t6

t4

Page 29: Execution Environment for On-Demand Computing Services Based on Shared Clusters

29/40

Resource sharing strategy : Example 2

◉ None of tasks has a duration <= D, but the task t2 is of best-effort type

– B1 is using a ratio of resources 1/7 greater than its investment

– t2 can be suspended at any time

Contributions : Resource Management Strategy

t4t1

Queued tasks

t3

t2

t5t6

Page 30: Execution Environment for On-Demand Computing Services Based on Shared Clusters

30/40

About Implementation

◉ Relies on principles of resource leasing

– A lease consists in allocating a virtual machine for running a task

– The duration of a lease depends on the related task

• Its duration and its of the type (best-effort or not)

◉ Two kinds of leases handled specifically

– Non-preemptive leases

• Assigned to tasks related to the customers

―Non preemptive tasks

=> Resources only freed up at completion

– Preemptive leases

• Assigned to best-effort tasks

―VMs can be suspended to be restart later

=> No guaranty of completion

Contributions : Resource Management Strategy

Page 31: Execution Environment for On-Demand Computing Services Based on Shared Clusters

31/40

Prototyping and Overview on Integration

◉ SVMSched (Smart Virtual Machine Scheduler)

– Drop-in replacement for the OpenNebula's default scheduler

– Proper interfaces that provide the SaaS abstraction

– Deals with allocating and freeing up VMs dynamically

– Implements the resource sharing strategy

– Supports contextualization data stored on Network File Systems

Contributions : Prototyping

Page 32: Execution Environment for On-Demand Computing Services Based on Shared Clusters

32/40

Outline

◉ Introduction

◉ Problem statement

◉ Background

– Existing SaaS clouds and their related RM issues

– Survey on existing resource sharing techniques

◉ Contributions

– Overview : Scheduling Approach and Execution Model

– Architecture Model and Scheduling Strategy

– Prototyping

◉ Experimental evaluation

– Evaluation Protocol

– Results

◉ Conclusion & perspectives

Page 33: Execution Environment for On-Demand Computing Services Based on Shared Clusters

33/40

Evaluation Protocol

◉ Evaluation of the performances of an application

– Time to setup the VM

– Performance overhead induced by the virtualization

◉ Study of the scheduling strategy

– Is that behaves well regarding the fairness and the utilization ?

– If not, how it can be improved?

◉ Experimental conditions

– Nodes from Grid'5000 : each having 2x4 cores, 2.27 Ghz, 8Go of RAM

– Xen 3.4.2 and OpenNebula 1.4.2 along with VM images of 500MB

– Applications from the Parsec Benchmark (BodyTrack, Blackscholes, Freqmine)

Evaluation

Page 34: Execution Environment for On-Demand Computing Services Based on Shared Clusters

34/40Evaluation

Performances of the virtualization

◉ Full VMs perform better than contextualized ones => slight difference

◉ High overhead : applications requiring high disk IO

◉ VMs perform better than native machines

=>concurrent tasks requiring high memory IO

◉ Contextualized VMs : require constant and low setup time

– ~15s (<5% of the duration of a task of 5 mins) with an image of 500 MB

◉ Full VMs : times grow linearly

Page 35: Execution Environment for On-Demand Computing Services Based on Shared Clusters

35/40Evaluation

Analyzing the scheduling strategy

◉ Better choice of the threshold

– Businesses can benefit from the mutualization

– Prevents the temptation for selfish behaviors

– Best-effort tasks would allows better utilization

◉ Mutualization is not relevant

– The threshold is not suitably chosen

– There is no best-effort tasks

– The strategy leads to a static sharing

Page 36: Execution Environment for On-Demand Computing Services Based on Shared Clusters

36/40

Outline

◉ Introduction

◉ Problem statement

◉ Background

– Existing SaaS clouds and their related RM issues

– Survey on existing resource sharing techniques

◉ Contributions

– Overview : Scheduling Approach and Execution Model

– Architecture Model and Scheduling Strategy

– Prototyping

◉ Experimental evaluation

– Evaluation Protocol

– Results

◉ Conclusion & perspectives

Page 37: Execution Environment for On-Demand Computing Services Based on Shared Clusters

37/40

Conclusion

◉ We studied and set up an environment for enabling HPC SaaS services on shared computing resources

– Designing an architecture model that relies on virtualization for executing on-demand requests

– Design resource management algorithms that allow to share in a fair way the resources while maximizing their use

◉ A prototype has been developed to evaluate experimentally our contributions

– Results shown the feasibility of our approach

– Prototype integrated in the deliveries of the Ciloe Project

◉ Thus we have enabled a room for addressing the problem of costs that highly constraints SMBs needing HPC resources for their applications

Conclusion & Perspectives

Page 38: Execution Environment for On-Demand Computing Services Based on Shared Clusters

38/40

Perspectives

◉ Model of predicting the duration of each task

– Envisioning an approximation model based on reinforcing learning

◉ Economic model of billing

– What parameters the invoicing can take into account?

• Per-use costs of software licenses and computing resources + earnings

◉ Dimensioning the platform

– To allow each business to have a suitable view of its needs in terms of resources

Conclusion & Perspectives

Page 39: Execution Environment for On-Demand Computing Services Based on Shared Clusters

39/40

About this Work

◉ Awards

– 1st Prize Grid'5000 Challenge, Reims 2011

◉ Book Chapter

– Rodrigue chakode, Jean-François Méhaut, Blaise-Omer Yenke. Scheduling On-demand SaaS Services on a Shared Virtual Cluster. In Cloud Computing and Services Science. Pages 259 – 276. ISBN 978-1-4614-2325-6, Springer-Verlag, April 2012.

◉ International conferences

– Rodrigue chakode, Blaise-Omer Yenke, Jean-François Méhaut. Resource Management of Virtual Infrastructure for On-demand SaaS Services. In CLOSER2011 - International conference on Cloud Computing and Service Science. Pages 352 – 361. Netherlands, May 2011.

– Rodrigue Chakode, Jean-François Méhaut, François Charlet. High Performance Computing on Demand: Sharing and Mutualizing Clusters. In AINA'10 - IEEE International Conference on Avanced Information Networking and Applications. Pages 126 – 133. Australia, April 2010.

◉ National conferences

– Rodrigue chakode, Blaise-Omer Yenke. Utilisation des machines virtuelles comme support de services de calcul à la demande. In Renpar'20: les actes des Rencontres francophones du Parallélisme, édition 2011. Saint-Malo, France, Mai 2011.

◉ Other publications (in the cloud community)

– Rodrigue chakode. SVMSched : A tool to enable On-demand SaaS and PaaS Services on top of OpenNebula. In OpenNebula Official Blog, http://blog.opennebula.org/?p=1646.

– Link on the OpenNebula Software Ecosystem : http://opennebula.org/software:ecosystem:svmsched

Page 40: Execution Environment for On-Demand Computing Services Based on Shared Clusters

40/40

Thanks for your attention !