Execution Environment for On-Demand Computing Services Based on Shared Clusters

1/40

Execution Environment for On-Demand Computing Services Based

on Shared Clusters

PhD thesis, Grenoble University

By Rodrigue Chakode(LIG/INRIA, Equipe Mescal)

Advisors: - Jean-François Méhaut- Maurice Tchuenté

2/40

Cloud Computing in a Nutshell

◉ Enables computing features as services

◉ Free or commercial services accessible over network

◉ On-demand and elastic accesses, plus a utility billing

– Customers (users of the service) only pay for what they use, aka pay-as-you-go

– Requests for more or less features should be satisfied quickly

◉ Services setup transparently against customers

– They don't have to care about how the service is enabled

3/40

Context Statement on Cloud Computing

◉Various sorts of cloud services– Infrastructure-as-a-Service, Platform-as-a-Service, Software-

as-a-Service, Data-as-a-Service, Translation-as-a-Service...

– Almost everything could be a service (XaaS)

◉Requires to set up a suitable computing infrastructure– Servers, storage, network fabrics, cooling system...

◉May need significant investments – Out of reach for many small or medium businesses (SMBs)

– Market currently dominated by biggest organizations

Introduction

4/40

Challenges for HPC

◉ Numerous software require intensive computing capabilities

– E.g. EDA Applications (Ciloe Project)

– Integrated circuits need to be simulated before manufacturing

◉ Computing architectures are increasingly parallel

– SMP, NUMA, GPU, Cluster... and soon many-core architectures

◉ HPC applications run on clusters of multicore nodes (SMP/NUMA)

◉ Also expensive

Example of a cluster. Credit : CEA

Introduction

5/40

Bring HPC Services into Clouds

◉Services requiring intensive computations

◉Services enabled from a mutualized cluster– Cluster supported by several businesses

– Each business providing its own service

– Cluster's resources shared among the services

◉Study with the context of an industrial collaboration– The Ciloe Project [http://ciloe.minalogic.net]

– Three SBEs editing EDA applications involved

Introduction

6/40

Outline

◉ Introduction

◉ Problem statement

◉ Background

– Existing SaaS clouds and their related RM issues

– Survey on existing resource sharing techniques

◉ Contributions

– Overview : Scheduling Approach and Execution Model

– Architecture Model and Scheduling Strategy

– Prototyping

◉ Experimental evaluation

– Evaluation Protocol

– Results

◉ Conclusion & perspectives

7/40

Resource Management for HPC SaaS Services

◉What is a service

–Computes customer data with a specific application

–Input specifies an application and the data

–Output retrieved after the computation

–No more interactions necessary

Problem Statement

8/40

Related Research Issues

◉Data Management

◉Resilience and Fault Tolerance

◉Security and privacy

◉Resource Management

Problem Statement

9/40

Scheduling Problems

◉Share the cluster's resources among the services– according to the investments of the different businesses

◉Maximize the use of resources– Use idle resources to run pending requests

– Run miscellaneous tasks on idle resources in a best-effort way

◉Minimize the impact of selfish behaviors– A business can under-invest while needing a lot of resources

Problem Statement

10/40

Resource Allocation for On-demand Services

◉ Running requests in a dynamic way

– Resources should be allocated dynamically

– Allocated resources should be freed up automatically once a request completed

– Handle Input/Output data in a transparent way

◉ Need to think of resource partitioning

– Modern computing nodes have several cores

– The amount of cores required by certain tasks can be less than the number of cores available on a node

Problem Statement

11/40

Outline

◉ Introduction


◉ Background



◉ Contributions



– Prototyping



– Results


12/40

Background on Existing SaaS Clouds

◉ Target office and collaborative applications

– E.g. Google Docs, Salesforce, Office365...

– Need of interactiveness

◉ SaaS cloud as a layer on top of a PaaS

– PaaS can rely on an IaaS layer

– IaaS enables on-demand resource allocation

• Virtualization plays an important role

◉ Resources belong to an unique organization

Background on SaaS Clouds

13/40

Services for Intensive Computations

◉ No need of interactiveness ◉ Requires a high dynamicity and

transparency

• Allocation of resources when executing a task

• Release of resources once a task completed

◉ Mutualized resources

=>Need to deal with sharing the resources among the services

Background on SaaS Clouds

14/40

Scheduling services on mutualized resources

◉ Raises conflicting objectives

– Fairness against the service suppliers

– Efficiency concerning the use of resources

◉ Prioritize an objective penalizes the other

=> Requires to make a tradeoff

Background on resource management

15/40

Common resource scheduling strategies

◉ First-come, First-served (FCFS)

◉ FCFS along with Backfilling (EASY/Conservative)

+ Fair against users

– Inefficient in terms of utilization

– May be unfair against some businesses in out context

+ Improve utilization

– May significantly delay biggest tasks

+ Possible optimization with a conservative backfilling

– Remains unfair in our context


16/40Background on resource management

How Resources are Assigned to Tasks

◉ Simple assignation strategies

– Greedy and round-robin algorithms

◉ Assignations guided by performance requirements

– Notion of match-making (affinities between resources and tasks)

◉ Prioritization

– More prioritized tasks get access to resources first

• Preemption can be introduced

=> Notion of best-effort when certain tasks only run on idle resources

◉ Reservation and leasing

– Resources are allocated for a given time slot

17/40Background on resource management

Common resource sharing strategies

◉ Static sharing (partitioning)

◉ Fair-sharing (no partitioning + dynamic priorities)

+ Fair and easy to setup – Inefficient in terms of utilization in our context

+ Tradeoff between the fairness and the utilization

– May still raise unfair situations in our context

R1

R2

R3

R4

R5

R6

R7

R1

R2

R3

R4

R5

R6

R7

Business 1

Business 2

Business 3

18/40

Partitioning Individual Node

◉ Requires isolation among tasks

– A task would not access resources allocated to another task

◉ Isolation with containers (cgroups, cpusets, OpenVZ, LXC...)

+ Low level partitioning inducing a low overhead

=> good performances

– Non-flexible since not easy to handle dynamically

◉ Isolation with virtual machines (VMs)

+ High level partitioning

=> High flexibility in terms of automation

– Possible performance overhead

―Several optimizations (e.g. HVM, paravirtualization, PCI passthrough...)


19/40

Synthesis on Partitioning Resources

◉ Virtual Machines enable interesting features

– To partition each individual node along with a high isolation

– To allocate and free up resources dynamically

– To suspend/restart best-effort tasks

◉ Powerful and proved VM management tools

– Handle VMs on individual node

– Xen, KVM, ESXi, Hyper-V...

– Handle VMs on distributed environments

• OpenNebula, Eucalyptus, OpenStack...

―Target IaaS clouds

20/40

Problems to Address With VMs

◉ Deal with performance overhead

– Generic optimizations

• HVM, PCI Passthrough

– Solution-specific optimizations

• Paravirtualization (Xen, Hyper-V)

• Virtio (KVM, Xen)

◉ Allocate custom VMs dynamically on distributed environments

– Contextualization enables interesting features (OpenNebula)

21/40

Lacks of the Existing According to Our Aims

◉ On-demand HPC services on a mutualized cluster

– Existing SaaS clouds focus on collaborative or office applications

• Resources owned by a single organization

◉ Existing resources sharing strategies don't suit our needs

=> Necessity to design new approaches

◉ Contributions

– Scheduling strategy for sharing mutualized resources

– Architecture for on-demand HPC services

– Prototyping for evaluation


22/40

Outline

◉ Introduction


◉ Background



◉ Contributions



– Prototyping



– Results


23/40

Ideas for the resource sharing strategy

◉ Combines the advantages...

– of a static sharing where the fairness is easy to hold

– and those of a fair-sharing strategy that allows to improve the utilization

◉ Enables a elasticity in resource sharing

– A business to use more resources than its investment :

• When the task raising such a situation has a duration less than a acceptable duration threshold noted D

• Or When the task is of best-effort type

=> Limits the impact of selfish behaviors from certain businesses

Contributions : Overview

24/40

Handling Requests Dynamically

◉ Encapsulate each task within a virtual machine (VM)

– Eases the partitioning of nodes and enables dynamicity

◉ Enable a Specific SaaS Manager

– Implements the scheduling strategy to address the resource sharing issues

– Assumes the allocation and the destruction of VMs

◉ Exploit the Contextualization of VMs

– VM created, customized and started dynamically

• VM suitably set to launch the task once started

– VM automatically destroyed once the task is completed

25/40

Architecture Model

◉ The SaaS Manager on top of the cluster

– Relies on a virtual infrastructure manager (VIM)

– VIM relies on hypervisors

◉ Possibility of reusing existing tools

– Avoids rewriting existing features

– Benefits of features from powerful proved tools

Contributions : Architecture Model

26/40

Design Driven by Openness, Performances and Interoperability

◉ OpenNebula enables support for handling the VMs

– Featuring the contextualization

◉ Xen manages VMs on each individual node

– Exploits the paravirtualization for better performances

◉ The different components coupled though Open APIs

– Ensure a better interopera-bility

Contributions : Architecture Model

27/40

Resource Sharing Strategy : Case study

◉ A situation with three businesses B1, B2 and B3

– B1 (with green tasks) invested for 2/7 of resources (R1, R2...R7)

– B2 (with red tasks) invested for 2/7

– B3 (with blue tasks) for 3/7

◉ On the figure, think of tasks as the related VMs

Contributions : Resource Management Strategy

t2t3 t5

t6

t1 t4

Queued tasks

28/40

Resource Sharing Strategy : Example 1

◉ Assumes the duration of t1 and t5 <= D (the chosen duration threshold)

– B1 and B3 are using ratios of resources geater than their investments

– That representing a complementary ratio of 1/14 for each of them


Queued taskst5t1

t2t3

t6

t4

29/40

Resource sharing strategy : Example 2

◉ None of tasks has a duration <= D, but the task t2 is of best-effort type

– B1 is using a ratio of resources 1/7 greater than its investment

– t2 can be suspended at any time


t4t1

Queued tasks

t3

t2

t5t6

30/40

About Implementation

◉ Relies on principles of resource leasing

– A lease consists in allocating a virtual machine for running a task

– The duration of a lease depends on the related task

• Its duration and its of the type (best-effort or not)

◉ Two kinds of leases handled specifically

– Non-preemptive leases

• Assigned to tasks related to the customers

―Non preemptive tasks

=> Resources only freed up at completion

– Preemptive leases

• Assigned to best-effort tasks

―VMs can be suspended to be restart later

=> No guaranty of completion


31/40

Prototyping and Overview on Integration

◉ SVMSched (Smart Virtual Machine Scheduler)

– Drop-in replacement for the OpenNebula's default scheduler

– Proper interfaces that provide the SaaS abstraction

– Deals with allocating and freeing up VMs dynamically

– Implements the resource sharing strategy

– Supports contextualization data stored on Network File Systems

Contributions : Prototyping

32/40

Outline

◉ Introduction


◉ Background



◉ Contributions



– Prototyping



– Results


33/40

Evaluation Protocol

◉ Evaluation of the performances of an application

– Time to setup the VM

– Performance overhead induced by the virtualization

◉ Study of the scheduling strategy

– Is that behaves well regarding the fairness and the utilization ?

– If not, how it can be improved?

◉ Experimental conditions

– Nodes from Grid'5000 : each having 2x4 cores, 2.27 Ghz, 8Go of RAM

– Xen 3.4.2 and OpenNebula 1.4.2 along with VM images of 500MB

– Applications from the Parsec Benchmark (BodyTrack, Blackscholes, Freqmine)

Evaluation

34/40Evaluation

Performances of the virtualization

◉ Full VMs perform better than contextualized ones => slight difference

◉ High overhead : applications requiring high disk IO

◉ VMs perform better than native machines

=>concurrent tasks requiring high memory IO

◉ Contextualized VMs : require constant and low setup time

– ~15s (<5% of the duration of a task of 5 mins) with an image of 500 MB

◉ Full VMs : times grow linearly

35/40Evaluation

Analyzing the scheduling strategy

◉ Better choice of the threshold

– Businesses can benefit from the mutualization

– Prevents the temptation for selfish behaviors

– Best-effort tasks would allows better utilization

◉ Mutualization is not relevant

– The threshold is not suitably chosen

– There is no best-effort tasks

– The strategy leads to a static sharing

36/40

Outline

◉ Introduction


◉ Background



◉ Contributions



– Prototyping



– Results


37/40

Conclusion

◉ We studied and set up an environment for enabling HPC SaaS services on shared computing resources

– Designing an architecture model that relies on virtualization for executing on-demand requests

– Design resource management algorithms that allow to share in a fair way the resources while maximizing their use

◉ A prototype has been developed to evaluate experimentally our contributions

– Results shown the feasibility of our approach

– Prototype integrated in the deliveries of the Ciloe Project

◉ Thus we have enabled a room for addressing the problem of costs that highly constraints SMBs needing HPC resources for their applications

Conclusion & Perspectives

38/40

Perspectives

◉ Model of predicting the duration of each task

– Envisioning an approximation model based on reinforcing learning

◉ Economic model of billing

– What parameters the invoicing can take into account?

• Per-use costs of software licenses and computing resources + earnings

◉ Dimensioning the platform

– To allow each business to have a suitable view of its needs in terms of resources

Conclusion & Perspectives

39/40

About this Work

◉ Awards

– 1st Prize Grid'5000 Challenge, Reims 2011

◉ Book Chapter

– Rodrigue chakode, Jean-François Méhaut, Blaise-Omer Yenke. Scheduling On-demand SaaS Services on a Shared Virtual Cluster. In Cloud Computing and Services Science. Pages 259 – 276. ISBN 978-1-4614-2325-6, Springer-Verlag, April 2012.

◉ International conferences

– Rodrigue chakode, Blaise-Omer Yenke, Jean-François Méhaut. Resource Management of Virtual Infrastructure for On-demand SaaS Services. In CLOSER2011 - International conference on Cloud Computing and Service Science. Pages 352 – 361. Netherlands, May 2011.

– Rodrigue Chakode, Jean-François Méhaut, François Charlet. High Performance Computing on Demand: Sharing and Mutualizing Clusters. In AINA'10 - IEEE International Conference on Avanced Information Networking and Applications. Pages 126 – 133. Australia, April 2010.

◉ National conferences

– Rodrigue chakode, Blaise-Omer Yenke. Utilisation des machines virtuelles comme support de services de calcul à la demande. In Renpar'20: les actes des Rencontres francophones du Parallélisme, édition 2011. Saint-Malo, France, Mai 2011.

◉ Other publications (in the cloud community)

– Rodrigue chakode. SVMSched : A tool to enable On-demand SaaS and PaaS Services on top of OpenNebula. In OpenNebula Official Blog, http://blog.opennebula.org/?p=1646.

– Link on the OpenNebula Software Ecosystem : http://opennebula.org/software:ecosystem:svmsched

40/40

Thanks for your attention !

Execution Environment for On-Demand Computing Services Based on Shared Clusters

Technology

Transcript of Execution Environment for On-Demand Computing Services Based on Shared Clusters