Resource Optimization in Multi-processor Real-time...

Ha

mid

Reza

Fara

ga

rdi RESO

URC

E OPTIM

IZATIO

N IN

MU

LTI-PROC

ESSOR REA

L-TIME SYSTEM

S2017

Mälardalen University Licentiate Thesis 263

Resource Optimization in Multi-processor Real-time Systems

Hamid Reza Faragardi

ISBN 978-91-7485-336-0ISSN 1651-9256

Address: P.O. Box 883, SE-721 23 Västerås. SwedenAddress: P.O. Box 325, SE-631 05 Eskilstuna. SwedenE-mail: [email protected] Web: www.mdh.se

Mälardalen University Press Licentiate ThesesNo. 263

RESOURCE OPTIMIZATION IN MULTI-PROCESSOR REAL-TIME SYSTEMS

Hamid Reza Faragardi

2017

School of Innovation, Design and Engineering

Popularvetenskaplig

sammanfattning

Denna avhandling behandlar amnet resurseffektivitet i sammanhanget flerkar-niga tidskritiska processorsystem.

Idag, nastan oavsett vart man an vander sig och tittar, sa finner man datoreroch datorsystem. De flesta av dessa datorsystem anvander sig av sa kalladeflerkarniga processorer for att utfora berakningar och tillhandahalla funktion-alitet. Dessa flerkarniga processorer anvands i allt fran mindre inbyggda da-torsystem, till exempel mobiltelefoner, till storre system som ar geografisktutspridda pa olika orter och sammankopplade med internet, till exempel sakallade datacenter i molnet. I bada dessa typer av flerkarniga processorsystemar ett effektivt anvandande av datakraft och berakningsresurser a och o.

Industriella system, till exempel de system som man finner i fordons- ochflyg-tillampningar, har oftast krav pa sig nar det galler systemets timing. Dennatyp av system dar korrekt funktion ocksa ar beroende av korrekt timing kallasfor realtidssystem. I ett realtidssystem ar det systemets ingenjors- och design-teams forsta och storsta utmaning att tillhandahalla en losning dar systemetsalla tidskrav ar uppfyllda.

Industrie 4.0 ar det senaste inom automations- och tillverkningsindustrinnar det kommer till att skapa nasta generations smarta fabriker. Tva typerav flerkarniga processorer ar centrala byggstenar i dessa smarta fabriker: 1)flerkarniga processorer i fabrikens alla inbyggda system, och 2) flerkarnigaprocessorer i de datacenters som fabriken anvander sig av i molnet. Bada dessakategorier av flerkarniga processorsystem beaktas i avhandlingen, specifikt 1)effektiv anvandning av inbyggda processorer med flera karnor i sammanhangetrealtidssystem och 2) effektiv anvandning av flera processorer i datacenteri molnet. Dessa tva typer av flerkarniga system behandlas i denna licenti-

i

ii

atavhandling separat med tillhorande introduktion av utmaningar for att uppnaen resurseffektiv design av systemet. Vi modellerar systemet och foreslarsmarta algoritmer for att optimera systemets resurseffektivitet samtidigt somvi sakerstaller att systemets alla tidskrav uppfylls. De resultat som presenterasunderlattar konstruktion av flerkarniga tidskritiska processorsystem.

Abstract

This thesis addresses the topic of resource efficiency in multiprocessor systemsin the presence of timing constraints.

Nowadays, almost wherever you look, you find a computing system. Mostcomputing systems employ a multiprocessor platform. Multiprocessor sys-tems can be found in a broad spectrum of computing systems ranging from atiny chip hosting multiple cores to large geographically-distributed cloud datacenters connected by the Internet. In multiprocessor systems, efficient use ofcomputing resources is a substantial element when it comes to achieving a de-sirable performance for running software applications.

Most industrial applications, e.g., automotive and avionics applications, aresubject to a set of real-time constraints that must be met. Such kinds of appli-cations, along with the underlying hardware and software components runningthe application, constitute a real-time system. In real-time systems, the first andmajor concern of the system designer is to provide a solution where all timingconstraints are met. Therefore, in multiprocessor real-time systems, not onlyresource efficiency, but also meeting all the timing requirements, is a majorconcern.

Industrie 4.0 is the current trend in automation and manufacturing when itcomes to creating next generation of smart factories. Two categories of mul-tiprocessor systems play a significant role in the realization of such a smartfactory: 1) multi-core processors which are the key computing element of em-bedded systems, 2) cloud computing data centers as the supplier of a massivedata storage and a large computational power. Both these categories are con-sidered in the thesis, i.e., 1) the efficient use of embedded multi-core processorswhere multiple processors are located on the same chip, applied to execute areal-time application, and 2) the efficient use of multi-processors within a cloudcomputing data center. We address these two categories of multi-processor sys-tems separately.

iii

iv

For each of them, we identify the key challenges to achieve a resource-efficient design of the system. We then formulate the problem and proposeoptimization solutions to optimize the efficiency of the system, while satisfyingall timing constraints. Introducing a resource efficient solution for those twocategories of multi-processor systems facilitates deployment of Industrie 4.0in smart manufacturing factories where multi-core embedded processors andcloud computing data centers are two central cornerstones.

To my dear family.

Acknowledgment

First of all I would like to give thanks to my supervisors Prof. Thomas Nolte,Prof. Bjorn Lisper, Prof. Kristian Sandstrom, and Dr. Alessandro Papadopou-los. Fulfilling this goal would not have been possible without their kind sup-port, encouragement and valuable comments.

I would also like to thank my office mate Lic. Matthias Becker for his helpand comments on my research project.

A special thanks goes to the CORE research group for providing a friendlyenvironment, specifically to: Prof. Thomas Nolte, Associate Prof. MorisBehnam, Prof. Kristian Sandstrom, Dr. Alessandro Papadopoulos, Lic. MatthiasBecker, Dr. Mohammad Ashjaei, Dr. Saad Mubeen, Lic. Sara Afshar.

I also would like to appreciate the IDT administration staff for their help todeal with all practical issues. Many thanks go to Carola, Jenny, Sofia and theothers.

I would also like to thank my friends and colleagues at the department forall the fun we have during the work and conference trips: Matthias, Dag, Arash,Hossein, Farid, Sara Ab., Mohammad, Alessandro, Sara x. Ab., Mehrdad,Sahar, Meng, Nesredin, Nima, Saad, Sara Afshar and others!

Last, I would like to express my gratitude to my family for the constantsupport and love throughout my life.

This work has been supported by Malardalen University, Vinnova via theFFI initiative ”AUTOSAR for Multicore in Automotive and Automation Indus-tries”, and KKS via the initiative PREMISE ”Predictable Multicore Systems”.

Hamid Reza FaragardiVasteras, May 10, 2017

vii

List of publications

Papers included in the licentiate thesis1

Paper A A communication-aware solution framework for mapping AUTOSARrunnables on multi-core systems, Hamid Reza Faragardi, Kristian Sand-strom, Bjorn Lisper, Thomas Nolte. In Proceedings of the 19th IEEEInternational Conference on Emerging Technologies and Factory Au-tomation (ETFA), 2014, September.IEEE Industrial Electronics Society Scholarship Award.

Paper B An efficient scheduling of AUTOSAR runnables to minimize com-munication cost in multi-core systems, Hamid Reza Faragardi, KristianSandstrom, Bjorn Lisper, Thomas Nolte. In Proceedings of the 6th IEEEInternational Symposium on Telecommunication (IST), 2014, August.

Paper C A resource efficient framework to run automotive embedded softwareon multi-core ECUs, Hamid Reza Faragardi, Kristian Sandstrom, BjornLisper, Thomas Nolte. Under submission, 2017, May.

Paper D A profit-aware allocation of high performance computing applica-tions on distributed cloud data centers with environmental considera-tions, Hamid Reza Faragardi, Aboozar Rajabi, Thomas Nolte, Amir Hos-sein Heidarizadeh. CSI Journal on Computer Science and Engineering,Vol 10, pp. 28 - 38, 2014.

Paper E Towards energy-aware resource scheduling to maximize reliabilityin cloud computing systems, Hamid Reza Faragardi, Aboozar Rajabi,

1The included articles have been reformatted to comply with the licentiate thesis layout.

ix

x

Reza Shojaee, Thomas Nolte. In Proceedings of the 15th IEEE Interna-tional Conference on High Performance Computation and Communica-tion (HPCC), 2013, May.

Paper F Towards energy-aware placement of real-time virtual machines ina cloud data center, Nima Khalilzad, Hamid Reza Faragardi, ThomasNolte. In Proceedings of the 17th IEEE International Conference onHigh Performance Computation and Communication (HPCC), 2015, Au-gust.

xi

Additional papers, not included in the licentiate the-

sis

1. A cost efficient design of a multi-sink multi-controller WSN in a smartfactory, Hamid Reza Faragardi, Hossien Fotouhi, Thomas Nolte, RahimRahmani. 18th IEEE International Conference on High PerformanceComputing and Communications (HPCC), 2017.

2. Ethical considerations in Cloud computing systems, Hamid Reza Fara-gardi. Doctoral Symposium on DIGITALISATION for a SustainableSociety, (short paper) 2017.

3. Ethical considerations in Cloud computing systems, Hamid Reza Fara-gardi, Hossien Fotouhi, Maryam Vahabi, Thomas Nolte. Journal of Soft-ware: Practice and Experience, Wiley Press , submitted September 2017(Under Review).

4. EAICA: An energy-aware resource provisioning algorithm for real-timecloud services, Hamid Reza Faragardi, Aboozar Rajabi, Thomas Nolte.In Proceedings of the 21st IEEE International Conference on EmergingTechnologies and Factory Automation (ETFA), 2016.

5. Energy-efficient scheduling of real-time cloud services using task con-solidation and dynamic voltage scaling, Ramin Razavi, Aboozar Rajabi,Hamid Reza Faragardi, Tahoora Pourashraf, Nasser Yazdani. In Pro-ceedings of the 7th IEEE International Symposium on Telecommunica-tions (IST), 2014.

6. Towards a communication-aware mapping of software components inmulti-core embedded real-time systems, Hamid Reza Faragardi, KristianSandstrom, Bjorn Lisper, Thomas Nolte. In Proceedings of the 20thIEEE Real-Time and Embedded Technology and Applications Sympo-sium (WIP session), 2014.

7. Communication-aware scheduling of AUTOSAR runnables on multicoresystems, Hamid Reza Faragardi, Kristian Sandstrom, Bjorn Lisper, Tho-mas Nolte. In Proceedings of the International Workshop on DesignSpace Exploration of Cyber-Physical Systems, Springer 2014.

8. From reliable distributed system toward reliable cloud by cat swarm op-timization, Reza Shojaee, Hamid Reza Faragardi, Nasser Yazdani. Inter-

xii

national Journal of Information and Communication Technology, Vol. 5,pp. 9-18, 2013.

9. Optimal task allocation for maximizing reliability in distributed real-time systems, Hamid Reza Faragardi, Reza Shojaee, Mohammad AminKeshtkar, Hamid Tabani. In Proceedings of the 12th IEEE/ACIS Interna-tional Conference on Computer and Information Science (ICIS), 2013.

10. An efficient scheduling of HPC applications on geographically distributedcloud data centers, Aboozar Rajabi, Hamid Reza Faragardi, ThomasNolte. Computer Networks and Distributed Systems, Springer 2013.

11. An analytical model to evaluate reliability of cloud computing systems inthe presence of QoS requirements, Hamid Reza Faragardi, Shojaee Reza,Tabani Hamid, Aboozar Rajabi. In Proceedings of the 12th IEEE/ACISInternational Conference on Computer and Information Science (ICIS),2013.

12. Towards a Communication-efficient Mapping of AUTOSAR Runnableson Multi-cores, Hamid Reza Faragardi, Bjorn Lisper, Thomas Nolte.In Proceedings of the 18th IEEE International Conference on EmergingTechnologies and Factory Automation (WIP session), Italy, 2013.

13. Communication-aware and Energy-efficient Resource Provisioning forReal-Time Cloud Services, Aboozar Rajabi, Hamid Reza Faragardi, NaserYazdani. In Proceedings of the 17th CSI Symposium on Computer Ar-chitecture and Digital Systems (CADS), 2013.

14. Reliability-aware task allocation in distributed computing systems us-ing hybrid simulated annealing and tabu search, Hamid Reza Faragardi,Reza Shojaee, Nasser Yazdani. High Performance Computing and Com-munication IEEE 9th International Conference on Embedded Softwareand Systems (HPCC-ICESS), 2012.

15. A new cat swarm optimization based algorithm for reliability-orientedtask allocation in distributed systems, Reza Shojaee, Hamid Reza Fara-gardi, Sara Alaee, Nasser Yazdani. In Proceedings of the 6th Interna-tional Symposium on Telecommunications (IST), 2012. Nominated as

one of the best papers, and invited to the international Journal of

Information and Communication Technology

xiii

16. Allocation of hard real-time periodic tasks for reliability maximizationin distributed systems, Hamid Reza Faragardi, Reza Shojaee, MaziarMirzazad-Barijough, Roozbeh Nosrati. In Proceedings of the 15th IEEEInternational Conference on Computational Science and Engineering,2012.

Contents

I Thesis 1

1 Introduction 3

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Multi-processor Categories . . . . . . . . . . . . . . . . . . . 6

1.2.1 Multi-processor System I . . . . . . . . . . . . . . . . 61.2.2 Multi-processor System II . . . . . . . . . . . . . . . 8

1.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . 9

2 Background 11

2.1 Embedded and Real-Time Multi-Core Systems . . . . . . . . 112.1.1 Multi-Core Processor Architecture . . . . . . . . . . . 132.1.2 The AUTOSAR standard . . . . . . . . . . . . . . . . 152.1.3 Key Challenges . . . . . . . . . . . . . . . . . . . . . 18

2.2 Cloud Computing Systems . . . . . . . . . . . . . . . . . . . 192.2.1 Cloud Data Center Topology . . . . . . . . . . . . . . 202.2.2 Energy Consumption of a Data Center . . . . . . . . . 212.2.3 Energy Consumption in a Cloud Federation . . . . . . 232.2.4 Key Challenges . . . . . . . . . . . . . . . . . . . . . 23

3 Research Summary 25

3.1 Problem Statement and Research Goals . . . . . . . . . . . . 253.1.1 Research Themes . . . . . . . . . . . . . . . . . . . . 253.1.2 Research Questions . . . . . . . . . . . . . . . . . . . 26

3.2 Research Methodology . . . . . . . . . . . . . . . . . . . . . 27

4 Contributions and Discussion 29

4.1 Technical Contribution . . . . . . . . . . . . . . . . . . . . . 29

xv

xvi Contents

4.2 Personal Contribution . . . . . . . . . . . . . . . . . . . . . . 32

5 Conclusions and Future Work 33

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

6 Overview of the Papers 37

6.1 Papers with focus on multi-core resource optimization in thecontext of automotive systems . . . . . . . . . . . . . . . . . 376.1.1 Paper A . . . . . . . . . . . . . . . . . . . . . . . . . 376.1.2 Paper B . . . . . . . . . . . . . . . . . . . . . . . . . 386.1.3 Paper C . . . . . . . . . . . . . . . . . . . . . . . . . 39

6.2 Papers with focus on resource optimization in cloud computingsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.2.1 Paper D . . . . . . . . . . . . . . . . . . . . . . . . . 396.2.2 Paper E . . . . . . . . . . . . . . . . . . . . . . . . . 406.2.3 Paper F . . . . . . . . . . . . . . . . . . . . . . . . . 41

Bibliography 43

II Included Papers 47

7 Paper A:

A Communication-aware Solution Framework for Mapping AU-

TOSAR Runnables on Multi-core Systems 49

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 527.3 Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . 53

7.3.1 Problem Description . . . . . . . . . . . . . . . . . . 547.3.2 Communication Time Analysis . . . . . . . . . . . . 557.3.3 Optimization Problem . . . . . . . . . . . . . . . . . 57

7.4 Solution Framework . . . . . . . . . . . . . . . . . . . . . . 597.4.1 Solution 1: Simple Mapping . . . . . . . . . . . . . . 597.4.2 Solution 2: SMSAFR . . . . . . . . . . . . . . . . . . 607.4.3 Solution 3: The Utilization-based Refinement Approach 60

7.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 667.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Contents xvii

8 Paper B:

An Efficient Scheduling of AUTOSAR Runnables to Minimize Com-

munication Cost in Multi-core Systems 77

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 798.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 808.3 Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . 818.4 Solution Framework . . . . . . . . . . . . . . . . . . . . . . 84

8.4.1 Framework 1: SMSA . . . . . . . . . . . . . . . . . . 848.4.2 Framework 2: The Refinement Approach . . . . . . . 90

8.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 928.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 96Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

9 Paper C:

A Resource Efficient Framework to Run Automotive Embedded

Software on Multi-core ECUs 103

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1059.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 1089.3 Problem Modeling . . . . . . . . . . . . . . . . . . . . . . . 110

9.3.1 The AUTOSAR Architecture . . . . . . . . . . . . . . 1109.3.2 Problem Description . . . . . . . . . . . . . . . . . . 1129.3.3 Communication Time Analysis . . . . . . . . . . . . 1209.3.4 Optimization Problem . . . . . . . . . . . . . . . . . 123

9.4 Solution Framework . . . . . . . . . . . . . . . . . . . . . . 1259.4.1 Method 1: Simple Mapping-based Approach (SMSA) 1259.4.2 Method 2: The Feedback-based Refinement Approach

(SMSAFR) . . . . . . . . . . . . . . . . . . . . . . . 1289.4.3 Method 3: The Utilization-based Refinement Approach

(PUBRF) . . . . . . . . . . . . . . . . . . . . . . . . 1299.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 134

9.5.1 Alternative Frameworks . . . . . . . . . . . . . . . . 1349.5.2 Application and Hardware Specifications . . . . . . . 1369.5.3 Comparison Results . . . . . . . . . . . . . . . . . . 139

9.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 143Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

xviii Contents

10 Paper D:

A Profit-aware Allocation of High Performance Computing Ap-

plications on Distributed Cloud Data Centers with Environmental

Considerations 151

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 15310.2 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . 15510.3 Problem Defition . . . . . . . . . . . . . . . . . . . . . . . . 156

10.3.1 System Model . . . . . . . . . . . . . . . . . . . . . 15610.3.2 Energy Model . . . . . . . . . . . . . . . . . . . . . . 15710.3.3 CO2 Emission Model . . . . . . . . . . . . . . . . . . 15810.3.4 Profit Model . . . . . . . . . . . . . . . . . . . . . . 15910.3.5 Optimization Problem . . . . . . . . . . . . . . . . . 159

10.4 Proposed Solution . . . . . . . . . . . . . . . . . . . . . . . . 16010.4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . 16010.4.2 Imperialist Competitive Algorithm (ICA) . . . . . . . 16210.4.3 Highest Execution time-Lowest Power consumption (HELP)

Heuristic . . . . . . . . . . . . . . . . . . . . . . . . 16410.4.4 Online Allocation . . . . . . . . . . . . . . . . . . . . 16610.4.5 Migration Handler . . . . . . . . . . . . . . . . . . . 168

10.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 16810.6 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 173Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

11 Paper E:

Towards Energy-aware Resource Scheduling to Maximize Relia-

bility in Cloud Computing Systems 179

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 18111.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 18411.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . 18511.4 System Modeling . . . . . . . . . . . . . . . . . . . . . . . . 187

11.4.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . 18711.4.2 Principle Constraints . . . . . . . . . . . . . . . . . . 19011.4.3 Reliability Evaluation . . . . . . . . . . . . . . . . . 19111.4.4 Power Consumption Evaluation . . . . . . . . . . . . 19311.4.5 Multi-Objective Optimization . . . . . . . . . . . . . 194

11.5 Solution Approach . . . . . . . . . . . . . . . . . . . . . . . 19611.5.1 Imperialist Competitive Algorithmg . . . . . . . . . . 19611.5.2 Online Scheduling Algorithm . . . . . . . . . . . . . 199

11.6 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . 201

Contents xix

11.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 205Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

12 Paper F:

Towards Energy-Aware Placement of Real-Time Virtual Machines

in a Cloud Data Center 211

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 21312.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21412.3 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

12.3.1 From Application to VM Specification . . . . . . . . 21612.3.2 VM to Core Allocation . . . . . . . . . . . . . . . . . 21912.3.3 VM Placement Algorithm . . . . . . . . . . . . . . . 221

12.4 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . 22412.5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . 225Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

I

Thesis

1

Chapter 1

Introduction

1.1 Motivation

Nowadays, most computing systems employ a multiprocessor. A multi-processor system includes a wide spectrum of computing systems rangingfrom a tiny chip hosting multiple cores to a huge cloud federation includinggeographically-distributed cloud data centers connected by the Internet. Inmultiprocessor systems the substantial challenge to achieve a reasonable per-formance is resource optimization. Although it is known that a multiprocessorsystem with N processing nodes may not have the same performance as Nuniprocessors (due to shared resources and inter-core communication over-head), resource optimization gives us an opportunity to reach the maximumperformance of multiprocessor systems. The performance of the system,in this thesis, is defined from the application perspective, meaning that theapplication can execute faster on a higher performance multiprocessor than ona lower performance one.

In many instances of multiprocessor systems, real-time constraints need tobe considered in the modeling of the system. The real-time constraints couldbe either hard real-time constraints, which is commonly found in avionics andautomotive applications, or soft real-time constraints that is common in multi-media and tele-communication applications. Apart from real-time constraints,there could be additional complex constraints in multiprocessor systems, suchas: memory limitations, limitations on communication bandwidth and hetero-geneity. In this work, we intend to focus on challenges related to resourceoptimization in multiprocessor systems, where not only there are real-time con-

3

4 Chapter 1. Introduction

straints, but also other complex constraints complicate the problem.When dealing with real-time applications, the performance may not be the

major concern as long as all real-time constraints are met. In such systems,usually an additional criterion is defined as the main concern of the systemdesigners; resource efficiency. In this thesis we define resource efficiency asthe amount of resources required to execute an application with an acceptableperformance (rather than high performance), while the application and systemconstraints are met. Accordingly, a highly efficient multi-processor system isable to execute a real-time application using a minimum number of resourceswithout sacrificing real-time constraints.

The allocation of a given workload on a multi-processor system dramati-cally impacts on the efficiency of the system. In an allocation scenario, twomain aspects should be taken into account. The first one is to allocate theworkload among the processors uniformly to increase the performance of theexecution of the application. The second one is to attempt to minimize thecommunication cost, since it can have a negative impact on the performanceof a multi-processor system. When the modules of an application running on amulti-processor system communicate with each other, load balancing and com-munication cost have an inverse relationship, in the sense that load balancinginherently increases the communication cost. System designers strive to obtainthe right compromise between aspects such as load balancing and communica-tion overhead.

In principle, it is not possible to propose the same solution to optimizingresource efficiency for all types of multiprocessor systems. Consequently, foreach type of multiprocessor system, with regards to both the hardware architec-ture and its application, we need to deal with each type of multiprocessor sys-tems distinctly. Two types of multiprocessor systems (categories) are discussedin this thesis. The first, denoted by Multiprocessor System I is a multi-coreembedded system which is a small multiprocessor system, applied to executecomponent-based real-time applications. The latter, denoted by MultiprocessorSystem II is a cloud computing system which is able to guarantee the timingrequirements of provided services. In the rest of this section we will have adeeper look at each of these systems.

There are four principal reasons to consider these two categories of multi-processor systems (multi-core embedded system and cloud computing system)among many other variants of multi-processor systems.

1. Both represent highly popular research trends in the context of

multi-processor real-time systems. Multi-core embedded systems

1.1 Motivation 5

are quite common in the context of embedded real-time systems.Ninety-eight percent of all multi-core processors are manufactured ascomponents of embedded systems [1]. Accordingly, when we talkabout embedded systems, we implicitly talk about multi-core embeddedsystems. Cloud computing is another popular multi-processor system,providing a highly scalable and cost-effective solution by outsourcingthe data and processing on a shared cloud infrastructure which is calleda cloud data center. Recently, there is a growing trend to utilize cloudcomputing to run real-time applications –called real-time cloud– such astelecommunication, multi-media, and online gaming applications. It isalso a hot topic in research to use cloud computing to run hard real-timeapplications, which is our main target for future work.

2. Most of manufacturing industries are dealing with the challenges

associated with both categories of the multi-processor systems at the

same time. At the present time, automation, automotive, avionics andseveral other manufacturing industries have started to replace their tradi-tional single-core embedded devices with multi-core embedded devices,while at the same time, they have also started to move from local com-puting servers onto cloud infrastructures. In such industries, computingservers are employed to store and process large streams of data comingfrom sensors, controller loops and other embedded devices. Increasingthe size and complexity of data, and processing demands has led them todecide to adopt cloud computing as a scalable solution to manage theirexponentially growing demands.

3. They are two fundamental elements in both Internet of Things and

Industry 4.0. The Internet of Things (IoT) and Industry 4.0 are twoterms that are increasingly used by academia, industry and in the dailypress. The term Industrie 4.0 is used for the next industrial revolution -which is about to take place right now. Basically, it is the current trend ofautomation in manufacturing technologies to create what has been calleda smart factory [2]. In a smart factory, objects, machines and humansare connected intelligently for incorporation of machinery, warehousingsystems and production facilities in the shape of Cyber-Physical Systems(CPS). The integration of three principal elements including multi-coreembedded systems as the key part of a cyber-physical system, the IoTand cloud computing together constitutes the principle of industry 4.0.Accordingly, embedded multi-core devices and cloud computing are twobuilding blocks in Industry 4.0. Not only in the Industry 4.0, but also


in other IoT systems, these two multi-ptocessor systems play an activerole. Considering that most of the applications in Industry 4.0 and IoTsystems introduce real-time requirements, both of the multi-processorsystems must be capable of addressing real-time applications efficiently.

4. Similar approaches can be applied to achieve a resource efficient

solution for both of these systems. Although these two categories ofmulti-processor systems may seem to have totally different characteris-tics, there are several shared concepts and similar challenges betweenthem. Basically, in both systems, the problem is to allocate applicationcomponents that communicate with each other, subject to specific dead-lines, to maximize the efficiency of the system while also satisfying otherapplication and system constraints. The approaches are also the same inprinciple, which begin with defining a goal function specifying the effi-ciency metric, then modeling the real-time and other systems constraints,continued by presenting the problem as an optimization problem, after-ward solving the optimization problem using heuristic algorithms, finallyto evaluate the approach through conducting of simulation experimentsin which values of parameters are derived from real-world benchmarks.

1.2 Multi-processor Categories

In the following we present the two categories of multi-processor systems thatare targeted by the work conducted in this thesis in more detail:

1.2.1 Multi-processor System I

The first category of multi-processor systems considered in this work is a multi-core embedded system on which a real-time component-based application isexecuted. Although we investigate the efficient use of multi-core processors torun real-time embedded software taking a general perspective, automotive ap-plications are taken into account as a case study. The size of these applicationsis relatively small. In the automotive industry, software systems are usually de-veloped according to a well-known standard called Automotive Open SystemArchitecture (AUTOSAR). An AUTOSAR-based application consists of a setof Software Components (SCs). Figure 1.1 shows an AUTOSAR system on amulti-core processor.

The main goal in the system is to minimize the overall processors’ utiliza-tion while the main constraints are end-to-end latencies between a sequence of

1.2 Multi-processor Categories 7

Figure 1.1: AUTOSAR architecture on multi-core processors.

SCs (called a transaction). A transaction corresponds to a mission in the sys-tem, for example, in a car a transactions could be: the anti-lock braking system,engine control process, temperature control. In this system, the key hurdles toachieve resource efficiency are listed as follows:

1. Inter-SC communication costs: if the SCs communicating with eachother are located on different tasks, and possibly also on different cores,then it imposes overhead on the system not only because of a higherload on the shared memory, but also a higher CPU utilization besidesrequiring a system call to invoke OS mechanisms to perform inter-task(possibly inter-core) communication.

2. Limitation for assignment of some SCs to only a subset of cores: a SCmay only be allowed to be allocated to a subset of the cores, due to thelocation of the basic software components required to execute the SC.

3. Shared SCs: A SC could be shared among different transactions, whichmay prevent both transactions from running the same SC at the sametime. If the transactions with the shared SCs are not effectively serial-ized, it affects the system efficiency.


An effective resource optimizer must be capable of dealing with these threechallenges to reduce their negative impact on the system efficiency.

1.2.2 Multi-processor System II

The second category of multiprocessor systems considered in this work is acloud data center, where multiprocessors are connected to each other througha communication network, in contrast to the multi-core system in which allthe cores are located on a single chip. Traditionally, enterprises were buyingtheir own computing infrastructures to run their applications, such as financialanalysis, distributed data processing, etc. However, cloud computing bringsa highly scalable solution for various types of user demands. Recently, cloudcomputing is becoming popular to execute various soft real-time applications,such as video streaming, gaming and tele-communication applications.

A cloud computing system which is able to guarantee the timing require-ments of provided services is usually called as Real-Time Cloud (RTC). Re-source optimization plays a critical role in achieving an acceptable perfor-mance in an RTC, while the minimum number of resources are utilized. That isexactly what we take into account in designing a resource provisioning mech-anism for real-time cloud services to efficiently develop an RTC. Here again,resources should be utilized in such a way that not only the timing requirementsof applications are met, but also the minimum computing and communicationresources are required. Minimizing the required resources has the potential toreduce the Total Cost of Ownership (TCO), as a result of the reduction of thecost of operation of the system. For example, minimizing the number of re-quired servers can result in a lower energy consumption which is one of themost significant costs for operating a data center.

In cloud computing systems timing constraints can be considered from twodifferent perspectives:

1. The first one is from the end-users perspective, where the total responsetime, starting with the submission of a request by the user and reach-ing until the completion of execution, should meet a specific deadline.The main challenge in this perspective is to decide which applicationsare able to run on an RTC, and which of them should run locally. Thisdecision can be made with respect to several considerations, such as: thetiming requirements of applications, the QoS guaranteed by the cloudprovider, and the delay of the Internet and other communication equip-ment.

1.3 Outline of the Thesis 9

2. The second perspective is the challenge of the cloud providers, where aset of applications have been submitted to a cloud, and the cloud systemshould be able to execute them within a specific time. In fact, cloudproviders must be able to give a guarantee for the quality of services,specifically with respect to the response time of services. That is whatwe call a real-time cloud.

Indeed, the former perspective comprises a broader range which also in-cludes the latter perspective, in the the sense that in order to satisfy the end-to-end response time constraints, one of the steps is to provide an RTC. In thecurrent state of this research, the main focus has been carried out given thecloud provider perspective (the second perspective), in which a set of real-timeapplications are submitted to an RTC, and the main challenge is to allocate thecomputation and communication resources efficiently. In order to make an effi-cient allocation, timing requirements must be met, while the minimum numberof resources (or low-cost resources) are used to optimize the cost of operationof the system. The costs of operation in an RTC include (i) energy consumptionwhich can be also translated into carbon emission, (ii) the amount of resourcesrequired for providing acceptable performance, availability and reliability ofservices.

In this thesis we also propose an RTC which is capable of handling peri-odic (and sporadic with a known minimum inter-arrival time) real-time tasks,as the task set in most real-time applications is formed as a set of periodic (orsporadic) tasks. The task set is mapped into a real-time Virtual Machine (VM).In this case, the VM specification must reflect the resource demand of the ded-icated periodic task set, such that as a result of satisfying the VM demands, thetasks are able to complete their execution before the deadlines.

It should be mentioned that the continuation of this research (future work)intends to cover also the first perspective by providing a local cloud –calledfog cloud– connected by a predictable communication link (such as softwaredefined networks) to the RTC which is introduced in this thesis.

1.3 Outline of the Thesis

This thesis is organized in 12 chapters. Chapter 2 introduces the required back-ground of the thesis. Chapter 3 presents the research summary and the researchmethodology used in the thesis. Research questions are also highlighted in thischapter. The contributions of the thesis are reflected in Chapter 4. In Chapter 5we present the conclusions and future work. Chapter 6 provides an outline of


the included papers in the thesis. Finally, the included papers are presented inChapter 7 to 12.

Chapter 2

Background

In this chapter we provide high level background information needed to con-textualize the thesis and the work itself.

2.1 Embedded and Real-Time Multi-Core Sys-

tems

Embedded systems are those systems where a computer system is embedded aspart of a complete device often including hardware and mechanical parts [3].Most of embedded systems have a dedicated function, often subject to timingconstraints. In such systems, due to resource limitations, efficient use of theresources such as memory, the CPU, and communication equipment, is a keypoint in the design of the system.

Once an embedded system includes real-time requirement(s), the correct-ness of a computation is not solely enough; the response time of the computa-tion is also important. In other words, to fulfill the functionality of the system,the response should be produced within a specific time referred to as deadline.The deadlines are inherent in the application requirements, and according tothe specific application requirements, the deadlines could be either hard or soft.For example, in the automotive industry the deadline associated to the brakingsystem is a hard deadline i.e., the corresponding computations absolutely needto be performed before the deadline.

One of the principal solutions for system designers when it comes to satis-fying real-time requirements is scheduling. The scheduler specifies the order

11

12 Chapter 2. Background

of execution of the tasks such that they can meet their deadlines. In real-timesystems, to ensure that a given task set is able to meet its deadlines, two of-fline operations are performed, (i) worst case execution time analysis, and (ii)schedulability analysis. In the former phase, the worst case execution time oftasks is calculated, while in the latter phase, the worst-case timing behavior ofthe system under a particular scheduling mechanism is examined, and it pro-duces a yes/no answer for the schedulability of the considered task set.

Fixed Priority Scheduling (FPS) is a well-known scheduling mechanismemployed in a large number of real-time systems. The schedulability analysisof FPS was proposed in [4]. Earliest Deadline First (EDF) is another well-known scheduling algorithm, in which, as far as the total utilization of thegiven workload is not greater than one, all the tasks of the workload meet theirdeadlines [5].

For multi-processor scheduling, there are two main approaches that provideboth advantages and disadvantages.

1. Partitioned scheduling where the given task set firstly is divided amongprocessors (i.e., task allocation), afterward, the subset of tasks dedicatedto each processor is scheduled by a uniprocessor scheduling algorithmsuch as FPS or EDF [6].

Processor 1

Processor N

Processor 2

…

Uniprocessor Scheduler



Figure 2.1: Partitioned scheduling scheme.

Partitioned scheduling enables us to employ well-known uniprocessorschedulers along with their schedulability analysis. Additionally, parti-tioned scheduling has a low run-time complexity, since task migrationis not allowed. Nevertheless, it adds a new challenge to the system de-sign i.e., the task allocation problem. If a given task set is not properlypartitioned among different processors, it is not possible to efficiently

2.1 Embedded and Real-Time Multi-Core Systems 13

utilize the processors. Accordingly, an optimal task allocation is keyto achieve good resource efficiency of multi-processor systems underthe partitioned scheduling. Unfortunately, the task allocation problem isknown to be an NP-Hard problem [7], thus, finding the optimal alloca-tion of tasks to processors can not be achieved in a non-exponential (i.e.,polynomial) time.

2. Global scheduling where a single shared queue is employed to sched-ule the tasks instead of multiple dedicated queues as is used in parti-tioned scheduling [6]. In this case, multiple jobs of a task may ex-ecute on different processors. Global EDF is one of the well-knownglobal schedulers which however is not optimal i.e., if a task set is notschedulable under the Global EDF, it may be schedulable under otheralgorithms. Proportionate fair (pfair) is one of the global scheduling al-gorithms which is optimal for periodic tasks with implicit deadlines [8],however, it requires many preemptions and migrations which yields aconsiderable overhead for this algorithm.

T1 T2 T3 …

Processor 1

Processor N

Processor 2Global Scheduler

…

Figure 2.2: Global scheduling scheme.

In most parts of this thesis, particularly those parts of the thesis dealingwith real-time scheduling for multi-core embedded processors, the partitionedscheduling approach is adopted. Consequently, besides processor scheduling,the task allocation problem is also taken into account.

Now let us have a look at the multi-core architecture to enable us to elab-orate the discussion about resource efficiency on multi-core embedded proces-sors.

2.1.1 Multi-Core Processor Architecture

A multi-core processor hosts multiple processing cores on a single die, whereshared-memory is typically used to conduct inter-core communications. One of


the common multi-core architectures used in a wide variety of embedded real-time applications [9] employs a three-level shared cache, in which not onlymain memory but also multiple levels of the cache are shared among the cores.In such architectures each core has its own private L1 cache while a second-level (L2) cache is shared across each pair of cores, and finally a third-levelcache (L3) is shared among all cores. It is difficult to characterize the latencyvalues with precise numbers, but in general the L2 cache latency is almost twoto three times larger than the L1 cache latency, the L3 cache latency is roughlyten times larger than the L1 cache latency, and the RAM latency is two ordersof magnitude larger than the latency of the L1 cache [10].

Figure 2.3 represents a sample of such an architecture with four process-ing cores. It should be mentioned that there are other types of shared-cacheprocessors, and this figure only shows an example of such an architecture. Forexample, in another variation of the three-level cache architecture, the L2 cacheis also private for each core and it is not shared among a pair of cores (e.g., IntelCore i7).

Figure 2.3: A three-level shared-cache quad core architecture.


AUTOSAR Run-Time Environment (RTE)

OS

Hardware(CPU Core 1,2,..,n)

Application Layer

Complex Driver Layer

Basic Software components and services

Figure 2.4: AUTOSAR software architecture.

2.1.2 The AUTOSAR standard

AUTOSAR is a software standard to develop component-based applicationsthat now is being widely adopted in automotive systems. An AUTOSAR-basedapplication consists of a set of loosely-coupled Software Components (SWC).Each SWC specifies its input and output ports, and the SWC can only com-municate through these ports. AUTOSAR provides an abstract communica-tion mechanism called the Virtual Functional Bus (VFB). The VFB allows fora strict separation not only between applications and infrastructure, but alsointer-SWC communications are performed by this mechanism in a standardway. It conceptually makes a SWC independent of the underlying hardwarearchitecture of the Electronic Control Unit (ECU). All services demanded bythe SWCs are provided by the AUTOSAR Run-Time Environment (RTE). In-deed, application SWCs are conceptually located on top of the RTE. The RTEis specifically generated for every ECU. The RTE itself uses the AUTOSAROperating System (OS) and Basic Software (BSW). The VFB functionality isalso implemented by the RTE for each ECU. The RTE is also responsible tomap the corresponding system calls to the BSW modules.

Figure 2.4 depicts this architecture. The BSW provides a standardized,highly-configurable set of services, such as: communication over various phys-ical interfaces, NVRAM1 access, management of run-time errors. The BSWforms the biggest part of the standardized AUTOSAR environment [11].


Multi-core support in AUTOSAR is still optional. Making a copy or mov-ing some basic software components on to other processing cores is suggestedin AUTOSAR version 4.2, to increase the parallelization of the system. How-ever, in edition 4.0, all basic software components were allowed to locate onlyon one of the cores, creating a bottleneck in the system. On the other hand,in versions earlier than 4.0, there was no support for multi-core ECUs. Theconfiguration of the basic software on different cores is out of the scope of thisthesis, and it remains as one of the concerns of the RTE designers. However,design decisions affect the cost of allocation of SWCs onto cores, and thusconducting the search direction towards the allocation solutions in which theSWCs that require a specific BSW to be allocated to the core on which theBSW is located.

SWCs have inter-communication relationships that are assumed to be basedon non-blocking read/write semantics [12]. To exchange data among the SWCslocated on the same task InterRunnable Variables (also called, local labels) areused that are read and written by the SWCs. SWCs located on different taskshave to use RTE mechanisms, e.g., the Sender-Receiver mechanism, to transferdata. Indeed, reading and writing inter-task labels are managed by the RTE.SWCs located on different tasks typically have read-execute-write semantic, acommon semantic in AUTOSAR that is also called implicit access [13], wherea local copy of the inter-task label for data access is created and the modifieddata is written back at the termination of the task execution.

In an AUTOSAR application, three types of communications are oftentaken into account, where the first and second type indicate the inter-SWCcommunications while the third type shows the interaction between the SWCsand BSWs. The first type covers data dependency between the SWCs, wherethey have to start to run with the fresh data generated by the predecessors tofulfill the dependency, in other words, there is a precedence among their ex-ecution order. The second communication type is when a pair of SWCs cantransfer data in between each other while the freshness of data does not matteror at least as long as all SWCs are completed within their periods the maximumage of data [14] is acceptable, in other words, there is no precedence amongtheir execution order. The third type represents communication cost originatingfrom SWCs communicating with BSW modules.

The data dependency between the SWCs (the first communication type) isreflected by a set of transactions, where each of which represents an end-to-end function implemented by a sequence of SWCs. Indeed, each transactionis a directed acyclic graph in which each node is a SWC and links representdata dependency between them. To fulfill the mission of a transaction the fresh


data generated by the latest instance of the predecessors should be provided.Figure 9.2 shows a sample of a transaction. Without loss of generality we canassume that all SWCs are covered by at least one transaction; if a SWC is notincluded in any transaction, then we assume a new dummy transaction coveringonly this SWC.

Figure 2.5: A sample of a transaction.

The transaction Γi has a relative end-to-end deadline, before which allrunnables of the transaction must finish their execution. The transaction dead-line corresponds to either:

• The deadline of the mission associated to that transaction, for example,the mission could be the braking system in a car where the whole trans-action must complete before a specific end-to-end deadline. This is thecase when the whole mission is supposed to run within a single multi-core ECU and the transaction outputs are sent to a system output (anactuator, for instance).

• Only a portion of a mission is covered by this transaction running ona single multi-core ECU. In other words, the other parts of the missionare executed by other ECUs in the system. In this case, let’s assumethat the system designers have defined a partial deadline for the transac-tion within this ECU (deadline decomposition [15]), meaning that if thisECU completes the transaction before the partial deadline and providesthe output data for transmission to other ECUs on time, then the wholemission is able to meet its deadline.

Task Synchronization Mechanisms

Regarding mutually exclusive access to resources, AUTOSAR recommends touse the Priority Ceiling Protocol (PCP) [16] for intra-core task synchroniza-tion. AUTOSAR provides a function called GetResource(). This function usesPCP, in which the main idea is to ensure that when a task preempt a critical sec-tion of another task and executes its own critical section, the priority at whichthis new critical section will execute is definitely higher than the inherited pri-orities of all the preempted critical sections. It not only bounds the blockingtime, but also significantly reduces priority inversion. PCP ensures that the


priority inversion is no more than the length of a single resource-holding by alower priority task. However, this mechanism does not scale to multi-core pro-cessors, since priorities are insufficient to prevent access from tasks executingon other cores.

AUTOSAR provides a SpinlockType mechanism for inter-core task syn-chronization. Indeed, the SpinlockType is an approach to handle the shortcom-ings of PCP for use in multi-core processors. It is a busy-waiting mechanismthat polls a lock variable until it becomes available. An atomic test-and-setfunctionality is used for the lock poll. Once a lock variable is obtained by atask, other tasks on other cores will not be able to obtain the lock variable. Thismechanism properly works for multi-core processors, where shared memorybetween the cores can be used to implement the lock variables.

In the context of non-nested spinlocks, the SpinlockType mechanism couldpotentially lead to deadlocks, as noted in the standard specification itself. Thedeadlock happens when a lower priority task holding a resource protected by aSpinlockType is preempted by a higher priority task that later tries to acquirethe same SpinlockType [17]. To avoid this problem, AUTOSAR in edition4.2.2 recommends to use the SuspendAllInterrupts function to disable the pos-sibility of preemption during the execution of a Global Critical Section (GCS).This solution not only avoids deadlines and starvation, but it also improves theamount of remote blocking suffered during multi-core synchronization. How-ever, the disadvantage of this recommendation is: higher priority tasks requir-ing another shared resource R will also suffer from non-preemptive executionof the critical section on the resource R. Basically, the SpinlockType mech-anism increases the execution time of a task due to the busy-waiting time toaccess a global shared resource. The busy-waiting time is also called spinlocktime.

2.1.3 Key Challenges

One of the major resources in multi-core embedded processors is the CPU. Acomponent-based application contains a set of software components exchang-ing a large number of messages. This is also the case for an AUTOSAR-basedapplication, where runnables send many messages among each other. Theinter-SC messages impose a CPU overhead for both the sender and receivertasks. If mapping of SCs to tasks and allocation of tasks to cores are not devel-oped efficiently, a considerable portion of the CPU time is spent to perform thecommunications. This also results in a substantial memory and cache overheadto conduct inter-core communications.

2.2 Cloud Computing Systems 19

2.2 Cloud Computing Systems

Cloud computing is a new generation of distributed systems inheriting severalparadigms from previous computing models, such as mainframe computing,cluster computing and grid computing. Let us have a look at the evolution ofcloud computing.

• In the mainframe computing model, multiple users were able to accessto a powerful computer (at that time) through terminals. Since the costof purchase and maintenance of a mainframe was extremely expensive,it was not feasible for an organization to buy one mainframe for everyemployee. Therefore, by sharing computing resources, mainframe wereproviding a cost-effective practical solution.

• In cluster computing a group of computers connected by a Local AreaNetwork (LAN) creates a cluster, which is similar to the notion of clouddata centers. However, cluster differs with cloud and grid, since cloudand grid are more wide scale and can be geographically distributed. Inother words, a cluster is tightly coupled, whereas a grid or a cloud isloosely coupled.

• Grid computing shares a lot of features with cloud computing, while themajor differences are: (i) utility computing, which is not included in thedefinition of grid, and (ii) software services can be provided by cloudrather than grid that only provides infrastructure services.

Cloud computing services are categorized in three different forms of ser-vice; Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastruc-ture as a Service (IaaS). In the SaaS form, a cloud provider offers a softwareapplication as an on-demand service, then users can use services through a sub-scription, in a pay-as-you-go model. Dropbox and Google Docs are examplesof this form of cloud serves. In the PaaS model, a platform for developing webapplications are provided, and developers can gain from the in place facilitiesto quickly create their own applications. PaaS makes the development, testing,and deployment of applications simple and cost-effective. Apprenda is an ex-ample of this form. In the IaaS form, computing and communication hardwareresources (such as servers, storage, network and operating systems) are pro-vided as on-demand services in pay-as-you-go model [18]. In this self-servicemodel, users can create their own Virtual Machines (VMs) and specify the re-quired processing power of the VMs and networking services (e.g. firewalls).


Microsoft Azure and Amazon Web Services are examples of this category ofcloud services. Figure 2.6 depicts these three categories in a cloud computingsystem. In this work, we mainly focus on SaaS and IaaS, being as the mostpopular form of cloud services.

SOFTWAREAS A SERVICE

PLATFORMAS A SERVICE

INFRASTRUCTUREAS A SERVICE

Figure 2.6: Cloud computing layers.

A cloud computing system includes one or multiple data centers, whereeach data center contains a set of servers (hosts) connected together through acommunication network. The servers in a data-center could be identical whichis called a homogenous data-center, or they could have different characteristicswhich is called a heterogeneous data-center. Each server itself consists of mul-tiple components such as multi-core processors, hard disks handled by RAIDcontrollers, memory modules and I/O devices. Let us have a look at one of thecommon topologies of a data-center.

2.2.1 Cloud Data Center Topology

In Figure 2.7, an example of a cloud data-center is illustrated, where there aremultiple inter-connected modules, each of which contains a set of racks and


aggregation switches to connect the racks together. A rack itself embraces aset of servers connected to each other through the rack switches.

Figure 2.7: A three-tier data-center network topology.

2.2.2 Energy Consumption of a Data Center

Energy consumption of data centers is around 1.5% - 2% of the total energyconsumption in the US, while this is around 1.5% in the world [19, 20]. Threemain elements contribute to the energy consumption of a data-center, (i) cool-ing and power distribution system, (ii) computing servers, (iii) network equip-ment. In this thesis the energy consumption of servers is mainly targeted, how-ever the consolidation of the workload on a minimum number of servers po-tentially decreases the energy consumption of the network devices and coolingsystem as well.

The power consumption of a server consists of the power consumed byCPU, memory, disk storage and network interfaces. It is shown by [21] thatthe power consumption of the CPU is one of the effective factors in the overallpower consumption of a host. Accordingly, the power consumption of a hostcould be achieved based on the CPU utilization by a linear model [22] definedin Eq.2.1.

P(U) = k×Pmax +(1− k)Pmax ×U (2.1)

where P(U) is the power consumption of a host when its CPU utilization isU , Pmax is the maximum power of a fully utilized host and k is the fraction of


power consumed by an idle host. It is cost-effective to turn the host off whenits utilization is equal to zero. It should be noted that a host still consumesenergy when it is turned off. Thus, it should be taken into consideration toprovide a more accurate energy model off-consumption (i.e., the consumptionof plugged-host when it is off). It is observed by [23] that the off-consumptionis 15% of the idle consumption. Hence, with the assumption that a server isalways turned off whenever it is idle (utilization is equal to zero) then Eq.2.2can model power consumption of the ith host.

Pi(U) =

{k×Pmax

i +(1− k)Pmaxi ×Ui Ui > 0

0.15×Pidle Ui = 0(2.2)

Power consumption of a server. This power model is originally proposedfor a uni-processor however, most of the common servers use a multi-core pro-cessor. There are three ways to adapt the mentioned model for a multi-coreprocessor. The most straightforward way is to consider each core as an inde-pendent processor which in this case, the power model could be too pessimisticas the cores are located on the same chip, and therefore their power consump-tion is significantly lower than a multi-processor where processors do not sharethe same chip. The second way to apply the mentioned model for a multi-coreprocessor is to assume that a multi-core processor is a single processor with ahigher computation capacity which consumes more power. In this case, if weassume that the multi-core processor in each server is homogeneous (all pro-cessing cores are identical) then the total utilization of the workload assignedto this server can be divided by the number of cores within this server and thenwe can apply Eq.2.2 to calculate the power consumption of this server. In fact,it is assumed that the assigned workload to this server is uniformly distributedamong its processing cores. Nevertheless, we need a load-balancing algorithmto assign VMs to the cores of a server in a balanced manner. According to thisapproach, the power consumption of a server is derived by Eq.2.3 where theserver utilization is calculated by the following

Uj(X) =∑N

i=1 uiXi j

m(2.3)

where m and N denote the number of cores and the number of software appli-cations respectively, the set X refers to a particular VM placement, Xi, j denotesthe existence of the ith VM on the jth server. If VM i is placed on server j, thenXi, j = 1, otherwise, we have Xi, j = 0. In addition, Uj(X) denotes the utiliza-tion of the jth host corresponding to the placement X . The third power model


considers the load of each processing core and the total power consumption ofthe multi-core chip is calculated according to the load of each core which is amore sophisticated power model.

2.2.3 Energy Consumption in a Cloud Federation

A cloud federation consists of multiple data-centers which could geographi-cally distributed, and connected to each other to provide an integrated cloudcomputing system. Most often there are high bandwidth communication linesdedicated to connect these data-centers to each other. In a cloud federation, thecost of execution of applications and the overall energy consumption dependson the data-center (also called site) chosen to serve the services. Therefore, aproper allocation of the workload on the cost-effective sites, can significantlyreduce both the energy consumption and the cost of providing services.

2.2.4 Key Challenges

In a cloud computing system, we mainly concentrate on minimizing the num-ber of required servers by consolidating the workload on the minimum num-ber of servers as long as real-time and other system constraints are able tobe satisfied. The consolidation of the workload provides us a potential to de-cline energy consumption of a data-center by switching idle servers into thesleep mode. Energy consumption significantly contributes in the cost of op-eration [24] of a cloud data-center. A higher cost of operation automaticallyresults in a higher cost of services realized by the cloud customers. Moreover, ahigher energy consumption is causing severe negative environmental impacts.

Chapter 3

Research Summary

3.1 Problem Statement and Research Goals

The overall goal of the thesis is to achieve resource efficient implementation ofmulti-processor systems in the presence of real-time constraints. Various typeof resources need to be handled in order to achieve such a goal.

Two categories of multi-processor systems existing in the context of In-dustrie 4.0 are highlighted. The first category represents systems hosting anembedded multi-core processor on which a component-based real-time appli-cation is run. The second category represents cloud computing systems whichprovides a large pool of computing and storage resources, composed of a set ofdistributed servers connecting through a communication network. Both thesecategories of multi-processor systems share the goal of minimizing resourceconsumption as long as all real-time and other system requirements are met.The reduction of required resources yields a potential to serving a bigger work-load, and minimizing energy consumption.

3.1.1 Research Themes

Within each category (embedded multi-core and cloud data center), a researchtheme is outlined with corresponding research questions.

25

26 Chapter 3. Research Summary

Research theme 1: Optimizing the CPU usage in an embedded multi-core

platform by efficient mapping of software components.

Along this theme we mainly focus on a homogenous embedded multi-coresystem running a component-based real-time application. In order to optimizethe CPU usage, communication between software components is taken intoaccount as the prominent hurdle to fully utilize the CPU in an embedded muti-core platform. Therefore, the problem can be reformulated to minimizing theinter-SCs communication costs.

Research theme 2: Minimizing energy consumption for execution of real-

time applications in a cloud computing system.

In this theme, among various techniques to reduce energy consumption ofcloud data-centers, we adopt work-load consolidation mechanisms where itis desirable to use a minimum number of servers for hosting real-time applica-tions, while making sure that all applications are able to meet their deadlines.

3.1.2 Research Questions

Based on the above-mentioned research themes and the overall goal, we nowlist related research questions. Note that the first two questions are specific foreach of the research themes respectively, while the third and forth questionsare shared between them.

Q1: How can we model the inter-software component communication cost,independent of the underlying hardware architecture, to pose a generalcommunication model?

Q2: How can the energy consumption be modeled besides the timing require-ments in a cloud computing system?

Q3: How can each of the proposed models be presented as an optimizationproblem (either single-objective or multi-objective), and afterward, howcan they be solved effectively, in the sense that the optimization mech-anism has a strong chance to find a global optimum or at least makingsure that the variation from the global optimum is not significant?

Q4: How can we setup real-world benchmarks to evaluate the efficiency ofthe proposed solutions, in other words, what is their expected efficiencyin practice?

3.2 Research Methodology 27

3.2 Research Methodology

A research methodology similar to what is proposed by [25] is applied in thisresearch work. Figure 3.1 depicts different steps of the research mythology be-sides the transitions between the steps. We start with specifying research goals.Since the research goals in this research can broadly contain all types of multi-processor systems, we introduce two research themes associated to embeddedmulti-core processors and cloud data centers respectively. The research goalsrelated to the first research theme have mainly come from practical challengesof our industrial partners (ABB Corporate Research and Volvo Trucks), whilethe research goals related to the second research theme mainly have come froma comprehensive survey on the state of the art based on current trends of thethe cloud community, as well as discussions with experts from both academiaand industry.

In the next step, once the research goal(s) are defined, we formulate theresearch questions. As mentioned, a sub-set of research questions are sharedbetween both research themes (Q3, Q4), while the rest of them are dedicatedto each of the research themes (Q1, Q2). Based on the research questions, westudy the literature to identify which parts of the challenges can be addressedusing existing solutions, and which parts of them are open challenges.

The next phase is to propose solution(s) for the open challenges to answerthe research questions. For most of the research questions, more than one solu-tion are presented in this thesis, such that a solution refines and elaborates theprevious solution which have been introduced for the same research question.

The implementation phase mainly relies on the use of simulation. It isworth noting that sometimes during the implementation process we find outsome drawbacks or shortcomings in either the proposed algorithm or the sys-tem model, and in this case, we come back to the proposed model or algorithm,revise it and then resume with the implementation process. Additionally, de-pending on the results of the evaluation we refine the proposed solution andrevisit the later steps. This iteration is repeated until the results are satisfactory.

Identify Research Goals

Formulate Research Questions

Specify Research Themes(Research Theme 1 and 2)

Study Related Works

Present a Solution

Implement the Solution

Evaluate the Solution

Figure 3.1: Research methodology.

Chapter 4

Contributions and Discussion

4.1 Technical Contribution

In the following, we discuss the research works reflected in this thesis, alongwith the technical contributions of the thesis, in order to answer the statedresearch questions. At the same time, the relation between the research papersand research questions are indicated.

To address the first research theme (using multi-core embedded systems torun AUTOSAR-based applications), we started with a simple modelling of thecommunication of software components located on different tasks, and on dif-ferent cores. Indeed, we addressed the research question Q1. Furthermore, as asimplification, we assumed that there are no dependent transactions (no sharedrunnables), or if there is a shared runnable between multiple transactions, thenit does not contain internal states, meaning that the shared runnable can bereplicated, while different replicas are allowed to be executed at the same time.The problem is then modelled as an integer linear optimization problem (re-search question Q3). Then the relation between the process of mapping ofsoftware components to tasks and the process of allocation of tasks to coresis discussed. It demonstrated how these two processes should be interleavedto achieve a more efficient solution (Paper B). It is demonstrated by the paperthat if these two steps are performed sequentially, they can not result in an ef-ficient solution rather than the case in which the allocation of tasks to coresis performed by getting a feed back from the mapping process iteratively. Inother words, the mapping algorithm is invoked in every iteration of the searchalgorithm during the exploration of the search space to increase the chance of

29

30 Chapter 4. Contributions and Discussion

achieving a global optimum.We then continue to refine both the communication model and the solution

framework (Paper A). A three-level shared cache processor is taken into ac-count, integrating more scenarios to communicate between a pair of runnableslocated on different cores. Indeed, we addressed the research question Q1 withmore details. The goal function concentrates on the CPU utilization rather thanthe total inter-runnable communication cost, in other words, the inter-runnablecommunication time was translated into the CPU utilization. This translationallows us to consider a trade-off between (i) the increase of the CPU utiliza-tion as a consequence of merging two tasks with different periods, and (ii) thereduction of the communication cost as a result of merging the two commu-nicating tasks. The trade-off leads to rise the chance of converging to a moreefficient solution. However, before considering the translation of communica-tion time into the CPU utilization, we had to merge only the communicatingtasks having the same period. As is mentioned, the method to deal with the op-timization problem also became more mature. A parallel heuristic algorithm isintroduced which is able to run in parallel on multiple cores to explore a greaterpart of the search space to improve the chance of finding a global optimum.

To make the solution framework sufficiently mature, we also consider thecommunication cost between the runnables and basic software components(Paper C). For this purpose, we had a deeper look at the details of the un-derlying AUTOSAR architecture on which application components are exe-cuted. Moreover, dependent transactions (shared runnables possibly with inter-nal states) came to play. In fact, we addressed the research question Q3 moreprecisely. A real-word benchmark–an Engine Management System (EMS)–was considered as the test case to indicate the efficiency of the proposed solu-tion framework in a real-word complicated application. EMS is considered asthe test case since it has a large number of runnables with a lot of inter-runnablecommunications. This has been done to address the research question Q4.

The same challenges as discussed above can be found in the context ofcloud computing, which fundamentally is very similar with a multi-core sys-tem, in terms of the inter-connected multi-processors on which a set of com-municating application components are going to be executed. Here again, wehave timing constraints associated to the application components. However,the nurture of cloud applications usually is more tolerable in terms of the dead-line miss rate which is called a soft real time system. Another minor differenceis the goal of the optimization. For the first theme the goal was to increasethe efficiency of the system in terms of CPU usage and inter-components com-munication cost, whereas in the cloud research, we focus on the efficiency of

4.1 Technical Contribution 31

the system in terms of the reduction of energy consumption (green cloud) asthe goal function. Since, energy consumption is a prominent parameter signif-icantly affecting the cost of cloud services.

To address the second research theme, we first started with integration ofenergy consumption to a model we already had for the reliability analysis of acloud data center (research question Q2). We then provided a multi-objectiveoptimization problem (research question Q3) comprising both the reliability ofa data center and energy consumption of severs. Some servers of a data centermay have a lower hazard rate, while they may consume more power. Accord-ingly, the optimization problem should struggle to find a right compromisebetween the two objectives (Paper E).

We then investigate the problem in a wider perspective, when there area set of geographically distributed data centers composing a cloud; named acloud federation (Paper D addressing research question Q2 and Q3). Cloudfederation is an effective solution to run a large number of High PerformanceComputing Applications (HPCAs) in a cost-efficient manner. The carbon emis-sion, which is another significant factor in the development of a green cloud,is also taken into account. The amount of carbon emission depends on (i) theamount of energy consumption of a data center (ii) the source of the genera-tion of the electrical energy (fossil or renewable energy source) which is usedby the cloud data center. In this way, a multi-objective optimization problemcomes to play to integrate the cost of energy consumption on one hand, andminimize the carbon emission on the other hand. Basically, minimizing oneof the objectives (cost or carbon emission) may not necessarily lead to mini-mization of the other objective, since a data center using a renewable sourceof energy may be located in a state or a country with a higher energy cost.A hierarchical allocation framework was proposed in this article (Paper D), inwhich the first layer assigns the HPCAs to data centers, while the second layerallocates HPCAs to servers. Here again, the notion of getting a feed back fromthe inner layer (allocation of HPCAs to servers) to guide the search algorithm(optimal allocation of HPCAs to data centers) towards global optimum formsthe core of the allocation framework. As is mentioned, this notion was firstlyapplied by Paper B and then it is also applied in Paper A and C.

Finally, an interesting bridge between the first theme and the second themewas created (Paper F again addressing Q2 and Q3), where the tasks which isgoing to be executed in a cloud data center not only have deadlines, but alsohave a periodic form. We showed how real-time applications consisting of aset of periodic tasks can be mapped to Virtual Machines (VMs). The output ofthe mapping of applications to VMs gives us the VMs’ specification in terms

32 Chapter 4. Contributions and Discussion

of the number of CPUs, the CPU utilization demand of each of the requestedCPUs, and the memory demand of each VM. The VMs should then be allocatedto servers in the first step, and then the VMs allocated to a server should bescheduled on the available cores of the server in the second step.

4.2 Personal Contribution

The research presented in this thesis is done in collaboration with ABB Corpo-rate Research, Sweden, University of Tehran, Iran, and Malardalen University,Sweden. I am the main driver of the work and the main author of all includedpapers except for the Paper F, where I am the second author. In the paper F, mycontributions include (i) formulation of the problem as an optimization prob-lem, (ii) dealing with the task allocation problem, and (iii) collaboration in thewriting of the corresponding sections. Prof. Thomas Nolte, Prof. Bjorn Lisper,Prof. Kristian Sandstrom, and Dr. Alessandro Papadopoulos are my super-visors and contributed in discussions and reviewing of the work. Dr. NasserYazdani, Reza Shojaee and Aboozar Rajabi contributed in discussions of PaperE, and Dr. Nima Khalilzad contributed to the writing of Paper F.

Chapter 5

Conclusions and Future

Work

5.1 Summary

In this thesis we address two central areas in the context of resource opti-mization in multi-processor systems, namely, multi-core embedded systemsto run component-based real-time applications and real-time cloud computing,respectively. When it comes to design of these systems, in the former area, theprocessor utilization is the main concern of optimization, and in the latter, en-ergy consumption is the major concern of optimization. In both areas, we alsoaddress real-time aspects inherent in the application requirements, i.e., thesesystems have requirements on the timeliness of software execution.

When it comes to optimization of these systems, in both areas, a similarapproach is adopted in which the first step is to formulate the problem as aninteger linear optimization problem. The second step of the optimization isto use heuristic algorithms to cope with the complexity of the optimizationproblem. The final step of the optimization is to evaluate the problem througha simulation framework, investigating multiple sizes and variations of systemconfigurations. The results of the methods proposed in this thesis show thatthese systems can significantly benefit from an optimized design when it comesto both resource utilization and energy consumption.

33

34 Chapter 5. Conclusions and Future Work

5.2 Future Work

Future work can be conducted in several directions. In the rest of this section,four central directions of future work are highlighted:

• The first direction concerns the area of the embedded multi-core platformused to run AUTOSAR-based applications. The support for AUTOSARon multi-core platforms is still in an early stage. Playing with the basicsoftware components in terms of the number of copies of each of themand their allocation to cores has the potential to improve the efficiencyof the system. As future work, the problem of mapping and allocation ofapplication software components can be integrated with the challengesof optimal basic software components configuration. In other words, notonly the allocation of application software components to cores shouldtake care of the configuration of the basic software, but also the configu-ration of basic software can change with regards to the allocation of theapplication software components to cores to provide a better efficiency.

• Another direction, which is also related to the first area, is to propose amore accurate communication model to measure the inter-software com-ponent communication cost. Although we do not want to limit the modelto a particular processor architecture, it is still possible to consider moredetails without relying on a specific architecture. For example, the com-munication time has been assumed to be linear, which is not usually thecase when the data exchange size is not big, meaning that exchanging Nbyte of data does not take N times the time of exchanging one byte data(usually it is faster), since when N is not too big, the dominant factor isthe memory access time.

• As the real-time cloud is a new concept there are various directions tocontinue this research work. The first direction is to deal with how cloudcomputing can be adopted as an efficient and predictable solution to runautomation and automotive applications, where a harder level of timingand availability guarantee is required. One of the interesting solutions foraddressing such situations is to interleaving an intermediate computinglayer between a factory and a cloud data center to deal with the latencyrequirements of hard real-time applications. The intermediate layer canbe implemented as a private cloud inside of the factory collaboratingwith a cloud data center. This potentially gives us a more predictabilityto run real-time applications not only in terms of timing concerns, butalso in terms of availability and security of the sensitive data.

5.2 Future Work 35

• The data-center network is a significant parameter affecting both the en-ergy consumption of a data-center and the completion time of commu-nicating tasks. We intend to integrate this important factor to our con-siderations. In this way, the VMs or tasks migration overhead (from onephysical machine to another) could be specified clearly; this leads to asupplementary model of a real-time cloud data-center.

Chapter 6

Overview of the Papers

6.1 Papers with focus on multi-core resource opti-

mization in the context of automotive systems

6.1.1 Paper A

A communication-aware solution framework for mapping AUTOSAR

runnables on multi-core systems, Hamid Reza Faragardi, Kristian Sand-strom, Bjorn Lisper, Thomas Nolte. In Proceedings of the 19th IEEEInternational Conference on Emerging Technologies and Factory Automation(ETFA), 2014.IEEE Industrial Electronics Society Scholarship Award

An AUTOSAR-based software application contains a set of software com-ponents, each of which encapsulates a set of runnable entities. In fact, the mis-sion of the system is fulfilled through the collaboration between the runnables.Several trends have recently emerged to utilize multi-core technology to runAUTOSAR-based software. Not only the overhead of communication betweenthe runnables is one of the major performance bottlenecks in multi-core pro-cessors, but it is also the main source of unpredictability in the system. Appro-priate mapping of the runnables onto a set of tasks (called mapping process)along with proper allocation of the tasks to processing cores (called task allo-cation process) can significantly reduce the communication overhead. In thispaper, three solutions are suggested, each of which comprises both the mappingand the allocation processes. The goal is to maximize key performance aspects

37

38 Chapter 6. Overview of the Papers

by reducing the overall inter-runnable communication time besides satisfyinggiven timing and precedence constraints. A large number of experiments arecarried out to demonstrate the efficiency of the proposed solutions.

6.1.2 Paper B

An efficient scheduling of AUTOSAR runnables to minimize communica-

tion cost in multi-core systems, Hamid Reza Faragardi, Kristian Sandstrom,Bjorn Lisper, Thomas Nolte. In Proceedings of the 7th IEEE InternationalSymposium on Telecommunications (IST), 2014.

The AUTOSAR consortium has developed as the worldwide standard forautomotive embedded software systems. From a processor perspective, AU-TOSAR was originally developed for single-core processor platforms. Re-cent trends have raised the desire for using multi-core processors to run AU-TOSAR software. However, there are several challenges in reaching a highlyefficient and predictable design of AUTOSAR-based embedded software uti-lizing multi-core processors. In this paper a solution framework comprisingboth the mapping of runnables onto a set of tasks and the scheduling of thegenerated task set on a multi-core processor is suggested. The goal of the workpresented in this paper is to minimize the overall inter-runnable communica-tion cost besides meeting all corresponding timing and precedence constraints.The proposed solution framework is evaluated and compared with an exhaus-tive method to demonstrate the convergence to an optimal solution. Since theexhaustive method is not applicable for large size instances of the problem,the proposed framework is also compared with a well-known meta-heuristicalgorithm to substantiate the capability of the frameworks to scale up. The ex-perimental results clearly demonstrate high efficiency of the solution in termsof both communication cost and average processor utilization.

6.2 Papers with focus on resource optimization in cloud computing

systems 39

6.1.3 Paper C

An efficient use of multi-core ECUs to run automotive embedded software,Hamid Reza Faragardi, Kristian Sandstrom, Bjorn Lisper, Thomas Nolte. Un-der submission, 2017.

The tremendous growth of software features in automotive systems causesthe emergence of a trend to use multi-core processors. This trend of usingsoftware is resulting in increased software complexity for such systems, andit requires the use of more powerful hardware, such as multi-core processors,to run the software. To manage software complexity and reuse, component-based software engineering is a promising solution. However, there are severalchallenges inherent in the intersection of resource efficiency and predictabilityof multi-core processors when it comes to running component-based embed-ded software. In this paper, we present a software design framework solvingthese challenges. The framework comprises both mapping of software com-ponents onto executable tasks, and the partitioning of the generated task setonto the processing cores of a multi-core processor. In contrast to previoussolutions, this framework allows for dependent transactions of software com-ponents where the components have internal states, which is common in em-bedded and automotive software. The goal of the work presented in this paperis to enhance the resource efficiency of the system by optimizing the softwaredesign with respect to 1) the inter-software-components communication cost,2) the cost of synchronization among dependent transactions of software com-ponents, and 3) the interaction of software components with the basic softwareservices resident in the software platform on which the components are exe-cuted. The proposed framework is compared with alternative solutions. Anengine management system, one of the most complex automotive sub-systems,is considered as a test case, and the experimental results indicate a total CPUutilization reduction of up to 11.2% on a quad-core processor in comparisonwith the common framework in the literature.

6.2 Papers with focus on resource optimization in

cloud computing systems

6.2.1 Paper D

A profit-aware allocation of high performance computing applications on

distributed cloud data centers with environmental considerations, Hamid

40 Chapter 6. Overview of the Papers

Reza Faragardi, Aboozar Rajabi, Thomas Nolte, Amir Hossein Heidarizadeh.CSI Journal on Computer Sceince and Engineering, Vol 10, pp. 28 - 38, 2014.

A Set of Geographically Distributed Cloud data centers (SGDC) is apromising platform to run a large number of High Performance ComputingApplications (HPCAs) in a cost-efficient manner. Energy consumption is a keyfactor affecting the profit of a cloud provider. In a SGDC, as the data centersare located in different corners of the world, the cost of energy consumptionand the amount of CO2 emission vary significantly among the data centers.Therefore, in such systems not only a proper allocation of HPCAs resultsin CO2 emission reduction, but it also causes a substantial increase of theprovider’s profit. Furthermore, CO2 emission reduction mitigates the destruc-tive environmental impacts. In this paper, the problem of allocation of a setof HPCAs on a SGDC is discussed where a two-level allocation framework isintroduced to deal with the problem. The proposed framework is able to reacha good compromise between CO2 emission and the providers’ profit subject tosatisfy HPCAs deadlines and memory constraints. Simulation results based ona real intensive workload demonstrate that the proposed framework enhancesthe CO2 emission by 17% and the provider’s profit by 9% in average.

6.2.2 Paper E

Towards energy-aware resource scheduling to maximize reliability in

cloud computing systems, Hamid Reza Faragardi, Aboozar Rajabi, RezaShojaee, Thomas Nolte. In Proceedings of the 15th IEEE InternationalConference on High Performance Computing and Communications (HPCC),2013.

Reliability is a key metric for assessing performance in Cloud Comput-ing Systems (CCSs). Fault tolerance methods are extensively used to enhancereliability in CCSs. However, these methods impose extra hardware and/orsoftware cost. Proper resource allocation is an alternative approach which cansignificantly improve system reliability without any extra overhead. On theother hand, contemplating reliability irrespective of energy consumption andQuality of Service (QoS) requirements is not desirable in CCSs. In this paper,an analytical model to analyze system reliability besides energy consumptionand QoS requirements is introduced. Based on the proposed model, a newonline resource allocation algorithm to find the right compromise between sys-tem reliability and energy consumption while satisfying QoS requirements issuggested. The algorithm is a new swarm intelligence technique based on im-perialist competition which elaborately combines the strengths of some well-

6.2 Papers with focus on resource optimization in cloud computing

systems 41

known meta-heuristic algorithms with an effective fast local search. A widerange of simulation results, based on real data, clearly demonstrate high effi-ciency of the proposed algorithm.

6.2.3 Paper F

Towards energy-aware placement of real-time virtual machines in a cloud

data center, Nima Khalilzad, Hamid Reza Faragardi, Thomas Nolte. InProceedings of the 17th IEEE International Conference on High PerformanceComputation and Communication (HPCC), 2015.

In this paper we studied the problem of allocation of a set of real-time ap-plications, where each of them are comprising a set of periodic tasks, onto acloud data center. We approached the problem from a cloud provider point ofview. The periodic tasks are first mapped to a set of VMs. The VM specifica-tion is abstracted based on the property of the tasks assigned to the VM. VMsare then placed on servers. Not only the placement algorithm considers the tim-ing requirements of the real-time applications running within the VMs, but italso attempts to minimize energy consumption by reducing the number of usedservers. To deal with this problem, an integer linear optimization problem isfirst introduced, and after that a two-level placement framework is introduced.

Bibliography

[1] Michael Barr. Real men program in c. Embedded Systems Design,22(7):3, 2009.

[2] Mario Hermann, Tobias Pentek, and Boris Otto. Design principles forindustrie 4.0 scenarios. In System Sciences (HICSS), 2016 49th HawaiiInternational Conference on, pages 3928–3937. IEEE, 2016.

[3] Daniel D Gajski, Frank Vahid, Sanjiv Narayan, and Jie Gong. Specifica-tion and design of embedded systems, volume 4. Prentice Hall EnglewoodCliffs, 1994.

[4] Chung Laung Liu and James W. Layland. Scheduling algorithms for mul-tiprogramming in a hard-real-time environment. Journal of the ACM,20(1):46–61, 1973.

[5] Project MAC (Massachusetts Institute of Technology). EngineeringRobotics Group and ML Dertouzos. Control robotics: The proceduralcontrol of physical processes. 1973.

[6] Giorgio C Buttazzo. Hard real-time computing systems: predictablescheduling algorithms and applications, volume 24. Springer Science& Business Media, 2011.

[7] Michael R. Garey and David S. Johnson. Computers and Intractability;A Guide to the Theory of NP-Completeness. W. H. Freeman & Co., NewYork, NY, USA, 1990.

[8] Sanjoy K. Baruah, N.K. Cohen, C.G. Plaxton, and D.A. Varvel. Propor-tionate progress: A notion of fairness in resource allocation. Algorith-mica, 15(6):600–625, 1996.

43

44 Bibliography

[9] Software techniques for shared-cache multi-core systems,https://software.intel.com/en-us/articles/software-techniques-for-shared-cache-multi-core-systems.

[10] David Levinthal. Performance analysis guide for in-tel core i7 processor and intel xeon 5500 processors,http://software.intel.com/sites/products/collateral/hpc/vtune/performanceanalysis guide.pdf.

[11] Niko Bohm, Daniel Lohmann, and Wolfgang Schroder-Preikschat. Multi-core processors in the automotive domain: An autosar case study. Pro-ceedings Work-in-Progress in Euromicro Conference on Real-Time Sys-tems, 2010.

[12] Abhijit Davare, Qi Zhu, Marco Di Natale, Claudio Pinello, Sri Kanajan,and Alberto Sangiovanni-Vincentelli. Period optimization for hard real-time distributed automotive systems. In Proceedings of the 44th annualDesign Automation Conference, pages 278–283. ACM, 2007.

[13] Simon Kramer, Dirk Ziegenbein, and Arne Hamann. Real world auto-motive benchmarks for free. In 6th International Workshop on Analy-sis Tools and Methodologies for Embedded and Real-time Systems (WA-TERS). IEEE, 2015.

[14] Nico Feiertag, Kai Richter, Johan Nordlander, and Jan Jonsson. A com-positional framework for end-to-end path delay calculation of automotivesystems under different path semantics. In Workshop on CompositionalTheory and Technology for Real-Time Embedded Systems (CRTS), 2008.

[15] Nicola Serreli, Giuseppe Lipari, and Enrico Bini. Deadline assignmentfor component-based analysis of real-time transactions. In 2nd Workshopon Compositional Real-Time Systems, Washington, DC, USA, 2009.

[16] C Douglass Locke and John B Goodenough. A practical application ofthe ceiling protocol in a real-time system. book published by ACM, 1988.

[17] Karthik Singaram Lakshmanan, Gaurav Bhatia, and Ragunathan Rajku-mar. Autosar extensions for predictable task synchronization in multi-core ecus. Technical report, SAE Technical Paper, 2011.

[18] Rackspace Support. Understanding the cloud computing stack,https://support.rackspace.com/whitepapers/understanding-the-cloud-computing-stack-saas-paas-iaas/.

[19] C. petty. gartner estimates ict industry accounts for 2 per-cent of globalco2 emissions.

[20] Alberto Leva, Daniele Mastrandrea, Marco Bonvini, and Alessandro Vit-torio Papadopoulos. Object-oriented modelling and simulation of air flowin data centres based on a quasi-3d approach for energy optimisation. InUtility and Cloud Computing (UCC), 2014 IEEE/ACM 7th InternationalConference on, pages 554–559. IEEE, 2014.

[21] Xiaobo Fan, Wolf-Dietrich Weber, and Luiz Andre Barroso. Power pro-visioning for a warehouse-sized computer. In ACM SIGARCH ComputerArchitecture News, volume 35, pages 13–23. ACM, 2007.

[22] Anton Beloglazov, Jemal Abawajy, and Rajkumar Buyya. Energy-awareresource allocation heuristics for efficient management of data centers forcloud computing. Future Generation Computer Systems, 28(5):755–768,2012.

[23] A-C Orgerie, Laurent Lefevre, and J-P Gelas. Demystifying energy con-sumption in grids and clouds. In Green Computing Conference, pages335–342, 2010.

[24] Rajkumar Buyya, Anton Beloglazov, and Jemal Abawajy. Energy-efficient management of data center resources for cloud computing: Avision, architectural elements, and open challenges. arXiv preprintarXiv:1006.0308, 2010.

[25] Mary Shaw. The coming-of-age of software architecture research. In Pro-ceedings of the 23rd International Conference on Software Engineering(ICSE), pages 656–, 2001.

Resource Optimization in Multi-processor Real-time...

Documents

Transcript of Resource Optimization in Multi-processor Real-time...