Optimization Algorithms for Task Offloading and Scheduling ... · Optimization Algorithms for Task...
Transcript of Optimization Algorithms for Task Offloading and Scheduling ... · Optimization Algorithms for Task...
Optimization Algorithms forTask Offloading and Scheduling in Cloud Computing
by
Sowndarya Sundar
A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer EngineeringUniversity of Toronto
c© Copyright 2019 by Sowndarya Sundar
Abstract
Optimization Algorithms for
Task Offloading and Scheduling in Cloud Computing
Sowndarya Sundar
Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
2019
Cloud computing can augment the capabilities of resource-poor local devices with the help of
resourceful servers. Computational offloading refers to the migration of application tasks from
local devices for execution at the cloud. Intelligent task offloading and scheduling can help
optimize parameters such as energy consumption or execution time. In this dissertation, we
address such optimization problems in cloud computing environments.
Several existing works that address this problem often make simplistic or impractical as-
sumptions with respect to the system model or propose inefficient solutions. We consider more
practical system models with finite-capacity and heterogeneous local devices, cloudlets, and
edge-clouds. We also address offloading of applications consisting of dependent tasks, multi-user
environments, and scheduling tasks that arrive over time. We consider meaningful optimization
objectives and constraints. Our aim is to propose efficient algorithms with high performance
to obtain these task offloading and scheduling decisions.
We first consider the problem where a single user wishes to execute applications consisting of
dependent tasks and a multi-tier cloud computing environment that may consist of cloudlets,
peer devices, and a remote cloud. We propose the Individual Time Allocation with Greedy
Scheduling (ITAGS) algorithm in order to minimize total cost subject to application deadlines.
We then consider a multi-user cloud environment where each user has certain budget constraints.
We propose the Single Task Unload for Budget Resolution (STUBR) algorithm to minimize
sum completion time objective, and prove performance guarantees for the same. Finally, we
address the online problem where tasks arrive over time, and we do not know task information
in advance. We propose the Task Dispatch through Online Training (TDOT) algorithm in
order to maximize profit to a cloud service provider subject to processor load constraints, and
ii
provide performance guarantees.
For each of these problems, we also use trace-driven simulation results to compare against
existing alternatives and analyze the performance of the proposed algorithms in various sce-
narios. We see that the proposed algorithms are efficient, outperform other alternatives, and
exhibit near-optimal performance.
iii
Contents
List of Tables vi
List of Figures vi
1 Introduction 1
1.1 Computational Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Optimization and Task Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Summary of Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.1 Offloading Dependent Tasks with Communication Delay . . . . . . . . . . 3
1.3.2 Multi-user Task Scheduling with Budget Constraints . . . . . . . . . . . . 4
1.3.3 Online Scheduling for Profit Maximization at the Cloud . . . . . . . . . . 5
1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 6
2.1 Computational Offloading Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Partitioning and Offloading Tasks . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 Common Cloud Computing Environments . . . . . . . . . . . . . . . . . . 7
2.2 Offloading Independent Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Single-user Task Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Multi-user Task Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Offloading Dependent Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Energy Consumption Objective . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Makespan Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Energy Consumption under a Deadline Objective . . . . . . . . . . . . . . 11
2.4 Online Task Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Related Theoretical Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.5.1 Job-shop Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.5.2 Online Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6 Review and Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iv
3 Offloading Dependent Tasks with Communication Delay 15
3.1 System Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Local Processors and Remote Cloud . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Task Dependency Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Individual Time Allocation with Greedy Scheduling (ITAGS) . . . . . . . . . . . 20
3.2.1 Binary Relaxation and Individual Time Allowance . . . . . . . . . . . . . 20
3.2.2 Alternative Discretization Heuristic . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 ITAGS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4 Feasibility and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 26
3.3 Scheduling multiple applications with different deadlines . . . . . . . . . . . . . . 26
3.3.1 Binary Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.2 Modified Alternative Discretization Heuristic . . . . . . . . . . . . . . . . 27
3.3.3 Modified ITAGS Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.4 Feasibility and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 29
3.4 Trace-driven and Randomized Simulations . . . . . . . . . . . . . . . . . . . . . . 31
3.4.1 Comparison Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.2 Trace-driven Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.3 Simulation with Randomly Generated Task Trees . . . . . . . . . . . . . . 36
3.4.4 Run-Time Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.5 Multiple Applications and Uncertain Processing and Communication Times 37
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4 Multi-user Task Scheduling with Budget Constraints 41
4.1 System Model and Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 The STUBR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.1 Relaxed Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2.2 Rounded Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2.3 Dealing with Budget Violation . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.4 WSPT Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.5 Feasibility and Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 53
4.3 STUBR Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 With Task Release Times . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.2 With Fixed Communication Times . . . . . . . . . . . . . . . . . . . . . . 55
4.3.3 With Sequence-dependent Communication Times . . . . . . . . . . . . . . 56
4.4 Trace-driven Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.4.1 Traces and Parameter Setting . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.2 Comparison Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
v
4.4.3 For Release Times and Fixed Communication Times . . . . . . . . . . . . 62
4.4.4 For Sequence-dependent Communication Times . . . . . . . . . . . . . . . 64
4.4.5 Runtime Overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5 Online Scheduling for Profit Maximization at the Cloud 66
5.1 System Model and Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . 67
5.1.1 Cloud Processors and Online Task Arrival . . . . . . . . . . . . . . . . . . 67
5.1.2 Profit Maximization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Task Dispatch through Online Training . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Offline Solution through Lagrange Relaxation . . . . . . . . . . . . . . . . 70
5.2.2 Online Scheduling with Partial-Task Profit Taking . . . . . . . . . . . . . 70
5.2.3 Performance Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.4 Complexity Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3 Modified Algorithm Without Partial-Task Profit Taking . . . . . . . . . . . . . . 79
5.4 TDOT with Data Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.1 Offline Solution through Lagrange Relaxation . . . . . . . . . . . . . . . . 80
5.4.2 Online Scheduling Algorithm with Partial-Task Profit Taking . . . . . . . 81
5.4.3 Performance Bound Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.4.4 TDOT-G with Data Requirements . . . . . . . . . . . . . . . . . . . . . . 90
5.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5.1 Comparison Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5.5.2 Simulation Setup and Task Requirements . . . . . . . . . . . . . . . . . . 91
5.5.3 I.I.D. Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.4 Google-cluster Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5.5 Overall Profit and ε Values . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6 Concluding Remarks 97
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.1 Task Scheduling in the Presence of Zero Task Information . . . . . . . . . 98
6.2.2 Online Dependent-Task Scheduling . . . . . . . . . . . . . . . . . . . . . . 98
6.2.3 Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.2.4 Straggler Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.2.5 Fuzzy Load Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
Bibliography 99
vi
List of Tables
2.1 Literature Review on Offloading Independent Tasks . . . . . . . . . . . . . . . . 8
2.2 Literature Review on Offloading Dependent Tasks . . . . . . . . . . . . . . . . . 10
2.3 Literature Review on Online Task Offloading . . . . . . . . . . . . . . . . . . . . 12
3.1 Chapter 3 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 Run-time (sec) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1 Chapter 4 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
vii
List of Figures
3.1 Example network of local processors and cloud. . . . . . . . . . . . . . . . . . . . 17
3.2 Dummy tasks, d1 and d2, added to a DAG of 5 tasks. . . . . . . . . . . . . . . . 17
3.3 Simulation topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Cost vs. application deadline for Gaussian elimination and FFT in Scenario 1 . . 32
3.5 Cost vs. application deadline for Gaussian elimination and FFT for Scenario 2 . 32
3.6 Cost vs. application deadline for Gaussian elimination and FFT for Scenario 3 . 33
3.7 Cost vs. application deadline for Gaussian elimination and FFT in Scenario 4 . . 33
3.8 Task Graph for Gaussian Elimination Application with a matrix size of 5 . . . . 34
3.9 Cost vs. application deadline for randomly generated task trees . . . . . . . . . . 36
3.10 Cost vs. application deadline for different number of applications A . . . . . . . 38
3.11 Cost vs. Realizations for 15% error . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.12 Cost vs. error with known and unknown processing and communication times
(with 95% confidence intervals) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1 Example system of 3 users and 5 cloud processors. . . . . . . . . . . . . . . . . . 43
4.2 For chess application on Galaxy S5. . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.3 For compute intensive application on Nexus 10. . . . . . . . . . . . . . . . . . . . 62
4.4 For chess application on Galaxy S5. . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.5 For compute intensive application on Nexus 10. . . . . . . . . . . . . . . . . . . . 64
5.1 Example system with two CSs consisting of two processors each. . . . . . . . . . 68
5.2 Effect of arrival rate λ on non-training set profit for i.i.d. tasks . . . . . . . . . . 92
5.3 Effect of max. data load Q on non-training set profit for i.i.d. tasks . . . . . . . 93
5.4 Effect of max. processing load L on non-training set profit for Google-cluster tasks 94
5.5 Effect of max. data load Q on non-training set profit for Google-cluster tasks . . 94
5.6 Effect of ε on overall profit for Google-cluster tasks . . . . . . . . . . . . . . . . . 95
viii
Chapter 1
Introduction
The usage of computationally intensive and resource-demanding applications has been rapidly
increasing in recent times. However, the improvements in the hardware and battery of local
devices are not sufficient to keep up with the power and time requirements of these applications.
For example, several solutions have been proposed to enhance the CPU performance [1], [2]
and to manage the disk and screen in an intelligent manner [3], [4] but, these solutions require
changes in the structure of local devices, or new hardware that results in an increase of cost
and may not be feasible for all devices [5]. As a result, resource poverty is a major obstacle for
many applications [6].
A Mobile Cloud Computing (MCC) system is one where mobile devices offload their compu-
tational tasks to cloud resource providers [7]. The cloud abstracts the complexities of provision-
ing computation and storage infrastructure [8], and provides the local devices access to nearly
unlimited computing power. This helps combat the resource poverty and augments the capabil-
ities of the devices through a technique called computational offloading. MCC has been proven
to be advantageous for several applications such as mobile commerce, mobile learning, mobile
healthcare, gaming [5], image and language processing [7], sharing GPS/internet data [9], and
crowd computing [10].
One of the major challenges facing mobile cloud computing is the communication delay
and overhead incurred due to transfer of data to the remote cloud. Edge computing [6] [11]
is a more recent advancement in MCC where finite computational/cloud resources are made
available at the edge of the network or in the vicinity of the mobile users. For example, in a
Mobile Edge Computing (MEC) system, MEC servers are deployed at the cellular base stations
and are shared by the mobile users.
Computational offloading has several benefits and can be used to optimize multiple param-
eters, as described in Section 1.1. In order to fully exploit the advantages of computational
offloading, we need to make intelligent task offloading decisions and utilize the resources at the
cloud efficiently. This is explained in Section 1.2.
1
Chapter 1. Introduction 2
1.1 Computational Offloading
Computational offloading refers to the migration of the computationally intensive parts (or
tasks) of an application from the local device to more powerful servers at the remote cloud.
This means that the execution of these resource-hungry parts of the application takes place in
the cloud and the results of the execution can be communicated to the local device for further
execution or to output results. This can prove to be extremely beneficial for the application
users for the following reasons:
• Augments capabilities of the mobile device
As the servers in the cloud are much more powerful in terms of speed and capability than
the local processors in the mobile device, computational offloading enables the mobile
device to run even computationally-intensive applications that the local processors alone
cannot handle. In other words, computational offloading gives the mobile user access to
nearly unlimited computing power thereby rendering the mobile device more resourceful.
• Decreases energy consumption
Offloading parts of the application implies that the mobile device has lesser work in terms
of execution, and this consequently results in reduction of energy consumption at the
mobile device. As a result, this helps improve the mobile device’s battery lifetime.
• Improves response time
The execution of some parts of the application at the faster servers at the cloud can
result in reduction of the time taken to execute the overall application, also known as
the makespan of the application. As a result, this improves the response time of the
application.
Existing research work has resulted in several computational offloading systems such as
energy-aware migration decisions at run-time [7], multiple virtual machine images [12], and
trusted cloudlets [6]. The aforementioned benefits of offloading systems can be optimized
through effective task scheduling.
1.2 Optimization and Task Scheduling
Each application can be modelled as a number of tasks, and each task can be executed either
locally at the mobile device or remotely at the cloud. This binary offloading decision on each
task should be taken such that the offloading for the entire application is optimal in terms of
the objective. This objective could be the overall energy consumption or application makespan.
It could also be a more sophisticated objective such as a cost or energy consumption subject to
latency constraints on the execution of the application, in order to provide a Quality of Service
guarantee to the application users [7], [13], [14].
Chapter 1. Introduction 3
The tasks that constitute an application can be considered to be independent or dependent
in nature based on precedence constraints and possible data communication between them.
Dependent tasks are often modeled using a task graph with nodes of the graph representing
the tasks and the edges in the graph representing the dependencies between the tasks. There
are several different existing techniques that can be used to identify the best task scheduling
decision for both independent and dependent tasks, and these are investigated in Chapter 2.
Cloud service providers (CSP) such as Amazon EC2 [15] often have multiple different in-
stances or servers with different storage, price, and computational capability. Consequently,
objectives such as makespan and energy consumption at the cloud can further be optimized
by scheduling tasks intelligently at the cloud, i.e., identifying which servers in the cloud should
run which tasks. It may also be in the CSPs interest to optimize objectives such as revenue
or profit. Solving these optimization problems in practice is quite challenging due to lack of
prior information about arriving tasks. Few existing works address this problem of online task
arrival and scheduling at the cloud, and this is also reviewed in Chapter 2.
The task offloading problem may also involve a multi-tier cloud computing environment
where a user is allowed to offload its task to a nearby edge-cloud or peer device, or to a
far away remote cloud. Based on the available communication and computation resources,
task scheduling decisions can be made in these more sophisticated environments to optimize
aforementioned objectives. Throughout this dissertation, we assume that the software required
to run the offloaded tasks already exists at the cloud, edge-clouds, and peer devices.
The main motivation of this dissertation lies in designing efficient algorithms to identify ef-
fective task scheduling decisions for computational offloading in cloud computing environments.
1.3 Summary of Contribution
Through this dissertation, we study optimization problems in cloud computing environments
through intelligent computational offloading and task scheduling. The results presented in
Chapters 3, 4, and 5 have appeared in [16] [17] [18] [19] [20]. In particular, this dissertation
makes the following contributions.
1.3.1 Offloading Dependent Tasks with Communication Delay
In Chapter 3, we consider the offloading of applications comprising multiple tasks, over a
generic cloud computing system consisting of a network of heterogeneous local processors and
a remote cloud. The local processors can represent the processing cores in a single mobile
device, local peer devices, and/or nearby cloudlets, depending on their computational speed
and communication distance from the user device. There is a time and a cost associated with
task execution, which depend on both the task and the processor where the task is scheduled.
We allow each application to consist of inter-dependent tasks with possible data commu-
nication between them. Each task may have predecessor tasks that must be completed before
Chapter 1. Introduction 4
the task can start. Furthermore, if data need to be transferred between tasks on different
processors, a communication delay, as well as communication cost, is incurred. The objective
of this work is to identify a task scheduling decision that minimizes the total cost of running
applications, subject to application completion deadlines. Our cost model is general, which
for example may include energy consumption or usage charges for task processing and commu-
nication. We observe that the precedence constraints and data transfer requirement between
tasks can drastically complicate their scheduling decision. Furthermore, the need to account
for both the cost and the run-time of the application adds to the challenge. Prior studies have
assumed simplified processor models to facilitate tractable analysis, such as non-concurrent
local and remote processors [7], infinite-capacity local processors [14], [21], [22], and negligible
delay between local processors [23], [24]. We use a more realistic processor model in this study.
This problem can be shown to be NP-hard, and, as such, there is no polynomial run-time
guarantee for finding an optimal solution. We propose a heuristic algorithm, termed ITAGS,
that utilizes a relaxed solution to the problem to obtain good task scheduling decisions. Through
trace-based simulation using real applications, as well as various randomly generated task trees,
we investigate the performance of ITAGS, highlighting the effect of the application deadline,
communication delay, number of processors, and number of tasks. We compare against existing
alternatives including a discretization heuristic, and the cost lower bound. We observe that
ITAGS demonstrates superior performance for all different scenarios considered.
1.3.2 Multi-user Task Scheduling with Budget Constraints
In Chapter 4, we study a problem of task scheduling and offloading in a cloud computing system
to minimize the computational delays of the users’ tasks. We study a multi-user scenario
with finite-capacity user devices, a finite-capacity cloud consisting of heterogeneous servers,
budget constraints for the users, and an objective to minimize weighted sum completion time.
In particular, we consider user tasks that may have different processing times, release times,
communication times, and weights. A task may be executed locally on the user’s device or
offloaded to a server at the finite-capacity cloud. The servers at the cloud are heterogeneous
processors with different speeds. The users are required to pay a certain monetary price based
on the usage time of a processor at the cloud, and the price may potentially depend on the
processor speed. Each user has a specific budget which determines the monetary cost that the
user is willing to spend for offloading tasks to the cloud.
Our objective is to identify the task scheduling decision that minimizes the sum of weighted
completion times of all tasks subject to all users’ budget constraints. The problem is NP-hard
since minimizing the sum of weighted completion times of jobs with release times on a single
processor is NP-hard. Our solution approach is inspired by an interval-indexed Integer Linear
Program (ILP) introduced in [25]. We exploit the structure of an approximation solution to
such an ILP to solve our problem.
We propose the Single-Task Unload through Budget Resolution (STUBR) algorithm, and
Chapter 1. Introduction 5
prove performance bounds for different task and channel models. We also assess its performance
through trace-based simulation. We see that STUBR exhibits maximum performance gains of
more than 50% for both chess and compute intensive applications [22] in comparison with the
Greedy Weighted Shortest Processing Time (WSPT) scheme. Finally, our simulation results
demonstrate that STUBR is highly scalable with respect to the number of users in the system.
1.3.3 Online Scheduling for Profit Maximization at the Cloud
In Chapter 5, we adopt an online task model where tasks arrive over time, in order to address
this more practical problem. We consider the offloading of these tasks to multiple cloud servers.
Each cloud server in our system model consists of finite-capacity and heterogeneous processors.
As a result, the cloud servers could represent cloudlets, edge-clouds, or peer devices, in a generic
and hybrid cloud computing environment.
In this chapter, we address the task scheduling problem from the perspective of a cloud
service provider (CSP) that obtains profit by processing user tasks. In our model, the profit
obtained is a function of the task processing time and the profit generated per unit time on the
scheduled processor. We aim to obtain the scheduling decision that maximizes the total profit
across all tasks arriving within a time interval, subject to processor load constraints. In our
online model, we do not know in advance the total number of tasks that will arrive within the
time interval. We also do not know the processing times of a task until it arrives at the CSP’s
controller, which then dispatches the task to the scheduled cloud server for processing.
We propose the Task Dispatch through Online Training (TDOT) algorithm, which consists
of training and exploitation phases. We provide performance bound analysis to show that
TDOT can generate profit that is close to the optimum, given a suitable size for the training
task set. TDOT assumes that profit can be obtained from partially-completed tasks, so we
further propose a modified version of TDOT, termed TDOT-G, for implementations where
profit can only be obtained from fully-completed tasks. Through simulation, using randomly-
generated as well as Google cluster data, we compare the performance of TDOT and TDOT-G
with that of greedy scheduling, logistic regression, and an offline upper-bound solution.
1.4 Thesis Organization
This dissertation is organized as follows. Related works are reviewed in Chapter 2. Chapter 3
presents our work on computational offloading of dependent tasks with communication delay.
Chapter 4 deals with a multi-user task offloading and scheduling problem. Chapter 5 addresses
an online scenario where tasks arrive over time. We conclude this dissertation in Chapter 6.
Chapter 2
Literature Review
In this chapter, we study the existing works that have addressed computational offloading
and task scheduling problems for cloud computing. In Section 2.1, we provide an overview of
computational offloading frameworks focusing on traditional offloading techniques and various
cloud computing environments. In Sections 2.2, 2.3 and 2.4, we review the works that address
offloading independent, dependent, and online tasks respectively. Finally, in Section 2.5, we
focus on related theoretical works in other domains that address similar problems.
2.1 Computational Offloading Frameworks
In this section, we provide some background on the framework required for computational of-
floading, namely, the infrastructure required to offload and schedule tasks, and the various cloud
computing environments for which the computational offloading problem has been studied.
2.1.1 Partitioning and Offloading Tasks
The computational offloading technique to offload applications that was initially studied con-
sidered just two possible outcomes to the scheduling decision, namely, either offloading the
entire application to the cloud or executing the entire application locally on the mobile de-
vice [6], [26], [27]. In [27], this decision is made with an objective to conserve energy for the
mobile device. The energy-optimal execution policy is obtained by solving two constrained
optimization problems, i.e., optimizing the clock frequency to complete CPU cycles for mo-
bile execution and the data transmission rate for cloud execution. However, offloading in finer
granularity has been shown to provide more flexibility and better performance than standalone
mobile or standalone cloud execution [13], [28].
As a result, each mobile application can be partitioned into different parts or tasks. This
partitioning can be either done by the programmers [29], [30] or by a profiler that partitions
applications automatically [7], [31] [32]. Each task can be executed either locally at the mobile
device or remotely at the cloud. This binary decision on each task should be taken such that
6
Chapter 2. Literature Review 7
the offloading for the entire application is optimal in terms of the objective such as minimizing
the overall energy consumption or the makespan. If a task is to be executed at the cloud, we
may also further decide on which cloud server to execute this task at the cloud in order to
further optimize our objective.
2.1.2 Common Cloud Computing Environments
Traditionally, most exsiting works study offloading tasks to powerful servers at a remote cloud
[7], [12]. However, offloading from a local device to a distant remote cloud can incur significant
delays, particularly if a large amount of input/output data needs to be communicated between
the cloud and the mobile device.
Consequently, edge computing [11, 33–36] is a recent advancement in Mobile Cloud Com-
puting (MCC) where finite computational/cloud resources are made available at the edge of
the network or in the vicinity of the mobile users. For example, in a Mobile/Multiaccess Edge
Computing (MEC) system [11, 35, 36], MEC servers are deployed at the cellular base stations
and are shared by the mobile users. Some works also consider offloading tasks to peer devices
such as nearby mobiles [37], [38], [39].
Some existing works consider a two-tier or three-tier offloading system where a task can be
executed 1) locally on the mobile device, 2) on a finite-capacity nearby computational resource,
or 3) at the remote cloud. A centralized decision engine or scheduler may decide whether
to offload or where to schedule each task. The existing decision-making and task scheduling
schemes are analyzed in the following sections.
2.2 Offloading Independent Tasks
Several works look to solve this offline task scheduling problem by assuming that the required
task information is known in advance. Such works can be broadly split into two categories, i.e.,
those that consider scheduling 1) independent tasks and 2) dependent tasks. In this section,
we study existing works that look to offload or schedule a number of independent tasks in a
cloud computing environment. Practically, these independent tasks may belong to a particular
mobile application or be individual applications by themselves. These tasks may belong to a
single user or multiple users. In Table 2.1, we summarize these existing works.
2.2.1 Single-user Task Offloading
In [41], a context-aware decision-making algorithm is formulated to schedule independent tasks,
taking into account the wireless medium, and cloud resources. Here, the objective is to schedule
the tasks such that the overall execution time and energy among all cloud resources is mini-
mized. However, the cloud virtual machines or VMs are assumed to be homogeneous in nature.
Similarly, [42] aims to optimize the offloading decision of the user to minimize the overall cost
of energy, computation, and delay for an application consisting of multiple independent tasks
Chapter 2. Literature Review 8
Table 2.1: Literature Review on Offloading Independent Tasks
Ref. Objective Solution Assumptions
Single-user
[40]
Weighted sum
completion time
& makespan
Heuristic Homogeneous VMs
[41]Execution time
+ energyHeuristic Homogeneous VMs
[42] Overall cost Heuristic Single cloud server
[43]
Weighted sum
of execution time
+ energy
Sub-optimal Single MEC server
Multi-user
[44]Number of beneficial
cloud users
Game-theoretic;
Nash Equilibrium
Constant task
processing times
at cloud
[45] Response timeGame-theoretic;
Nash Equilibrium
Infinite-capacity
cloud server
[46]
Cost of
weighted energy+
delay+monetary
Game-theoretic;
Nash Equilibrium
Tasks can be
rejected
without execution
[47]
Cost of energy +
delay +
computation
Approximation Algorithm
using SDR
Infinite-capacity
cloud server
Chapter 2. Literature Review 9
using semidefinite relaxation and randomization mapping approaches. This work assumes that
tasks can be offloaded to only a single remote server. In [43], an objective of weighted sum of
the execution delay and device energy consumption is considered to make an offloading decision
in a MEC (Mobile Edge Computing) system, and a sub-optimal algorithm is proposed.
2.2.2 Multi-user Task Offloading
In [44], [45], [46] a game-theoretic approach is adopted to obtain offloading decisions for inde-
pendent tasks from multiple users. In [44], a shared mobile-edge cloud is considered, and a
distributed algorithm is proposed to compute a Nash Equilibrium solution. In [45], a three-tier
mobile cloud computing system consisting of peer devices, cloudlets, and remote cloud, and the
problem is modeled as a Generalized Nash Equilibrium game, whereas in [46], the problem is
modeled as a potential game.
Unlike these decentralised techniques, in [47], a centralized approximation algorithm is
proposed to make offloading decisions with an objective to minimize overall cost of energy,
delay, and computation of all users. However, this work assumes a single infinite-capacity
remote cloud server.
2.3 Offloading Dependent Tasks
Each of the following techniques have been employed to identify an offloading decision on an
application consisting of dependent tasks satisfying some objective. In Table 2.2, we summarize
these techniques, and their advantages and disadvantages.
2.3.1 Energy Consumption Objective
Computational offloading decisions can be taken with the sole objective of minimizing the
energy consumption or overall cost due to application execution. Cost and energy can be
considered to be analogous in nature and hence, we study the works that look to minimize
either of these objectives. In [54] and [48], the problem of minimizing the sum of the cost is
addressed using graph partitioning. The proposed branch-and-bound algorithm provides an
optimal solution but has exponential-time complexity. Furthermore, [48] also makes several
impractical assumptions with respect to the system model.
2.3.2 Makespan Objective
Optimal offloading techniques to minimize makespan can be found for an application consisting
of sequential tasks, i.e., considering only task graphs with linear topology. In [55], the one-time
offload property has been proven for sequential tasks. This property states that the optimal
set of tasks to be offloaded in a sequential task graph while minimizing makespan will always
be a sequence of consecutive tasks in the task graph. In [49], an algorithm is proposed to find
Chapter 2. Literature Review 10
Table 2.2: Literature Review on Offloading Dependent Tasks
Ref. Min. Objective Solution Assumptions
[8] Total cost Optimal in O(n3) time
Cloud & client tasks
do not execute
simultaneously
[48] Energy consumption Approximate; high complexity
Cloud & client tasks
do not execute
simultaneously
[49] Makespan Optimal Sequential tasks
[50] Makespan ApproximateInfinite capacity
mobile device
[51] Makespan Approximate; high complexity No communication
[7]Energy consumption
under deadline
Optimal;
exponential time complexity
Cloud & client tasks
do not execute simultaneously;
Infinite capacity
mobile device
[14]Energy consumption
under deadline
Optimal;
step-size dependent
time complexity
Infinite capacity mobile device
[13]
Energy
consumption
under deadline
ApproximateSequential
tasks
[52]Overall latency
under resource costPTAS solution
Devices possess
infinite capacity
[23]Cost
under deadlineHeuristic
Delay between
local processor
negligible
[53]
Weighted sum of
energy and
completion time
HeuristicDevice and cloud
possess infinite capacity
Chapter 2. Literature Review 11
the entry-task and exit-task of the optimal one-time offload such that makespan or completion
time is minimized.
For generic task graphs, a load balancing heuristic is suggested in [49] to identify the offload-
ing decision. Numerically, this load balancing heuristic is shown to give better performance in
comparison with the greedy offloading algorithm in Odessa [56]. In [50], applications consisting
of dependent modules (or tasks), multiple finite-capacity servers at the cloud, and multi-user
scenarios are considered. A Mixed Integer Linear Programming (MILP) problem is formulated
with the objective of minimizing makespan. Furthermore, two greedy heuristics are proposed
to obtain solutions. The problem of task scheduling onto unrelated parallel machines is con-
sidered in [51], with an application to cloud computing. In this work, there exist precedence
constraints between tasks of the application, but no communication delay is considered between
the processors and no data communication is considered between tasks.
2.3.3 Energy Consumption under a Deadline Objective
Purely minimizing the energy consumption without regard to the makespan could result in
large application delays, particularly in practical systems wherein faster processors consume
more energy and vice-versa. Similarly, purely minimizing the makespan could result in large
amounts of energy being consumed. As a result, the objective of minimizing energy under
an application deadline has been considered in order to achieve a trade-off between these two
important quantities. In [53], an objective of energy-efficiency cost which is formulated as a
weighted sum of application completion time and energy consumption is considered. However,
this work assumes that the mobile device and the remote cloud has infinite number of available
servers.
The problem of maximizing energy savings at the mobile device due to computational of-
floading, subject to an application deadline, is formulated as an integer linear program in [7].
While integer linear programming provides an optimal solution, it is NP-hard in general and
hence, there is no polynomial run-time guarantee. In [14] and [52], a dynamic programming
approach is proposed to obtain a polynomial-time solution. However, the algorithms assume
that mobile devices are capable of simultaneously processing any number of tasks without any
loss in speed or efficiency. This assumption simplifies the problem and consequently allows dy-
namic programming to be used to provide a solution. In [13], the classical LARAC (Lagrangian
Relaxation Based Aggregated Cost) algorithm is adopted to obtain an approximate solution for
the specific case of sequential tasks.
2.4 Online Task Offloading
In [57], a stochastic optimization problem is formulated in order to schedule tasks either locally
on the mobile device or on a single MEC server. The tasks are assumed to be fluid in nature,
and a task is generated at the beginning of each time slot with a certain probability. Similarly,
Chapter 2. Literature Review 12
Table 2.3: Literature Review on Online Task Offloading
Ref. Objective Solution Assumptions
[57]Power-constrained
delayOptimal stochastic policy
Fluid tasks;
single MEC server
[64] Resource usage Heuristics Preemptable tasks
[58] Cost ApproximateFluid tasks;
infinite-capacity datacenter
[59] Response time Reinforcement Learning Homogeneous VMs
[60]Weighted makespan
+offloading costApproximate Identical processors
[63]Queue-length-constrained
costApproximate
Identical tasks & cost functions
for an application
in [58], data is offloaded with an objective to minimize bandwidth, storage and computation
costs. Two online approximation algorithm are proposed. However, data is scheduled in a fluid
fashion, the processing capacity of the datacenters is not accounted for, and data can be sent
to only one chosen datacenter in a time slot.
In [59], a reinforcement learning approach is proposed to schedule online tasks to VMs, where
the VMs have a fixed buffer size. Similarly, in [60], an approximation algorithm is proposed to
schedule online tasks to identical processors. In [61], CloudSim toolkit is used to simulate the
proposed task scheduling algorithm with an objective to minimize makespan, whereas in [62],
Monte-Carlo simulation is used to test the proposed heuristic.
In [63], tasks from a certain number of applications arrive at an MEC system in each time
slot, and an approximation algorithm is proposed to minimize long-term average cost under
a queue-length constraint. In [64], two algorithms are proposed to optimize cloud resource
through preemptable task execution. These works are summarized in Table 2.3.
2.5 Related Theoretical Works
There are several theoretical works across domains that we can gain inspiration from in order
to solve task scheduling problems in cloud computing environments. Here, we address two
well-studied theoretical techniques:
• Job-shop scheduling to help solve task offloading problems, particularly for offline inde-
pendent tasks.
• Online learning to help solve online task offloading problems.
Chapter 2. Literature Review 13
2.5.1 Job-shop Scheduling
Several job-shop scheduling [25, 65–68] works address the problem of scheduling jobs to par-
allel processors, and propose algorithms with performance guarantees. However, they often
make simplistic assumptions, and consequently the proposed techniques cannot be trivially
extended to work with a practical cloud computing system. For example, [67] assumes equal-
length jobs, [68] assumes single-processor system, and [65]. [25] and [66] address the problem of
scheduling jobs to heterogeneous machines for objectives of cost and weighted sum completion
time respectively, and provide performance guarantees for their proposed methods. However,
neither of these works accommodate multiple users, dependent jobs, or communication times.
Hence, they cannot be easily applied to solve computational offloading problems for practical
system models.
2.5.2 Online Learning
Online learning algorithms focus on making decisions online in the presence of partial or no
information by learning information over time. However, existing works require convexity of
functions and utilize specific objectives such as regret [69]. Some works propose online learning
algorithms to solve auction problems [70] or adwords problems [71]. Again, while these works
provide inspiration to solve online task scheduling problems for cloud computing environments,
they cannot be directly applied to solve these problems because of lack of practical objectives,
constraints, or assumptions.
2.6 Review and Contribution
In this dissertation, we aim to propose algorithms to solve optimization problems in cloud
computing environment through identifying task offloading or scheduling decisions. In Chap-
ter 4, we address the problem of offloading independent tasks to a network of heterogeneous
processors. We consider an objective of minimizing sum completion time subject to multiple
user budget constraints. In Section 2.2, we reviewed the existing works that address similar
problems. However, we can see from Table 2.1 that these works address the problem from a de-
centralized game-theoretic perspective or make certain assumptions with respect to the system
model such as homogeneous resources or an infinite-capacity cloud. We propose centralized al-
gorithms with performance guarantees, and consider a heterogeneous and finite-capacity system
model.
Similarly, in Section 2.3, we reviewed the existing works that deal with dependent task
scheduling, and we can see from Table 2.2 that these works make similar assumptions with
respect to the system model. In Chapter 3, we considered a system model consisting of finite-
capacity devices and generic DAG applications in order to do away with these assumptions. We
wish to identify a task scheduling decision that minimizes overall cost subject to application
completion deadlines, and propose an efficient heuristic algorithm to obtain effective solutions.
Chapter 2. Literature Review 14
In Chapter 5, we address the online scheduling of tasks, where the total number of tasks
and task information are not known apriori. From Table 2.3, we see that existing works make
assumptions such as fluid tasks and homogeneous resources. We consider a system model that
does away with these assumptions, and propose algorithms with performance guarantees. We
also compare against other alternatives through trace-driven simulation.
Chapter 3
Offloading Dependent Tasks with
Communication Delay
In this chapter, we study the scheduling decision for applications consisting of dependent tasks,
in a generic cloud computing system comprising a network of heterogeneous local processors
and a remote cloud. We formulate an optimization problem to find the offloading decision that
minimizes the overall execution cost, subject to application completion deadlines. Since this
problem is NP-hard, we propose a heuristic algorithm termed Individual Time Allocation with
Greedy Scheduling (ITAGS) to obtain an efficient solution, and study its performance through
simulation.
The contributions of this work are as follows:
• We formulate a problem of cost minimization in scheduling a single application with
dependent tasks and a completion deadline, over a generic cloud computing system with
heterogeneous processors and communication delay. We relax the binary constraints to
obtain a convex problem and a lower bound to the optimal objective of the original
problem.
• We observe that a scheduling solution obtained by directly discretizing the binary-relaxed
solution does not provide satisfactory performance. Therefore, we propose a new heuristic
algorithm, termed Individual Time Allocation with Greedy Scheduling (ITAGS), which
utilizes the binary-relaxed solution to allocate a completion deadline to each individual
task and then greedily optimizes the scheduling of each task subject to its time allowance.
• We consider an extension to this problem where we need to schedule multiple applications
with different completion deadlines, and propose a modified version of ITAGS to obtain
a solution.
• Through trace-based simulation with real-world applications from [72], as well as various
randomly generated task trees, we study the impact of the application deadline and
15
Chapter 3. Offloading Dependent Tasks with Communication Delay 16
other system settings on the performance of ITAGS. We further compare ITAGS with
the dynamic programming approach from [14,21], other alternatives including the above
discretization heuristic, and the cost lower bound, demonstrating its superior effectiveness.
We also evaluate through simulation the robustness of ITAGS to variation in processing
times and communication delays.
The problem of offloading dependent tasks to multiple types of processors have been con-
sidered in [21], [22], [23], and [24]. However, in [21] and [22], the devices are assumed to possess
infinite capacity in terms of the number of tasks that can be processed simultaneously without
reduction in the processing speed for each task. On the other hand, in [24], the local pro-
cessor cores are assumed to exist on a single mobile device, and an objective of only energy
consumption by the mobile device is considered. Similarly, in [23], we investigate the objective
of cost minimization subject to an application deadline, for heterogeneous local and remote
processors. However, the delay between the local processors is assumed to be negligible. In
this chapter, we account for the delay between all processors in order to arrive at a general
model that encompasses scenarios such as offloading to peer devices and cloudlets in addition
to the cloud. The local processors have finite capacity, and there are time and cost associated
with both task execution and data communication between any two locations. This leads to a
unique and novel problem formulation. Additionally, we consider an extension with multiple
applications consisting of dependent tasks, which has not been considered in existing literature.
The rest of the chapter is organized as follows. Section 3.1 describes the system model
and the problem formulation. In Section 3.2, we present the motivation and details of ITAGS.
In Section 3.3, we consider the extension with multiple applications and propose the modified
version of ITAGS. Section 3.4 presents the simulation results, and concluding summary is given
in Section 3.5.
3.1 System Model and Problem Formulation
In this section, we consider the problem of offloading a single application with a completion
deadline. We extend this model to multiple applications in Section 3.3.
3.1.1 Local Processors and Remote Cloud
We consider a system with a finite number of local processors. These processors may be installed
in mobile edge computing hosts, cloudlet devices, or peer mobile devices. These processors may
have different speeds but are assumed to be unary, i.e., each processor executes one task at a
time, while the other tasks assigned to the processor wait in a queue. We emphasize that, with
respect to the cost and delay in task processing, this assumption is without loss of generality.1
1It is easy to see that there is no benefit in processor-sharing with respect to the sum queueing-and-executiondelay of the tasks.
Chapter 3. Offloading Dependent Tasks with Communication Delay 17
Remote Cloud
1
2
3
4
d13
d12
d35
d45
d15
d34
d24
d23
d14
Figure 3.1: Example network of local processors and cloud.
��
� � �
� �
��
� � �
� �
Figure 3.2: Dummy tasks, d1 and d2, added to a DAG of 5 tasks.
We further assume a remote cloud center that provides an essentially infinite number of
processors, possibly through leasing of virtual machines. Consequently, the remote cloud can
be viewed as an additional processor having infinite capacity in terms of the number of tasks
it can process simultaneously. Let the set of all processors, including the remote cloud, be Pand its size be M . Let dij be the delay per unit data transfer between processors i and j. For
simplicity of illustration, we assume dij = dji and dij = 0 if i = j. An example of such a system
is depicted in Figure 3.1.
3.1.2 Task Dependency Graph
Consider a single application that must be completed before a deadline L. The application
is partitioned into tasks, whose dependency is modeled as a directed acyclic graph (DAG)
G = 〈V, E〉 where V is the set of tasks and E is the set of edges. The edge (i, k) on the graph
specifies that there is some required data transfer, eik, from task i to task k and hence, k
cannot start before i finishes. Furthermore, if they are scheduled at different processors j and
v respectively, the communication delay is eikdjv and the communication cost is ceikdjv, where
c is the communication cost per unit time. It is clear that the delay values are often smaller
while offloading to nearby local processors in comparison with the delay to offload to the remote
cloud.
If task i is executed on processor j, the execution time is tij and the execution cost is pjtij ,
Chapter 3. Offloading Dependent Tasks with Communication Delay 18
Table 3.1: Chapter 3 Notations
Notation Description
tij execution time for task i on processor j
pj processing cost per unit time on processor j
c communication cost per unit delay
eik amount of data to be communicated from task i to k
djv delay per unit data between processors j and v
L application deadline
M total number of processors
N ′ total number of tasks
where pj is the processing price per unit time on processor j. In practice, the processing times
and data transfer requirement may be obtained by applying a program profiler as shown in
experimental studies such as MAUI [7] and Thinkair [12]. In this work, we proceed assuming
that such information in already given.
We assume that an application is initiated at a particular local processor and must end at the
same local processor. To model this requirement, for a given DAG representing an application,
we insert two dummy nodes, i.e., tasks having zero execution time and zero communication
cost. One dummy task is inserted at the start to trigger the application at the local device,
and another task is inserted at the very end to receive all the results back at the local device.
This is depicted in Figure 3.2. This insertion is without loss of generality since it preserves the
application. Hence, the total number of tasks can be considered to be
N ′ = |V|+ 2. (3.1)
3.1.3 Problem Formulation
The task scheduling decision contains both the mapping between tasks and processors and the
order of the tasks allocated to each processor. We define the scheduling decision variables as
follows:
xijr :=
1 if task i is on processor j at position r,
0 if otherwise,
for all i = 1, . . . N ′, j = 1, . . .M and r = 1, . . . N ′. Each task is to be scheduled to exactly one
of the existing positions on the processors. Hence,
M∑j=1
N ′∑r=1
xijr = 1, ∀i = 1, . . . , N ′. (3.2)
Chapter 3. Offloading Dependent Tasks with Communication Delay 19
Furthermore, each position on each processor can be assigned to at most one task, which is
given byN ′∑i=1
xijr ≤ 1, ∀r = 1, . . . , N ′, j = 1, . . . ,M. (3.3)
The positions in each processor are filled by the tasks sequentially, i.e., until one position
on a processor is occupied, tasks cannot be assigned to subsequent positions. This is imposed
by the following constraint:
N ′∑i=1
xijr −N ′∑i=1
xij(r−1) ≤ 0, ∀r = 2, . . . , N ′, j = 1, . . . ,M. (3.4)
The two dummy tasks inserted are required to be scheduled on a local processor, so we have
N ′∑r=1
x11r = 1,
N ′∑r=1
xN ′1r = 1. (3.5)
Furthermore, our task scheduling decision is required to meet the application deadline,
which imposes constraints on the finishing times of the tasks. Let Fi be the finish time of task
i, for i = 1, . . . N ′. Then
FN ′ ≤ L (3.6)
ensures that the last task, and consequently the overall application, is completed by the deadline.
In addition,
F1 = 0 (3.7)
sets the finish time of the first task to zero as it is a dummy task and has zero execution time.
The relationship between the finish times of tasks on the same local processor j and the
decision variables is given by
Fi − Fk + C(2− xijr − xkj(r−1)) ≥ tij , ∀i, k = 1, . . . N ′, r = 2, . . . , N ′, j = 1, . . . , (M − 1),
(3.8)
where we assign a large positive number to C. This ensures that the finish time of a task
on processor j is at least equal to the sum of the finish time of the preceding task and the
processing time of the present task. Note that 2− xijr − xkj(r−1) is zero if and only if tasks k
and i are placed consecutively on processor j.
Finally, since the tasks of the application are dependent, the finish time of a task must be
greater than that of each of its predecessors by the amount of its predecessor’s execution time
and communication time from its predecessor. Thus, we have
Fi − Fk ≥N ′∑r=1
M∑j=1
tijxijr +
M∑j=1
N ′∑t=1
N ′∑r=1
M∑v=1
ekidvjxijrxkvt, ∀i = 1, . . . N ′, (k, i) ∈ E . (3.9)
Chapter 3. Offloading Dependent Tasks with Communication Delay 20
The first term on the right hand side of (3.9) is the total execution time, and the second term
is the total data communication time, which occurs when task i is executed on processor j and
its predecessor k is executed on another processor v.
We define the total cost of application execution as the sum of the total execution cost and
the total communication cost. Our goal is to identify the schedule that minimizes this total
cost, subject to the application deadline, L. This can be formulated as an optimization problem
as follows:
minimize{xijr}
N ′∑r=1
M∑j=1
N ′∑i=1
pjtijxijr +N ′∑i=1
N ′∑k=1
M∑j=1
M∑v=1
N ′∑r=1
N ′∑t=1
ceikdjvxijrxkvt, (3.10)
subject to (3.2)− (3.9),
xijr ∈ {0, 1}, ∀i = 1, . . . N ′, r = 1, . . . , N ′, j = 1, . . . ,M. (3.11)
This problem is NP-hard since it contains the Generalized Assignment Problem (GAP) as
a special case, and GAP is NP-hard. Hence, we do not expect to find an optimal solution in
polynomial time. Consequently, we propose the ITAGS algorithm and study its effectiveness in
solving this problem.
3.2 Individual Time Allocation with Greedy Scheduling (ITAGS)
The ITAGS algorithm is built on the concept of appropriately allocating the application dead-
line among the individual tasks. To provide a guideline on the suitable amount of individual
time allocation, we first consider a binary-relaxed version of the original problem in the next
subsection. We follow that by discussing how one might design, as an inferior alternative to
ITAGS, a feasible binary solution via direct discretization. We then present the details of
ITAGS, concluding with a discussion of its feasibility and computational complexity.
3.2.1 Binary Relaxation and Individual Time Allowance
Optimization problem (3.10) is a mixed integer program, and it is non-convex due to its non-
convex objective and constraints (3.9). However, we note that the communication delay and
cost terms in (3.9) and (3.10) can be modified as follows:
M∑j=1
N ′∑t=1
N ′∑r=1
M∑v=1
ekidvjxijrxkvt (3.12)
=M∑j=1
M∑v=1
ekidvj
(N ′∑r=1
xijr
)(N ′∑t=1
xkvt
)
=
M∑j=1
M∑v=1
ekidvj max
[(N ′∑r=1
xijr
)+
(N ′∑t=1
xkvt
)− 1, 0
], (3.13)
Chapter 3. Offloading Dependent Tasks with Communication Delay 21
where the last equality holds because {xijr} are binary. This converts the non-convex (3.12) to
a convex form in (3.13).
Therefore, we perform the following two-step binary relaxation on the original problem:
• Replace the communication terms in (3.10) and (3.9) by (3.13);
• Replace the binary constraints in (3.11) with linear constraints by simply restrict the
decision variables to be non-negative.
This leads to the following convex problem over decision variables {xijr}:
minimize{xijr}
N ′∑r=1
M∑j=1
N ′∑i=1
pjtijxijr +N ′∑i=1
N ′∑k=1
M∑j=1
M∑v=1
ekidvj max
[(N ′∑r=1
xijr
)+
(N ′∑t=1
xkvt
)− 1, 0
](3.14)
subject to (3.2)− (3.8),
Fi − Fk ≥N ′∑r=1
M∑j=1
tijxijr +
M∑j=1
M∑v=1
ekidvj max
[(N ′∑r=1
xijr
)+
(N ′∑t=1
xkvt
)− 1, 0
],
∀i = 1, . . . N ′, (k, i) ∈ E . (3.15)
xijr ≥ 0, ∀i = 1, . . . N ′, r = 1, . . . , N ′, j = 1, . . . ,M. (3.16)
An optimal solution to problem (3.14) can be efficiently computed using convex program-
ming solvers such as CVX. Note that replacing (3.11) with (3.16) is equivalent to allowing a
single task to be distributed and executed partially across several processors and positions.
This is unrealistic, but solving this relaxed problem is useful for two purposes. First, since
the relaxed problem has a larger feasible set, it serves as a lower bound to the optimum of
the original problem, which can be used for numerical performance benchmarking. Second,
the relaxed solution can be leveraged to recover a binary solution to the original problem. In
particular, as a part of the ITAGS algorithm, it supplies the individual time allowance for each
task as explained in Section 3.2.3.
3.2.2 Alternative Discretization Heuristic
Before presenting the details of ITAGS, we first consider a conventional approach to recover a
binary solution from the above relaxed solution, by discretizing the fractional solution xijr to
binary values. We will show later that such an approach, although non-trivial, does not provide
satisfactory performance. Therefore, it will be used mainly for performance benchmarking
against ITAGS.
We note that discretizing the fractional xijr solutions is challenging. Directly rounding
them to binary values will violate some constraints of the original problem. In particular, the
constraints on relative positions of tasks on a processor need to be taken into consideration,
to ensure that the scheduled tasks are in proper order to satisfy the dependency requirement.
Chapter 3. Offloading Dependent Tasks with Communication Delay 22
Consequently, we consider the following algorithm, term discretization heuristic, which 1) dis-
regards the task positions in the relaxed solution, 2) schedules each task to a processor based
on the fractional solution, and 3) calculates the resultant task starting times to obtain their
relative position values for the final binary solution.
Reduction to Task-on-Processor
In this step, we assign xijr values to their corresponding task-on-processor variables yij as
follows:
yij =N ′∑r=1
xijr, ∀i = 1, . . . N ′, j = 1, . . . ,M. (3.17)
Thus, the yij variables contain just the fractional solution for each task i on each processor
j, which disregards the position information in xijr. It should be noted that yij obey the
scheduling constraint:∑M
j=1 yij = 1, ∀i = 1, . . . N ′.
Discretization
We next discretize the fractional yij solutions to decide the processor assignment decision si for
every task i by picking the processor that has the maximum yij value. The intuition behind
this is that a yij value can be viewed as the probability of scheduling task i on processor j, and
thus we take the decision with the highest probability:
si = argmaxj
yij , ∀i = 1, . . . N ′. (3.18)
Thus, the decision for every task i is
yij :=
1 if j = si,
0 if j 6= si.
Mapping to Positions
Although we have determined the processor on which each task needs to be scheduled, we still
need to decide the positions of tasks on each processor, or the starting times for each task.
Towards this end, we sort the tasks in the order of increasing Fi values from the solution to
the relaxed problem (3.14). This sorting will ensure that the precedence constraints in (3.9)
are obeyed between any two consecutive tasks. Thus, scheduling the tasks to their assigned
processors in this order will give us their corresponding positions and starting time values.
Feasibility Check
For the above task schedule, we check if the total delay meets the application deadline L. If so,
the corresponding cost is the resulting solution, or else the algorithm fails to produce a feasible
Chapter 3. Offloading Dependent Tasks with Communication Delay 23
schedule. In the latter case, we will use the same fallback procedure as ITAGS described below,
to offload all tasks to the cloud.
3.2.3 ITAGS Algorithm
The task scheduling decision is required to meet the overall application completion deadline.
A purely greedy algorithm might schedule each task to the processor where it achieves the
minimum cost such that the overall application deadline is still met. However, such an algorithm
prioritizes the tasks that are scheduled in the beginning as these tasks would be able to take
away a larger chunk of the overall deadline allowance and make cost-effective decisions for
themselves. On the other hand, the tasks that are scheduled later in the greedy process would
have a relatively smaller portion of the overall deadline allowance available, resulting in possible
infeasibility and performance degradation.
Thus, the guiding principle behind the design of ITAGS is that the overly greedy aspect
of the above approach should be countered, by assigning individual deadlines to the tasks to
ensure uniform priority for all tasks regardless of their scheduling order. ITAGS consists of
three major steps: 1) Set individual time allowance for each task; 2) Schedule each task to a
processor based on a greedy approach subject to its individual deadline; and 3) Check feasibility
by testing if the last task meets the overall application deadline.
Step 1: Individual Time Allocation
In Step 1, we identify the time allowance to be given to each task. This is achieved by performing
binary relaxation on the original problem as detailed in Section 3.2.1, and solving the relaxed
problem to obtain the finish time Fi for each task i. These finish times are treated as individual
task deadlines in the next step.
Step 2: Task Scheduling
Once the individual deadlines are set in Step 1, Step 2 of ITAGS aims at assigning a processor
si for each task i. This task scheduling process has a principled greedy nature as the algorithm
takes one task at a time and schedules it to the processor where the task 1) can complete its
execution before its individual deadline and 2) incurs minimum additional cost.
ITAGS schedules the tasks starting from the top of the DAG and works its way down to
the bottom. Specifically, the tasks are scheduled in the increasing order of individual deadline
Fi. We note that this ensures that a task is scheduled only after its predecessors have been
scheduled since (3.15) ensures that the Fi value of task i exceeds that of its predecessors. The
topmost task is the first dummy task, and it is always scheduled to the local processor where
the application is initiated. Then, as ITAGS moves down the list of unscheduled tasks, for each
task i, we decide its start time STi and processor si.
Chapter 3. Offloading Dependent Tasks with Communication Delay 24
First, for each potential processor j, we compute the accumulated execution delay Dij and
cost Cij , due to the execution of i on processor j, as follows:
Dij = max(k,i)∈E
(STk + tksk + Tki) (3.19)
Cij = pjtij + c∑
(k,i)∈E
Tki (3.20)
where Tki = dskjeki is the communication delay from processor sk to processor j with respect to
the data from task k to task i. In (3.19), the sum inside max calculates the time when a parent
task k completes execution and its data transfer to task i has arrived at processor j. Therefore,
Dij is the earliest start time of task i on processor j, by taking into account all parents of task
i. Note that if both tasks k and i are scheduled onto the same processor, i.e., sk=j, then the
communication delay per unit data dskj = 0 and consequently Tki = 0.
However, knowing Dij is not sufficient to decide whether task i should be place on processor
j, since Dij does not take into account the waiting time for a task on processor j if the processor
is local and is already executing another task. Therefore, we keep a tab on the end of busy time
on each local processor j, denoted by SLj , and we update it every time a task is scheduled onto
the processor. In other words, every time that some task k is assigned to processor j, we set
SLj = STk + tkj . (3.21)
This takes into account the amount of time that a task will have to wait for processor j if
it is assigned to this processor. Note that for the remote cloud M , SLM is always zero as
we assume that the cloud has infinite capacity in terms of the number of tasks it can process
simultaneously, resulting in zero waiting time for any task scheduled to the cloud.
As a result, the start time of a task i assigned to processor j is the maximum of the
accumulated execution delay Dij and current end of busy time SLj . Thus, in order for task i
to complete execution by its individual deadline Fi, the following condition must be satisfied:
max{Dij , SLj}+ tij ≤ Fi. (3.22)
We then choose processor si to schedule task i as follows:
si =
argminj∈Ji Cij if Ji 6= ∅
argminj Dij if Ji = ∅(3.23)
where Ji is the set of all processors for which (3.22) is satisfied for task i. From (3.23) we
see that if the individual deadline Fi is too tight and cannot be met by any processor, ITAGS
gracefully falls back to a greedy-time algorithm, i.e., one that tries to minimize makespan.
Chapter 3. Offloading Dependent Tasks with Communication Delay 25
Step 3: Feasibility Check
The process outlined in Step 2 is repeated until the last dummy task. This dummy task is to be
scheduled to the local processor that initiated the application in order to obtain the results at the
initiating device. If this last task does not meet the overall application deadline L, infeasibility
occurs and the algorithm fails to produce a feasible schedule. Alternatively, if every task has
been scheduled successfully to some processor, then a feasible decision is obtained. These two
possibilities result in termination of the algorithm.
The details of ITAGS are given in Algorithm 1.
Algorithm 1 ITAGS algorithm (after Step 1)
Input: DAG G = 〈V, E〉, P, L, and solution to problem (3.14).Output: Scheduling decision variables {xijr}
SLj ← 0 for all j ∈ PSTi ← 0 for all i ∈ Vs1 = 1 {Schedule first dummy task to initiating processor}while there exist tasks not scheduled do
Choose unscheduled task i with minimum Fifor all j ∈ P do
Calculate Dij from (3.19)Calculate Cij from (3.20)
end forif i = N ′ thensN ′ = 1 {Schedule last dummy task to initiating processor}
elseFind si from (3.23)
end ifSTi ← max{Disi ,SLsi} {Setting actual starting time}if si < M then
SLsi ← STi + tisi {Updating the end of busy time for local processors}end if
end whileif DN ′sN′
> L thenNo feasible decision produced.return
end ifxijr ← 0 for all i, j and rSort the tasks scheduled to each single processor in increasing order of STi and obtain theirpositions ri.for all i ∈ V doxisiri = 1
end for
Chapter 3. Offloading Dependent Tasks with Communication Delay 26
3.2.4 Feasibility and Complexity Analysis
For our NP-hard optimization problem (3.10), neither the discretization heuristic nor ITAGS
provides a feasibility guarantee. Consequently, for practical cases, we may consider a fallback
option that simply offloads all tasks belonging to the application to the remote cloud, if a
feasible solution is not found by the algorithm. Such a fallback option is applicable under the
assumption that the cloud has fast processors and high-speed access, so that offloading to the
cloud can meet the overall application deadline, albeit with an added cost.
The computational complexity of the discretization heuristic is O(2|V|+ |E|). On the other
hand, the computational complexity for the ITAGS algorithm excluding the time to compute
the lower bound solution is O(M(|V| + |E|)), which is polynomial with respect to the size of
the application. The time to compute the lower bound is dependent on the algorithms used by
the software to arrive at the solution. Assuming that a primal barrier algorithm and a υ-self
concordant barrier function with µ as the barrier parameter is used, the number of iterations
in the algorithm to arrive at the solution is O(√υlog(υµε )) for a convex program [73].
3.3 Scheduling multiple applications with different deadlines
Now, we consider an extension where multiple applications need to be executed, each one
having its own application completion deadline. Let the total number of applications be A.
Each application a is modeled as a DAG Ga = 〈Va, Ea〉 where Va is the set of tasks and Ea is
the set of edges. Edge eaki specifies the amount of data to be communicated from task i to task
k belonging to application a.
Similar to the single-application case in Section 3.1.2, the number of tasks in application
a, including the dummy tasks, is given by N ′a. Each application a has an application deadline
given by La, and if it’s task i is executed on processor j, the execution time is taij .
We redefine the task scheduling decision variables as follows.
xaijr :=
1 if task i belonging to application a
is on processor j at position r,
0 if otherwise,
for all a = 1, . . . , A, i = 1, . . . , N ′a, j = 1, . . . ,M and r = 1, . . . , P , where P =∑A
a=1N′a.
3.3.1 Binary Relaxation
The relaxed problem (3.14) is modified for this new multiple-application model as follows.
minimize{xaijr}
A∑a=1
P∑r=1
M∑j=1
N ′a∑i=1
pjtaijxaijr
Chapter 3. Offloading Dependent Tasks with Communication Delay 27
+
A∑a=1
N ′a∑i=1
N ′a∑k=1
M∑j=1
M∑v=1
eakidvj max
[( P∑r=1
xaijr
)+
(P∑t=1
xakvt
)− 1, 0
](3.24)
subject to
M∑j=1
P∑r=1
xaijr = 1, ∀a = 1, . . . , A, i = 1, . . . , N ′a, (3.25)
A∑a=1
N ′a∑i=1
xaijr ≤ 1, ∀r = 1, . . . , P, j = 1, . . . ,M, (3.26)
A∑a=1
N ′a∑i=1
xaijr −A∑a=1
N ′a∑i=1
xaij(r−1) ≤ 0, ∀r = 2, . . . , P, j = 1, . . . ,M, (3.27)
FaN ′a ≤ La, ∀a = 1, . . . , A, (3.28)
Fa1 = 0, ∀a = 1, . . . , A, (3.29)
Fai − Fak ≥P∑r=1
M∑j=1
taijxaijr
+M∑j=1
M∑v=1
ekidvj max
[( P∑r=1
xaijr
)+
(P∑t=1
xakvt
)− 1, 0
],
∀a = 1, . . . , A, i = 1, . . . , N ′a, (k, i) ∈ Ea, (3.30)
Fai − Fbk + C(2− xaijr − xbkj(r−1)) ≥ taij , ∀a, b = 1, . . . , A, i = 1, . . . , N ′a,
k = 1, . . . , N ′b, r = 2, . . . , P, j = 1, . . . ,M, (3.31)
P∑r=1
xa11r = 1,P∑r=1
xaN ′a1r = 1, ∀a = 1, . . . , A, (3.32)
xaijr ≥ 0, ∀a = 1, . . . , A, i = 1, . . . , N ′a, r = 1, . . . , P, j = 1, . . . ,M. (3.33)
Consequently, using this relaxed solution, we can extend the discretization heuristic and
ITAGS proposed in Sections 3.2.2 and 3.2.3 respectively, with some modifications, to solve this
problem.
3.3.2 Modified Alternative Discretization Heuristic
The discretization heuristic can be adjusted to accommodate multiple applications with different
deadlines by tweaking its three constituent steps presented in Section 3.2.2 as follows.
Reduction to Task-on-Processor
In this step, we assign xaijr values to their corresponding task-on-processor variables yaij as
follows:
yaij =
P∑r=1
xaijr,∀a = 1, . . . , A, i = 1, . . . Na, j = 1, . . . ,M. (3.34)
Chapter 3. Offloading Dependent Tasks with Communication Delay 28
Discretization
We next discretize the fractional yaij solutions to decide the processor assignment decision sai
for every task i by picking the processor that has the maximum yaij value.
sai = argmaxj
yaij ,∀a = 1, . . . , A, i = 1, . . . Na. (3.35)
Thus, the decision for every task i is
yaij :=
1 if j = sai,
0 if j 6= sai.
Mapping to Positions
We sort tasks in the order of increasing Fai values ∀a = 1, . . . , A, i = 1, . . . Na from the solution
to the relaxed problem (3.24). We schedule the tasks to their assigned processors in this order
to obtain their corresponding positions and starting time values.
Feasibility Check
For the above task schedule, we check if the application deadline La is met for every application
a. If so, the corresponding cost is the resulting solution, or else the algorithm fails to produce
a feasible task schedule for one or more applications. In the latter case, we will use the fallback
procedure described in Section 3.3.4.
3.3.3 Modified ITAGS Algorithm
Step 1: Individual Time Allocation
We solve the relaxed problem to obtain the finish time Fai for each task i belonging to appli-
cation a. These finish times are treated as individual task deadlines in the next step.
Step 2: Task Scheduling
Tasks are scheduled in the increasing order of individual deadline Fai. The topmost task is the
first dummy task, and it is always scheduled to the local processor where the application is
initiated. Then, as ITAGS moves down the list of unscheduled tasks, for each task i, we decide
its start time STai and processor sai.
First, for each potential processor j, we compute the accumulated execution delay Daij and
cost Caij , due to the execution of i of application a on processor j, as follows:
Daij = max(k,i)∈Ea
(STak + taksak + Taki) (3.36)
Chapter 3. Offloading Dependent Tasks with Communication Delay 29
Caij = pjtaij + c∑
(k,i)∈Ea
Taki (3.37)
where Taki = dsakjeaki is the communication delay from processor sak to processor j with respect
to the data from task k to task i. In other words, every time that task i is assigned to processor
j, we set schedule length of processor j
SLj = STai + taij , (3.38)
where STai is the start time of task i of application a.
In order for the task to complete execution by its individual deadline Fai, the following
condition must be satisfied:
max{Daij , SLj}+ taij ≤ Fai. (3.39)
We then choose processor sai to schedule task i of application a as follows:
sai =
argminj∈Jai Caij if Jai 6= ∅
argminj Daij if Jai = ∅(3.40)
where Jai is the set of all processors for which (3.39) is satisfied.
Step 3: Feasibility Check
The process outlined in Step 2 is repeated until all tasks are scheduled. The dummy tasks
are to be scheduled to the local processor. If the last dummy task of every application a
does not meet its overall application deadline La, infeasibility occurs and the algorithm fails to
produce a feasible schedule for that application. Alternatively, if every task has been scheduled
successfully to some processor, then a feasible decision is obtained. These two possibilities
result in termination of the algorithm.
3.3.4 Feasibility and Complexity Analysis
For this NP-hard optimization problem (3.24), neither the discretization heuristic nor ITAGS
provides a feasibility guarantee. If application deadline La is not met for some application a,
we use a fallback option where the algorithm simply offloads all tasks belonging to application
a to the remote cloud, similar to Section 3.2.4. Note that this will not alter the feasibility of
the other applications.
The computational complexity of the discretization heuristic excluding the time to compute
the lower bound solution is O(2∑A
a=1(|Va| + |Ea|)). On the other hand, the computational
complexity for the ITAGS algorithm is O(M∑A
a=1(|Va| + |Ea|)), which is polynomial with
respect to the total number of applications and their sizes.
Chapter 3. Offloading Dependent Tasks with Communication Delay 30
Algorithm 2 Modified ITAGS algorithm (after Step 1)
Input: DAGs G = 〈Va, Ea〉, P, La, ∀a ∈ {1, . . . , A}, and solution to problem (3.24).Output: Scheduling decision variables {xaijr}
SLj ← 0 for all j ∈ PSTai ← 0 for all a ∈ {1, . . . , A}, i ∈ Vasa1 = 1,∀a ∈ {1, . . . , A} {Schedule first dummy task of each a to initiating processor}while there exist tasks not scheduled do
Choose unscheduled task i with minimum Fai across all applications aa← the application that includes task ifor all j ∈ P do
Calculate Daij from (3.36)Calculate Caij from (3.37)
end forif i = N ′ thensaN ′ = 1 {Schedule last dummy task to initiating processor}
elseFind sai from (3.40)
end ifSTai ← max{Daisai , SLsai} {Setting actual starting time}if sai < M then
SLsai ← STai + ta∗isi {Updating the end of busy time for local processors}end if
end whilefor all a ∈ {1, . . . , A} do
if DaN ′saN′> La then
No feasible decision produced.return
end ifend forxaijr ← 0 for all a, i, j and rSort the tasks scheduled to each single processor in increasing order of STai and obtain theirpositions rai.for all i ∈ Va doxaisairai = 1
end for
Chapter 3. Offloading Dependent Tasks with Communication Delay 31
3.4 Trace-driven and Randomized Simulations
We investigate the performance of ITAGS with extensive simulation over multiple offloading
scenarios and applications, using both real-world applications and randomly generated task
trees with practical parameter values.
3.4.1 Comparison Targets
We compare ITAGS with the following alternatives:
• Discretization heuristic in Section 3.2.2 for single application, and modified discretization
heuristic in Section 3.3.2 for multiple applications.
• Purely local: Scheduling all tasks on the local device, i.e., user’s own device/processor.
• Purely remote: Scheduling all tasks on the remote cloud.
• Greedy algorithm: Picking tasks starting from the top of the DAG and scheduling each
task onto the processor where it has the least accumulated cost such that the overall
application deadline is still met.
• Kao’s dynamic programming: The dynamic programming method proposed in [14, 21].
Since the local device in [14,21] can execute any number of tasks simultaneously without
increasing the required processing time of each task, it is essentially assumed to have
an infinite number of identical unary-capacity processors. Furthermore, there is zero
delay between the local device and the remote cloud. Thus, we study the performance
of this dynamic programming algorithm by allowing only a finite number of identical
local processors and practical delay between these processors. In other words, we run
their algorithm to obtain a scheduling decision and apply this decision to our system
by queuing the tasks appropriately and calculating the cost and deadline accounting for
inter-processor delay.
All of the above algorithms are provided with the same fallback option as ITAGS. The lower
bound solution, described in Section 3.2.1, is also observed for benchmarking purposes. It is
calculated using the SDPT3 solver of CVX.
We present the results for a single application case in Sections B, C and D, and for the
multiple applications case in Section E. Additionally, in Section E, we evaluate the performance
of ITAGS when the task processing times and communication times are not precisely unknown
or subject to variation.
3.4.2 Trace-driven Simulation
The delay constrained cost minimization problem under consideration and the ITAGS algorithm
have general application to different network topologies. Here we consider the following typical
scenarios.
Chapter 3. Offloading Dependent Tasks with Communication Delay 32
802.11 ac
LTE-A
LTE-A
LTE-A
802.11 n
802.11 n
Peer device/Local processor
Cloudlet
Remote cloud
Own Mobile Device/
Processor
Figure 3.3: Simulation topology
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. HeuDyn. Prog
(a) Gaussian elimination
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. HeuDyn. Prog
(b) FFT algorithm
Figure 3.4: Cost vs. application deadline for Gaussian elimination and FFT in Scenario 1
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. Heu
(a) Gaussian elimination
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. Heu
(b) FFT algorithm
Figure 3.5: Cost vs. application deadline for Gaussian elimination and FFT for Scenario 2
Chapter 3. Offloading Dependent Tasks with Communication Delay 33
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. Heu
(a) Gaussian elimination
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. Heu
(b) FFT algorithm
Figure 3.6: Cost vs. application deadline for Gaussian elimination and FFT for Scenario 3
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
0.28
0.3
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. Heu
(a) Gaussian elimination
0 0.05 0.1 0.15 0.2 0.25
Application deadline (s)
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Cos
t (J)
Lower BoundPurely LocalPurely RemoteGreedyITAGSDisc. Heu
(b) FFT algorithm
Figure 3.7: Cost vs. application deadline for Gaussian elimination and FFT in Scenario 4
Chapter 3. Offloading Dependent Tasks with Communication Delay 34
Figure 3.8: Task Graph for Gaussian Elimination Application with a matrix size of 5
• Scenario 1 : Identical local processors (including the initiating processor) and a remote
cloud
• Scenario 2 : The initiating processor, a peer device/local processor, and a remote cloud
• Scenario 3 : The initiating processor, a cloudlet, and a remote cloud
• Scenario 4 : The initiating processor and a 3-tier architecture (peer device, cloudlet and
remote cloud) given in Figure 3.3.
Note that we apply Kao’s dynamic programming scheme only to Scenario 1, as the system
model considered in [14,21] cannot be extended to more complicated scenarios. We always label
the local processor initiating the application as processor 1.
We use the application DAG structures presented in [72] for Gaussian elimination (depicted
in Figure 3.8) and the FFT algorithm, as well as additional information provided in [72] with
respect to the computation and communication times, to test our proposed algorithms for the
aforementioned scenarios. We consider the Gaussian elimination application with a matrix size
of 5. We generate random values for the processing time of a single loop uniformly from the
interval (0.5, 5) ms and allocate processing times ti1 for each task i accordingly based on the
number of loops required for the execution of the task in the Gaussian elimination algorithm.
Chapter 3. Offloading Dependent Tasks with Communication Delay 35
Similarly, the input/output data is drawn uniformly from the interval (10, 100) KB. For the
FFT algorithm, we generate the processing times ti1 for each task i uniformly in (0.5, 20) ms
and input/output data amount is drawn uniformly from the interval (10, 100) KB. Further, we
enforce that the computation times for the tasks in each level are equal and the communication
times between the tasks at two particular levels are equal as given in [72]. The rest of the
parameter values are kept the same as those of the Gaussian elimination application.
We use energy consumption as the measurement of cost. We set c = 0.935 watt [74].
The local processor initiating the application has p1 = 0.944 watt [74]. The following are the
parameter settings for the different scenarios.
• Scenario 1 : 3 additional identical local processors which may all be on the same initiating
device or on different devices. We assume that all these local processors have pj = 0.944
watt and tij = ti1, for each task i and processor j = 2, 3, 4.
• Scenario 2 : A single additional local processor, representing the peer device, having
p2 = 1.5 watts and ti2 = 0.75ti1 for task i.
• Scenario 3 : For Scenario 3, we consider a cloudlet consisting of two processors p2 = p3 = 4
watts and ti2 = ti3 = 0.5ti1 for each task i.
• Scenario 4 : Both the peer device from Scenario 2 and the cloudlet from Scenario 3.
For each of the above scenarios, we additionally consider a more powerful and consequently more
expensive remote cloud, consisting of an unlimited number of available processors, with p3 = 10
watts and tiM = 0.12ti1 for each task i. For communication between processors, we consider
practical communication delay based on the links as given in Figure 3.3, with 6.15 ns/byte
for 802.11ac, 17.77 ns/byte for 802.11n, and 80 ns/byte for Long Term Evolution Advanced
(LTE-A).
Figures 3.4-3.7 depict the cost versus application deadline for the Gaussian elimination and
FFT applications for these four scenarios. We see from these figures that ITAGS performs
consistently better than the other alternatives. From Figures 3.4a and 3.4b, we see that the
dynamic programming approach in [14, 21] performs poorly when subjected to practical con-
straints such as finite capacity processors and inter-processor delay. Naive algorithms such as
purely local, purely remote, and greedy do not give satisfactory cost, particularly for non-trivial
values of application deadline. The discretization heuristic generally performs better than the
naive alternatives but is out-performed by ITAGS. We also see that with increasing values of
application deadline, the cost decreases due to the cost-time tradeoff. For large values of appli-
cation deadline, the decisions tend towards being purely local as the local device is the cheapest
and the slowest in our settings.
Chapter 3. Offloading Dependent Tasks with Communication Delay 36
0.4 0.5 0.6 0.7 0.8 0.9
Application Deadline (s)
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Cos
t (J)
Disc. Heu:M=2Disc. Heu:M=3Disc. Heu:M=4ITAGS:M=2ITAGS:M=3ITAGS:M=4Lower Bound:M=2Lower Bound:M=3Lower Bound:M=4
(a) Effect of the number of processors M
0.4 0.5 0.6 0.7 0.8 0.9
Application Deadline (s)
0.5
1
1.5
2
Cos
t (J)
Disc. Heu:c=0.3Disc. Heu:c=0.9Disc. Heu:c=1.5ITAGS:c=0.3ITAGS:c=0.9ITAGS:c=1.5Lower Bound:c=0.3Lower Bound:c=0.9Lower Bound:c=1.5
(b) Effect of communication cost c
0.4 0.5 0.6 0.7 0.8 0.9
Application Deadline (s)
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
Cos
t (J)
Disc. Heu:dl=10Disc. Heu:dl=20Disc. Heu:dl=30ITAGS:dl=10ITAGS:dl=20ITAGS:dl=30Lower Bound:dl=10Lower Bound:dl=20Lower Bound:dl=30
(c) Effect of local delay per byte dl
0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3
Application Deadline (s)
0
1
2
3
4
5
6
Cos
t (J)
Disc. Heu:N=7Disc. Heu:N=10Disc. Heu:N=15ITAGS:N=7ITAGS:N=10ITAGS:N=15Lower Bound:N=7Lower Bound:N=10Lower Bound:N=15
(d) Effect of the number of tasks N
Figure 3.9: Cost vs. application deadline for randomly generated task trees
3.4.3 Simulation with Randomly Generated Task Trees
In order to further assess the behavior of ITAGS over richer parameter settings, we conduct
simulation based on randomly generated task trees in terms of the DAG structure, task execu-
tion times on the processors, and input/output data between tasks. For each parameter setting,
we observe the average performance of various algorithms over multiple realizations of the ran-
domly generated task trees. From our observations in the previous section, the discretization
heuristic mostly outperforms the other naive alternatives. Hence, we present comparison only
with the discretization heuristic, but also use the lower bound solution for benchmarking.
In Figure 3.9, we study the effect of various parameters on the performance of ITAGS. We
again use energy consumption as the measurement of cost. We consider a general topology with
multiple local helper processors and a remote cloud. We assume the processor initiating the
application, labeled as processor 1, has p1 = 0.944 watt [74], and any additional local helper
processors, representing faster cloudlets or peer devices, has pj = 1.5 watts and tij = 0.75ti1 for
each task i and processor j = 2, . . . , (M − 1). We consider a more powerful and consequently
more expensive remote cloud with p3 = 10 watts and tiM = 0.25ti1 for each task i. Here,
Chapter 3. Offloading Dependent Tasks with Communication Delay 37
Table 3.2: Run-time (sec)
N Disc. Heu. ITAGS Disc. Heu. ITAGS
M=3 M=4
5 5.0612 5.0620 8.6019 8.6029
7 10.0160 10.0167 17.0474 17.0484
10 22.9529 22.9538 41.5981 41.5991
15 79.7471 79.7484 178.5292 178.5308
ti1 = number of cycles1.2GHz where the processor speed is 1.2GHz and the number of cycles is drawn from
a uniform distribution in the interval (100, 200) mega cycles. We set by default M = 3, N = 5,
c = 0.935 watt [74], and communication delay between the local processors dl = 10 ns/byte
but vary each of them in different plots. The input/output data amount is drawn uniformly
from the interval (1, 3) MB. The communication delay is taken as 50 ns/byte between a local
processor and the remote cloud.
We see that ITAGS substantially outperforms the discretization heuristic, and by inference
the other alternatives, over a wide range of parameter values in the number of processors,
the communication price, and the application size. Furthermore, as the application deadline
increases, ITAGS converges to the lower bound solution, and hence also converges to the opti-
mum, faster than the discretization heuristic.
3.4.4 Run-Time Comparison
In Table 3.2, we show the run-time of ITAGS under the settings of Figure 3.9a, averaged over all
L values. We observe that the run-time of ITAGS is nearly identical to that of the discretization
heuristic. Therefore, the substantial performance benefit of ITAGS is achieved with negligible
run-time penalty. Furthermore, ITAGS scales well with respect to the application size.
3.4.5 Multiple Applications and Uncertain Processing and Communication
Times
In this section, we consider multiple applications with different deadlines and the modified
algorithms proposed in Section 3.3. For this general scenario, we also study the robustness of
ITAGS to variation in the processing times and communication times.
Figure 3.10 depicts cost versus application deadline for different number of applications A.
For this figure, we assume that all applications have the same deadline, and use the default
parameter settings used in Section 3.4.3. We see that when A = 2 and A = 3, the performance
gap between ITAGS and the discretization heuristic is larger than when A = 1, i.e., the single-
application case. In other words, ITAGS can be expected to perform even better relatively for
larger systems with multiple applications.
We now consider the scenario where we do not know the exact task processing times and
Chapter 3. Offloading Dependent Tasks with Communication Delay 38
0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8
Application Deadline (s)
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Cos
t (J)
Lower Bound:A=1Lower Bound:A=2Lower Bound:A=3ITAGS:A=1ITAGS:A=2ITAGS:A=3Disc. Heu:A=1Disc. Heu:A=2Disc. Heu:A=3
Figure 3.10: Cost vs. application deadline for different number of applications A
0 5 10 15 20 25
Realizations
0
0.5
1
1.5
2
2.5
3
3.5
4
Cos
t (J)
Disc. Heu(known)Disc. Heu(unknown)ITAGS(known)ITAGS(unknown)
Figure 3.11: Cost vs. Realizations for 15% error
Chapter 3. Offloading Dependent Tasks with Communication Delay 39
−0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Error
Cos
t (J)
Disc. Heu(known)Disc. Heu(unknown)ITAGS(known)ITAGS(unknown)Lower Bound(known)Lower Bound(unknown)
(a) Single Gaussian elimination application
−0.05 0 0.05 0.1 0.15 0.2 0.25 0.31
1.5
2
2.5
3
3.5
4
Error
Cos
t (J)
Disc. Heu(known)Disc. Heu(unknown)ITAGS(known)ITAGS(unknown)Lower Bound(known)Lower Bound(unknown)
(b) Two randomly-generated applications
Figure 3.12: Cost vs. error with known and unknown processing and communication times(with 95% confidence intervals)
communication times. We assume that we know certain estimates of these times, and allow
for an error about these estimates. We use the default parameter settings from Section 3.4.3,
except treating the processing time values (tij for all task i and processor j) and communication
time values (djr between processors j and r) there as estimates. We allow the actual times to
vary uniformly randomly between ((1 − ε)tij , (1 + ε)tij) and ((1 − ε)djr, (1 + ε)djr) for some
error ε ∈ [0, 1].
Figures 3.11 and 3.12 compare the costs for the cases with known and unknown task pro-
cessing times and communication times. The labels marked ’unknown’ refer to the cases where
we know just the estimates and not the exact times. The labels marked ’known’ refers to the
case where we know the exact times. For Figure 3.11, we consider two randomly-generated
applications with deadlines L1 = 0.6 and L2 = 0.8. We fix error ε = 0.15, and run ITAGS and
the discretization heuristic for multiple realizations. We see that ITAGS performs consistently
better than the discretization heuristic for every realization, for both known and unknown cases.
Additionally, we see that for some realizations, ITAGS and the discretization heuristic provide
the same solution for the ’known’ scenario but discretization heuristic provide a worse solution
for the ’unknown’ scenario. This implies that ITAGS is comparatively a more robust algorithm.
Figure 3.12 depicts the cost performance versus error. For each value of error ε, we run
several realizations and average the performance. We plot the 95% confidence intervals to
understand the variation in cost values for a particular value of error, and consequently gain
insight regarding the robustness of the schemes. Figure 3.12a considers a single Gaussian
elimination application with a deadline L = 0.6, and Figure 3.12b considers two randomly-
generated applications with deadlines L1 = 0.6 and L2 = 0.8. We see that ITAGS exhibits better
cost performance than the discretization heuristic for both known and unknown cases. We also
see that the confidence intervals for ITAGS are smaller than that of the discretization heuristic
on average, which implies that ITAGS is more robust to variation in processing/communication
times.
Chapter 3. Offloading Dependent Tasks with Communication Delay 40
3.5 Summary
We study the scheduling of applications consisting of dependent tasks on heterogeneous pro-
cessors with communication delay and application completion deadlines. The proposed cost
minimization formulation is generic, allowing different cost structures and processor topologies.
To overcome the obstacles of task dependency and deadline constraint, we have developed the
ITAGS approach, where the scheduling of each task is assisted by an individual time allowance
obtained from a binary-relaxed version of the original optimization problem. Through trace-
driven and randomized simulations, we show that ITAGS substantially outperforms a wide
range of known algorithms. Furthermore, as the deadline constraint is relaxed, it converges to
optimality much faster than other alternatives.
Chapter 4
Multi-user Task Scheduling with
Budget Constraints
In this chapter, we study task scheduling and offloading in a cloud computing system with
multiple users where tasks have different processing times, release times, communication times,
and weights. Each user may schedule a task locally or offload it to a shared cloud with het-
erogeneous processors by paying a price for the resource usage. Our work aims at identifying
a task scheduling decision that minimizes the weighted sum completion time of all tasks, while
satisfying the users’ budget constraints.
Our main contributions are summarized below:
• We first consider the problem where all tasks are available at time zero and communication
times are negligible. We formulate an interval-indexed ILP inspired by [25]. Using a
relaxed LP-solution, we obtain an integer solution that is shown to provide a constant-
factor approximation to the minimum weighted sum completion time. Even though this
integer solution violates the budget constraints, we make an interesting observation that
the average budget violation decreases with respect to the number of users.
• Based on the above observation, we propose an algorithm termed Single Task Unload for
Budget Resolution (STUBR). In addition to finding a relaxed and rounded LP-solution for
the above interval-indexed ILP, STUBR resolves budget violations. We prove performance
bounds for this budget-resolved solution. We then use a greedy task ordering scheme on
each processor to further reduce the weighted sum completion time. We also study the
computational complexity of STUBR.
• We then extend STUBR to more practical models (a) with task release times, (b) with
fixed communication times, and (c) with sequence-dependent communication times, i.e.,
considering a finite-capacity channel model where tasks must be sequenced and commu-
nicated. We obtain performance bounds for these cases as well.
41
Chapter 4. Multi-user Task Scheduling with Budget Constraints 42
• Our trace-driven simulation shows that STUBR performs consistently better than the
existing alternatives. It exhibits maximum performance gains of more than 50% for both
chess and compute intensive applications [22] in comparison with the Greedy Weighted
Shortest Processing Time (WSPT) scheme. Finally, our simulation results demonstrate
that STUBR is highly scalable with respect to the number of users in the system.
The general problem of minimizing the weighted sum completion time on a single processor
has been well studied [65], but few works in the literature have considered the same objective
to schedule tasks in a multi-processor cloud environment. In [40], the authors proposed an Ant
Colony Optimization based algorithm to solve this NP-hard problem. The same objective was
also considered in [75] for scheduling coflows in data center networks and approximation algo-
rithms were proposed. In [25], the authors considered this objective for scheduling tasks with
release times on parallel processors, and proposed an 8-approximation algorithm. Our solu-
tion approach is inspired by [25]. However, our problem, in addition to considering a weighted
sum completion time objective and task release times, also accounts for multiple users, and
per-user budget constraints, which renders our problem more challenging than those addressed
in [25,40,65,75]. Additionally, in this work, we also address the proposed problem under more
generic task communication time models. The existing works that consider sum completion
time objective for multiprocessor environments, i.e., [25, 40, 65, 75] do not consider communi-
cation time, and cannot be easily extended to accommodate a finite-capacity communication
channel model as considered in this work.
The rest of the chapter is organized as follows. Section 4.1 describes the system model
and the problem formulation. In Section 4.2, we propose the STUBR algorithm, and present
performance guarantees. In Section 4.3, we extend this to the problem with release times, fixed
communication times, and with sequence-dependent communication times. Section 4.4 presents
the simulation results, and we summarize the chapter in Section 4.5.
4.1 System Model and Problem Formulation
In this section, we present details of the system model and problem formulation. Initially,
we consider the problem of scheduling immediately available tasks to heterogeneous processors
under user budget constraints. In Section 4.3, we extend this to the case where tasks have
release times, fixed communication times, and sequence-dependent communication times.
4.1.1 System Model
Processors and Tasks
We consider a system with N user/mobile devices. Each user i ∈ {1, . . . , N} wishes to complete
a set of independent tasks, denoted by Ji. Each user has its own unary local processor, i.e., it
can execute only one task at a time. This assumption is without loss of generality, as allowing
Chapter 4. Multi-user Task Scheduling with Budget Constraints 43
Figure 4.1: Example system of 3 users and 5 cloud processors.
Table 4.1: Chapter 4 Notations
Notation Description
tj local processing time of task j
cj communication time of task j
tRj release time of task j
wj weight of task j
αir speed-up achieved by user i’s tasks on processor r
βr cost per unit time to utilize processor r
Bi budget of user i
Ji set of tasks user i wishes to execute
R set of all processors (local and cloud)
C set of cloud processors
Ri set of cloud processors and user i’s local processor
R′ set of machine-interval processors
N total number of users
(τl−1,τl) time interval l
L number of intervals
Chapter 4. Multi-user Task Scheduling with Budget Constraints 44
multiple tasks to share a processor simultaneously will not provide any improvement to our
weighted sum completion time objective. The system includes a finite-capacity cloud consisting
of a number of heterogeneous processors that run at different speeds. Let C be the set of cloud
processors. Each processor at the cloud is assumed to be unary. Similar to the local processor
scenario, this unary-capacity assumption is also without loss of generality. An example of such
a system is illustrated in Figure 4.1.
The processing time for each task j ∈ Ji is tj on user i’s local processor. The speed-up
factor for each cloud processor r is αir ≥ 0, so that the processing time for task j at processor r
is αirtj . Each user can execute its tasks either locally or remotely at one of the cloud processors.
The processing times may be obtained by applying a program profiler as shown in experimental
studies such as MAUI [7], Clonecloud [31], and Thinkair [12]. In this work, we proceed assuming
that such information is already given. We also consider a weight wj associated with each task,
to signify the relative urgency of certain tasks with respect to the others. For notation simplicity,
we further define R as the set of all processors (including all users’ local processors), Ri as the
set of processors to which user i can offload its tasks, i.e., its own local processor and cloud
processors.
User Budget
The users are required to pay a certain price per unit time to use the processors at the cloud,
but no price to execute tasks locally on their own device. Let βr be the cost per unit time for
executing a task on processor r. Each user i has a budget Bi that determines the total expense
that the user is willing to incur for offloading tasks to the cloud.
Release Times and Communication Times
Each task j has a release time tRj , i.e., the time at which the task j becomes available at the local
processor. Furthermore, each task may require some input data that needs to be communicated
if the task is to be executed at the cloud. The time to transmit the input data for task j to the
cloud is given by cj . We consider two different communication models:
1. Fixed Communication Times: The input data for each task j can be transmitted to
the cloud as soon as the task is available. Hence, the communication delay for task j is
simply cj . Hence, the overall communication delay for task j is the sum of transmission
times of itself and all tasks before it. This allows us to model a communication link with
a large number of channels.
2. Sequence-dependent Communication Times: The input data for each task cannot
be transmitted as soon as the task is available. We assume that the data is transmitted
to the scheduled processor one task at a time, i.e., the channel to a processor is unary.
This allows us to model a communication link with finite capacity.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 45
4.1.2 Problem Formulation
For clarity of presentation, initially we consider the case where all tasks are released and avail-
able at time zero, and the links between processors are fast enough so that the communication
delay between them is negligible. This leads to the problem formulation in this section and
the corresponding STUBR algorithm in Section 4.2. We will provide details on how they are
extended to the cases non-zero task release times, fixed communication times, and sequence-
dependent communication times in Section 4.3.
We wish to identify the task scheduling decision that minimizes the weighted sum completion
time of all tasks subject to user budget constraints. We formulate the proposed problem by
using an interval-indexing method proposed in [25]. Towards this end, we divide the time axis
into intervals (τl − 1, τl), where τ0 = 1 and τl = 2l−1, for l ∈ 1, ..., L, where L is the smallest
integer such that2L−1 ≥
∑j
tj .
This means that 2L−1 is a sufficiently large time horizon for the scheduling of all given tasks
since it accounts for the worst-case completion time∑
j tj . The task scheduling decision de-
termines the processors where each task should be scheduled, as well as the order of the tasks.
We define decision variables {xjrl} where xjrl = 1 if and only if task j finishes execution on
processor r in time interval l ∈ {1, . . . , L}. Such an approach reduces the number of variables
in our formulation in comparison with a time-indexed formulation with constant-size intervals,
making it computationally tractable, with a small penalty in the precision of quantifying the
optimization objective. The optimization problem is defined below.
min{xjrl}
N∑i=1
∑j∈Ji
wj∑r∈Ri
L∑l=1
τl−1xjrl, (4.1)
s.t.L∑l=1
∑r∈Ri
xjrl = 1, ∀i ∈ {1, . . . , N}, j ∈ Ji, (4.2)
N∑i=1
∑j∈Ji
αirtjxjrl ≤ τl, ∀r ∈ R, l ∈ {1, . . . , L}, (4.3)
∑j∈Ji
L∑l=1
∑r∈Ri
βrαirtjxjrl ≤ Bi, ∀i ∈ {1, . . . , N}, (4.4)
xjrl = 0, if τl < αirtj , ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ C, l ∈ {1, . . . , L}, (4.5)
xjrl = 0, if τl < tj , ∀i ∈ {1, . . . , N}, j ∈ Ji, r /∈ C, l ∈ {1, . . . , L}, (4.6)
xjrl = 0, if Bi < βrαirtj , ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}, (4.7)
xjrl ∈ {0, 1}, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}. (4.8)
Chapter 4. Multi-user Task Scheduling with Budget Constraints 46
The objective (4.1) is to minimize the weighted sum completion times of tasks across all users.
Constraint (4.2) ensures that every task is assigned to exactly one processor and one interval.
Constraint (4.3) enforces that for each interval l, the total load on every processor r cannot
exceed τl. Equation (4.4) enforces the budget constraints for each user. Equations (4.5)-(4.7)
ensure that individual tasks do not exceed the τl interval deadline and the budget. Constraint
(4.8) forces the decision variables to take on binary values.
Remark 1. One may note that τl−1 is a lower bound to the completion time of a task completing
in interval l, and consequently, (4.1) is a lower bound to the weighted sum completion time.
In Sections 4.2 and 4.3, we present algorithms that provide worst-case performance guarantees
in terms of constant factors above this lower-bound objective. Hence, the same algorithms also
have at least the same worst-case performance guarantees with respect to the optimal weighted
sum completion time.
4.2 The STUBR algorithm
In this section, we present the STUBR algorithm to solve problem (4.1). We then prove
some guarantees and properties of this algorithm, to better understand its functionality and
performance.
STUBR has the following steps:
1. Relax the integer constraints in problem (4.1) and obtain a relaxed solution.
2. Round this solution to obtain an integer solution that gives an objective value that is
no higher than 8 times the objective value achieved by the relaxed solution, and thus is
also no higher than 8 times of the optimal objective value of problem (4.1). While this
rounded solution is expected to violate the budget, we prove that the average cost over a
large number of users meets the average user budget.
3. We resolve any budget violation by strategically moving some tasks to the local device.
4. To further reduce the total weighted completion time, we note that the well-known WSPT
is optimal for a single processor and jobs without release times. Hence, on each processor,
we reorder the tasks allocated to it by WSPT ordering.
These steps are explained in detail in the following sections.
4.2.1 Relaxed Solution
For each user i ∈ {1, . . . , N}, j ∈ Ji, and r ∈ Ri, let pjr and bjr be the processing times and
costs for scheduling task j on processor r. For our initial plain model with no release times and
Chapter 4. Multi-user Task Scheduling with Budget Constraints 47
communication times, we have
pjr :=
αirtj if r ∈ C,
tj otherwise,(4.9)
bjr :=
βrαirtj if r ∈ C,
0 otherwise.(4.10)
Using (4.9) and (4.10), we reformulate the optimization problem in Section 4.1.2, and relax the
integer constraints to obtain the following linear program.
min{xjrl}
N∑i=1
∑j∈Ji
wj∑r∈Ri
L∑l=1
τl−1xjrl, (4.11)
s.t.
L∑l=1
∑r∈Ri
xjrl = 1, ∀i ∈ {1, . . . , N}, j ∈ Ji, (4.12)
N∑i=1
∑j∈Ji
pjrxjrl ≤ τl, ∀r ∈ R, l ∈ {1, . . . , L}, (4.13)
∑j∈Ji
∑r∈Ri
L∑l=1
bjrxjrl ≤ Bi, ∀i ∈ {1, . . . , N}, (4.14)
xjrl = 0, if τl < pjr, ∀i ∈ {1, . . . , N}, r ∈ R, l ∈ {1, . . . , L}, (4.15)
xjrl = 0, if Bi < bjr, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}, (4.16)
xjrl ≥ 0, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}. (4.17)
The above linear program can be solved efficiently in polynomial-time to obtain a relaxed
solution to the problem (4.1). This formulation also resembles the LP-relaxed version of the
problem minimizing the weighted sum completion time in a system of unrelated1 machines,
i.e. R||∑wjCj in the standard parallel-processor-scheduling notation as formulated in [25].
However, our formulation has additional budget constraints, (4.14) and (4.16), that need to be
met for each user. It also accommodates multiple users unlike the formulation in [25]. These
aspects render our formulation a more complex one requiring more sophisticated techniques,
for recovering an integer solution and resolving budget overage.
4.2.2 Rounded Solution
In [25], the authors used a method proposed in [66] for solving the generalized assignment
problem, and obtained an integer solution for their problem at hand. However, our formulation
1In the mode of unrelated machines, the processing times of a task on any two machines are independent.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 48
renders a further constrained version of the generalized assignment problem due to the budget
constraints. We therefore extend the method proposed in [66] to obtain a rounded solution to
problem (4.1). We study the behavior of this solution and later employ techniques to improve
this solution for our problem. We also provide worst-case performance and incurred cost guar-
antees. Additionally, we study the behavior of the average incurred cost as the number of users
increases in the system.
Rounding Technique
We first convert the LP-solution xjrl to xjr′ , where each machine-interval pair (r, l) is viewed as
a single virtual processor r′ ∈ R′, such that R′ is the set of machine-interval processors. This
facilitates the application of the rounding method proposed in [66] to our problem.
The rounding technique lists the tasks in non-increasing order of pjr′ , for r′ ∈ R′, and
constructs a bipartite fractional matching. A fractional matching between task nodes and
machine nodes assigns each task node partially to multiple machine nodes, and all allocated
fractions for a particular task node should sum up to 1. Let f(vr′s,uj) denote the fractional
matching between task nodes uj , for j ∈ Ji, i ∈ {1, . . . , N}, and machine nodes vr′s, for r′ ∈ R′,s ∈ {1, . . . , kr′}, and kr′ = d
∑j xjr′e. This is constructed in accordance with the following:
xjr′ =∑
s:(vr′s,uj)∈E
f(vr′s, uj), ∀r′ ∈ R′, j ∈ Ji, i = {1, . . . , N}, (4.18)
∑j:(vr′s,uj)∈E
f(vr′s, uj) = 1, ∀r′ ∈ R′, s = {1, . . . , (kr′ − 1)}, (4.19)
where E is the set of edges of the bipartite graph. This fractional matching is then converted
to a minimum cost integer matching where each task is assigned to a single machine node.
For our problem, this would be equivalent to a weighted sum completion time integer match-
ing. We call this integer solution x = {xjrl}. This integer matching solution, however, is likely
to violate the interval deadline τl constraints as well as the user budget constraints, since the
relaxed solution that meets these constraints has been rounded. We now analyze the extent to
which these constraints could be violated, and the resulting performance guarantee.
Interval Deadline Violation and Performance Guarantee
Lemma 1. With the rounded solution, the total processing time of all tasks for every r′ ∈ R′ =(r, l), and interval l ∈ {1, . . . , L} cannot be worse than 2τl, i.e., constraint (4.13) is violated by
at most τl.
Proof. For each machine node vr′s, let the maximum possible processing time be
pmaxr′s = max
j:(vr′s,uj)∈Epjr′ , (4.20)
Chapter 4. Multi-user Task Scheduling with Budget Constraints 49
and minimum possible processing time be
pminr′s = min
j:(vr′s,uj)∈Epjr′ . (4.21)
Consequently, pminr′s ≥ pmax
r′(s+1), since tasks are allocated in non-increasing order of pjr′ while
constructing the fractional bipartite matching. Along the lines of the proof in [66], we have for
each r′ ∈ R′,
kr′∑s=2
pmaxr′s ≤
kr′−1∑s=1
pminr′s (4.22)
≤kr′−1∑s=1
∑j:(vr′s,uj)∈E
pjr′f(vr′s, uj) (4.23)
≤kr′∑s=1
∑j:(vr′s,uj)∈E
pjr′f(vr′s, uj) (4.24)
=N∑i=1
∑j∈Ji
pjr′xjr′ ≤ τl. (4.25)
Furthermore, pmaxr′1 ≤ τl,∀r′, from (4.15). Hence, we have
N∑i=1
∑j∈Ji
pjrxjrl =N∑i=1
∑j∈Ji
pjr′xjr′ (4.26)
≤kr′∑s=1
pmaxr′s ≤ 2τl, ∀r ∈ R, l = {1, . . . , L}. (4.27)
We derive an approximation ratio for the integer matching x which is presented in the
following theorem.
Theorem 1. The objective value of the rounded solution obtained from the integer matching x
cannot be worse than 8 times the optimal objective of problem (4.1).
Proof. We define new intervals τl := 2l+1, ∀l = {1, . . . , L}. From Lemma 1, we can see every
task j that was scheduled in the lth interval will be completed by time τl. This is because
τl − τl−1 ≤ 2τl, and from Lemma 1, we know that the total processing time for the tasks
assigned to the lth interval does not exceed 2τl. Let the contribution to the objective by task
j be given by Orelaxj and Oround
j in the relaxed solution and the rounded solution, respectively.
If task j is scheduled to complete in interval l, we have
Orelaxj = wjτl−1 (4.28)
Chapter 4. Multi-user Task Scheduling with Budget Constraints 50
= wj2l−2. (4.29)
Similarly, for the rounded solution, we have
Oroundj ≤ wjτl (4.30)
≤ wj2l+1 (4.31)
≤ wj2l−223 (4.32)
≤ 8Orelaxj . (4.33)
This implies that
N∑i=1
∑j∈Ji
Oroundj ≤
N∑i=1
∑j∈Ji
8Orelaxj . (4.34)
We see that the rounded objective value is at most 8 times the relaxed solution, and hence,
at most 8 times the optimal objective of problem (4.1) since the relaxed solution by definition
returns an objective value that is below the optimal objective.
Multiple Users and Incurred Cost Guarantees
Since our problem also accounts for multiple users and budget constraints, we wish to evaluate
the performance of this rounded solution with respect to these parameters.
Theorem 2. With the rounded solution, the sum of the incurred cost of all users cannot be
worse than (|R′|+ 1) times the sum of user budgets.
Proof. Let bmaxr′s and bmin
r′s be the maximum and minimum possible costs at machine node (r′,s),
respectively. For each processor r′, we have pminr′s ≥ pmax
r′(s+1), as explained in 1. Consequently,
we have bminr′s ≥ bmax
r′(s+1) for our model from (4.9) and (4.10). Then, we have
N∑i=1
∑r′∈R′
kr′∑s=2
bmaxr′s ≤
N∑i=1
∑r′∈R′
kr′−1∑s=1
bminr′s (4.35)
≤N∑i=1
∑r′∈R′
kr′−1∑s=1
∑j:(vr′s,uj)∈E
bjr′f(vr′s, uj) (4.36)
≤N∑i=1
∑r′∈R′
kr′∑s=1
∑j:(vr′s,uj)∈E
bjr′f(vr′s, uj) (4.37)
=
N∑i=1
∑r′∈R′
∑j∈Ji
bjr′xjr′ ≤N∑i=1
Bi. (4.38)
Hence, if we take out the tasks allocated to machine nodes vr′1 for every r′ ∈ R′, the remaining
Chapter 4. Multi-user Task Scheduling with Budget Constraints 51
tasks have a sum cost that is less than the sum of user budgets. There are at most |R′| such
tasks. Furthermore, for each r′ ∈ R′ and j such that (vr′1,uj) ∈ E , we know from (4.16) that
bjr′ ≤ bmaxr′1 ≤
N∑i=1
Bi. (4.39)
Hence we have
N∑i=1
∑j∈Ji
∑r∈R
L∑l=1
bjrxjrl =N∑i=1
∑j∈Ji
∑r′∈R′
L∑l=1
bjr′xjr′ (4.40)
≤N∑i=1
∑r′∈R′
kr′∑s=1
bmaxr′s ≤ (|R′|+ 1)
N∑i=1
Bi. (4.41)
Remark 2. We see from the above at most one task on each interval-processor violates the
sum of the user budgets. Consequently, we can find a subset of at most |R′| tasks that violate
the sum of the user budgets.
The following conclusions follow directly from Theorem 2.
Corollary 1. If bjr is independent of task j, let br = bjr. We further define S = {r ∈ R′ :
∃ j, xjr = 1}. Then, we have
N∑i=1
∑j∈Ji
∑r∈R
L∑l=1
brxjrl ≤N∑i=1
Bi +∑r∈S
br. (4.42)
Corollary 2. If Ci is the incurred cost for user i,
1
N
N∑i=1
Ci ≤1
N
N∑i=1
Bi +1
N|R′|Bmax, (4.43)
where Bmax = maxiBi, and for the specific case from Corollary 1,
1
N
N∑i=1
Ci ≤1
N
N∑i=1
Bi +1
N
∑r∈S
br. (4.44)
If we increase the number of users N , the total processing time increases, and consequently,
the number of intervals L increases. But we note that since the interval size increases expo-
nentially, the number of intervals L only increases logarithmically. Additionally, the number
of processors |R| is fixed. This implies that |R′| = L|R| increases more slowly in comparison
with N . Hence, we can see that as N → ∞, the second term on the right-hand side of (4.43)
approaches zero, leading to Corollary 3.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 52
Corollary 3. As N →∞, the average cost incurred across all users meets the average budget.
Thus, the average user cost performance improves as the number of users in the system
increases. This property indicates that the proposed algorithm is highly scalable and is a
suitable choice for multi-user systems.
4.2.3 Dealing with Budget Violation
Even if the budget constraints are met on average, the budget constraints for each individual
user could still be violated. In cases where the users expect strict budget constraints, we need to
identify a technique by which this rounded solution can be modified to ensure that each user’s
budget is met, while not significantly affecting the weighted sum completion time. Since there
is no budget constraint on executing tasks on a user’s local device, we propose the following
technique to move certain tasks to the local device in the event of a budget violation:
1. Check if budget is violated for user i.
2. If so, sort all its offloaded tasks, {j ∈ Ji | xjrl = 1, ∀r ∈ C}, in non-decreasing order of
wjtj . We do this as we expect a task with smaller weight and smaller local processing
time does lesser damage to the weighted sum completion time objective when transferred
to the local device.
3. Start with the first task (with least wjtj) and schedule it on the local device. Update the
incurred cost of user i by subtracting the previously incurred cost of this task.
4. If incurred cost now meets the budget, stop. If not, repeat Steps 2 and 3 until the budget
is met.
5. Repeat for all users.
6. Once every user meets its budget, we apply our modified WSPT (presented in Section
4.2.4) on all processors.
Now we wish to understand the impact of moving tasks to the local device to meet the
budget on the performance.
Theorem 3. The objective value of the final solution is at most 2dlog2(2+ 1a
)e+2 times the optimal
solution, where a = mini,r αir is the minimum value of speed-up factor in the system.
Proof. We know, from Remark 2, that for every interval l, at most one task from every cloud
processor needs to be moved to the local device, and this task has a maximum processing time
of τl. We also know, from Lemma 1, that total processing time on a local processor for the
tasks assigned to the lth interval does not exceed 2τl. Furthermore, from (4.15), the processing
time of a task scheduled to finish in interval l cannot exceed τl. Thus, after moving a task
belonging to user i from cloud processor r to user i’s local processor, the total processing time
Chapter 4. Multi-user Task Scheduling with Budget Constraints 53
on the local processor will be (2 + 1αir
)τl, in the worst case. Since this task that we move back
may belong to any user, this value will be at most (2 + 1a)τl, as a = mini,r αir is the minimum
value of speed-up factor. In other words, we now have
N∑i=1
∑j∈Ji
pjrxjrl ≤ (2 +1
a)τl, ∀r ∈ R, l = {1, . . . , L}. (4.45)
We need to redefine τl defined in Theorem 1 such that every task that is assigned to the lth
interval may be run entirely within the interval (τl−1, τl). In other words, we need
τl − τl−1 ≤ (2 +1
a)τl. (4.46)
Towards this end, we set x = log2
(2 + 1
a
)+ 1, and τl = 2xτl = 2x2l−1 = 2x+l−1.
We now get, for every task j,
Oroundj
Orelaxj
≤ wjτlτl−1
≤ 2x+1 ≤ 2log2(2+ 1a)+2 (4.47)
Thus, the objective value of the final solution is at most 2dlog2(2+ 1a
)e+2 times the optimal solution.
4.2.4 WSPT Ordering
From the above, we obtain a scheduling decision for every task that specifies on which processor
the task should be executed. Some processors will be assigned multiple tasks. We know that
the WSPT ordering is optimal for the weighted sum completion time objective for a single
processor and jobs without release times [76]. Thus, we perform a WSPT ordering on the tasks
allocated to a particular processor to further improve our objective value as follows:
1. Step 1: Obtain the task scheduling decision, i.e., the processor on which each task should
be scheduled.
2. Step 2: On each processor r ∈ R, order the scheduled tasks in the non-decreasing order
ofpjrwj
. This ensures that tasks with smaller weights and longer completion times (without
accounting for wait times) are scheduled earlier.
3. Step 3: Modify the task completion times correspondingly, and obtain the new objective
value.
4.2.5 Feasibility and Complexity Analysis
It can be readily noted that the STUBR algorithm provides a feasible solution. In other words,
the user budgets are always met, and all the tasks are always scheduled. Thus, in the worst
case with extremely tight budgets, the algorithm will execute all tasks locally.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 54
The time complexity of STUBR is dominated by the LP-solving step (in Section 4.2.1) and
the rounding step (in Section 4.2.2) that involves finding the weighted sum completion time
bipartite matching. An LP can be solved in O(n3.5) time where n is the number of variables [77].
For our problem, this would imply that the time complexity for solving the LP is O((P |R|L)3.5),
where P =∑N
i=1 |Ji| is the total number of tasks. On the other hand, bipartite matching can
be solved in cubic time in the number of vertices by utilizing the Hungarian algorithm, proposed
in [78]. If P > |R|, the time complexity of this step is O(P 3). Thus, we see that the overall
worst-case time complexity of STUBR is O((P 2L)3.5).
4.3 STUBR Extensions
In this section, we consider the models with release times, fixed communication times, and
sequence-dependent communication times, introduced in Section 4.1.1.
4.3.1 With Task Release Times
STUBR can also be applied to solve the problem of scheduling tasks with release times. We
reformulate problem (4.11) to incorporate release times tRj for every task j as follows.
min{xjrl}
N∑i=1
∑j∈Ji
wj∑r∈Ri
L∑l=1
τl−1xjrl, (4.48)
s.t.L∑l=1
∑r∈Ri
xjrl = 1, ∀i ∈ {1, . . . , N}, j ∈ Ji, (4.49)
N∑i=1
∑j∈Ji
pjrxjrl ≤ τl, ∀r ∈ R, l ∈ {1, . . . , L}, (4.50)
∑j∈Ji
∑r∈Ri
L∑l=1
bjrxjrl ≤ Bi, ∀i ∈ {1, . . . , N}, (4.51)
xjrl = 0, if τl < tRj + pjr, ∀i ∈ {1, . . . , N}, r ∈ R, l ∈ {1, . . . , L}, (4.52)
xjrl = 0, if Bi < bjr, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}, (4.53)
xjrl ≥ 0, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}. (4.54)
On applying the same rounding method proposed in Section 4.2.2, we can easily see that
Lemma 1 is satisfied for this case as well. Additionally, we can also extend the results in
Theorem 1 to prove the following.
Theorem 4. The objective value of the rounded solution obtained from the integer matching x
cannot be worse than 8 times the optimal objective of problem (4.48).
Chapter 4. Multi-user Task Scheduling with Budget Constraints 55
Proof. Every task that is assigned to the lth interval may be run entirely within the interval
(τl−1, τl), where τl := 2(l+1). This is because τl − τl−1 ≤ 2τl, and from Lemma 1, we know
that the total processing time for the tasks assigned to the lth interval does not exceed 2τl.
Additionally, every task j that is assigned to the lth interval will have been released by τl−1
because τl−1 > τl > tRj + pjr > tRj . Thus, similar to Theorem 1, we see that the rounded
objective value is at most 8 times the relaxed solution, and hence, at most 8 times the optimal
objective of problem (4.48) since the relaxed solution by definition returns an objective value
that is below the optimal objective.
We can also see that Theorem 2 and the corresponding corollaries are satisfied for this case.
We apply the budget resolution technique proposed in Section 4.2.3 and can easily see that
Theorem 3 can be proved for this case as well. Additionally, we also apply a modified WSPT
similar to that in Section 4.2.4 that we call m-WSPT ordering, by accommodating task release
times. We do this by scheduling tasks in the non-decreasing order oftRj +pjrwj
in Step 2.
4.3.2 With Fixed Communication Times
We can further extend the solution in Section 5.1 to the case where every task j has release
time tRj and communication time cj . This is equivalent to defining a new per processor release
time tRjr, to be the release times for scheduling task j on processor r, as follows:
tRjr :=
tRj + cj if r ∈ C,
tRj otherwise,(4.55)
Thus, the new version of problem (4.11) becomes
min{xjrl}
N∑i=1
∑j∈Ji
wj∑r∈Ri
L∑l=1
τl−1xjrl, (4.56)
s.t.L∑l=1
∑r∈Ri
xjrl = 1, ∀i ∈ {1, . . . , N}, j ∈ Ji, (4.57)
N∑i=1
∑j∈Ji
pjrxjrl ≤ τl, ∀r ∈ R, l ∈ {1, . . . , L}, (4.58)
∑j∈Ji
∑r∈Ri
L∑l=1
bjrxjrl ≤ Bi, ∀i ∈ {1, . . . , N}, (4.59)
xjrl = 0, if τl < tRjr + pjr, ∀i ∈ {1, . . . , N}, r ∈ R, l ∈ {1, . . . , L}, (4.60)
xjrl = 0, if Bi < bjr, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}, (4.61)
xjrl ≥ 0, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}. (4.62)
Chapter 4. Multi-user Task Scheduling with Budget Constraints 56
We can see that the fixed communication times are incorporated in constraint (4.60). On
applying the same rounding method proposed in Section 4.2.2, we can easily see that Lemma
1 is satisfied for this case as well. Additionally, we can also extend the results in Theorem 4 to
prove the following.
Theorem 5. The objective value of the rounded solution obtained from the integer matching x
cannot be worse than 8 times the optimal objective of problem (4.56).
Proof. The proof is similar to the proof of Theorem 4, except that we note that every task
j that is assigned to the lth interval, i.e., (τl−1, τl) will have been released by τl−1 because
τl−1 > τl > tRjr + pjr > tRjr.
We see that Theorem 2, the corresponding corollaries, as well as Theorem 3 can be proved for
this case as well. Additionally, we can accommodate both task release times and communication
times by scheduling tasks in the non-decreasing order oftRjr+pjrwj
in Step 2 of m-WSPT proposed
in Section 4.2.4.
4.3.3 With Sequence-dependent Communication Times
Modified Channel Model and Problem Formulation
In a more practical model of a communication channel with finite channel capacity, the input
data is communicated to the scheduled processor one task at a time. To extend STUBR to this
more complicated scenario, we introduce the following new decision variables:
xjrpl :=
1 task j is communicated at interval p
to processor r and executed at interval l,
0 otherwise,
(4.63)
We define communication times for a task j on a processor r as
cjr :=
cj if r ∈ C,
0 otherwise,(4.64)
It should be noted that, under this channel model, the release time of a task j at the local
device is tRj , but the release time of the task at a cloud processor is now determined by the
sequence in which tasks are communicated to this processor.
Relaxed Solution
Incorporating (4.63), (4.64), and the consideration of sequence-dependent communication times
into the first step of STUBR, we first solve the following optimization problem to obtain an
Chapter 4. Multi-user Task Scheduling with Budget Constraints 57
LP-relaxed solution.
min{xjrpl}
N∑i=1
∑j∈Ji
wj∑r∈Ri
L∑l=1
τl−1
L∑p=1
xjrpl, (4.65)
s.t.L∑l=1
∑r∈Ri
l∑p=1
xjrpl = 1, ∀i ∈ {1, . . . , N}, j ∈ Ji, (4.66)
N∑i=1
∑j∈Ji
l∑p=1
pjrxjrpl ≤ τl, ∀r ∈ R, l ∈ {1, . . . , L}, (4.67)
N∑i=1
∑j∈Ji
L∑l=p
cjxjrpl ≤ τp, ∀r ∈ C, p ∈ {1, . . . , L}, (4.68)
∑j∈Ji
∑r∈Ri
L∑l=1
L∑p=1
bjrxjrl ≤ Bi, ∀i ∈ {1, . . . , N}, (4.69)
L∑l=1
xjrpl = 0, if τp < tRj + cj ,∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ C, p ∈ {1, . . . , L}, (4.70)
L∑p=1
xjrpl = 0, if τl < tRj + pjr, ∀i ∈ {1, . . . , N}, j ∈ Ji, r /∈ C, l ∈ {1, . . . , L}, (4.71)
xjrpl = 0, if τl < τp−1 + pjr and τp > tRj + cj ,
∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ C, p ∈ {1, . . . , L}, l ∈ {1, . . . , L}, (4.72)
xjrpl = 0, if Bi < bjr, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}, p ∈ {1, . . . , L},(4.73)
xjrpl ≥ 0, ∀i ∈ {1, . . . , N}, j ∈ Ji, r ∈ R, l ∈ {1, . . . , L}, (4.74)
Constraint (4.68) enforces that for each interval p the total load on the channel cannot exceed τp.
Equations (4.70) and (4.71) ensure that individual tasks do not exceed the τl and τp separately.
Constraint (4.72) ensures that a task cannot be communicated in p and executed in l even if
the task can be communicated by τp but it cannot be executed by τl.
Rounded Solution
We convert the LP-solution xjrpl,∀j, r to
yjr′ =L∑p=1
xjrpl (4.75)
where each (r, l) pair is viewed as a single virtual processor r′, and
zjr =L∑l=1
xjrpl (4.76)
Chapter 4. Multi-user Task Scheduling with Budget Constraints 58
where each (r, p) pair is viewed as a single virtual processor r. We then apply the rounding
technique proposed in Section 4.2.2 to both yjr′ and zjr separately.
Lemma 2. With the rounded solution, the total processing time of all tasks for every r′ ∈ R′
and interval l ∈ {1, . . . , L} cannot be worse than 2τl, i.e., constraint (4.67) is violated by at
most τl.
Proof. From inequality (4.24) of Lemma 1 and using constraint (4.67), we can see that for each
r′ ∈ R′,
kr′∑s=2
pmaxr′s ≤
N∑i=1
∑j∈Ji
pjr′yjr′ (4.77)
≤N∑i=1
∑j∈Ji
L∑p=1
pjr′yjr′ ≤ τl. (4.78)
Furthermore, pmaxr′1 ≤ τl, ∀r′, from (4.71), (4.72) and (4.74). Hence, we have
N∑i=1
∑j∈Ji
L∑p=1
pjrxjrpl ≤ 2τl, ∀r ∈ R, l = {1, . . . , L}. (4.79)
Lemma 3. With the rounded solution, the total communication time of all tasks for every
r ∈ R and interval p ∈ {1, . . . , L} cannot be worse than 2τp, i.e., constraint (4.68) is violated
by at most τp.
Proof. For each machine node vr′s, let the maximum possible communication time be
cmaxrs = max
j:(vrs,uj)∈Ecj , (4.80)
and minimum possible communication time be
cminrs = min
j:(vrs,uj)∈Ecj . (4.81)
From inequality (4.24) of Lemma 1 and using constraint (4.68), we can see that for each r ∈ R,
kr∑s=2
pmaxrs ≤
N∑i=1
∑j∈Ji
cjrzjr (4.82)
≤N∑i=1
∑j∈Ji
L∑l=1
cjrzjr ≤ τp. (4.83)
Chapter 4. Multi-user Task Scheduling with Budget Constraints 59
Furthermore, cmaxr1 ≤ τp,∀r, from (4.70) and (4.74). Hence, we have
N∑i=1
∑j∈Ji
L∑l=1
cjxjrpl ≤ 2τp, ∀r ∈ R, p = {1, . . . , L}. (4.84)
Theorem 6. The objective value of the rounded solution obtained from the integer matching x
cannot be worse than 32 times the optimal objective of problem (4.65).
Proof. We define new communication intervals τp := 2(p+1) = 4τp, similar to Theorem 1. We
can easily show using Lemma 3 that every task j assigned to finish communication in the pth
interval may be run entirely within interval (τp−1, τp). Thus, the actual release time of these
tasks at the cloud processors is τp.
We also define new execution intervals τl := 2(l+3). We can easily show using Lemma 2 that
every task j assigned to finish communication in the lth interval may be run entirely within
interval (τl−1, τl). Every task j that is assigned to the lth interval will have been released by
τl−1 because
τl−1 = 8τl > 8τp−1 + 4pjr > 8τp−1 > 4τp (4.85)
If task j is scheduled to complete in interval l, we have
Orelaxj = wj2
l−2 (4.86)
Similarly, for the rounded solution, we have
Oroundj ≤ wj2l+3 (4.87)
≤ wj2l−225 (4.88)
≤ 32Orelaxj . (4.89)
This implies that
N∑i=1
∑j∈Ji
Oroundj ≤
N∑i=1
∑j∈Ji
32Orelaxj . (4.90)
Thus, we see that the rounded objective value is at most 32 times the relaxed solution, and
hence, at most 32 times the optimal objective of problem (4.56) since the relaxed solution by
definition returns an objective value that is below the optimal objective.
Theorem 2 and the corresponding corollaries on budget violation can be trivially extended
for this case since the costs are only dependent on task processing times.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 60
Dealing with Budget Violation
We apply the same budget resolution technique proposed in Section 4.2.3 in order to ensure
that individual user budgets are met. Consequently, along the lines of Theorem 3, we can prove
the following.
Theorem 7. The objective value of the final solution is at most 2dlog2(2+ 1a
)e+3 times the optimal
solution, where a = mini,r αir is the minimum value of speed-up factor in the system.
Proof. Similar to Theorem 3, we can show that after moving a task belonging to user i back to
the local device, the local total processing time will be at most (2 + 1a)τl, where a = mini,r αir
is the minimum value of speed-up factor. In other words, we now have, using (4.67),
N∑i=1
∑j∈Ji
L∑p=1
pjrxjrpl ≤ (2 +1
a)τl, ∀r ∈ R, l = {1, . . . , L}. (4.91)
We need to redefine τl defined in Theorem 6 such that every task that is assigned to the lth
interval is available for processing by τl−1 and may be run entirely within the interval (τl−1, τl).
Towards this end, we set x = log2
(2 + 1
a
)+ 1, and τl = 2x+l.
We now get, for every task j,
Oroundj
Orelaxj
≤ wjτlτl−1
≤ 2x+2 ≤ 2log2(2+ 1a)+3. (4.92)
Thus, the objective value of the final solution is at most 2dlog2(2+ 1a
)e+3 times the optimal solution.
Remark 3. Even though the proven worst-case bounds look large, from the trace-driven simula-
tion results in Section 4.4, we see that the performance of STUBR in practice is no worse than
4 times the relaxed solution and consequently the optimal, for all models and cases considered.
We do not apply the modified WSPT ordering here as the release times on processors depend
on the sequence of tasks transmitted, and we cannot trivially extend this technique to improve
performance for this case.
For all of the extensions described in this section, using similar arguments as in Section
4.2.5, it can be verified that the STUBR algorithm always provides a feasible solution and has
worst-case time complexity O((P 2L)3.5).
4.4 Trace-driven Simulation
In addition to the worst-case bounds derived in Sections 4.2 and 4.3, in this section, we inves-
tigate the performance of STUBR, using trace-driven simulation. We study the effect of user
budget and number of tasks on algorithm performance. We evaluate STUBR for the model
Chapter 4. Multi-user Task Scheduling with Budget Constraints 61
with release times and fixed communication times described in Section 4.3.2, and for the model
with sequence-dependent release times and communication times described in Section 4.3.3.
4.4.1 Traces and Parameter Setting
In [22], the authors conducted experiments on four different applications, and provided task
characteristics in terms of input data, computation need, and arrival rates. Additionally, they
consider different mobile devices with varying computational capacities. We use the traces from
this paper in order to test our proposed algorithm as follows:
1. We take the computation need and input data given in [22] as mean, and allow a maximum
of ±50 % variation. In other words, we randomly pick these values from a uniform
distribution in (50% mean, 150% mean).
(a) We calculate the mean local processing time tj of tasks.
(b) We calculate the mean communication time of a task by dividing the input data (in
MBytes) by the available data rate, which is 20 Mbps from [22].
2. We pick the release time values from a uniform distribution in the range (0, number of tasksarrival rate (in task/sec)).
3. We pick the task weights from a uniform distribution in the range (0,1).
4. We run multiple randomized iterations (for different values of input data and computation)
for each parameter setting, and take the average among them to plot each point on the
graph.
We run our simulation using MATLAB, and utilize the CVX programming package to solve
our linear programs.
4.4.2 Comparison Targets
We use the following targets for comparison with STUBR algorithm:
• Lower bound : This is the relaxed solution obtained from Section 4.2.1.
• Rounded infeasible: This is the solution obtained from Section 4.2.2, without dealing with
budget violation. We also perform a modified WSPT, proposed in Section 4.2.4, on this
solution. Hence, this solution has an objective value that is a constant times the optimal,
but may violate the user budgets.
• Greedy : All tasks are sorted in the non-decreasing order of weighted local processing timetjwj
for all j, and each task is scheduled in this order onto the processor where it meets its
user’s budget and has the fastest processing time.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 62
0 1 2 3 4 5 620
40
60
80
100
120
140
160
180
User budget
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedyComm. sensitiveRounded infeasibleLower bound
(a) Effect of user budget
8 9 10 11 12 13 14 15 160
100
200
300
400
500
600
Number of tasks per user
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal m−WSPTGreedy WSPTComm. sensitiveRounded infeasibleLower bound
(b) Effect of the number of tasks per user
Figure 4.2: For chess application on Galaxy S5.
0 5 10 15 20 25 3050
100
150
200
250
300
350
400
450
500
550
User budget
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedyComm. sensitiveRounded infeasibleLower bound
(a) Effect of user budget
8 9 10 11 12 13 14 15 160
200
400
600
800
1000
1200
1400
1600
Number of tasks per user
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedy Comm. sensitiveRounded infeasibleLower bound
(b) Effect of the number of tasks per user
Figure 4.3: For compute intensive application on Nexus 10.
• Local processing : All tasks are scheduled locally, and ordered in the non-decreasing order
of weighted local processing timetRj +tjwj
for all j. This would illustrate the benefits of
offloading using our algorithm.
• Comm. sensitive: All tasks are sorted in the non-decreasing order of communication cj for
all j, and each task is scheduled in this order onto the processor where it meets its user’s
budget and has the fastest processing time. This method of sorting tries to offload the
tasks that have shorter communication times thereby decreasing the overall contribution
of communication time to the objective.
4.4.3 For Release Times and Fixed Communication Times
In this section, we evaluate STUBR for the model with release times and fixed communication
times described in Section 4.3.2, for chess and compute-intensive applications presented in [22].
In Figures 4.2a and 4.2b, we consider three Galaxy S5 users and chess applications. We
Chapter 4. Multi-user Task Scheduling with Budget Constraints 63
0 1 2 3 4 5 620
40
60
80
100
120
140
160
180
200
User budget
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedy Comm. sensitiveRounded solutionLower Bound
(a) Effect of user budget
10 12 14 16 18 200
200
400
600
800
1000
1200
1400
1600
Number of tasks per user
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedy Comm. sensitiveRounded infeasibleLower bound
(b) Effect of the number of tasks per user
Figure 4.4: For chess application on Galaxy S5.
consider a five-processor cloud with speed-up factors αi1 = 0.5, αi2 = αi3 = 0.1, and αi4 = αi5 =
0.2 for every user i. We set the processor prices as β1 = 0.5, β2 = β3 = 3, and β4 = β5 = 2.
This parameter setting ensures that the users will have to pay a higher price to use a faster
processor.
For Figure 4.2a, we consider users with equal budget, and constant number of tasks |J1| = 5,
|J2| = 5, and |J3| = 10. This allows us to study the impact of user budget on the weighted
sum completion time and algorithm performance. We see that as the user budget increases,
the weighted sum completion time decreases as expected. We also see that the STUBR curve
appears to plateau beyond a particular value of budget that is large enough to offload all tasks
to the fastest processors. On the other hand, for tighter values of budget, we see that the
STUBR curve coincides with the local execution curve. Additionally, we also note that the gap
between STUBR and the rounded solution decreases with increasing budget until eventually
the STUBR curve meets the rounded solution curve. This illustrates that the amount of budget
violation decreases with increasing budget, and consequently, the STUBR solution approaches
the rounded solution.
In Figure 4.2b, we observe the impact of the number of tasks per user, for user budgets
B1 = B2 = B3 = 5. The total weighted completion time increases with increasing number
of tasks per user (and total number of tasks) as expected. We see that the performance gap
between STUBR and other schemes increases with increasing number of tasks, indicating that
STUBR is more scalable.
In Figures 4.3a and 4.3b, we consider three Nexus 10 users running compute-intensive
applications (as described in [22]). We consider the same five-processor simulation setup as
that of Figures 4.2a and 4.2b. For Figure 4.3a, we consider constant number of tasks |J1| = 5,
|J2| = 5, and |J3| = 10. For Figure 4.3b, we set B1 = B2 = B3 = 20. We again see that
STUBR provides superior performance and scales well.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 64
0 1 2 3 4 5 60
100
200
300
400
500
600
User budget
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedy Comm. sensitiveRounded solutionLower bound
(a) Effect of user budget
10 12 14 16 18 200
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Number of tasks per user
Wei
ghte
d su
m c
ompl
etio
n tim
e (s
)
STUBRLocal processingGreedy Comm. sensitiveRounded infeasibleLower bound
(b) Effect of the number of tasks per user
Figure 4.5: For compute intensive application on Nexus 10.
4.4.4 For Sequence-dependent Communication Times
We now consider the model with sequence-dependent release times and communication times
described in Section 4.3.3. In Figures 4.4 and 4.5, we use the same simulation setting from
Section 4.4.3 for the modified channel model and STUBR proposed in Section 4.3.3.
Interestingly, we observe that STUBR performs better than even the rounded solution
for some samples. This happens because moving tasks to the local device may significantly
reduce task completion times in some cases because of the reduction of sequence-dependent
release/communication times, particularly when these dominate the processing times. Further-
more, STUBR outperforms all other alternatives for both chess and compute-intensive appli-
cations. In fact, the performance gap between STUBR and other alternatives is even greater
than for the fixed communication case. It is also interesting to note that the communication
sensitive scheme performs better than the greedy scheme for this sequence-dependent commu-
nication model since the communication times now contribute more to the overall objective. In
some cases, we see that greedy and communication sensitive schemes may even increase with
increase in user budget, because of the naive nature of these schemes that causes the initial
tasks to use up all of the budget and do not take release times into account while making
scheduling decisions. We again see that STUBR scales well with increase in number of tasks
per user.
4.4.5 Runtime Overhead
We note that the sum completion time values that we are dealing with in these figures are of
the order of hundreds to thousands of seconds. This is much greater than the runtime of the
algorithm, and thus, the overhead due to the runtime of the algorithm is still compensated by
the improvements in the completion times due to scheduling and offloading.
Chapter 4. Multi-user Task Scheduling with Budget Constraints 65
4.5 Summary
We have studied a multi-user computational offloading problem, for a system consisting of a
finite-capacity cloud with heterogeneous processors. The offloaded tasks incur monetary cost
for the cloud resource usage and each user has a budget constraint. We have formulated the
problem of weighted-sum-completion-time minimization subject to the user budget constraints.
We have formulated a problem to minimize the weighted sum completion time subject to the
user budget constraints. The proposed STUBR algorithm relaxes, rounds, and resolves budget
violations, and it sorts the tasks to obtain an effective solution. We have also obtained inter-
esting performance bounds for both the underlying rounded solution and the budget-resolved
solution for different release-time and communication-time models. Through simulation us-
ing real-world application traces, we have observed that STUBR is scalable and substantially
outperforms the existing alternatives especially for larger systems.
Chapter 5
Online Scheduling for Profit
Maximization at the Cloud
In this chapter, we study the scheduling of tasks that arrive dynamically at a networked cloud
computing system consisting of heterogeneous processors. Execution of tasks yields some profit
to the cloud service provider. We intend to maximize the total profit across all tasks arriving
within a time interval, subject to processor load constraints, without prior knowledge of the
task arrival times or processing requirements. We propose polynomial-time algorithms that use
a combination of learning and dual-optimization techniques to obtain effective solutions.
The main contributions of this work are as follows:
• We formulate our task scheduling problem and propose the Task Dispatch through Online
Training (TDOT) algorithm. It consists of two broad phases: (1) a training phase where
we observe the processing times of some arriving tasks to obtain information about task
characteristics, and (2) an exploitation phase where we make decisions on future tasks with
the help of the information obtained. We draw inspiration from a relaxed solution to the
offline problem to identify the parameters that bridge TDOT’s training and exploitation
phases. This algorithm assumes that profit can be obtained on a partially-completed task,
if the processor load constraint is met before the task could complete execution.
• We derive performance bounds that quantify TDOT’s effectiveness against the offline
benchmark. For example, for Poisson task arrivals, we present a scenario (below Corollary
4) where TDOT achieves an expected profit that is at least half of the maximum profit
achievable by any offline algorithm.
• We consider an extension where we allow each task to have data requirements in addition
to the computation requirements. We then extend TDOT to deal with both computation
and storage resources.
• We then propose a modified version of TDOT, namely TDOT with Greedy Scheduling
(TDOT-G), for implementation in systems where profit can only be obtained from fully-
66
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 67
completed tasks. We use tasks generated from randomly-generated i.i.d. data as well
as Google cluster traces [79] to investigate the practical performance of the proposed
algorithms. We compare it with alternatives such as greedy scheduling, logistic regression,
and an offline upper-bound solution. We observe that TDOT and TDOT-G generally
outperform all other online alternatives and achieves near-optimal performance over the
non-training set of tasks.
The existing works that tackle online problems in cloud computing often make certain
assumptions such as a single server [57,80], purely fluid tasks [57,58,62], homogeneous resources
[59, 60, 81], preemptable tasks [64], or propose heuristic solutions [61, 62]. On the other hand,
certain theoretical works address generic online problems such as assigning items to agents with
budgets [71,82] and scheduling jobs to machines [67,68], providing performance guarantees for
their schemes. However, these works solve a simpler problem [82], address a considerably
different objective [67, 68], or make certain impractical assumptions such as equal-length jobs
[67] or a single-processor system [68]. Some of the techniques we apply are similar to those
proposed in [71], but unlike [71], we (i) have no prior knowledge of the total number of tasks,
(ii) propose an algorithm to obtain feasible task scheduling decisions, (iii) consider a second
resource to accommodate for task data requirements, (iv) propose a modified algorithm for
implementations where profit can only be obtained from fully-completed tasks, and (v) assess
the performance of the proposed schemes through both performance bound analysis and trace-
based simulation.
The rest of this chapter is organized as follows. In Section 5.1, we describe the system
model comprising of cloud processors and online tasks, and formulate the optimization problem
to maximize profit. In Section 5.2, we propose the Task Dispatch through Online Training
(TDOT) algorithm, and provide a performance bound analysis for TDOT. We further propose
an improved version of TDOT, termed TDOT-G, in Section 5.3. In Section 5.4, we extend our
work to consider an additional task requirement, such as data requirements. In Section 5.5,
we present the simulation results, to compare the performance of TDOT and TDOT-G with
other alternatives including greedy scheduling, logistic regression, and an offline upper-bound
solution.
5.1 System Model and Problem Statement
5.1.1 Cloud Processors and Online Task Arrival
We consider a CSP with K broadly defined cloud servers (CSs), which can be, for example,
remote cloud servers, mobile edge hosts, or cloudlets. Each CS k has Pk processors, which may
not be identical. Tasks arrive at the CSP’s controller at an average rate of λ tasks per unit
time over a duration of length T . The role of the controller is to dispatch the tasks to the CSs
for execution. The processing requirements, e.g. number of cycles or processing time, for each
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 68
Table 5.1: Notations
Notation Description
tjrk processing time for task j on processor r at CS k
prk profit obtained per unit time on processor r at CS k
T time duration of the system
L maximum load on each processor
λ task arrival rate
Pk total number of processors at CS k
K total number of CSs
Figure 5.1: Example system with two CSs consisting of two processors each.
task j on processor r in CS k is given by tjrk,∀r, k, and is known only once the task arrives at
the controller.
5.1.2 Profit Maximization
We assume that the work of processor r in CS k generates profit prk per unit time, which may
account for multiple contributing factors such as the revenue from user payment and the cost
of maintaining the processor. Then, the profit obtained by executing task j having processing
requirements tjrk on processor r in CS k is given by prktjrk. By intelligently scheduling this
task, we may maximize the profit we gain from it. In this paper, we aim to identify a task
dispatching decision that maximizes the total profit across all tasks arriving within duration T .
It is important to note that since tasks arrive dynamically, we have no prior knowledge of the
exact number of tasks that arrive within duration T , nor their processing requirements on any
processor.
Let M be the random number of tasks that arrive within duration T . We define the
scheduling decision variables as xjrk = 1, when task j is scheduled to processor r in cloud
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 69
server k, and 0 otherwise. We consider a given load constraint L on each processor1, so that
M∑j=1
tjrkxjrk ≤ L, ∀k ∈ {1, . . . ,K}, r ∈ {1, . . . , Pk}. (5.1)
Additionally, each task is executed at most once:
Pk∑r=1
K∑k=1
xjrk ≤ 1, ∀j ∈ {1, . . . ,M}. (5.2)
We aim to maximize the profit of the CSP. Hence, we formulate an optimization problem as
follows:
maximize{xjrk}
M∑j=1
K∑k=1
Pk∑r=1
prktjrkxjrk (5.3)
subject to (5.1)− (5.2),
xjrk ∈ {0, 1} ∀j ∈ {1, . . . ,M}, k ∈ {1, . . . ,K}, r ∈ {1, . . . , Pk}. (5.4)
Here, objective (5.3) is to maximize the total profit across all tasks, CSs, and processors, while
we ensure that the maximum load L is well-utilized by packing these tasks efficiently.
Remark 4. The multiple cloud servers in our model allow us to differentiate between groups of
processors. For example, all the processors in a particular CS may incur a different cost from
another, and we may have profits prk = pk, ∀r, k. However, one may also visualize this model
as just processors with different profit rates.
In the offline version of the problem, the number of tasks and the task processing times
are known in advance. On the other hand, the online nature of the proposed problem is more
challenging due to the lack of prior information. Consequently, we propose a polynomial-time
online algorithm that uses training to identify appropriate scheduling decisions.
5.2 Task Dispatch through Online Training
In this section, we first obtain an optimal solution to the binary-relaxed offline problem for
performance benchmarking and to gain insights into the online algorithm construction. We
then propose the TDOT algorithm for online task scheduling, and provide a performance bound
with respect to the relaxed offline solution.
1The load constraint can be generalized to be processor dependent, i.e., Lrk. See Section 5.2.3 for a discussionon this.
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 70
5.2.1 Offline Solution through Lagrange Relaxation
We relax the binary constraints (5.4) in the offline version of problem (5.3). The dual of this
problem is then given by
minimize{urk≥0,vj≥0}
K∑k=1
Pk∑r=1
urkL+M∑j=1
vj (5.5)
subject to urktjrk + vj ≥ prktjrk, ∀j ∈ {1, . . . ,M}, r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}, (5.6)
where urk are vj are Lagrange multipliers corresponding to constraints (5.1) and (5.2) respec-
tively.
Constraint (5.6) implies that an optimal solution must satisfy
vj = maxr,k
(prk − urk) tjrk, ∀j ∈ {1, . . . ,M}. (5.7)
In other words, given optimal u = {urk, ∀r, k}, we should assign each task j to the processor
given by argmaxr,k (prk − urk) prktjrk.Thus, the dual problem can be rewritten as follows:
minimize{u≥0}
D(u) (5.8)
where
D(u) =K∑k=1
Pk∑r=1
urkL+M∑j=1
maxr,k
(prk − urk) tjrk. (5.9)
This solution is optimal for the binary-relaxed, offline version of problem (5.3), and is an upper
bound to the optimal online solution. We call this solution OPT and use it in Section 5.2.3 to
define the performance bound.
5.2.2 Online Scheduling with Partial-Task Profit Taking
Now we consider the online problem where tasks arrive dynamically. We neither know the total
number of tasks arriving within duration T nor the processing times of the tasks in advance.
Hence, we need to dynamically learn about the processing time characteristics, i.e., optimal
u values defined in (5.9). The proposed TDOT algorithm utilizes a technique that initially
performs training to learn from arriving tasks, and then uses the information to produce profit
on the remaining set of tasks. TDOT assumes that profit can also be obtained from partially-
completed tasks within the load constraints. In other words, if only a part of the task scheduled
to a processor can be completed before the load L is met, then we retain the partial profit
generated due to the execution of that task upto that point. This assumption is eliminated in
Section 5.3, i.e., we consider a model where profits can only be obtained from fully-completed
tasks. The TDOT algorithm consists of two phases and involves a user-defined parameter
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 71
0 < ε < 1.
Training
We observe the first bελT c arriving tasks, denoted by A = {1, . . . , bελT c}. For each task j ∈ A,
we record its computing requirement and hence tjrk,∀r, k. These tasks may be arbitrarily
scheduled. For simplicity, we may ignore for now the profit earned from these tasks and set
xjrk = 0, ∀j ∈ {1, . . . , bελT c}, r, k, which is shown later not to affect our derivations of the
competitive ratios for TDOT.
If we allocate only εL load to A, then we can write the dual problem objective (5.9) purely
for A as follows:
D(u,A) =K∑k=1
Pk∑r=1
urkεL+∑j∈A
maxr,k
(prk − urk) tjrk. (5.10)
Since the dual of an LP is also an LP, we can use any existing LP solver to efficiently obtain
u∗ = argminu≥0D(u,A).
Exploitation
Let Ac denote all tasks arriving after task bελT c. Now for each arriving task j ∈ Ac, we apply
weights u∗ to obtain the scheduling decision as follows. We set
(r′, k′) = argmaxr,k
(prk − u∗rk) tjrk, (5.11)
if the task can be scheduled on r′ in CS k′ without violating (1− ε)L. Otherwise, we schedule
just a fraction of the task that can be scheduled while meeting (1− ε)L. This is because TDOT
assumes we may obtain profit from partially-completed tasks as well. We achieve this by defining
load variable lrk for every r and k. If task j satisfies tjr′k′ < (1− ε)L− lr′k′ , we set xjr′k′ = 1;
else we set xjr′k′ =(1−ε)L−lr′k′
tjrk. We then update the variables, lr′k′ = lr′k′ + xjr′k′tjr′k′ . We
stop at the end of duration T .
After the above two phases, we obtain scheduling decision xjrk, ∀j, r, k, and the resulting
profit can be calculated by∑M
j=1
∑Kk=1
∑Pkr=1 prktjrkxjrk.
Remark 5. TDOT uses single-shot learning, which is unlike a reinforcement learning approach,
often used to solve multi-armed bandit problems, that iteratively explores and exploits, for each
incoming task, for example. Single-shot learning is more efficient to implement, and can be
more effective if tasks arriving within duration T have similar characteristics. As shown below,
it also has a performance bound unlike reinforcement learning.
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 72
5.2.3 Performance Bound Analysis
The TDOT algorithm, despite its simple premise of training and exploitation, obtains per-
formance that is close to the optimum. In this section, we present a performance bound for
expected profit produced by TDOT in comparison to the upper bound offline solution. We
have not included proof details here due to lack of space.
Let S(u∗,Ac) be the profit obtained by TDOT on the non-training set Ac. We next provide
in Lemma 1 a conditional performance bound on S(u∗,Ac) with respect to the upper bound
OPT. The following definitions are necessary. We define R(u∗) as the profit obtained in the
absence of load constraints by applying weights u∗ to the entire set of M tasks, and Rrk(u∗)
as the contribution of processor r in CS k to R(u∗). We further define Rrk(u∗,A) similarly to
Rrk(u∗), except over just the set of tasks A.
Lemma 4. For any given M number of tasks that arrive within duration T , if we have
K∑k=1
Pk∑r=1
|Rrk(u∗,A)− εRrk(u∗)| ≤ ε2√λT
Mmax{OPT, R(u∗)}, (5.12)
then S(u∗,Ac) ≥ (1− ε− ε√
λTM )OPT.
Proof. We first define a few functions for a given M , for the purposes of the proof. Let yjr′k′(u∗)
be the scheduling decision in the absence of load constraints L, by applying weights u∗ to the
entire set of M tasks. Specifically, ∀j, r′, k′, yjr′k′(u∗) = 1 if r′, k′ = argmaxr,k
(1− u∗rk
prk
)prktjrk
and 0 otherwise. Then, we obtain
R(u∗) =K∑k=1
Pk∑r=1
Rrk(u∗) =
K∑k=1
Pk∑r=1
M∑j=1
prktjrkyjrk(u∗) (5.13)
Note that Rrk(u∗) is the profit obtained due to utilizing processor r in CS k. Similarly, by
applying this to just A, we have
R(u∗,A) =
K∑k=1
Pk∑r=1
Rrk(u∗,A)
=K∑k=1
Pk∑r=1
∑j∈A
prktjrkyjrk(u∗) (5.14)
We define the contribution of processor r in CS k to the dual objective (5.9) as
Drk(u∗) = u∗rkL+
(1− urk
prk
)Rrk(u
∗), (5.15)
and to the dual objective (5.10) over just the set of tasks A as
Drk(u∗,A) = u∗rkεL+
(1−
u∗rkprk
)Rrk(u
∗,A). (5.16)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 73
Let xjrk(u∗) be the scheduling decision obtained by applying u∗ in the presence of load
constraints. We define S(u∗) as the profit obtained while applying u∗ to the entire set of tasks
in the presence of load constraints. We can write S(u∗) as follows.
S(u∗) =K∑k=1
Pk∑r=1
M∑j=1
prktjrkxjrk(u∗)
=K∑k=1
Pk∑r=1
min{prkL,M∑j=1
prktjrkyjrk(u∗)} (5.17)
This is because the tasks are divisible and TDOT assigns a fraction of a task to ensure that L
is exactly met when∑M
j=1 tjrkyjrk(u∗) > L. We again define the contribution of processor r in
CS k to S(u∗) as
Srk(u∗) = min{prkL,
M∑j=1
prktjrkyjrk(u∗)} (5.18)
= min{prkL,Rrk} (5.19)
The profit obtained by TDOT can be written as
S(u∗,Ac) =K∑k=1
Pk∑r=1
∑j∈Ac
prktjrkxjrk(u∗), (5.20)
where the contribution of processor r in CS k to S(u∗,Ac) is
Srk(u∗,Ac) =
∑j∈Ac
prktjrkxjrk(u∗) (5.21)
= min{(1− ε)prkL,Rrk(Ac)}. (5.22)
For simplicity, for the rest of the proof, we drop u∗ from all functions. We also define some
srk,∀r, k, such that
|Rrk(A)− εRrk| ≤ srk, (5.23)
and, ∑r,k
srk ≤ ε2√λT
Mmax{OPT, R}, (5.24)
which is given by the hypothesis of the lemma in (5.12). Additionally, we set ark = srkε .
We first prove that for all r, k,
max {Rrk, Drk} − Srk ≤ ark. (5.25)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 74
We consider the following two cases.
• Case 1: u∗rk > 0.
We can see, from (5.15), that Drk ≤ max {prkL,Rrk}. Consequently,
max {Rrk, Drk} − Srk ≤ max {prkL,Rrk} −min {prkL,Rrk} (5.26)
= |prkL−Rrk|. (5.27)
Since u∗rk > 0, by complementary slackness conditions on the LP for just the tasks in A,
we have Rrk(A) = εprkL. Thus, from (5.23), we have
|εprkL− εRrk| ≤ srk (5.28)
This implies that |prkL−Rrk| ≤ ark. From (5.27), we can now prove (5.25).
• Case 2: u∗rk = 0. We have
Drk = Rrk (5.29)
and
Rrk ≤Rrk(A)
ε+ ark (from (5.12))
≤ prkL+ ark (by complementary slackness). (5.30)
Thus, we get
Srk + ark = min{prkL+ ark, Rrk + ark}
≥ min{Rrk, Rrk + ark} (from (5.30))
= Rrk = Drk (from (5.29)). (5.31)
Hence, (5.25) is proven for Case 2.
We can sum (5.25) over r and k to have
max {R,D} − S ≤Pk∑r=1
K∑k=1
ark. (5.32)
Note that S ≤ OPT ≤ D, by weak duality. Therefore, from (5.32) and using (5.24), we can
see that
R−OPT ≤ 1
ε
Pk∑r=1
K∑k=1
srk ≤ ε√λT
MR (5.33)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 75
Again using weak duality and (5.32), we have OPT− S ≤ ε√
λTM R. Consequently using (5.33),
S ≥
1−ε√
λTM
1− ε√
λTM
OPT (5.34)
Now, from (5.23), since Rrk(Ac) = Rrk − Rrk(A), we see that Rrk(Ac) > (1 − ε)Rrk − srktaking both cases into consideration. Applying this to (5.22), we have
Srk(Ac) ≥ min{(1− ε)prkL, (1− ε)Rrk − srk} (5.35)
≥ (1− ε)Srk − srk (5.36)
Summing over r and k,
S(Ac) ≥Pk∑r=1
K∑k=1
(1− ε)Srk (ignoring 2nd order terms) (5.37)
= (1− ε)S (5.38)
≥ (1− ε)
1−ε√
λTM
1− ε√
λTM
OPT (from (5.34)) (5.39)
≥ (1− ε− ε√λT
M)OPT (5.40)
This lemma states that if u∗ produces an unconstrained profit on the entire set of tasks that is
proportionally close to that on the training set A for each (r, k), then we obtain a performance
bound on the profit on the non-training set, i.e., S(u∗,Ac) for a given M .
Lemma 1 is used in our main theorem below, for which we need to define Pmax = maxk Pk,
and cmax = maxr,k prk.
Definition 1. Λ ⊆ [0, 1]KPmax is a (y, ε)-net for a scheduling rule yjrk() and some ε > 0, if for
all u ∈ [0, 1]KPmax, there exists an u′ ∈ Λ such that |yjrk(u)− yjrk(u′)| ≤ ε for all j, r, k, where
yjrk(u) is the scheduling decision obtained by applying u in the absence of load constraints.
Theorem 8. If OPTcmax
≥ KPmaxln(KPmax|Λ|/ε)
ε3, then we have
EM [E[S(u∗,Ac)|M ]] ≥(
1− 2ε− ε√λTEM
[1√M
])OPT.
Proof. For each 1 ≤ r ≤ Pk, 1 ≤ k ≤ K, and u ∈ Λ, we define a bad event, Br,k,u as one that
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 76
satisfies |Rrk(u,A)− εRrk(u)| > sr,k,u, where
srk,u =2
3cmax ln
(2
δ
)+ ||Rrk(u)||2
√2
(ελT
M
)ln
2
δ. (5.41)
The form of (5.41) is inspired by Lemma 3 from [71], except that in our case |A|M 6= ε in our case
due to the randomness of task arrivals.
Similar to [71], we can show that Pr(Br,k,u) ≤ δ for every 0 < δ < 1. To prove our theorem,
we set δ = εKPmax|Λ| .
We next show that the choice of srk,u in (5.41) satisfies the hypothesis in Lemma 1. We sum
up (5.41) over r and k. The first term in the right hand side is bounded by O(KPmaxcmax ln(
1δ
)).
Since OPTcmax
≥ KPmax ln(1/δ)ε3
, this is less than ε3OPT. In order to bound the contribution of the
second terms, we use the following two inequalities:
||Rrk(u)||2 ≤√cmaxRrk(u) (5.42)
and ∑r,k
√Rrk(u) ≤
√KPmax
∑r,k
Rrk(u) =√KPmaxR(u) (5.43)
Now combining these, we have
∑r,k
||Rrk(u)||2
√2ε ln
(2
δ
)≤
√KPmaxcmax
ελT
Mln
(1
δ
)R(u)
≤√ε4λT
MOPTR(u), (5.44)
where the last inequality is due to OPTcmax
≥ KPmax ln(1/δ)ε3
. Thus, we see that
∑r,k
srk,u ≤ ε2√λT
Mmax{OPT, R}. (5.45)
Hence, setting srk = srk,u gives us (5.24).
Suppose u∗ ∈ Λ. Then using the fact that Pr(Br,k,u) ≤ εKPmax|Λ| and by simply applying a
union bound over all u ∈ Λ and r, k, we have that with probability ≥ 1− ε, none of the events
Br,k,u happen. Thus we can apply Lemma 1, to conclude that
Eu∗ [S(u∗,Ac)|M ] ≥ (1− ε)(1− ε− ε√λT
M)OPT (5.46)
≥ (1− 2ε− ε√λT
M)OPT (5.47)
Alternatively, suppose u∗ /∈ Λ. Because Λ is a (y; ε)-net, there exists u′ ∈ Λ such that
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 77
|yjrk(u∗)− yjrk(u′)| ≤ ε. Consequently, we can prove that
|Rrk(u′,A)−Rrk(u∗,A)| ≤∑j∈A
prktjrk|yjrk(u′)− yjrk(u∗)|
≤ εC(A),
where C(A) is a constant for the set of tasks A. Similarly, we have a constant C for the entire
set of tasks. Now, by using this and the triangle inequality, we have
|Rrk(u∗,A)− εRrk(u∗)| ≤ |Rrk(u′,A)− εRrk(u′)| (5.48)
+ |Rrk(u′,A)−Rrk(u∗,A)|
+ ε|Rrk(u′)−Rrk(u∗)|
≤ srk,u′ + εC(A) + ε2C (5.49)
≤ srk,u′ +O(ε2C) (5.50)
Summing over r and k, and applying Lemma 1, we again have
E[S(u∗,Ac)|M ] ≥ (1− 2ε− ε√λT
M)OPT. (5.51)
Combining both cases, we get E[S(u∗,Ac)|M ] ≥ (1− 2ε− ε√
λTM )OPT.
We then take expectation over M to obtain
EM [E[S(u∗,Ac)|M ]] ≥(
1− 2ε− ε√λTE
[1√M
])OPT. (5.52)
Note that the conclusion of Theorem 1 gives us a bound on the expected performance of TDOT.
Remark 6. The condition on OPTcmax
in Theorem 1 is easily met for all practical scenarios as this
is a ratio, of total profit across all tasks and processors to the profit per unit time on a single
processor, which is generally a large value.
Furthermore, by Jensen’s inequality, we have EM[
1√M
]≤√EM [ 1
M ]. Using this in Theorem
1, we now have
EM [E[S(u∗,Ac)|M ]] ≥
(1− 2ε− ε
√λTEM
[1
M
])OPT
≥
(1− 2ε−
√εEM
[ελT
M
])OPT. (5.53)
Thus, we can see that the profit performance gap depends on EM[ελTM
], which is the expected
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 78
proportion of tasks in the training set. Furthermore, using a lower bound on EM[
1M
]if M is
Poisson [83], we have the following corollary.
Corollary 4. Assume the condition of Theorem 1 is met. If M has a Poisson distribution with
mean λT , we have
EM [E[S(u∗,Ac)|M ]] ≥(1− 2ε− ε
√(3 + λT )(1− e−λT )
λT
)OPT. (5.54)
This corollary allows precise numerical calculation. As an example, if λ = 0.1, T = 1000s, and
we choose ε = 0.15, then EM [E[S(u∗,Ac)] ≥ 0.5 OPT.
Corollary 5. For λ→∞, (5.54) reduces to
EM [E[S(u∗,Ac)|M ]] ≥ (1− 3ε) OPT. (5.55)
When the task arrival rate is high, there are enough tasks for training so that ε can be set
small. In this case, Corollary 5 suggests that TDOT can perform close to an optimal offline
algorithm.
Remark 7. Instead of a single load constraint L, we can consider processor-dependent load
constraints Lrk,∀r, k, and rewrite equation (5.1) as follows:
M∑j=1
tjrkxjrk ≤ Lrk, ∀r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}.
We note that all results can be trivially extended to this case.
Remark 8. Our performance bound is computed purely based on the profit from the non-training
set of tasks Ac, but it is compared against the profit of an upper-bound offline algorithm that
considers the entire set of tasks. Consequently, any additional profit we obtain on training set
A is a bonus and further improves profit performance. We note that the value of ε we choose
splits the tasks into sets A and Ac, and consequently impacts profit performance. We study the
effect of ε on the total profit in Section 5.5.
5.2.4 Complexity Analysis
An LP can be solved in O(n3.5B) time where n is the number of variables and B is the number
of bits in the input [77]. Thus, for a given M , the dual minimization during the training phase
of TDOT can be done in O((εMP )3.5B) time where P =∑K
k=1 Pk is the total number of
processors. On the other hand, the time complexity of the exploitation phase is O((1− ε)M).
Thus, the time complexity of TDOT is dominated by LP-solving in the training phase.
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 79
Remark 9. We note that |A| = εM , and hence, the above complexity is equivalent to O((|A|P )3.5B).
This is usually small since the number of training tasks, i.e., |A|, is much smaller relative to
M .
5.3 Modified Algorithm Without Partial-Task Profit Taking
The TDOT algorithm proposed in the previous section assumes that we may obtain profit
on partially-completed tasks, if load constraint on the scheduled processor is met. Hence, we
propose a variant, namely TDOT with Greedy scheduling or TDOT-G, for scenarios where
profit can be obtained only for tasks that have fully completed execution while meeting the
load constraints. This algorithm consists of the same two broad phases as that of TDOT,
namely, the training phase, and the exploitation phase.
In this version, if an incoming task cannot be scheduled on the maximum profit processor,
we try to schedule it on the second maximum profit processor, and then the third maximum
profit processor, and so on. We expect this technique to result in better practical performance
than simply discarding a task that cannot be scheduled on the maximum profit processor as we
greedily try to ensure that the current task is at least executed on some processor, which will
produce some profit. The following are the steps of this algorithm.
• Step 1: Observe the processing times of the set of the first bελT c arriving tasks, A.
• Step 2: Find weights u* = argminuD(u,A).
• Step 3: For each incoming task j, we initialize P to be the total set of processors.
– Step 3a: Schedule the task to processor (r′, k′) = argmaxr,k∈P (prk − u∗rk) tjrk, if
(1− ε)L is not violated on processor r′ in CS k′. If (1− ε)L is violated on processor
r′ in CS k′, go to Step 3b. This violation is checked by using load variables lrk,∀r, k,
similar to TDOT.
– Step 3b: Remove (r′, k′) from P and repeat Step 3a unless P is empty, i.e., the task
cannot be scheduled on any processor.
• Step 4: Stop at the end of duration T .
The overall complexity of TDOT-G is still dominated by the LP-solving step, and given by
O((|A|P )3.5B) as shown in Section 5.2.4.
5.4 TDOT with Data Requirements
In this section, we allow each arriving task j to have data requirements djrk on processor r
in CS k, in addition to the processing requirements tjrk defined in Section 5.1. We assume
that similar to the task processing requirements, these data requirements are only known once
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 80
the task becomes available for dispatch. Each processor r in CS k also has data load con-
straints Q, similar to processing load constraints L. Problem (5.3) can be then reformulated to
accommodate these as follows:
max.{xjrk}
M∑j=1
K∑k=1
Pk∑r=1
prktjrkxjrk +M∑j=1
K∑k=1
Pk∑r=1
qrkdjrkxjrk (5.56)
s. t.
M∑j=1
tjrkxjrk ≤ L, ∀r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}, (5.57)
M∑j=1
djrkxjrk ≤ Q, ∀r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}, (5.58)
K∑k=1
Pk∑r=1
xjrk ≤ 1, ∀j ∈ {1, . . . ,M}, (5.59)
xjrk ∈ {0, 1},∀j ∈ {1, . . . ,M}, r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}. (5.60)
5.4.1 Offline Solution through Lagrange Relaxation
Similar to Section 5.2, the dual problem of the LP-relaxation of (5.56) is given by
min.{urk≥0,wrk≥0,vj≥0}
K∑k=1
Pk∑r=1
urkL+
K∑k=1
Pk∑r=1
wrkQ+
M∑j=1
vj (5.61)
s. t. urktjrk + wrkdjrk + vj ≥ prktjrk + qrkdjrk,
∀j ∈ {1, . . . ,M}, r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}. (5.62)
where urk, wrk and vj are Lagrange multipliers corresponding to constraints (5.57), (5.58) and
(5.59) respectively. We can rewrite constraint (5.62) as follows:
vj ≥(
1− urkprk
)prktjrk +
(1− wrk
qrk
)qrkdjrk,
∀j ∈ {1, . . . ,M}, r ∈ {1, . . . , Pk}, k ∈ {1, . . . ,K}. (5.63)
In other words, given optimal z = {urk, wrk, ∀r, k, }, we should assign each task j to the pro-
cessor given by argmaxr,k
(1− urk
prk
)prktjrk +
(1− wrk
qrk
)qrkdjrk.
Thus, the dual problem can be rewritten as follows:
minimize{z≥0}
D(z) (5.64)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 81
where
D(z) =
Pk∑r=1
K∑k=1
urkL+
Pk∑r=1
K∑k=1
wrkQ+M∑j=1
maxr,k
[(1− urk
prk
)prktjrk +
(1− wrk
qrk
)qrkdjrk
](5.65)
5.4.2 Online Scheduling Algorithm with Partial-Task Profit Taking
Now we consider the online problem where tasks arrive dynamically, and we need to learn about
the optimal z values defined in (5.65). We modify the TDOT algorithm proposed in 5.2.2 for
this problem as follows.
Training
We observe the first bελT c arriving tasks, denoted by A = {1, . . . , bελT c}. For each task j ∈ A,
we record its computing requirement and hence tjrk,∀r, k. These tasks may be arbitrarily
scheduled. For simplicity, we may ignore for now the profit earned from these tasks and set
xjrk = 0, ∀j ∈ {1, . . . , bελT c}, r, k, which is shown later not to affect our derivations of the
competitive ratios for TDOT.
If we allocate only εL and εQ loads to A, then we can write the dual problem objective
(5.65) purely for A as follows.
D(z,A) =
Pk∑r=1
K∑k=1
urkεL+
Pk∑r=1
K∑k=1
wrkεQ+∑j∈A
maxr,k
[(1− urk
prk
)prktjrk +
(1− wrk
qrk
)qrkdjrk
](5.66)
Since the dual of an LP is also an LP, we can use any existing LP solver to efficiently obtain
z∗ = argminz≥0D(z,A).
Exploitation
Let Ac denote all tasks arriving after task bελT c. Now for each arriving task j ∈ Ac, we apply
weights z∗ to obtain the scheduling decision as follows. We set
(r′, k′) = argmaxr,k
[(1− urk
prk
)prktjrk +
(1− wrk
qrk
)qrkdjrk
], (5.67)
If the task j can be scheduled on r′ in CS k′ without violating both (1− ε)L and (1− ε)Q, we
schedule the entire task to processor r′ in CS k′. Otherwise, we schedule just a fraction of the
task to processor r′ in CS k′ that can be scheduled while meeting both (1− ε)L and (1− ε)Q.
We achieve this by defining load variables lrk and mrk for every r and k. If task j satisfies
both tjrk < (1− ε)L+ lrk and djrk < (1− ε)Q+mrk , we schedule the entire task i.e. xjrk = 1.
We update the load variables, lr′k′ = lr′k′ + tjr′k′ and mr′k′ = mr′k′ + djr′k′ . Else for task j, we
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 82
execute just xljr′k′ =(1−ε)L−lr′k′
tjrkfraction of the task, and store xmjr′k′ =
(1−ε)Q−mr′k′djrk
fraction of
the data. We update the load variables, lr′k′ = lr′k′+xljr′k′tjr′k′ and mr′k′ = mr′k′+xmjr′k′djr′k′ .
We stop at the end of duration T .
5.4.3 Performance Bound Analysis
Let S(z∗,Ac) be the profit obtained by TDOT on the non-training set Ac. We provide a
conditional performance bound on S(z∗,Ac) with respect to the offline optimum OPT in Lemma
5. We define some functions similar to Section 5.2.3. We define R(z∗) as the profit obtained
due to the total processing time in the absence of load constraints and by applying weights z∗
to the entire set of M tasks, and Rrk(z∗) as the contribution of processor r in CS k to R(z∗).
We further define Rrk(z∗,A) similarly to Rrk(z
∗), except over just the set of tasks A. Similarly,
we define C(z∗), Crk(z∗), and Crk(z
∗,A) as the corresponding profit obtained due to the data
requirements. Let yjr′k′(u∗) be the scheduling decision in the absence of load constraints L and
Q, by applying weights z∗ to the entire set of M tasks.
We now prove a performance bound on the revenue obtained on the test set Ac with respect
to the total offline solution OPT.
Lemma 5. For any given M number of tasks that arrive within duration T , if we have
K∑k=1
Pk∑r=1
max{|Rrk(z,A) + Crk(z,A)− εRrk(z)− εCrk(z)|,
|Rrk(z,A)− εRrk(z)|, |Crk(z,A)− εCrk(z)|, |(Rrk(z,A)− εRrk)− (Crk(z,A)− εCrk(z)|}
≤ ε2√λT
Mmax{OPT, R(z∗) + C(z∗)}, (5.68)
then S(z∗,Ac) ≥ (1− ε− ε√
λTM )OPT.
Proof. We first define a few functions for a given M , for the purposes of the proof. Let yjr′k′(z∗)
be the scheduling decision in the absence of load constraints L, by applying weights z∗ to the
entire set of M tasks. Specifically, ∀j, r′, k′, yjr′k′(z∗) = 1 if
(r′, k′) = argmaxr,k
[(1− urk
prk
)prktjrk +
(1− wrk
qrk
)qrkdjrk
]and 0 otherwise. Then, we obtain
R(z∗) =
K∑k=1
Pk∑r=1
Rrk(z∗) =
K∑k=1
Pk∑r=1
M∑j=1
prktjrkyjrk(z∗) (5.69)
Note that Rrk(z∗) is the profit obtained due to utilizing processor r in CS k. Similarly, by
applying this to just A, we have
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 83
R(z∗,A) =
K∑k=1
Pk∑r=1
Rrk(z∗,A)
=K∑k=1
Pk∑r=1
∑j∈A
prktjrkyjrk(z∗) (5.70)
Similarly, we have
C(z∗) =K∑k=1
Pk∑r=1
Crk(z∗) =
K∑k=1
Pk∑r=1
M∑j=1
qrkdjrkyjrk(z∗) (5.71)
and
C(z∗,A) =K∑k=1
Pk∑r=1
Crk(z∗,A)
=
K∑k=1
Pk∑r=1
∑j∈A
qrkdjrkyjrk(z∗) (5.72)
For simplicity, we also define function Yrk(z∗) = Rrk(z
∗) + Crk(z∗),∀r, k, and
Y (z∗) =K∑k=1
Pk∑r=1
Yrk(z∗) = R(z∗) + C(z∗). (5.73)
We define the contribution of processor r in CS k to the dual objective (5.65) as
Drk(z∗) = u∗rkL+ w∗rkQ+
(1−
u∗rkprk
)Rrk(z
∗) +
(1−
w∗rkqrk
)Crk(z
∗), (5.74)
and to the dual objective over just the set of tasks A as
Drk(z∗,A) = u∗rkεL+ w∗rkεQ+
(1−
u∗rkprk
)Rrk(z
∗,A) +
(1−
w∗rkqrk
)Crk(z
∗,A). (5.75)
Let xjrk(z∗) be the scheduling decision obtained by applying z∗ in the presence of load
constraints. We define S(z∗) as the profit obtained while applying z∗ to the entire set of tasks
in the presence of load constraints. We can write S(z∗) as follows.
S(z∗) =K∑k=1
Pk∑r=1
M∑j=1
prktjrkxjrk(z∗) +
K∑k=1
Pk∑r=1
M∑j=1
qrkdjrkxjrk(z∗)
=
K∑k=1
Pk∑r=1
min{prkL+ Crk(z∗), qrkQ+Rrk(z
∗), prkL+ qrkQ,Yrk(z∗)} (5.76)
This is because the tasks are divisible and TDOT assigns a fraction of a task to ensure that
L and Q are exactly met when∑M
j=1 tjrkyjrk(z∗) > L, and
∑Mj=1 djrkyjrk(z
∗) > Q. We again
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 84
define the contribution of processor r in CS k to S(z∗) as
Srk(z∗) = min{prkL+ Crk(z
∗), qrkQ+Rrk(z∗), prkL+ qrkQ,Yrk(z
∗)} (5.77)
The profit obtained by TDOT can be written as
S(z∗,Ac) =
K∑k=1
Pk∑r=1
∑j∈Ac
prktjrkxjrk(z∗) +
K∑k=1
Pk∑r=1
∑j∈Ac
qrkdjrkxjrk(z∗), (5.78)
where the contribution of processor r in CS k to S(z∗,Ac) is
Srk(z∗,Ac) = min{(1− ε)prkL+ Crk(z
∗,Ac), (1− ε)qrkQ+Rrk(z∗,Ac), (1− ε)prkL+ (1− ε)qrkQ,Yrk(z∗,Ac).
(5.79)
For simplicity, for the rest of the proof, we drop z∗ from all functions. We also define some
srk,∀r, k, such that
max{|Rrk(A) + Crk(A)− εRrk − εCrk|, |Rrk(A)− εRrk|, |Crk(A)− εCrk|,
|(Rrk(A)− εRrk)− (Crk(A)− εCrk|} ≤ srk, (5.80)
and, ∑r,k
srk ≤ ε2√λT
Mmax{OPT, Y }, (5.81)
which is given by the hypothesis of the lemma in (5.68). Additionally, we set ark = srkε .
We first prove that for all r, k,
max {Yrk, Drk} − Srk ≤ ark. (5.82)
We consider the following two cases.
• Case 1: u∗rk > 0 and w∗rk > 0.
We can see, from (5.74), that
Drk ≤ max{prkL+ Crk, qrkQ+Rrk, prkL+ qrkQ,Yrk}. (5.83)
Consequently,
max {Yrk, Drk} − Srk ≤ max {prkL+ Crk, qrkQ+Rrk, prkL+ qrkQ,Yrk}−
min {prkL+ Crk, qrkQ+Rrk, prkL+ qrkQ,Yrk} (5.84)
≤ max{|prkL+ qrkQ− Yrk|, |qrkQ− Crk|, |prkL−Rrk|, |prkL+ Crk − qrkQ−Rrk|}(5.85)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 85
≤ max{|prkL+ qrkQ− Yrk|, |qrkQ− Crk|+ |prkL−Rrk|, |prkL+ Crk − qrkQ−Rrk|}(5.86)
Since u∗rk > 0 and w∗rk > 0, by complementary slackness conditions on the LP for just the
tasks in A, we have Rrk(A) = εprkL and Crk(A) = εqrkQ. Thus, from (5.80), we have
|εprkL+ εqrkQ− εRrk − εCrk| ≤ srk (5.87)
This implies that |prkL+ qrkQ− Yrk| ≤ ark. Similarly, we also have
|εprkL+ εCrk − εqrkQ− εRrk| ≤ srk (5.88)
which implies that |prkL+ Crk − qrkQ−Rrk| ≤ ark.
Similarly, we can prove that
|qrkQ− Crk| ≤ ark, (5.89)
and
|prkL−Rrk| ≤ ark. (5.90)
From (5.86), we can now prove (5.82).
• Case 2: u∗rk > 0 and w∗rk = 0. This implies that
Drk = u∗rkL+
(1−
u∗rkprk
)Rrk + Crk. (5.91)
Thus,
Drk ≤ max{prkL+ Crk, Yrk} (5.92)
≤ max{prkL+ Crk, qrkQ+Rrk, prkL+ qrkQ,Yrk} (5.93)
Now, similar to Case 1, we can prove (5.82).
• Case 3: u∗rk = 0 and w∗rk > 0. This implies that
Drk = u∗rkL+
(1−
u∗rkprk
)Rrk + Crk. (5.94)
Thus,
Drk ≤ max{prkL+ Crk, Yrk} (5.95)
≤ max{prkL+ Crk, qrkQ+Rrk, prkL+ qrkQ,Yrk} (5.96)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 86
Now, similar to Case 1, we can prove (5.82).
• Case 4: u∗rk = 0 and w∗rk = 0. We have
Drk = Yrk (5.97)
and
Rrk ≤Rrk(A)
ε+ ark (from (5.12))
≤ prkL+ ark (by complementary slackness). (5.98)
Crk ≤Crk(A)
ε+ ark (from (5.12))
≤ qrkQ+ ark (by complementary slackness). (5.99)
Yrk ≤Rrk(A) + Crk(A)
ε+ ark (from (5.12))
≤ prkL+ qrkQ+ ark (by complementary slackness). (5.100)
Thus, we get
Srk + ark = min{prkL+ Crk + ark, qrkQ+Rrk + ark, prkL+ qrkQ+ ark, Yrk + ark}
≥ min{Yrk, Yrk + ark} (from (5.30))
≥ Yrk = Drk (from (5.29)). (5.101)
Hence, (5.82) is proven for Case 4.
We can sum (5.82) over r and k to have
max {Y,D} − S ≤Pk∑r=1
K∑k=1
ark. (5.102)
Note that S ≤ OPT ≤ D, by weak duality. Therefore, from (5.102) and using (5.81), we
can see that
Y −OPT ≤ 1
ε
Pk∑r=1
K∑k=1
srk ≤ ε√λT
MY (5.103)
Again using weak duality and (5.102), we have OPT−S ≤ ε√
λTM Y . Consequently using (5.103),
S ≥
1−ε√
λTM
1− ε√
λTM
OPT (5.104)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 87
Now, from (5.80), since Rrk(Ac) + Crk(Ac) = Rrk + Crk − Rrk(A) − Crk(A), we see that
Rrk(Ac) +Crk(Ac > (1− ε)(Rrk +Crk)− srk taking both cases into consideration. Using this,
we can see that
Yrk(Ac) > (1− ε)Yrk − srk, (5.105)
Crk(Ac) > (1− ε)Crk − srk, (5.106)
and
Rrk(Ac) > (1− ε)Rrk − srk. (5.107)
Applying these to (5.79), we have
Srk(Ac) = min{(1− ε)prkL+ Crk(Ac), (1− ε)qrkQ+Rrk(Ac), (1− ε)(prkL+ qrkQ), Yrk(Ac)}(5.108)
≥ min{(1− ε)prkL+ (1− ε)Crk − srk, (1− ε)qrkQ+ (1− ε)Rrk − srk,
(1− ε)(prkL+ qrkQ), (1− ε)Yrk − srk} (5.109)
≥ (1− ε)Srk − srk. (5.110)
Summing over r and k,
S(Ac) ≥Pk∑r=1
K∑k=1
(1− ε)Srk (ignoring 2nd order terms) (5.111)
= (1− ε)S (5.112)
≥ (1− ε)
1−ε√
λTM
1− ε√
λTM
OPT (from (5.34)) (5.113)
≥ (1− ε− ε√λT
M)OPT (5.114)
Theorem 9. If OPTcmax
≥ KPmaxln(KPmax|Λ|/ε)
ε3for some Λ that is a (y, ε)-net, then we have
EM [E[S(z∗,Ac)|M ]] ≥(
1− 2ε− ε√λTEM
[1√M
])OPT.
Proof. For each 1 ≤ r ≤ Pk, 1 ≤ k ≤ K, and u ∈ Λ, we define a bad event, Br,k,z as one that
satisfies max{|Rrk(z,A) + Crk(z,A) − εRrk(z) − εCrk(z)|, |Rrk(z,A) − εRrk(z)|, |Crk(z,A) −εCrk(z)|, |(Rrk(z,A) − εRrk) − (Crk(z,A) − εCrk(z)|} > sr,k,z. Thus, the probability of a bad
event is given by
P [Br,k,z > sr,k,z] ≤ 4δ, (5.115)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 88
where
srk,z =2
3cmax ln
(2
δ
)+ ||Yrk(z)||2
√2
(ελT
M
)ln
2
δ. (5.116)
for 0 < δ < 1. The form of (5.116) is inspired by Lemma 3 from [71], except that in our case|A|M 6= ε in our case due to the randomness of task arrivals. We use Lemma 3 from [71] to prove
that
P [|Rrk(z,A)− εRrk(z)| > pr,k,z] ≤ δ, (5.117)
for
prk,z =2
3rmax ln
(2
δ
)+ ||Rrk(z)||2
√2
(ελT
M
)ln
2
δ. (5.118)
Since prk,z < srk,z, this implies
P [S1] = P [|Rrk(z,A)− εRrk(z)| > sr,k,z] ≤ δ. (5.119)
Similarly, we can prove
P [S2] = P [|Crk(z,A)− εCrk(z)| > sr,k,z] ≤ δ, (5.120)
P [S3] = P [|Yrk(z,A)− εYrk(z)| > sr,k,z] ≤ δ, (5.121)
and
P [S4] = P [|(Rrk(z,A)− εRrk(z))− (Crk(z,A)− εCrk(z)| > sr,k,z] ≤ δ. (5.122)
Thus ,
P [Br,k,z > sr,k,z] = P [S1 ∩ S2 ∩ S3 ∩ S4] (5.123)
≤ P [S1] + P [S2] + P [S3] + P [S4] (5.124)
≤ 4δ. (5.125)
To prove our theorem, we set δ = εKPmax|Λ| . We show that the choice of srk,z satisfies the
hypothesis in Lemma 5. We sum up (5.41) over r and k. The first term in the right hand side
is bounded by O(KPmaxcmax ln(
1δ
)). Since OPT
cmax≥ KPmax ln(1/δ)
ε3, this is less than ε3OPT. In
order to bound the contribution of the second terms, we use the following two inequalities:
||Yrk(z)||2 ≤√cmaxYrk(z) (5.126)
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 89
and ∑r,k
√Yrk(z) ≤
√KPmax
∑r,k
Yrk(z) =√KPmaxY (z) (5.127)
Now combining these, we have
∑r,k
||Yrk(z)||2
√2ε ln
(2
δ
)≤
√KPmaxcmax
ελT
Mln
(1
δ
)Y (z)
≤√ε4λT
MOPTY (z), (5.128)
where the last inequality is due to OPTcmax
≥ KPmax ln(1/δ)ε3
. Thus, we see that
∑r,k
srk,z ≤ ε2√λT
Mmax{OPT, Y }. (5.129)
Hence, setting srk = srk,z gives us (5.81).
Suppose z∗ ∈ Λ. Then using the fact that Pr(Br,k,z) ≤ εKPmax|Λ| and by simply applying a
union bound over all z ∈ Λ and r, k, we have that with probability ≥ 1− ε, none of the events
Br,k,z happen. Thus we can apply Lemma 5, to conclude that
E[S(z∗,Ac)|M ] ≥ (1− ε)(1− ε− ε√λT
M)OPT (5.130)
≥ (1− 2ε− ε√λT
M)OPT (5.131)
Alternatively, suppose z∗ /∈ Λ. Because Λ is a (y; ε)-net, there exists z′ ∈ Λ such that
|yjrk(z∗)− yjrk(z′)| ≤ ε. Consequently, we can prove that
|Rrk(z′,A)−Rrk(z∗,A)| ≤∑j∈A
prktjrk|yjrk(z′)− yjrk(z∗)| ≤ εC(A),
where C(A) is a constant for the set of tasks A. Similarly, we have a constant C for the entire
set of tasks. Now, by using this and the triangle inequality, we have
|Rrk(z∗,A)− εRrk(z∗)| ≤ |Rrk(z′,A)− εRrk(z′)|+ |Rrk(z′,A)−Rrk(z∗,A)|
+ ε|Rrk(z′)−Rrk(z∗)|
≤ srk,z′ + εC(A) + ε2C (5.132)
≤ srk,z′ +O(ε2C) (5.133)
We can similarly prove for |Crk(z∗,A) − εCrk(z∗)|, |Yrk(z∗,A) − εYrk(z∗)|, and |(Rrk(z,A) −
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 90
εRrk)− (Crk(z,A)− εCrk(z)|. Summing over r and k, and applying Lemma 1, we again have
E[S(z∗,Ac)|M ] ≥ (1− 2ε− ε√λT
M)OPT. (5.134)
Combining both cases, we get E[S(z∗,Ac)|M ] ≥ (1− 2ε− ε√
λTM )OPT.
We then take expectation over M to obtain
EM [E[S(z∗,Ac)|M ]] ≥(
1− 2ε− ε√λTE
[1√M
])OPT. (5.135)
5.4.4 TDOT-G with Data Requirements
TDOT-G proposed in Section 5.3 can be modified to accommodate task data requirements by
modifying just Step 3a as follows:
Step 3a: Schedule each task to processor
(r′, k′) = argmaxr,k∈P
{(
1− urkprk
)prktjrk +
(1− wrk
qrk
)qrkdjrk}, (5.136)
only if both (1 − ε)L and (1 − ε)Q are not violated on processor r′ in CS k′. If (1 − ε)L or
(1− ε)Q is violated on processor r′ in CS k′, go to Step 3b. This violation is checked by using
load variables lrk, ∀r, k, similar to TDOT.
5.5 Simulation Results
We investigate the performance of our proposed algorithms with extensive simulation, using
i.i.d task data as well as Google cluster traces with practical parameter values. We present
the comparison targets and simulation setup in Sections 5.5.1 and 5.5.2 respectively. We study
the non-training set profit performance on randomly-generated i.i.d. tasks in Section 5.5.3,
Google-cluster tasks in Section 5.5.4, and the entire set of tasks in Section 5.5.5.
5.5.1 Comparison Targets
We use the following comparison targets to evaluate the performance of TDOT and TDOT-G:
• Logistic Regression - Greedy (LR-G): We use
r′, k′ = argmaxr,k
{(prk − urk)tjrk + (qrk − wrk)djrk}
as the training labels for each task j in the training set A. We then perform multi-class
classification using logistic regression [84] to obtain the label for each non-training task,
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 91
and schedule the task to the corresponding processor as long as (1− ε)L and (1− ε)Q are
not violated. Else, we use a technique similar to TDOT-G. We set P to be the remaining
set of processors. We schedule task j to processor (r′, k′) = argmaxr,k∈P prktjrk + qrkdjrk,
if (1 − ε)L and (1 − ε)Q is not violated on processor r′ in CS k′. If it is violated, we
remove the processor from P and repeat until the task is scheduled or all processors are
exhausted.
• Naive Bayes: We prepare the training labels in a manner similar to LR-G above, and per-
form Naive Bayes classification to obtain labels for non-training tasks. We then schedule
the task to the corresponding processor as long as (1− ε)L and (1− ε)Q are not violated.
Else, we discard the task.
• Support Vector Machine (SVM): Similar to Naive Bayes above, but we use SVM classifier
instead to obtain labels for non-training tasks.
• Greedy Algorithm: This is similar to the ’G’ portion of TDOT-G. We set P to be the entire
set of processors. We schedule task j to processor (r′, k′) = argmaxr,k∈P prktjrk + qrkdjrk,
if (1 − ε)L and (1 − ε)Q is not violated on processor r′ in CS k′. If it is violated, we
remove the processor from P and repeat until the task is scheduled or all processors are
exhausted.
• Upper Bound Offline: Solve formulation (5.61) to obtain an upper-bound.
Based on whether we plot profit on just the non-training set Ac or the overall set of tasks,
we modify these comparison targets accordingly. In Section 5.5.5, we ensure every comparison
target obtains profit on the training set as well, for fair comparison. These details are given in
Section 5.5.5.
5.5.2 Simulation Setup and Task Requirements
We consider two different CSs with two processors each. The profits are set to p11 = 0.5,
p12 = 0.7, p21 = 0.3, and p22 = 0.3, and q11 = 0.2, q12 = 0.2, q21 = 0.1, and q22 = 0.1. We set
default values of system duration D = 3000, task arrival times λ = 0.1, maximum processing
load L = 3500, maximum data load Q = 4000 for Google-cluster tasks and Q = 1500 for I.I.D.
tasks, and ε = 0.2.
• Randomly-generated i.i.d. tasks: We draw task processing requirements tjrk,∀r, k and
task data requirements djrk, ∀r, k from independent and identical (i.i.d.) uniform distri-
butions between [5, 20] and [2, 10] respectively. We set default values of system duration
D = 3000, task arrival times λ = 0.1, maximum processing load L = 3500, maximum
data load Q = 800, and ε = 0.2.
• Google-cluster tasks: We use the task events information from Google cluster data [79] to
obtain the task arrival times, and compute average task per unit time λ = 1average inter-arrival time
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 92
0.08 0.09 0.10 0.11 0.12 0.13 0.14 0.15 0.16Arrival rate λ
2000
2500
3000
3500
4000
4500
Prof
it (o
n no
n-tra
inin
g se
t)
TDOT-GTDOTLR-GGreedyNaive BayesSVMUpper Bound Offline
Figure 5.2: Effect of arrival rate λ on non-training set profit for i.i.d. tasks
from these values. We consider Poisson task arrival at the controller, so that the total
number of tasks that arrive within duration T is a Poisson random variable with mean
λT . We also use the task usage information from [79], i.e., task start times and end times,
to obtain task processing times. We set mean task processing time mj = (task end time
- task start time). We then consider processors with different relative speeds, α11 = 1,
α12 = 2, α21 = 1.5, and α22 = 1.5, to obtain varied processing times on different pro-
cessors. Furthermore, we add an additional ±50% randomness added to task processing
times and data requirements to simulate unrelated processors/resources. We set default
values of system duration D = 3000, maximum processing load L = 2500, maximum data
load Q = 600, and ε = 0.2.
In the implementation of TDOT, while picking processor (r′, k′) = argmaxr,k (prk − u∗rk) tjrk,we randomly tie-break if there are multiple processors that give us the maximum value within
a tolerance of 0.001.
In Sections 5.5.3 and 5.5.4, we analyze the profit obtained on the non-training set of tasks,
Ac, for i.i.d tasks and Google-cluster tasks respectively. In Section 5.5.5, we then analyze the
overall profit obtained.
5.5.3 I.I.D. Tasks
In this section, we consider the case where task processing times and data requirements are
drawn from i.i.d. uniform distributions (tjrk ∼ U(5, 20) and djrk ∼ U(2, 10)). According to
the performance bound proven in Theorem 9 for i.i.d task requirements, TDOT approaches
near-optimal performance for increasing values of arrival rate λ and number of tasks M , and
decreasing values of ε. In order to study this, we plot the non-training profit versus λ in Figure
5.2. As we increase λ in the x-axis, we also decrease ε proportionally to maintain a constant
training-set size, i.e., bελT c. The other values are set to the defaults indicated in Section
5.5.1. We observe that TDOT outperforms other alternatives, particularly as arrival rate and
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 93
600 800 1000 1200 1400Max. Load, Q (s)
2000
2200
2400
2600
Prof
it (o
n no
n-tra
inin
g se
t)
TDOT-GTDOTLR-GGreedyNaive BayesSVMUpper Bound Offline
Figure 5.3: Effect of max. data load Q on non-training set profit for i.i.d. tasks
consequently the total number of tasks increases. This is in line with the proven performance
bound. Thus, for a fixed training set size, as we increase the non-training set size, we expect
TDOT to approach near-optimality.
In Figure 5.3, we vary data load Q for the fixed default values of L, λ, and ε. We see that the
non-training set profit obtained by TDOT and TDOT-G exhibit superior performance relative
to the other alternatives. Thus, a reasonable guideline would be to pick TDOT if profit may
be obtained by the CSP from partially-completed tasks, and TDOT-G if not.
While these figures validate the proven theorems and indicate the effectiveness of the pro-
posed algorithms for i.i.d. task requirements, we also wish to evaluate the performance of the
schemes under more practical settings. Towards this end, in the following sections, we consider
tasks obtained from Google cluster data [79].
5.5.4 Google-cluster Tasks
Figures 5.4 and 5.5 exhibit the non-training profit versus processing load L performance and
data load Q respectively. The other values are set to the defaults indicated in Section 5.5.1.
We see that TDOT again outperforms the other algorithms particularly for tighter L and Q
load values. This makes sense particularly because TDOT allows profit to be obtained from
partially-completed tasks and this effect is prominent when the loads are tight. On the other
hand, TDOT-G performs better for loose loads, and is the alternative to if the CSP can obtain
profit only from fully-completed tasks. Both TDOT and TDOT-G have low computational
complexity compared to alternatives such as LR-G, Naive Bayes, and SVM, since it only involves
solving a linear program to find the weights in comparison with a convex gradient descent.
5.5.5 Overall Profit and ε Values
Although we use the set of first bελT c tasks for training purposes, in practice some profit can
be made on these tasks. We use the Greedy Algorithm presented in Section 5.5.1 applied to the
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 94
1000 1500 2000 2500 3000 3500Max. Load, L (s)
1500
2000
2500
3000
3500
Prof
it (o
n no
n-tra
inin
g se
t)
TDOT-GTDOTLR-GGreedyNaive BayesSVMUpper Bound Offline
Figure 5.4: Effect of max. processing load L on non-training set profit for Google-cluster tasks
500 550 600 650 700 750 800 850 900Max. Load, Q (s)
2000
2500
3000
3500
4000
Prof
it (o
n no
n-tra
inin
g se
t)
TDOT-GTDOTLR-GGreedyNaive BayesSVMUpper Bound Offline
Figure 5.5: Effect of max. data load Q on non-training set profit for Google-cluster tasks
Chapter 5. Online Scheduling for Profit Maximization at the Cloud 95
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50ε
2000
2250
2500
2750
3000
3250
3500
Prof
it (o
n en
tire
set)
TDOT-GTDOTLR-GGreedyNaive BayesSVM
Figure 5.6: Effect of ε on overall profit for Google-cluster tasks
training set of tasks. Adding this profit obtained on the training set to the profit obtained on
the non-training set, from Section 5.5.4, gives us the overall profit. Thus, for fair comparison,
we add this training set profit to TDOT-G, LR, LR-G, SVM, and Naive Bayes. Both greedy
and upper-bound can obtain profit on the entire set of tasks, by definition.
Figure 5.6 shows the overall profit performance for different values of ε using Google-cluster
task data similar to Section 5.5.4. We see that the best profit is achieved for ε = 0.3, and
TDOT-G exhibits superior performance for all values of ε relative to other alternatives.
We see that the upper-bound offline and greedy solutions have performance that is inde-
pendent of ε as expected. However, a value of ε = 0.2 produces the best profit performance on
average while using TDOT and TDOT-G. This suggests that using around 20% of the expected
number of tasks for training in this setting allows the algorithm to both have enough tasks to
learn well but also have enough tasks to exploit the benefit of training. We note that TDOT
and TDOT-G still outperform the other online alternatives.
5.6 Summary
We study the online scheduling of tasks to multiple cloud servers with an objective to maximize
profit subject to load constraints. The processors in our model are heterogeneous and unary-
capacity, and the tasks arrive dynamically, resulting in a challenging problem. We have proposed
a polynomial-time TDOT algorithm that consists of a training phase and an exploitation phase
to obtain effective scheduling solutions. We provided a performance bound for TDOT under
the assumption that profit can also be obtained on partially-completed tasks if the load is
already met. We also proposed a modified algorithm, TDOT-G, for implementations where
profit can only be obtained on fully-completed tasks. Through trace-driven simulation, we saw
that TDOT and TDOT-G consistently outperform the comparison targets and can be tuned to
exhibit near-optimal performance.
Chapter 6
Concluding Remarks
6.1 Conclusions
In this dissertation, we have studied task offloading and scheduling in cloud computing envi-
ronments. Different optimization problems are investigated, and we wish to make scheduling
decisions that minimize cost or completion time, or maximize profit. We propose efficient
algorithms to solve these challenging NP-hard problems.
In Chapter 3, we have investigated the scheduling of applications consisting of dependent
tasks on heterogeneous processors with communication delay and application completion dead-
lines. The proposed cost minimization formulation is generic, allowing different cost structures
and processor topologies. To overcome the obstacles of task dependency and deadline con-
straint, we have developed the ITAGS approach, where the scheduling of each task is assisted
by an individual time allowance obtained from a binary-relaxed version of the original opti-
mization problem. Through trace-driven and randomized simulations, we show that ITAGS
substantially outperforms a wide range of known algorithms. Furthermore, as the deadline
constraint is relaxed, it converges to optimality much faster than other alternatives.
In Chapter 4, we have considered a multi-user computational offloading problem, for a
system consisting of a finite-capacity cloud with heterogeneous processors and tasks with het-
erogeneous release times, processing times, and communication times. The offloaded tasks incur
monetary cost for the cloud resource usage and each user has a budget constraint. We have
formulated a problem to minimize the weighted sum completion time subject to the user bud-
get constraints. We propose the STUBR algorithm and prove interesting performance guar-
antees. Using trace-driven simulation, we compare against existing alternatives and observe
that STUBR exhibits superior performance. We have observed that STUBR is scalable and
substantially outperforms the existing alternatives especially for larger systems.
In Chapter 5, we have addressed the online scheduling of tasks to multiple cloud servers
with an objective to maximize profit subject to load constraints. The processors in our model
are heterogeneous and unary-capacity, and the tasks arrive dynamically, resulting in a chal-
lenging problem. We have proposed a polynomial-time TDOT algorithm that consists of a
96
Chapter 6. Concluding Remarks 97
training phase and an exploitation phase to obtain effective scheduling solutions. We provided
a performance bound for TDOT under the assumption that profit can also be obtained on
partially-completed tasks if the load is already met. We also proposed a modified algorithm,
TDOT-G, for implementations where profit can only be obtained on fully-completed tasks.
Through trace-driven simulation, we saw that TDOT and TDOT-G consistently outperform
the comparison targets and can be tuned to exhibit near-optimal performance.
For all these three problems considered in this dissertation, we proposed efficient and effec-
tive algorithms and evaluated the performance of the algorithms through mathematical analysis
as well as trace-driven simulation results.
6.2 Future Directions
6.2.1 Task Scheduling in the Presence of Zero Task Information
In Chapter 5, we consider the problem where tasks arrive over time and we know the task
processing time once the task arrives at the controller. We then utilize this information in
order to make better task scheduling decisions. However, we are also interested in considering
the more practical problem where we know the task processing time only after making the
scheduling decision and executing the task. We still expect a learning algorithm to perform
well, and can potentially use a training-exploitation technique similar to TDOT but this would
need to be modified in order to cope with the lack of information.
6.2.2 Online Dependent-Task Scheduling
An additional feature would be incorporating dependencies or priorities in online task arrivals.
So far, we have considered that tasks that arrive online can be executed in any order as they
are independent. But it is possible that multiple tasks belong to a particular application and
need to be executed in a certain order, or we may associate certain tasks with higher priority
if they are urgent. Accounting for these while scheduling tasks will allow us to create a more
robust and practical scheduling scheme.
6.2.3 Caching
In all the chapters of this dissertation, we assume that the scheduling decision needs to be
computed each time for an application or an arriving task. However, some computation time
and power can be saved if these decisions can be cached and used to make future decisions
when the same application or task arrives. Incorporating caching in our scheduling will require
an additional layer of sophistication from our algorithms.
Chapter 6. Concluding Remarks 98
6.2.4 Straggler Nodes
A processor or a node that is performing poorly or slower than anticipated due to issues such
as faulty hardware are called straggler nodes [85]. In our work, we have assumed that the
processors are all well-behaving. However, scheduling algorithms can be implemented such
that they take the straggler nodes into account and preempt or restart tasks to manage these
situations.
6.2.5 Fuzzy Load Constraint
In Chapter 5, we consider load constraints for each processor that are strict and deterministic.
A more practical and interesting model might be to consider the case where these constraints are
probabilistic, or can be violated during some situations (such as in order to complete execution
of a task that has already begun execution).
Bibliography
[1] R. Kakerow, “Low power design methodologies for mobile communication,” in Computer
Design: VLSI in Computers and Processors, 2002. Proceedings. 2002 IEEE International
Conference on. IEEE, 2002, pp. 8–13.
[2] L. D. Paulson, “Low-power chips for high-powered handhelds,” Computer, vol. 36, no. 1,
pp. 21–23, 2003.
[3] J. W. Davis, “Power benchmark strategy for systems employing power management,”
in Electronics and the Environment, 1993., Proceedings of the 1993 IEEE International
Symposium on. IEEE, 1993, pp. 117–119.
[4] R. N. Mayo and P. Ranganathan, “Energy consumption in mobile devices: why future
systems need requirements–aware energy scale-down,” in Power-Aware Computer Systems.
Springer, 2003, pp. 26–40.
[5] H. T. Dinh, C. Lee, D. Niyato, and P. Wang, “A survey of mobile cloud computing: archi-
tecture, applications, and approaches,” Wireless communications and mobile computing,
vol. 13, no. 18, pp. 1587–1611, 2013.
[6] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case for vm-based cloudlets
in mobile computing,” IEEE Pervasive Computing, vol. 8, no. 4, pp. 14–23, 2009.
[7] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu, R. Chandra, and
P. Bahl, “Maui: making smartphones last longer with code offload,” in Proc. ACM Inter-
national Conference on Mobile Systems, Applications, and Services (MobiSys), 2010.
[8] D. Kovachev, T. Yu, and R. Klamma, “Adaptive computation offloading from mobile
devices into the cloud,” in Parallel and Distributed Processing with Applications (ISPA),
2012 IEEE 10th International Symposium on. IEEE, 2012, pp. 784–791.
[9] N. Vallina-Rodriguez and J. Crowcroft, “Erdos: achieving energy savings in mobile os,” in
Proc. ACM workshop on MobiArch, pp. 37–42, 2011.
[10] M. Satyanarayanan, “Mobile computing: the next decade,” ACM SIGMOBILE Mobile
Computing and Communications Review, vol. 15, no. 2, pp. 2–10, 2011.
99
Bibliography 100
[11] B. Liang, “Mobile edge computing,” in Key Technologies for 5G Wireless Systems, V. W.
S. Wong, R. Schober, D. W. K. Ng, and L.-C. Wang, Eds., Cambridge University Press,
2017.
[12] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “Thinkair: Dynamic resource
allocation and parallel execution in the cloud for mobile code offloading,” in Proc. IEEE
INFOCOM, 2012.
[13] W. Zhang, Y. Wen, and D. O. Wu, “Energy-efficient scheduling policy for collaborative
execution in mobile cloud computing,” in Proc. IEEE INFOCOM, 2013.
[14] B. Y.-H. Kao and B. Krishnamachari, “Optimizing mobile computational offloading with
delay constraints,” in Proc. IEEE GLOBECOM, 2014.
[15] A. EC2, “Pricing of on-demand instances,” https://aws.amazon.com/ec2/pricing/on-
demand/.
[16] S. Sundar and B. Liang, “Offloading dependent tasks with communication delay and dead-
line constraint,” in Proc. IEEE Conference on Computer Communications (INFOCOM),
2018.
[17] S. Sundar, J. P. Champati, and B. Liang, “Completion time minimization in multi-user task
scheduling with heterogeneous processors and budget constraints,” in Proc. IEEE/ACM
International Symposium on Quality of Service (IWQoS), Short Paper, 2018.
[18] S. Sundar and B. Liang, “Individual time allocation with greedy scheduling for offloading
dependent tasks with communication delay,” under review IEEE Transactions on Cloud
Computing (TCC), 2018.
[19] S. Sundar, J. P. Champati, and B. Liang, “Multi-user task offloading to heterogeneous
processors with communication delay and budget constraints,” under review IEEE Trans-
actions on Cloud Computing (TCC), 2018.
[20] S. Sundar and B. Liang, “Task dispatch through online training for profit maximization
at the cloud,” in Proc. IEEE INFOCOM Workshop on Network Intelligence, 2018.
[21] Y.-H. Kao, B. Krishnamachari, M.-R. Ra, and F. Bai, “Hermes: Latency optimal task
assignment for resource-constrained mobile computing,” in Proc. IEEE INFOCOM, 2015.
[22] K. Habak, M. Ammar, K. A. Harras, and E. Zegura, “Femto clouds: Leveraging mobile
devices to provide cloud service at the edge,” in Proc. IEEE CLOUD, 2015.
[23] S. Sundar and B. Liang, “Communication augmented latest possible scheduling for cloud
computing with delay constraint and task dependency,” in Proc. IEEE INFOCOM Work-
shop on Green and Sustainable Networking and Computing (GSNC 2016), 2016.
Bibliography 101
[24] X. Lin, Y. Wang, Q. Xie, and M. Pedram, “Task scheduling with dynamic voltage and
frequency scaling for energy minimization in the mobile cloud computing environment,”
IEEE Transactions on Services Computing, vol. 8, no. 2, pp. 175–186, 2015.
[25] L. A. Hall, A. S. Schulz, D. B. Shmoys, and J. Wein, “Scheduling to minimize average com-
pletion time: Off-line and on-line approximation algorithms,” Mathematics of Operations
Research, vol. 22, no. 3, pp. 513–544, 1997.
[26] B.-G. Chun and P. Maniatis, “Augmented smartphone applications through clone cloud
execution.” in HotOS, vol. 9, 2009, pp. 8–11.
[27] Y. Wen, W. Zhang, and H. Luo, “Energy-optimal mobile application execution: Taming
resource-poor mobile devices with cloud clones,” in Proc. IEEE INFOCOM, 2012.
[28] P. Balakrishnan and C.-K. Tham, “Energy-efficient mapping and scheduling of task inter-
action graphs for code offloading in mobile cloud computing,” in Proc. IEEE/ACM 6th
International Conference on Utility and Cloud Computing, pp. 34–41, 2013.
[29] J. Flinn, D. Narayanan, and M. Satyanarayanan, “Self-tuned remote execution for per-
vasive computing,” in Hot Topics in Operating Systems, 2001. Proceedings of the Eighth
Workshop on. IEEE, 2001, pp. 61–66.
[30] J. Flinn, S. Y. Park, and M. Satyanarayanan, “Balancing performance, energy, and qual-
ity in pervasive computing,” in Distributed Computing Systems, 2002. Proceedings. 22nd
International Conference on. IEEE, 2002, pp. 217–226.
[31] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud: elastic execution
between mobile device and cloud,” in Proceedings of the sixth conference on Computer
systems. ACM, 2011, pp. 301–314.
[32] R. K. Balan, M. Satyanarayanan, S. Y. Park, and T. Okoshi, “Tactics-based remote exe-
cution for mobile computing,” in Proceedings of the 1st international conference on Mobile
systems, applications and services. ACM, 2003, pp. 273–286.
[33] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case for VM-based cloudlets
in mobile computing,” IEEE Pervasive Computing, vol. 8, no. 4, pp. 14–23, Oct. 2009.
[34] F. Bonomi, R. Milito, J. Zhu, and S. Addepalli, “Fog computing and its role in the internet
of things,” 2012.
[35] E. G. Specification, “Mobile edge computing (mec); framework and reference architecture,”
ETSI GS MEC 003 V1.1.1, vol. 3, 2016.
[36] Y. Mao, C. You, J. Zhang, K. Huang, and K. B. Letaief, “A survey on mobile edge
computing: The communication perspective,” IEEE Communications Surveys & Tutorials,
vol. 19, no. 4, pp. 2322–2358, 2017.
Bibliography 102
[37] R. K. Balan, D. Gergle, M. Satyanarayanan, and J. Herbsleb, “Simplifying cyber foraging
for mobile devices,” in Proc. ACM MobiSys, 2007.
[38] A. Mtibaa, K. A. Harras, and A. Fahim, “Towards computational offloading in mobile
device clouds,” in Proc. IEEE International Conference on Cloud Computing Technology
and Science (CloudCom), 2013.
[39] A. Mtibaa, K. A. Harras, K. Habak, M. Ammar, and E. W. Zegura, “Towards mobile
opportunistic computing,” in Proc. IEEE CLOUD, 2015.
[40] C. Mateos, E. Pacini, and C. G. Garino, “An ACO-inspired algorithm for minimizing
weighted flowtime in cloud-based parameter sweep experiments,” Advances in Engineering
Software, vol. 56, pp. 38–50, 2013.
[41] B. Zhou, A. V. Dastjerdi, R. N. Calheiros, S. N. Srirama, and R. Buyya, “A context sen-
sitive offloading scheme for mobile cloud computing service,” in Proc. IEEE International
Conference on Cloud Computing (CLOUD), 2015.
[42] M.-H. Chen, B. Liang, and M. Dong, “A semidefinite relaxation approach to mobile cloud
offloading with computing access point,” in Proc. IEEE Signal Processing Advances in
Wireless Communications (SPAWC), 2015.
[43] Y. Mao, J. Zhang, and K. B. Letaief, “Joint task offloading scheduling and transmit power
allocation for mobile-edge computing systems,” in Proc. IEEE Wireless Communications
and Networking Conference (WCNC), pp. 1–6, 2017.
[44] X. Chen, L. Jiao, W. Li, and X. Fu, “Efficient multi-user computation offloading for mobile-
edge cloud computing,” IEEE/ACM Transactions on Networking, vol. 24, no. 5, pp. 2795–
2808, 2016.
[45] V. Cardellini, V. D. N. Persone, V. Di Valerio, F. Facchinei, V. Grassi, F. L. Presti,
and V. Piccialli, “A game-theoretic approach to computation offloading in mobile cloud
computing,” Mathematical Programming, vol. 157, no. 2, pp. 421–449, 2016.
[46] L. Tianze, W. Muqing, Z. Min, and L. Wenxing, “An overhead-optimizing task scheduling
strategy for ad-hoc based mobile edge computing,” IEEE Access, vol. 5, pp. 5609–5622,
2017.
[47] M.-H. Chen, B. Liang, and M. Dong, “Multi-user multi-task offloading and resource allo-
cation in mobile cloud systems,” IEEE Transactions on Wireless Communications, vol. 17,
no. 10, pp. 6790–6805, 2018.
[48] Z. Li, C. Wang, and R. Xu, “Computation offloading to save energy on handheld devices:
a partition scheme,” in Proc. ACM International Conference on Compilers, Architecture,
and Synthesis for Embedded Systems, 2001.
Bibliography 103
[49] M. Jia, J. Cao, and L. Yang, “Heuristic offloading of concurrent tasks for computation-
intensive applications in mobile cloud computing,” in Proc. IEEE INFOCOM Workshops,
2014.
[50] L. Yang, J. Cao, and H. Cheng, “Resource constrained multi-user computation partitioning
for interactive mobile cloud applications,” Technical report, Dept. of Computing, Hong
Kong Polytechnic Univ, 2012.
[51] M.-A. Hassan Abdel-Jabbar, I. Kacem, and S. Martin, “Unrelated parallel machines with
precedence constraints: application to cloud computing,” in Proc. IEEE CLOUDNET,
2014.
[52] Y.-H. Kao, B. Krishnamachari, M.-R. Ra, and F. Bai, “Hermes: Latency optimal task
assignment for resource-constrained mobile computing,” IEEE Transactions on Mobile
Computing, vol. 16, no. 11, pp. 3056–3069, 2017.
[53] S. Guo, B. Xiao, Y. Yang, and Y. Yang, “Energy-efficient dynamic offloading and re-
source scheduling in mobile cloud computing,” in Proc. IEEE International Conference on
Computer Communications (INFOCOM), 2016.
[54] C. Wang and Z. Li, “Parametric analysis for adaptive computation offloading,” ACM
SIGPLAN Notices, vol. 39, no. 6, pp. 119–130, 2004.
[55] Y. Zhang, H. Liu, L. Jiao, and X. Fu, “To offload or not to offload: an efficient code
partition algorithm for mobile cloud computing,” in Proc. IEEE CLOUDNET, 2012.
[56] M.-R. Ra, A. Sheth, L. Mummert, P. Pillai, D. Wetherall, and R. Govindan, “Odessa:
enabling interactive perception applications on mobile devices,” in Proceedings of the 9th
international conference on Mobile systems, applications, and services. ACM, 2011, pp.
43–56.
[57] J. Liu, Y. Mao, J. Zhang, and K. B. Letaief, “Delay-optimal computation task scheduling
for mobile-edge computing systems,” in Proc. IEEE International Symposium on Informa-
tion Theory (ISIT), 2016.
[58] L. Zhang, C. Wu, Z. Li, C. Guo, M. Chen, and F. C. Lau, “Moving big data to the cloud:
An online cost-minimizing approach,” IEEE Journal on Selected Areas in Communications,
vol. 31, no. 12, pp. 2710–2721, 2013.
[59] Z. Peng, D. Cui, J. Zuo, Q. Li, B. Xu, and W. Lin, “Random task scheduling scheme
based on reinforcement learning in cloud computing,” Cluster computing, vol. 18, no. 4,
pp. 1595–1607, 2015.
[60] J. P. Champati and B. Liang, “One-restart algorithm for scheduling and offloading in a
hybrid cloud,” in Proc. IEEE International Symposium on Quality of Service (IWQoS),
2015.
Bibliography 104
[61] Y. Fang, F. Wang, and J. Ge, “A task scheduling algorithm based on load balancing in
cloud computing,” in Proc. International Conference on Web Information Systems and
Mining, 2010.
[62] H. Goudarzi and M. Pedram, “Maximizing profit in cloud computing system via resource
allocation,” in Proc. IEEE Distributed Computing Systems Workshops (ICDCSW), 2011.
[63] Y. Chen, N. Zhang, Y. Zhang, and X. Chen, “Dynamic computation offloading in edge
computing for internet of things,” IEEE Internet of Things, 2019.
[64] J. Li, M. Qiu, Z. Ming, G. Quan, X. Qin, and Z. Gu, “Online optimization for scheduling
preemptable tasks on iaas cloud systems,” Journal of Parallel and Distributed Computing,
vol. 72, no. 5, pp. 666–677, 2012.
[65] D. P. Williamson and D. B. Shmoys, The Design of Approximation Algorithms, 1st ed.
New York, NY, USA: Cambridge University Press, 2011.
[66] D. B. Shmoys and E. Tardos, “An approximation algorithm for the generalized assignment
problem,” Mathematical Programming, vol. 62, no. 1-3, pp. 461–474, 1993.
[67] M. Chrobak, W. Jawor, J. Sgall, and T. Tichy, “Online scheduling of equal-length jobs:
Randomization and restarts help,” SIAM Journal on Computing, vol. 36, no. 6, pp. 1709–
1728, 2007.
[68] B. Kalyanasundaram and K. Pruhs, “Maximizing job completions online,” in Proc. Euro-
pean Symposium on Algorithms, 1998.
[69] S. Shalev-Shwartz et al., “Online learning and online convex optimization,” Foundations
and Trends R© in Machine Learning, vol. 4, no. 2, pp. 107–194, 2012.
[70] M.-F. Balcan, A. Blum, J. D. Hartline, and Y. Mansour, “Mechanism design via machine
learning,” pp. 605–614, 2005.
[71] N. R. Devanur and T. P. Hayes, “The adwords problem: online keyword matching with
budgeted bidders under random permutations,” in Proc. ACM Conference on Electronic
Commerce, 2009.
[72] H. Topcuoglu, S. Hariri, and M.-y. Wu, “Performance-effective and low-complexity task
scheduling for heterogeneous computing,” IEEE Transactions on Parallel and Distributed
Systems, vol. 13, no. 3, pp. 260–274, 2002.
[73] F. A. Potra and S. J. Wright, “Interior-point methods,” Journal of Computational and
Applied Mathematics, vol. 124, no. 1, pp. 281–302, 2000.
Bibliography 105
[74] B. Flipsen, J. Geraedts, A. Reinders, C. Bakker, I. Dafnomilis, and A. Gudadhe, “Envi-
ronmental sizing of smartphone batteries,” in Proc. IEEE Electronics Goes Green (EGG),
pp. 1–9, 2012.
[75] Z. Qiu, C. Stein, and Y. Zhong, “Minimizing the total weighted completion time of coflows
in datacenter networks,” in Proc. ACM symposium on Parallelism in Algorithms and Ar-
chitectures, 2015.
[76] W. E. Smith, “Various optimizers for single-stage production,” Naval Research Logistics,
vol. 3, no. 1-2, pp. 59–66, 1956.
[77] N. Karmarkar, “A new polynomial-time algorithm for linear programming,” in Proc. ACM
Symposium on Theory of Computing, 1984.
[78] H. W. Kuhn, “The hungarian method for the assignment problem,” Naval Research Logis-
tics, vol. 2, no. 1-2, pp. 83–97, 1955.
[79] J. Wilkes, “More google cluster data,” Available at https://github.com/google/cluster-data,
Nov. 2011.
[80] J. P. Champati and B. Liang, “Semi-online algorithms for computational task offload-
ing with communication delay,” IEEE Transactions on Parallel and Distributed Systems,
vol. 28, no. 4, pp. 1189–1201, 2017.
[81] J. P. Champati and B. Liang, “Single restart with time stamps for computational offload-
ing in a semi-online setting,” in Proc. IEEE Conference on Computer Communications
(INFOCOM), 2017.
[82] G. Aggarwal, G. Goel, C. Karande, and A. Mehta, “Online vertex-weighted bipartite
matching and single-bid budgeted allocations,” in Proc. ACM-SIAM Symposium on Dis-
crete Algorithms, 2011.
[83] E. L. Grab and I. R. Savage, “Tables of the expected value of 1/x for positive bernoulli
and poisson variables,” Journal of the American Statistical Association, vol. 49, no. 265,
pp. 169–177, 1954.
[84] B. Krishnapuram, L. Carin, M. A. Figueiredo, and A. J. Hartemink, “Sparse multinomial
logistic regression: Fast algorithms and generalization bounds,” IEEE Transactions on
Pattern Analysis & Machine Intelligence, no. 6, pp. 957–968, 2005.
[85] M. Zaharia, A. Konwinski, A. D. Joseph, R. H. Katz, and I. Stoica, “Improving mapreduce
performance in heterogeneous environments.” USENIX Symposium on Operating Systems
Design and Implementation, vol. 8, no. 4, p. 7, 2008.