MOBILE OFFLOADING FOR ENERGY-EFFICIENT COMPUTATION … · MOBILE OFFLOADING FOR ENERGY-EFFICIENT...

MOBILE OFFLOADING FOR ENERGY-EFFICIENT

COMPUTATION ON SMARTPHONES

BY

LIYAO XIANG

A THESIS SUBMITTED IN CONFORMITY WITH THE REQUIREMENTS

FOR THE DEGREE OF MASTER OF APPLIED SCIENCE,DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING,

AT THE UNIVERSITY OF TORONTO.

COPYRIGHT c© 2015 BY LIYAO XIANG.ALL RIGHTS RESERVED.

Mobile Offloading for Energy-efficient Computation onSmartphones

Master of Applied Science ThesisEdward S. Rogers Sr. Dept. of Electrical and Computer Engineering

University of Toronto

by Liyao Xiang2015

Abstract

Mobile offloading enables mobile devices to distribute computation-intensive tasks to the cloud

or other devices for energy conservation or performance gains. In principle, the idea is to trade

the relatively low communication energy expense for high computation power consumption. In

this thesis, we first focus on the technique of mobile code offloading to the cloud by proposing

the new technique of coalesced offloading, which exploits the potential for multiple applica-

tions to coordinate their offloading requests with the objective of saving additional energy on

mobile devices. We then turn our attention to collaborative mobile computing where a group of

mobile users with the common target job form coalitions to reduce the overall energy costs. We

propose distributed collaboration strategies through game theory, and formulate the problem as

a non-transferable utility coalitional game, and solve it by merge and split rules.

ii

To my families

AcknowledgmentsFirst, I would like to express my gratitude towards my advisor, Professor Baochun Li. With-

out his guidance, support, and patience, my master study could not be smoothly completed.

Throughout the research and thesis writing process, he provided insightful advice on exploring

some interesting and promising visions, as well as sound suggestions on technical writing.

I am also thankful to all dear members in iQua Research Group at the University of Toronto,

who not only offered practical suggestions to my research but also created a stimulating and

delightful environment in which to learn and grow. Special regards to Chen Feng who recently

got the tenure-track faculty position at the University of British Columbia, for his funny jokes

and wise advice.

Last but not least, I would like to give my geniune regards to my parents who love and

support me unconditionally throughout the entire journey. They are the source of my happiness

and power.

iii

Contents

Abstract ii

Acknowledgments iii

List of Tables vii

List of Figures ix

1 Introduction 1

1.1 Mobile Code Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Collaborative Mobile Computing . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Related Work 5

2.1 Mobile Code Offloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Collaborative Mobile Computing . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Coalesced Offloading from Mobile Devices to the Cloud 9

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.2 Motivation and Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 10

iv

CONTENTS CONTENTS

3.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2.2 The Coalesced Offloading Problem . . . . . . . . . . . . . . . . . . . 12

3.3 Coalesced Offloading: an Offline Solution . . . . . . . . . . . . . . . . . . . . 15

3.3.1 From Continuous-Time to Discrete-Time Formulation . . . . . . . . . 15

3.3.2 Optimal Offline Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4 Ready, Set, Go: Online Algorithms . . . . . . . . . . . . . . . . . . . . . . . . 21

3.4.1 The Dynamic TCP Acknowledgment Problem . . . . . . . . . . . . . 21

3.4.2 The Online Algorithm Aθ . . . . . . . . . . . . . . . . . . . . . . . . 22

3.4.3 Deterministic Online Algorithm: Performance Analysis . . . . . . . . 23

3.4.4 Performance Analysis of Aθ . . . . . . . . . . . . . . . . . . . . . . . 25

3.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.5.1 Measuring the Tail Time . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5.2 Model-Driven Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.5.3 Experiments on the Mobile Phone . . . . . . . . . . . . . . . . . . . . 34

3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Coalition Formation for Collaborative Mobile Computing 38

4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.2 The Collaborative Computing Model . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Task Distribution for Mobile Applications . . . . . . . . . . . . . . . . . . . . 43

4.4 Coalition Formation among Mobile Users . . . . . . . . . . . . . . . . . . . . 46

4.4.1 Coalitional Game and Properties . . . . . . . . . . . . . . . . . . . . . 47

4.4.2 Coalition Formation Algorithm . . . . . . . . . . . . . . . . . . . . . 48

4.4.3 Stability Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.5 Simulation Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 53

v

CONTENTS CONTENTS

4.5.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.5.2 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5 Conclusion 60

Bibliography 62

vi

List of Tables

3.1 Energy cost reduction compared with the naive strategy. . . . . . . . . . . . . . 35

vii

List of Figures

3.1 The benefits of coalesced offloading. . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 The coalesced offloading problem: an illustrative example. . . . . . . . . . . . 14

3.3 The online algorithm A1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.4 The proof of the competitive ratio of Aθ. . . . . . . . . . . . . . . . . . . . . . 26

3.5 The Proof of Lemma 3.2 (to prove the competitive ratio of Aθ) . . . . . . . . . 28

3.6 (a) The fcost of offloading requests with different levels of fluctuations. (b) The

estimated energy cost with varying α. . . . . . . . . . . . . . . . . . . . . . . 34

3.7 (c) The energy consumption on the mobile device with a varying α. (d) The

request transmissions on the mobile device w/o the RSG algorithm. . . . . . . 35

3.8 (e) The request transmissions on the mobile device w/o the RSG algorithm. (f)

The battery voltage change as measured on the mobile device. The top figure

shows the result of the naive strategy, and the bottom one shows the result of

the RSG algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1 The Workflow of an Example Job . . . . . . . . . . . . . . . . . . . . . . . . 41

4.2 The Resource Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.3 An example of mapping jobs to a set of mobile devices . . . . . . . . . . . . . 44

viii

LIST OF FIGURES LIST OF FIGURES

4.4 Average energy cost per user when users are non-cooperate, running central-

ized, merge and split algorithms using Pareto order and utilitarian order. (a)

User connecting ratio 0.10. (b) User connecting ratio 0.35. (c) User connect-

ing ratio 0.60. (d) User connecting ratio 0.95. . . . . . . . . . . . . . . . . . . 57

4.5 Average coalition size when users running centralized and merge and split al-

gorithms. (a) User connecting ratio 0.10. (b) User connecting ratio 0.35. (c)

User connecting ratio 0.60. (d) User connecting ratio 0.95. . . . . . . . . . . . 58

4.6 Average running time when users running centralized and merge and split al-

gorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

ix

Chapter 1

Introduction

The thin, user-interactive mobile devices and the powerful, remote cloud seem to be the two

ends of the world. At one end, the mobile device boasts the advantage of being mobile and

responsive to user, but generally lack computation power and short of battery; at the other end,

the cloud service requires stable network connection conditions and its computation power

appears to be infinite to the users. To exploit the advantages of the both, it is a valid idea to

offload some computation-intensive parts of the mobile application to execute on the cloud. As

a generalization of this idea, the offloading parts of any granularity can be regarded as a task

to be executed on any other devices. The goal of saving energy can be achieved by trading

relatively low energy consumption of network connection in exchange for high power expense

on intensive computation.

In this thesis, we study two problems: the first one will try to answer the question that

how to save energy when offloading code from one single device to the cloud; the second one

concerns about collaboration among different mobile devices to minimize the overall energy

consumption. In what follows, we will give a brief overview of each problems, and present

how this thesis is organized.

1

1.1. MOBILE CODE OFFLOADING 2

1.1 Mobile Code Offloading

Heralded as a primary feature in mobile cloud computing, code offloading from mobile devices

to the cloud has received a substantial amount of research attention in the recent literature. The

concept of code offloading is intuitively simple: with an abundance of computing power in the

cloud computing infrastructure and a keen awareness of power efficiency on mobile devices, it

is natural to offload a portion of the computational requests within computationally intensive

mobile applications. With its roots dating back to the notion of thin clients in the 1990s, code

offloading may be instrumental in a wide variety of mobile applications, from natural language

processing (e.g., Apple’s Siri) to augmented reality.

Code offloading can be performed at the granularity level of thread execution [1,2], method

invocation [3], and even full VM migration [4]. Either way, offloading requests have been well

planned, with the optimization objective of gaining better application performance and energy

efficiency. To achieve the objective, a typical solution includes a profiler on the mobile device

that collects runtime statistics of the mobile application, as well as a solver that partitions the

computation in a way that optimizes energy consumption or application performance.

However, existing works have so far focused on one application only. In reality, mainstream

mobile operating systems support multitasking, with multiple applications running simultane-

ously on a mobile device. Particularly, there may be several services running in the background

while one or two applications running on screen. A user may ask Siri (or Google Now) about

a location, viewing the augumented reality street view on her phone, while in the background

downloading a cloud-sourced video over 3G or 4G mobile networks at the same time. When

multiple applications send their offloading requests to the cloud independently without any

coordination, the cellular or Wi-Fi network interface needs to be activated to transmit these

requests, entering the high-power state at arbitrary times. This may potentially consume more

1.2. COLLABORATIVE MOBILE COMPUTING 3

energy: once a network interface enters the high-power state, it lingers in this state for a pe-

riod of time, usually seconds, after completing the transmission of all the existing requests [5].

The amount of energy the network interface consumes in the high-power state before it enters

stand-by again, referred to as the tail energy, is proportional to the length of time the interface

stays in this state (referred to as the tail time).

Since during the tail time, the smartphone transmits nothing but only waits for incoming

transmission requests, that portion of energy is wasted when the phone goes to idle state. If

multiple applications send their transmission requests without careful scheduling, much energy

would be wasted at the high-power state. In Chapter 3, we study the tail time phenomenon,

and propose algorithms to reduce such energy waste as much as possible without incurring too

much penalty in latency.

1.2 Collaborative Mobile Computing

As well as the effort of computational offloading to the cloud, the trend for mobile users seek-

ing power outside the devices to assist its computation-intensive tasks is increasingly popular.

Cyber foraging and cloudlet [6] have long been proposed as a way to liberate mobile devices

from severe resource constraints. Apart from migrating computation to the cloud, other fea-

tures such as ‘Continuity’ recently released by Apple has made the technique of cross-platform

application state migration possible, which allows users to smoothly transfer between their Mac

OS X and iOS in close proximity when editing an email or answering a phone call. Moreover,

the recent literature of I/O sharing [7] between mobile systems further facilitate the collabo-

ration between mobile devices by permiting an application running on one mobile system to

access the I/O devices on another device. That clearly points out a trend that more and more

tasks that previously can only be done on one device are designed to be executed distributedly

1.3. THESIS ORGANIZATION 4

on different devices and platforms which are connected with each other. Actually, today many

prototype applications in the area of social sensing, crowdsourcing, content sharing, etc. often

need to recruit many devices to jointly work together.

Moving from the powerful computing infrastructure cloud to nearby computational devices,

the strategy of assigning task is not a binary choice anymore. When multiple users are involved,

the problem of finding an optimal task arrangement in terms of energy consumption is NP-

hard. A possible approach is to let the users to make decisions distributedly to form coalitions.

In Chapter 4, the problem of collaborative mobile computing is studied with focus on task

distribution and coalition formation.

1.3 Thesis Organization

The remainder of this thesis is organized as the following. In Chapter 2, we present the back-

ground and related work to this thesis. Chapter 3 discusses the energy-saving approach to

coalescedly offload computational requests from devices to the cloud. Both simulations and

real-world experiments on iPhone are done in this part. Chapter 4 theorectically proposes a

collaborative mobile computing coalition game, and adopted merge-and-split rules in forming

the coalitions. The efficiency of the algorithm is verified by simulations. In Chapter 5, we

concludes our work and reveal our future works.

Chapter 2

Related Work

In this Chapter, we briefly review the related literature in the field of mobile cloud comput-

ing, illustrate the difference between other work and ours, and finally motivate the following

chapters.

2.1 Mobile Code Offloading

Many existing works in the literature of code offloading between mobile devices and the cloud

only considered the optimal offloading choice of a single application. Works such as [1, 3] de-

cided at runtime which parts of the application are to be remotely executed with an optimization

engine, in order to achieve the best energy savings. Kosta et al. [8] developed a framework of

smartphone virtualization in the cloud, allowing method-level computation offloading. Gordon

et al. [2] used a distributed shared memory technique instead of remote procedure calls to sup-

port multi-threaded applications to run on multiple machines. However, it has been observed

that the on-and-off switching state of the network interface, incurred by offloading requests

from multiple simultaneously running applications, unnecessarily consumes much idle energy.

5


Without considering that aspect in the entire optimization framework, it is insufficient to dis-

cuss code offloading alone.

Our work is also closely related to Balasubramanian et al. [5], as it found that 3G incurs a

high tail energy overhead for lingering in the high-power state after the completion of a transfer.

It also proposed a scheduling algorithm to minimize the energy consumed while meeting user-

specified deadlines. However, the scheme is only designed for delay-tolerant and prefetching

applications, without taking the length of delays into account.

Our online strategies are tied to the online algorithm literature [9–11]. The dynamic TCP

acknowledgment problem is a generalization of the classical ski rental problem with the same

competitive ratio. We show that our problem is a generalization of the dynamic TCP acknowl-

edgment problem, and we prove that the competitive ratio achieves its special case, which is

already proven to be the best possible. A similar case can also be found in scheduling tasks

to minimize the total power consumption [12], and they presented an effort to minimize the

number of “gaps,” e.g., the idle periods, in application execution.

2.2 Collaborative Mobile Computing

Most previous works in the area of application migration or code offloading, such as MAUI [3],

CloneCloud [1], and ThinkAir [8] etc., only consider offloading via the Internet to powerful

servers in remote cloud. The cloud seems to be a perfect solution as the backend of smart-

phones, but the long WAN latency often fail to achieve the stringent delay requirement of

mobile applications. To counter the long latency in WAN, the idea of “Mobile Cloud” was pro-

posed in [13, 14] to take advantage of computing power in proximity to get execution speedup

and energy savings. Compared to offloading to the cloud, “Mobile Cloud” is especially use-

ful when the internet access is expensive or unavailable. Our work is mostly related to this


category in terms of multiple devices collaboration.

However, the background of our work is different from Serendipity [15] and other works

in sharing computing resources: in those works, usually one initiator mobile device ‘borrows’

idle computational resources available on other devices in its environment to accomplish tasks

of certain structure. In this work, the device ‘borrows’ computation power from mobile devices

which have the same job objective and can share the workload with. Works like [13, 16, 17]

discuss collaborative computation performed distributedly on a set of mobile devices, but the

incentive of participants is not taken into account. As a matter of fact, it is very hard to define

fairness in such a situation where users contribute multiple resources (computation power and

network connections), and even harder to devise a proper incentive mechanism to enable ev-

eryone to contribute. In this work, we don’t assume users are selfish in that each one of them

is purely motivated by its own benefit, and switches between coalition groups when it sees

utility improvement. The approach may reflect how users behave in real world but impractical

in taking too many iterations before all users reach an agreement; instead, we adopted a sim-

ple merge-and-split rule for each coalition to practice, in which coalitions can either be merged

into one or split up into any number when at least one user sees strict utility improvement while

not hurting other users’ utilities.

From the coalition formation perspective, our work is related to cooperative media stream-

ing problem [18] and the UAVs’ coalition formation problem in wireless networks [19]. While

in cooperative streaming, each mobile device provides their bandwidth endowments in a coop-

erative manner, in our work both the computation power and connection links contribute to the

resource pool which is shared among users in a coalition. The cooperative streaming game is

a game without transferable utility, while the UAV deployment is a transferable utility game.

The problem in our work is found to be a non-transferable utility(NTU) game where each user


adjusts its strategy according to the energy expense which cannot be transferred across different

devices.

Chapter 3

Coalesced Offloading from Mobile Devices

to the Cloud

3.1 Overview

In this thesis, we propose the concept of coalesced offloading, which seeks to achieve addi-

tional energy savings by exploiting the potential for multiple mobile applications to coordinate

their code offloading requests to the cloud. Coalesced offloading realizes the intuition that, by

sending code offloading requests in “bundles,” the period of time that the network interface

stays in the high-power state can be reduced, thus saving additional energy. Our proposed

technique of coalesced offloading is inspired by timer coalescing, used in the kernel of Mac

OS X 10.9 Mavericks, that improves the energy efficiency by deferring and shifting computa-

tion tasks from multiple applications to the same time interval. To our knowledge, our work

represents the first attempt to improve power efficiency by bundling offloading requests from

multiple applications in a coalesced fashion.

Since bundling offloading requests may incur additional offloading delays, we choose to

9

3.2. MOTIVATION AND PROBLEM FORMULATION 10

formulate the problem of coalesced offloading as a joint optimization problem, with both the

energy cost and the response time considered. The highlight of our original contributions is the

design of two online algorithms, collectively referred to as Ready, Set, Go (RSG), that are de-

signed to solve our optimization problem. As the benchmark for evaluating RSG, we first study

an offline algorithm that computes the optimal solution with a time complexity of O(n), with

the impractical assumption that the exact arrival times of future requests from all the applica-

tions are known a priori. Without any knowledge of upcoming offloading requests beforehand,

our deterministic online algorithm is 2-competitive against the optimal offline algorithm, and

our randomized online algorithm is e/(e− 1)-competitive (1.58-competitive). We analytically

show that both online algorithms achieve the best possible competitive ratios in their respective

cases. Our online algorithms are simple enough to implement: using both simulations and our

real-world implementation on the iOS platform, we show that the RSG online algorithm is able

to realize an additional energy saving of up to 20% for the deterministic case and 27% for the

randomized case with a variety of offloading request patterns.

3.2 Motivation and Problem Formulation

In this section, we first motivate the notion of coalesced offloading, and then formally formulate

the optimization problem of making optimal offloading decisions, considering both the energy

cost and application performance.

3.2.1 Motivation

With current code offloading techniques, if a portion of the application code (e.g., a method

invocation or a thread) is to be offloaded to the cloud, an offloading request will be generated,


and the cellular or Wi-Fi network interface on the mobile device will be activated, incurring

a small ramp-up energy cost, such as the WiFi association overhead. After the completion of

transmitting each request, the interface will not immediately switch to the low-power state.

Instead, it remains at the high-power state for tens of seconds — an inactive period referred to

as the tail time [5], as shown in Fig. 3.1 (a). If there is another request coming in during the

tail time, the inactivity timer will be reset, and the interface will stay at the high-power state

until the end of the transmission, plus another period of the tail time if there are no further

successive requests. The tail time phenomenon is especially critical with the 3G interface,

which consumes nearly 60% of the total energy consumption [5].

Time

t1

Time

Power State

Power State

(a) Before bundling:

(b) After bundling:

requests of app 1

requests of app 2

t2 t3 t4 t5 t6 t7

t2(t1') t3 t5(t4') t7(t6')

Figure 3.1: The benefits of coalesced offloading.

As an important insight that we explore in this paper, the tail time phenomenon can be

alleviated if we bundle the offloading requests into small batches, and handle them all together.

This reduces the energy consumption as the wireless network interface on the mobile device

is activated for fewer times, and a shorter tail time is incurred. Naturally, a single application

may not have frequent successive requests for code offloading; we focus on the abundant re-

quest bundling opportunities that exist when we consider the offloading requests from multiple


applications running on the device simultaneously. As the example in Fig. 3.1 (b) shows, three

bundles can be formed when offloading requests for two applications are considered at the same

time, which lead to a reduced period of time for the network interface to stay in the high-power

state, as compared to handling each of them independently without any coordination. Such

request bundling from multiple applications is formally referred to as coalesced offloading in

this paper. It requires all offloading requests to be granted by an OS-level coalesced offloading

framework, possibly with a delay, before application code is actually offloaded to the cloud.

3.2.2 The Coalesced Offloading Problem

While coalesced offloading is able to reduce energy costs, request bundling requires the subse-

quent offloading requests to wait for a period of time for the next batch to be handled, which

results in additional offloading delays and may adversely affect the application performance.

The main challenge of coalesced offloading is balancing the tradeoff between the energy cost

and the application performance. If requests are bundled more aggressively, less energy costs

are incurred as a shorter period of time is spent in the high-power state for code offloading.

However, withholding the offloading requests will inevitably cause longer offloading delays.

On the other hand, sending offloading requests in a more scattered manner can maintain the

high performance of applications, but will incur a longer period of time in the high-power state,

causing more energy to be consumed. To find the “sweet spot” in such an inherent tradeoff be-

tween energy savings and application performance, we formulate the problem of coalesced

offloading as a joint optimization problem, considering both the energy cost and application

performance in the objective function.

We assume that there are M applications, 1, 2, . . . ,m, running on the mobile device, and

each application generates multiple offloading requests during their runtime based on their own


profiler and solver. Let a1, a2, . . . be the arrival time sequence of the offloading requests across

all the applications, and g1, g2, . . . be the granting time sequence, each element of it represent-

ing one transmission from mobile device to the cloud. Notice that multiple requests can be

bundled and granted in one transmission. The granting time directly determines the transition-

ing time from the low to the high power state. The device transitions from the high to the low

power state only when the network has been inactive for the length of tail time. That is to say,

the subsequent transmission occurs at least tail time after the preceeding transmission. We use

the sequence t1, t2, . . . and s1, s2, . . . to respectively denote such transition time sequence when

the wireless interface enters the high-power state from the low-power state and the inverse. Let

T be the duration of the tail time after the completion of transmission. Since the duration of a

request transmission is a few orders of magnitude shorter than the length of the tail time (in the

order of seconds), we assume that all request transmissions are completed instantaneously.

Fig. 3.2 shows an illustrative example of our model. The offloading requests arrival time

sequence is a1, a2, . . . , a9, and the granting time sequence is g1, g2, . . . , g5. As we can see, two

offloading requests generated at time a1 and a2 are delayed to be transmitted at g1, the arrival

time of the third request a3, and the network interface goes into the high-power state. Since

the high-power state remains for at least T , requests generated at a4 and a5 are transmitted

immediately. The network interface transits to the low-power state after idling for time T , and

enters the high-power state again at the next transmission time g4. In a nutshell, we seek to

formulate the problem to find the optimal solution of the granting time sequence g1, g2, . . . to

determine when the wireless interface of the mobile device should stay at the high-power state

for transmitting offloading requests, such that a combined interest in both the energy cost and

the application performance is optimized.

Since the actual energy cost is nearly linear to the duration that the network interface stays


Tail Time T

Power State

Time

a3(g1) = t1

High-power State Low-power State

s1a2a1 a5(g3) a4(g2) a6 a7 a8(g4) = t2 a9(g5)

latency(1)

Figure 3.2: The coalesced offloading problem: an illustrative example.

at the high-power state, in our problem formulation, we use that time duration to represent the

energy cost.

Observation 3.1 If a transmission occurs at gi when the network interface is in the high-power

state, the energy cost is gi − gi−1. If gi occurs when the network interface is in the low-power

state, the energy cost can be considered as T .

To be more specific, when the transmission gi occurs when the interface is in the high-

power state, it extends that state for a period of gi − gi−1. If gi occurs during the low-power

state, it contributes the tail time T to the energy cost. If the duration of gi−gi−1 > T , the high-

power state will expire before gi, so that the gi occurs in the low-power state. Thus, the energy

cost for one transmission is min{gi − gi−1, T}. The joint optimization problem of coalesced

offloading can be formulated as follows:

min fcost =∑j

min{gj − gj−1, T}+ α∑j

∑ai s.t.

gj−1≤ai≤gj

(gj − ai), (3.1)

In the objective function, the first term represents the energy cost while the second term denotes

the total latencies as offloading requests are postponed by the coalesced offloading framework.

3.3. COALESCED OFFLOADING: AN OFFLINE SOLUTION 15

α is introduced to combine the two objectives, and to balance the conflicting interests between

minimizing the energy costs and minimizing the total latencies for granting the offloading

requests.

At first glance, our formulated problem is similar to the dynamic TCP acknowledgment

problem [9]. The dynamic TCP acknowledgment problem discusses the scenario when a num-

ber of subsequent messages are to be acknowledged, whether we should acknowledge each

individual message immediately upon receiving it, or acknowledge multiple messages with a

single acknowledgment packet. While we amortize the tail energy by delaying the offloading

requests, the dynamic TCP acknowledgment problem delays the acknowledgments to alleviate

the acknowledgment overhead. That said, the two problems are actually quite different. In our

problem, the tail energy is determined by the amount of time that the mobile device stays at

the high-power state, since the transmission time is negligible comparing to the tail time; while

in the dynamic TCP acknowledgment problem, the acknowledgment overhead mainly depends

on the number of acknowledgements.

3.3 Coalesced Offloading: an Offline Solution

In this section, we start to solve the optimization problem that we have formulated by first

transforming it into a discrete-time optimization problem, and then present an offline solution

based on dynamic programming in this section.

3.3.1 From Continuous-Time to Discrete-Time Formulation

By carefully analyzing problem (4.1), we make the following two important observations.

Observation 3.2 All offloading requests should be transmitted either at their respective arrival


times or the arrival time of other requests to minimize fcost.

Proof We prove that any transmission of requests that do not satisfy the above conditions

increases the total costs. Suppose two requests arrive sequentially at time 〈a1, a2〉, and we are

to about to schedule their transmission time(s) g with the objective of minimizing fcost. We

have the following three choices: (1) g ∈ (0, a1], (2) g ∈ (a1, a2), 3) g ∈ [a2, inf).

For the first case, the total cost is

fcost1 = (a1 − g) + min{a2 − a1, T}+ T.

Obviously, when g = a1, the value is the minimum:

fmincost1 = min{a2 − a1, T}+ T.

In the second case, there would be an α(t − a1) delay cost for the first request. Adding the

same energy cost as in the first case, the total cost would take the form of

fcost2 = min{a2 − t, T}+ T + α(t− a1), a1 < t < a2.

Similarly, the cost in the third case is

fcost3 = T + α(a2 − a1) + 2α(g − a2),

of which the minimum value is:

fmincost3 = T + α(a2 − a1)


when g = a2.

From fcost2, we have

• When a2 − t < T,

fcost2 = α(a2 − a1) + (1− α)(a2 − t) + T.

Obviously, if α < 1, fcost2 > fmincost3. If α > 1, then

fcost2 > (t− a1) + (a2 − t) + T

= a2 − a1 + T ≥ fmincost1.

• When a2 − t > T,

fcost2 = α(t− a1) + 2T > 2T ≥ fmincost1.

Therefore, no offloading request should be granted in times other than the arrival times if the

total cost fcost is to be minimized. All requests are either granted at their own arrival times or

the arrival time of other requests.

Since tj indicates the time when the wireless interface enters the high-power state, after

which requests will be granted for transmission as they arrive, tj equals to the arrival time of

one of the requests. Similarly, sj shall be a tail time T after some arrival time of a granted

request. The original problem is equivalent to determining at which request’s arrival the inter-

face should be switched to the high-power state, and from which request’s arrival no further

transmissions should take place. For each request, the scheduling decision becomes whether


to transmit it immediately or to wait until the next transmission. In this way, we can trans-

form the original problem of deciding when to power on and off the network interface into

making a decision for each request upon its arrival, on whether to send it out immediately or

to delay it. Therefore, we may use 1 to represent transmitting the current request (with or

without previously delayed requests), and use 0 to represent the decision to delay the current

request. We transform the original problem (4.1) into deciding a binary transmission sequence

〈1, 0, 0, . . . , 1, . . .〉 for the successively arriving requests, such that the total cost fcost is mini-

mized.

In a nutshell, if a request is granted immediately, the latency cost is 0, and the energy

cost for transmitting the request arrives at ai is min{ai − gprev, T}, where gprev represents the

preceeding transmission; if the request is delayed, it only incurs a latency cost since it will

not extend the tail time. Whenever the request is withheld from transmitting immediately, the

latency cost is α(gnext − ai). Thus, we have

f icost =

min{ai − gprev, T}, if granted,

α(gnext − ai), if delayed.(3.2)

Let fcost represent the sum of the energy cost and latency cost of transmitting the entire request

sequence. We should

min fcost =n∑i=1

f icost, (3.3)

for 2n combinations of binary transmission sequences according to Eqn. (3.2).

The transformation from the resulted binary transmission sequence into the time sets 〈t1, t2, . . . , tk〉

and 〈s1, s2, . . . , sk〉 is simple: let t1 be the arrival time of the first 1 appearing in the sequence.

Whenever the interface enters high-power state, a timer is set to T . If a request is granted

before the timer counts down to 0, the timer will be reset to T . s1 is the time that the timer


firstly counts down to 0. Whenever the binary transmission sequence turns to 1 again, we set

the arrival time of the request as t2. In this way, we alternatively determine the time sequence

of entering the high-power state t1, t2, . . . and s1, s2, . . . the time sequence of leaving that state.

3.3.2 Optimal Offline Algorithm

We now present an optimal offline algorithm to solve the problem (4.1), in which the arrival

time sequence a1, a2, . . . , an are given a priori. The objective is to output a binary transmission

sequence Seq[n], such that the total cost fcost is minimized. Though it depends on unrealistic

assumptions of knowing the timing of all future requests, our offline algorithm will serve as

the benchmark for us to design and evaluate our online algorithms.

We use dynamic programming to obtain an optimal offline algorithm with a time complex-

ity of O(n). Let Cmin[i] be the minimum cost of the arrival time subsequence 〈a1, a2, . . . , ai〉

and Seq[i] be the binary transmission sequence for that arrival time subsequence. For an arrival

time sequence of length i, there are 2i possible combinations of binary transmission sequences,

all of which will be traversed to obtain the one with the minimum cost.

With respect to the binary transmission sequence, we state the following facts that will lead

us to the offline algorithm. There are 2i−1 possible binary transmission sequences in total for

the arrival time sequence 〈a1, a2, . . . . . . , ai〉, if the last request in the arrival time sequence

must be transmitted. The cost of the arrival time sequence 〈a1, a2, . . . , ai〉 consists of the sum

of min{ai − ai−1, T} and the cost for 〈a1, a2, . . . , ai−1〉, if the granting time sequence of the

latter is a subsequence of the former. If not, the cost of the arrival time sequence from a1 to

ai is the sum of the cost of the sequence from a1 to ai−j , the latency costs of the requests

from ai−j+1 to ai−1, and the tail time T . We first need to find out the granting time sequence

that minimizes the cost of all the subsequences of 〈a1, a2, . . . , ai〉 to obtain the granting time


sequence minimizing its total cost. We now give our optimal offline algorithm, as summarized

in Algorithm 3.1.

Algorithm 3.1 The Offline AlgorithmInput: a1, a2, . . . , anOutput: Seq[n]Initialize Cmin[0] = 0Initialize Seq[0] = 〈〉Initialize Cmin[1] = TInitialize Seq[1] = 〈1〉for i ∈ [2, n] doCmin[i] = Cmin[i− 1] + min{ai − ai−1, T}Seq[i] = 〈Seq[i− 1], 1〉for j ∈ [2, i] doC[i] = Cmin[i− j] + α

∑i−1k=i−j+1(ai − ak) + T

if C[i] < Cmin[i] thenCmin[i] = C[i]Seq[i] = 〈Seq[i− j], 01st , . . . , 0j−1th , 1〉

Theorem 3.1 The offline algorithm produces an optimal solution to the transformed coalesced

offloading problem (6).

Proof It is known that if a problem possesses the optimal substructure property, then any

dynamic programming algorithm that explores all subproblems is an optimal algorithm. To

see problem (6) contains the optimal substructure property, we only need to note that if the

sequence Seq[i] optimizes the total cost of input 〈a1, a2, . . . , ai〉, the subsequence of Seq[i] —

Seq[i − j] must be an optimal solution to the subsequence 〈a1, a2, . . . , ai−j〉. Our algorithm

obviously explores all the possible subproblems to obtain the optimal solution.

3.4. READY, SET, GO: ONLINE ALGORITHMS 21

3.4 Ready, Set, Go: Online Algorithms

We are now ready to design Ready, Set, Go (RSG), our online algorithms to solve problem (4.1)

without a priori knowledge of the arrival time sequence. We begin by considering the algo-

rithms that probabilistically vary the amount of latency with a similar approach to the dynamic

TCP acknowledgment problem [9]. We show that the coalesced offloading problem is a gen-

eralized case of this problem, which is known to be a generalization of the online ski rental

problem.

3.4.1 The Dynamic TCP Acknowledgment Problem

The dynamic TCP acknowledgment problem is a generalization of the online ski rental problem

with the following form. The input is a sequence of the packet arrival times a1, a2, . . . , an and

the output is a set of times t1, t2, . . . , tk at which an acknowledgment occurs. The latency is

defined as the amount of time elapsed between a packet arrives and it is acknowledged. The

cost of each acknowledgment is 1. The problem objective is to minimize

k +∑

1≤j≤k

latency(j).

It has been proved by Karlin et al. that the randomized algorithm of this problem has an optimal

competitive ratio of e/(e− 1).

We can see that the coalesced offloading problem is a generalized case for the dynamic

TCP acknowledgment problem with the following analysis.

While the cost for each acknowledgment in the dynamic TCP acknowledgment problem

is a constant, its counterpart in our problem, i.e., the energy consumption of each transmis-

sion, is a function that depends on the previous transmission time. To show the dynamic TCP


acknowledgment problem is a special case of our problem, we only need to set T such that

T < (gi − gi−1). Then the energy cost for each transmission is a constant T . If we set both

T and α to 1, then the energy cost is exactly the acknowledgment cost of the TCP problem,

whereas the latency costs in both problems are equivalent.

3.4.2 The Online Algorithm Aθ

Our algorithm Aθ is defined as follows.

Definition 3.1 Aθ is a randomized algorithm that selects θ between 0 and 1 according to a

probability density function p(θ) = eθ/(e − 1). Let R(t, t′) be the number of requests that

arrive between time t and t′, and g1, g2, . . . , gi, . . . be the times at which requests are granted

and transmitted. Algorithm Aθ grants the next request at gi+1 such that there exists a time τi+1,

gi < τi+1 < gi+1, that satisfies

R(gi, τi+1)(gi+1 − τi+1) = (θ/α)Si, (3.4)

where

Si = 2(min{τi+1 − gi, T}+min{gi+1 − τi+1, T})−min{gi+1 − gi, T}.

The intuition behind the equation above is simple: given the previous transmission occur-

ring at time gi, the additional transmission happens at τi+1 will reduce the latency cost by θSi.

Si is essentially the amount of energy cost increment due to the additional transmission. It is

easy to prove that

min{τi+1 − gi, T}+min{gi+1 − τi+1, T} ≥ min{gi+1 − gi, T},


so that

Si ≥ min{gi+1 − gi, T} ≥ min{gi+1 − τi+1, T}. (3.5)

Fig. 3.3 helps to explain our algorithms and proofs. The x-axis represents the time, and

the y-axis represents the number of request arrivals. The staircase function indicates the arrival

sequence of the requests. gi defines the times at which a bundle of requests is granted. The

shaded area below the staircase curve and the dotted line above represents the saved latency

cost. Fig. 3.3 shows an example of the online algorithm A1, letting τ0 = g0 = 0.

!"

Time

Request

Arrivals

!# !$!$!#!"

s1α

s0α

s2α

Figure 3.3: The online algorithm A1.

3.4.3 Deterministic Online Algorithm: Performance Analysis

We prove that when θ = 1, the deterministic online algorithm A1 is 2-competitive against the

optimal algorithm AOPT.


Lemma 3.1 The optimal algorithm grants a request between any pair of successive transmis-

sions.

Proof We suppose that A1 grants requests at times g1, g2, . . . , gi, . . .. Enrich the sequence by

adding transmissions at time τi+1, gi < τi+1 < gi+1 for all i such that Eqn. (3.4) is satisfied.

It is obvious to see from Fig. 3.3 that, by adding a transmission at time τi+1, the latency cost

decreases at least by min{(gi+1 − τi+1), T} units between gi and gi+1, whereas the additional

energy consumption incurred is at most min{(gi+1 − τi+1), T} units. In this case, the new

sequence is at least as good as the original one. It is easy to see the reduced amount of latency

cost by Eqn. (3.5). To see the additional energy cost incurred is at most min{(gi+1− τi+1), T},

we recall that by performing an additional transmission at τi+1, the energy cost is increased by

min{τi+1−gi, T}whereas the original energy cost of the transmission at gi+1 min{gi+1−gi, T}

is now updated to min{gi+1− τi+1, T} for one additional transmission. The net increase of the

energy cost satisfies

min{(τi+1 − gi), T} −min{gi+1 − gi, T}

+min{gi+1 − τi+1, T}

≤ min{(gi+1 − τi+1), T}

since τi+1 < gi+1. Hence, there exists an optimal sequence that grants requests as least once in

the interval (gi, gi+1) for all i.

Theorem 3.2 Algorithm A1 is 2-competitive.

Proof Let the input be any arbitrary request arrival sequence. As Lemma 3.1 has shown, the

optimal algorithm grants at least one request between any pair of successive transmissions. In

Fig. 3.4, we use a ray instead of a staircase curve to simplify the presentation of the arrival


times of the requests. The solid line and dotted line stand for the transmission sequences of the

optimal algorithm AOPT and A1. The shaded area below AOPT and above A1 is L(A1\AOPT),

which is the latency cost incurred by A1 but not AOPT. The area is bounded by the granting

time of the optimal sequence at the left and the granting time of A1 at the right. Thus, by the

definition of A1, we have

L(A1\AOPT) ≤ ( 1/α)∑

iSi.

Then the total cost of A1 can be calculated as follows:

CA1 = αL(A1) + E(A1)

≤ CAOPT + αL(A1\AOPT)− E(AOPT) + E(A1)

≤ CAOPT + α× (1/α)∑i

Si −∑i

(min{gi+1 − τi+1, T}+

min{τi+1 − gi}) +∑i

min{gi+1 − gi, T}

= CAOPT +∑i

(min{gi+1 − τi+1, T}+min{τi+1 − gi})

= CAOPT + E(AOPT)

≤ 2COPT.

3.4.4 Performance Analysis of Aθ

Theorem 3.3 The competitive ratio between the expected cost incurred by Aθ and the optimal

cost is e/(e− 1).

Proof We start our proof by first decomposing the total cost of Aθ. As Fig. 3.4 has illustrated,

L(Aθ\AOPT) is the latency incurred by Aθ but not AOPT, which is the dark shaded area above


Request

arrivals

Time

AOPT

AθRequest arrivals

L(Aθ\AOPT)

L(AOPT\ Aθ)

Figure 3.4: The proof of the competitive ratio of Aθ.

the dotted line and below the solid line. Likewise, L(AOPT\Aθ) stands for the latency incurred

by AOPT but not Aθ, which is illustrated by the light shaded area above the solid line and below

the lighted line. The latency cost of Aθ is the area above the curve of Aθ and below the curve

of request arrivals, which is at most the area above the solid curve plus the dark shaded area

minus the light shaded area. Thus, the total cost satisfies

CAθ ≤ Eθ + (COPT − EOPT)

+ [L(Aθ\AOPT)− L(AOPT\Aθ)]× α,

letting Eθ and EOPT be the energy cost of Aθ and AOPT, respectively. By the definition of Aθ,

the sum of the dark shaded area is:

L(Aθ\AOPT) ≤ (θ/α)∑i

Si

= (θ/α)(2EOPT − Eθ).


For the light shaded area, we will prove the following lemma first.

Lemma 3.2 The light shaded area L(AOPT\Aθ) satisfies:

αL(AOPT\Aθ) ≥∫ 1

0

E(x)dx− (1− 2θ)EOPT − θEθ. (3.6)

To prove the lemma above, we make the following claim. Let M(E, θ) be the minimum,

over all possible granting sequences W with the energy cost E, of the area above W and below

the Aθ curve as shown in Fig. 3.5. We omit the request arrivals and use a line to represent Aθ.

We claim that for any u > v ≥ θ,

M(Eu, θ) ≥ [(v − θ)/α](Ev − Eu) +M(Ev, θ) (3.7)

Proof Let nu and nv represent the total number of grants incurred by performing algorithmsAu

andAv for the same input. The granting sequence is h1, h2, . . . , hnu forAu and is g1, g2, . . . , gnv

for Av. As shown in Fig. 3.5, the shaded rectangles of Av, defined by the definition of Av,

intersect with the Au curve at most nu times. Therefore, at least nv − nu shaded rectangles

strictly lie above the curve of Au. Pick exactly nv − nu of them, denote each one of them by

its transmission sequence number i, and define the set of these rectangles as V ∗. Let

S(V ∗) =∑i∈V ∗

Si.

Then the sum of the area of the nv − nu rectangles in V ∗ is (v/α)S(V ∗), and the area of

(v/α)S(V ∗) that lies above the curve of Aθ is at most (θ/α)S(V ∗). Thus the shaded area


Time

Au

Av

Area above Aθ

Request Arrivals

Rectangle intersecting Au

Rectangle lying

above Au

!! "! #! !$ "$ #%

Aθ

Figure 3.5: The Proof of Lemma 3.2 (to prove the competitive ratio of Aθ)


below the Aθ curve is at least (v−θ)αS(V ∗), and this area strictly lie above the curve of Au. We

next generate a new granting sequence g∗1, g∗2, . . . , g

∗n with A∗v such that the energy cost of it

is exactly the same with the energy cost of the transmission sequence of Av. Also, the new

generated sequence issues a grant at τi, ∀ i ∈ V ∗. Thus, the shaded area strictly lies below the

curve of A∗v but above the curve of Au in Fig. 3.5, which is

M(Eu, θ)−M(Ev, θ) ≥ [(v − θ)/α]S(V ∗). (3.8)

Note that Ev in Eqn. (3.8) is the energy cost of the new granting sequence of A∗v, which is

equivalent to the energy cost of Av. It is still required to prove the following to have Eqn. (3.7).

S(V ∗) ≥ (Ev − Eu). (3.9)

By Eqn. (3.5), we have verified

S(V ∗) ≥∑i∈V ∗

min{gi+1 − gi, T}.

For the same input sequence, the entire time duration of the transmission sequences of Au and

Av is the same:nu−1∑i=0

(hi+1 − hi) =nv−1∑i=0

(gi+1 − gi),

Removing the nv−nu items in V ∗ from the right-hand side and applying the minimum function

to both sides, we then get

nu−1∑i=0

min{hi+1 − hi, T} ≥∑j /∈V ∗

min{gj+1 − gj, T},


Therefore,

S(V ∗) ≥nv−1∑i=0

min{gi+1 − gi, T} −∑j /∈V ∗

min{gj+1 − gj, T}

≥nv−1∑i=0

min{gi+1 − gi, T} −nu−1∑i=0

min{hi+1 − hi, T}

= (Ev − Eu).

Combining with Eqn. (3.8) gives us Eqn. (3.7).

Letting u = v + dv, the equation (13) can be rewritten as

M(Ev+dv, θ) ≥v − 2θ

α(Ev − Ev+dv) +M(Ev, θ).

Integrating from θ to t, for any θ < t ≤ 1, we have

∫ t

θ

dM(Ev, θ) ≥∫ t

θ

−v − θα

dEv +θ

α

∫ t

θ

dEv

which is equivalent to

M(Et, θ)−M(Eθ, θ) ≥1

α[

∫ t

θ

Evdv − (t− θ)Et] +θ

α(Et − Eθ).

M(Eθ, θ) = 0 according to its definition. And for v > t, Ev ≤ Et, we have

M(Et, θ) ≥1

α[

∫ 1

θ

Evdθ − (1− 2θ)Et − θEθ].

Let Et = EOPT , and recall that M(EOPT , θ) is the lower bound for L(AOPT\Aθ). Thus,

Lemma 3.2 is proved.

3.5. PERFORMANCE EVALUATION 31

We hereby can prove Theorem 3.3. By the definition of Aθ, θ is picked from 0 to 1 accord-

ing to a probability density function p(θ) = eθ

e−1 , and∫ x0p(θ)dθ = P (x).

CAθ ≤ COPT − EOPT +

∫ 1

0

p(θ)[Eθ+

α(L(Aθ\AOPT )− L(AOPT\Aθ))]dθ

≤ COPT − EOPT +

∫ 1

0

p(θ)[Eθ + 2θEOPT − θEθ

−∫ 1

0

Exdx+ (1− 2θ)EOPT + θEθ]dθ

= COPT − EOPT +

∫ 1

0

p(θ)[Eθ −∫ 1

0

Exdx+ EOPT ]dθ

= COPT +

∫ 1

0

p(θ)Eθdθ −∫ 1

0

p(θ)

∫ 1

0

Exdxdθ

by changing the order of integration

= COPT +

∫ 1

0

p(θ)Eθdθ −∫ 1

0

ExP (x)dv

= COPT +

∫ 1

0

(p(θ)− P (θ))Eθdθ.

At last, we have:

CAθCOPT

≤ 1 +

∫ 1

0(p(θ)− P (θ))Eθdθ∫ 1

0Eθdθ

=e

e− 1.

3.5 Performance Evaluation

We evaluate both offline and online algorithms using both model-driven simulations and real-

world experiments on a mobile device. We start with measuring the tail time in our model


to evaluate the cost performance of both offline and online algorithms. Then we quantify the

reduction of energy utilizations performing the RSG algorithms with real-world runtime traces

from mobile applications.

We run all of our real-world experiments on an iPhone 3GS with iOS 6.1.3, and using the

Bell Mobility 3G cellular network. To measure the energy consumption, we use PowerGrem-

lin [20], a power usage monitor application, to record run-time battery capacity (mAh) with a

sample duration of one second. All of our measurements are performed under stable network

conditions, with the mobile device running in a standalone environment in which all other ap-

plications and background tasks are shut off except for our application-level prototype service,

and with the screen off.

3.5.1 Measuring the Tail Time

The measurement methodology of the tail time is as follows. Initially we plan to use Power-

Gremlin to track the energy trace of sending a packet, however, this method ends up with a

very subtle energy change that is difficult to detect. We decided that the tail time is to be mea-

sured by transmitting successive packets of equal sizes, given that the time intervals between

every two transmissions are the same. Our argument is that when the time interval is smaller

than tail time T, the 3G network interface is kept on from the previous transmission to the next,

thus the variation of the transmission interval will make no difference to the overall energy

consumption. On the contrary, if the time interval between transmissions is longer than the tail

time T, the 3G network interface enters stand-by some time T after the completion of the last

transmission, thus the overall energy consumption is reduced.

To measure the 3G tail time, we generate stable sequential offloading requests over a period

of 5 minutes. To eliminate the effect of varying transmission costs incurred by different sizes


of the packets, we set the packet to be of equal sizes, and small enough to avoid a heavy

transmission overhead. In our experiments, the time intervals between requests are designated

to span from 3 to 17 seconds. Our measurement result is in accordance with our argument:

the total energy consumed during the 5-min period keeps the same level when the transmission

time interval varies from 3 to 9 seconds, but drops dramatically at 9 seconds. As a result, 9

seconds is the tail time for the 3G interface in iOS 6. We hereby use the result in our subsequent

simulations and experiments.

3.5.2 Model-Driven Evaluation

In our model-driven evaluations, the trace of offloading requests is a sequence of the arrival

times 〈a1, a2, . . . , ai, . . .〉, simulating the timing of multiple offloading requests from several

simultaneously running applications. We categorize the request patterns into three types: low,

medium, and high fluctuation. With these request sequences as input, we compare the total

cost fcost of the online RSG algorithms with the benchmark of the offline algorithm. Each

simulation result is averaged over 500 rounds of tests.

As Fig. 3.6-(a) shows, on average, the cost of the randomized online RSG is no more

than 1.4009 times of the benchmark offline algorithm, while the ratio of the cost of the de-

terministic online RSG to the offline algorithm is 1.4652. Both numbers are within the 1.58

and 2-competitive ratio as analyzed previously. In addition, the randomized online algorithm

generally achieves better performance in terms of fcost when the fluctuation is higher, while the

performance of the deterministic online algorithm almost remains the same for different inputs.

Fig. 3.6-(b) compares the energy costs of the naive, deterministic online, randomized on-

line, and offline algorithms when the weight factor α varies. The naive strategy is to send the

offloading request upon its arrival. As stated previously, the energy cost is proportional to the


time that the network interface stays in the high-power state, which can be used to estimate

the energy costs. As Fig. 3.6-(b) illustrates, the naive strategy incurs the highest energy costs

over all the αs. The performance of the offline algorithm dominates when α is less than 1, but

approaches the curve of the naive case when α increases. This conforms with our observation

that the offline algorithm grants more requests immediately upon arrival when more weights

are added to the latency. The curves for the two RSG online algorithms lie between the naive

and the offline ones, and their energy cost curves climb less dramatically than the offline curve.

low medium high250

300

350

400

450

500

550

600

To

tal C

ost

Randomized

Deterministic

Offline

0.2 0.4 0.6 0.8 1 1.2 1.4 1.650

100

150

200

250

300

350

α

En

erg

y C

ost

Naive

Deterministic

Randomized

Offline

(a) (b)

Figure 3.6: (a) The fcost of offloading requests with different levels of fluctuations. (b) Theestimated energy cost with varying α.

3.5.3 Experiments on the Mobile Phone

In our real-world experiments, we choose XML-RPC [21] to emulate successive offloading

requests generated from multiple applications and their transfers to the cloud. We use three

typical types of offloading requests — random, bursty, and stable — to represent real-world

traces. Each measurement result is averaged over 50 trials, with each trial containing around

50 transfer requests. Without loss of generality, the parameter α is set to be 0.3. From our


Table 3.1: Energy cost reduction compared with the naive strategy.

α Offline Randomized Deterministic0.3 62.3% 28.2% 14.84%1.0 36.23% 14.33% 8.49%1.3 34.09 % 13.86% 8.61%

experimental results, as shown in Fig. 3.7-(c), we have observed that the RSG deterministic

online, randomized online, and offline algorithms can respectively achieve 20.23%, 27.10%

and 60.20% of energy reduction on average, for all three types of requests, compared to the

naive strategy.

Random Bursty Stable0.02

0.03

0.04

0.05

0.06

0.07

0.08

Un

it T

ime

En

erg

y C

ost

(mA

h)

Naive

Deterministic

Randomized

Offline

0.2 0.4 0.6 0.8 1 1.2 1.4 1.60.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

α

Un

it T

ime

En

erg

y C

ost

(mA

h)

Naive

Offline

Randomized

Deterministic

(c) (d)

Figure 3.7: (c) The energy consumption on the mobile device with a varying α. (d) The requesttransmissions on the mobile device w/o the RSG algorithm.

When we look into that how the energy costs vary with α, we find that when α is smaller,

RSG bundles requests in a more aggressive manner so that more energy is saved. Fig. 3.7-(d)

and Table 3.1 together illustrate the energy cost reduction of different algorithms compared to

the naive strategy with varying α, with random requests only.

We take a step further to test our algorithms using real-world traces. To get the trace, we

run three typical mobile applications, Rubik Solver, Email, and online chatting, that are ready


to offload on our iPhone 3GS, and leverage Wireshark [22] to record their network traffic.

The Rubik Solver demonstrates highly bursty traffic for it is computation-intensive, whereas

Email regularly checks with the server in the background, and online chatting arbitrarily gen-

erates traffic from time to time. Fig. 3.8-(e) shows the actual transmission times before and

after scheduling. Apparently, with the RSG algorithm, requests from multiple applications

are transmitted in bundles. To further verify our results, we monitor the raw battery voltage

variation on the mobile device. As Fig. 3.8-(f) shows, the battery voltage are more stable and

decrease more moderately with RSG. Our experiments have revealed that by performing the

RSG algorithm with our real-world traces, the energy consumption is reduced by 20.71%.

Naive

Online

Time (s)

Re

qu

est

Gra

nts

Rubik Solver Email Chat

50 100 150 200 2504040

4060

4080

4100

4120

4140

Ra

w B

att

ery

Vo

lta

ge

(m

V)

50 100 150 200 2504040

4060

4080

4100

4120

4140

Time (s)

Ra

w B

att

ery

Vo

lta

ge

(m

V)

(e) (f)

Figure 3.8: (e) The request transmissions on the mobile device w/o the RSG algorithm. (f) Thebattery voltage change as measured on the mobile device. The top figure shows the result ofthe naive strategy, and the bottom one shows the result of the RSG algorithm.

3.6. SUMMARY 37

3.6 Summary

Coordinating the offloading requests of multiple applications to achieve greater energy savings

while maintaining satisfactory performance is an important issue in offloading from mobile

devices to the cloud. In particular, how can we schedule the offloading requests without any

knowledge of the future requests? To answer that, we propose RSG, which consists of two

online algorithms, one deterministic and one randomized, that dynamically decide when to

grant requests without future information. We prove that the RSG online algorithm achieves

the best possible 2-competitive ratio for the deterministic case and e/(e−1) for the randomized

one. With RSG, our real-world implementation on the iOS platform has shown a substantial

amount of energy savings.

Chapter 4

Coalition Formation for Collaborative

Mobile Computing

4.1 Overview

In this chapter, we study the coalition formation problem for a group of mobile users working

on one job. There are several cases that motivate such collaborations. Mobile devices which

are constrained by their own batteries may need to look for the help of other devices. A typ-

ical example is that a mobile device is going to drain out the battery before finishing the job,

thus it seeks collaborators to share the job with to bring down the energy costs. Or the mobile

device is limited by its capability or physical location to perform the job. For example, some

crowdsourcing task may require data gathered from sensors/cameras on smartphones in certain

area. The collaborative computing model also fits for many other applications in crowdsourc-

ing, content sharing, indoor localization, etc. A particular scenario is that when users enter a

venue, they are likely to use the same shopping-aid or navigation application while they are in

location proximity with each other. Crowdsourcing for mapping an indoor floor plan is such

38

4.1. OVERVIEW 39

a job that can be shared among a group of proximate users. Since collaboration with other

devices can potentially mitigate the overall costs, each mobile device would naturally seek to

form coalition to share their computation and network resources with. The underlying princi-

ple is similar to that of mobile code offloading to the cloud – each mobile user trades relative

low communication energy expense with high computation cost so that the overall energy cost

is reduced. We leave the security and privacy problems out for other literatures to contribute,

and focus on the collaboration issues.

Some general questions to ask in the above scenarios are: among a set of users who are

interested in collaboratively performing a job, how to distribute the tasks to each so that the

overall energy cost is minimized? How do coalitions form among users hosting different re-

sources? In this work, the term of user and device refer to the same thing. We assume that each

user only concerns computation and network connection energy consumption in that these are

accounted as major energy cost factors for a smartphone; and they are pertinent to task distribu-

tion. Even executing the same task would incur different energy costs across devices. Choosing

which collaborators to work with also has significant impact on the energy cost, since mobile

devices communicate with each other via various wireless channels with different energy char-

acteristics. As pointed out in [23], energy consumption of transmitting certain amount of data

is inversely proportional to the available bandwidth condition. If we describe the computation

capability and network channels of each mobile device using a resource graph, the first problem

can be formulated as an optimization problem over all partitions of the graph. However, seek-

ing a centralized solution to minimize the energy cost is very tricky and impractical; further,

a central arbitrator hardly exists in real world. To tackle the problem, we made the following

efforts:

4.2. THE COLLABORATIVE COMPUTING MODEL 40

• Formulating the task distribution problem as a 0-1 integer quadratic programming prob-

lem with quadratic constraints to minimize the overall energy costs for completing a job

on a group of mobile devices.

• Proposing a distributed algorithm for coalition formation through merge-and-split rules.

Using the proposed algorithm, multiple mobile users can self-organize into disjoint in-

dependent coalitions.

The following sections are organized as follows. The underlying infrastructure is intro-

duced and the system model is described in Sec. 4.2. Sec. 4.3 formulates the problem as a

centralized task distribution problem. Sec. 4.4 takes another look at the problem and recon-

sider it as a coalitional game among users. A simple distributedly-running merge-and-split

algorithm is proposed for forming coalitions. The algorithm performance is evaluated against

non-cooperative, other cooperative, and centralized schemes in the following section. The last

section briefly summaries the work.

4.2 The Collaborative Computing Model

In this section, we describe the task distribution problem for a group of mobile users. A job is

modelled by its workflow as such in Fig. 4.1. Divided by functionality, each block stands for an

atomic task which can only be executed on one device. The arrow between blocks represents

the data flow from one task to the other. The task that an arrow pointed at requires the output of

the task before it as its input; otherwise, the task cannot be completed. For example, the outputs

of “Image Capturing” are the inputs of “Data Backup” and “Features Extraction”. However,

the tasks with I/O relations described above can be executed on different devices or on the same

device; in the former case, the link is considered as an external link. Of all the state variables


that affect the power consumption of a phone, CPU utilization and network connecions are the

two most significant attributes, which are measured by computation cycles and the amount of

data transmitted or received respectively.

We express the above job requirement using a directed graph Gt = (V,E). Each node

i ∈ V is a task associated with computation cycles ci. Each link ei,j ∈ E between nodes i

and j is associated with di,j the amount of data transferred from task i to task j. All computa-

tional cycles of each task and the amount of data to be transferred between them are profiled

beforehand.

Image Capturing15M cycles

Image Capturing20M cycles

Data Backup10M cycles

Features Extraction50M cycles

Find Match100M cycles

350 KB

2 MB

350 K

B

2 M

B

180

KB

500 KB

Figure 4.1: The Workflow of an Example Job

Due to disparity in hardware and operating systems, different mobile devices have varied

energy consumption over the same computation cycles. Since these mobile devices are dis-

tributed in different locations, they connect with each other in various ways as Fig. 4.2 shown:

they can establish Bluetooth or WiFi adhoc connections when in proximity, or via 3G or LTE

networks and so on. The energy costs of different wireless channels vary as well. To describe

the computational and network resources of the potentially collaborative mobile users, we use

the directed resource graph Gr = (N,L), in which each node n ∈ N stands for a device, and it


takes an Joule to execute one unit computational cycle. Each link ln,m ∈ L of the graph repre-

sents the communication channel from user n to user m, each being associated with bn,m Joule

per KB transferred data from device n to m. The energy costs with regard to computational

cycles and data transfer are profiled on each device a prior.

Internet

WiFi AP

Bluetooth

Figure 4.2: The Resource Graph

From a holistic view, the goal is to minimize the overall energy consumption of all users ac-

complishing the job. We assume that all users involved have the common goal of finishing the

job. Although each one of them can do the job individually without incurring any communi-

cation cost, they may also collaborate with as many other users as possible as long as the tasks

can be divided up. In the latter case, the user can potentially reduce the computation burden

at the cost of increasing communication energy consumption. Between these two extremes,

there should be a sweet spot which minimizes the overall energy costs. To achieve that, we are

interested in partitioning the resource graph Gr into subgraphs of which users form coalitions,

4.3. TASK DISTRIBUTION FOR MOBILE APPLICATIONS 43

and then map the task graph Gt onto each resource subgraph. The steps of partitioning the re-

source graph and mapping tasks are correlated with each other in minimizing the total energy

consumption.

In the next section, we will illustrate the centralized approach to find the optimal partition

and distribution is highly impractical. We thus consider deriving a distributed solution enabling

each user to choose the set of collaborators.

4.3 Task Distribution for Mobile Applications

Given that the resources and job workflow modelled as graphs, there will be multiple ways to

partition the users to different coalitions, and for each coalition, there will be different ways to

assign tasks to each user. Our objective is to find out the optimal way to partition the resource

graph and map the job workflow graph onto it while the energy consumed overall is minimized.

Besides the energy costs, other objectives can be defined within the problem structure as

well. For example, we can also consider the processing time of the job as the goal of task

assignment if the job is delay-stringent. In our problem, we seek a centralized solution that

minimizes the overall energy consumption of executing the job. Let B be the set of all partitions

of graph Gr, and T be one user coalition of the partition P ∈ B. Our objective is:

minP∈B

∑T∈P

minC(T ). (4.1)

C(T ) is the energy consumption of the coalition T . It is the sum of the energy expense of

all mobile devices in that coalition. Given one partition of the graph, we map the job onto the

subgraph in an optimal way that minimizes the energy consumption of the coalition. Fig. 4.3

shows a toy example. Mobile devices n1 to n5 form two coalitions T1, T2 which execute the


job respectively, and the job consists of three tasks i1, i2, i3.

n1

n2

n3 n5

n4

i1

i2

i3

i3

i1

i2 i2

i1

i3

Coalition T1

Coalition T2

Figure 4.3: An example of mapping jobs to a set of mobile devices

In addition to the notations in section 4.2, we define the following notations to describe the

energy consumed executing tasks and in communications.

• Let ln,m = 1 represent that device n and m are connected on Gr, and 0 otherwise.

• Let ei,j = 1 represent that the task i is connected with task j on Gt, and the output of task

i is the input of task j. ei,j = 0 represents that the task i, j are not directly associated.

• Let si,n = 1 represent that task i is assigned to device n, and 0 otherwise.

• Let ri,n = 1 represent that task i is able to be executed on device n, and 0 otherwise.

For a given partition P and a group of users T , we optimize the energy expense over the

coalition under the topology constraints of the resource and job workflow graphs. Energy


consumption minimization over the coalition can be formulated as below.

minC(T ) =∑n∈T

φn(T ), (4.2)

of which φn(T ) represents the energy consumption on device ∀ n ∈ T :

φn(T ) =

∞, if |T | > |V|,

En(T ), otherwise.(4.3)

En(T ) = an∑i∈V

si,nci+

∑i∈V

∑m 6= n,

m, n ∈ T

i 6= j,

i, j ∈ V

si,nsj,m(ei,jbn,mdi,j + ej,ibm,ndj,i) (4.4)

We consider it infeasible to divide atomic task thus setting a barrier when there are more

than |V| users in a coalition. The first part of (4.4) is linear representing the computational

energy costs, while the second part is quadratic w.r.t si,n denoting the network energy expense

on all external links of the device. The constraints are as follows:

∑n∈T

si,n = 1, ∀ i ∈ V, (4.5)

∑i∈V

si,n ≥ 1, ∀ n ∈ T, (4.6)

si,nsj,mei,j ≤ ln,m, ∀ i 6= j, ∀ n 6= m,n,m ∈ T (4.7)

si,n ≤ ri,n, ∀ i, ∀ n ∈ T, (4.8)

4.4. COALITION FORMATION AMONG MOBILE USERS 46

si,n ∈ {0, 1}, ∀ n ∈ T, ∀ i ∈ V. (4.9)

The given parameters in the above are an, ci, ln,m, bn,m, ei,j , di,j , and ri,n. Given coalition T ,

the optimization problem in (4.2) is to find the optimal si,n to minimize the energy consumption

of a coalition. Constraint (4.5) means each task i is supposed to be executed on one device

only. And constraint (4.6) presents the requirement that each device should be involved with

at least one task. (4.7) shows that device n and m can collaborate with each other only when

the external link between them exists. (4.8) expresses the constraint on the availability of the

resource on device when performing certain task. The problem of (4.2) is quadratic and non-

convex, and the constraints of (4.7) can be non-convex as well, thus there is no existing solver

to solve this integer programming problem. However, since the solution space is not huge

(2|V|∗|T | ≤ 2|V|2), a heuristic approach can be used to find the optimal solution.

4.4 Coalition Formation among Mobile Users

At the first glance, we seek a centralized solution that minimizes the energy consumption of all

users. In the centralized approach, we assume there is an arbitrator who has the energy profiles

of all participating users. It makes decisions w.r.t. the coalition structures, and within each

coalition assigns tasks to each user. However, it is shown in [24] that a problem as (4.1) is a

NP-complete problem. This is mainly because the number of possible partitions of graph Gr

grows exponentially with the number of users. Moreover, finding the optimal way of assigning

tasks to each group adds complexity to the problem. In real world, an arbitrator hardly exists

to assist with the decision making process. Thus we proposed a distributed solution enabling

each user to make local decisions to join or split from coalitions depending on its preference.


4.4.1 Coalitional Game and Properties

Coalitional game is a game where groups of players may enforce cooperative behavior, hence

the game is a competition between coalitions of players. In the area of coalitional game, coali-

tion formation has been a research topic of continuing interest, and a set of analytical tools

have been developed [25].

In the settings of collaborative computing, the proposed problem is modeled as a (N, v)

coalitional game where N is the set of players (or users), and v is the utility function or value

of a coalition, which in our case corresponds to the energy cost C(T ) for users coalition T .

Notice that if the coalition size is larger than the number of tasks of the job, i.e., |V| in the task

graph Gt, the job cannot be divided among the users in coalition, thus the corresponding utility

is defined as negative infinity. Otherwise, the utility v(T ) should be a decreasing function of

the energy cost, hereby is given by:

v(T ) = −C(T ) = −∑n∈T

φn(T ) (4.10)

We first give the following definition from [26] to prove important properties of the collab-

orative computing game model.

Definition 4.1 A coalitional game (N, v) is said to have a transferable utility if the utility

function v(T ) can be arbitrarily apportioned between the coalition’s players. Otherwise, the

coalitional game has a non-transferable utility and each player will have its own utility within

coalition T .

By the definition, we hereby give the first property of collaborative computing model.

Property 4.1 The proposed collaborative computing game (N, v) has a non-transferable util-

ity.


Proof According to (4.10), the utility of coalition T v(T ) is the negative sum of the energy

cost of each user. Since the energy cost per device is fixed once the task assignment is done,

the utility cannot be transfered among the users. The utility of the coalition T cannot be

arbitrarily apportioned between the coalition’s players, thus the proposed coalitional game has

a non-transferable utility.

Generally, it is assumed that the grand coalition in which all users participate maximizes

the utilities of all users. However, this property is not true in our case in that: on one hand,

within one coalition, if more than |V| users participate in performing the job, it is infeasible to

further split the atomic tasks into subtasks; on the other hand, when the number of participating

users increases, the communication energy cost increases as a result, thus decreases the total

utility. Therefore, for the proposed (N, v) coalitional game we have the second property:

Property 4.2 For the collaborative computing game (N, v), the grand coalition of all users

does not always form. Disjoint independent coalitions will form among the mobile users.

Overall, we have a non-transferable (N, v) coalitional game and our goal is to devise a

distributed algorithms for coalition formation.

4.4.2 Coalition Formation Algorithm

First of all, we give the following definitions which will be used in the derivation of the coali-

tion formation algorithm.

Definition 4.2 A collection is any family T := {T1, ..., Tl} of mutually disjoint coalitions. If

additionally⋃lj=1 Tj = N, the collection T is called a partition of N.

Definition 4.3 Assume A and B are partitions of the same set C, a comparison relation B is

defined as, ABB means that the way A partitions C is preferable to the way B partitions C.


Each comparison relation B is used only to compare partitions of the same set of players.

Partitions of different sets of players are incomparable w.r.t. B. As illustrated in [25], the com-

parison relations on partitions are induced in a canonic way from the corresponding comparison

relations on multisets of reals as such:

ABB ⇐⇒ v(A)B v(B). (4.11)

The comparison relation B on multisets of reals can be defined in different orders such as util-

itarian order, Nash order, or lexmin order. We consider both coalitional values and individual

values in this work when users choosing which coalition to join or split from. The coalitional

values are values of the coalitional group; utilitarian order is to compare different coalitions by

the total sum of the utility of each user in that coalition. Formaly, utilitarian order is defined as

follows:

ABB ⇐⇒ v(A)B v(B) (4.12)

where v(T ) is exactly what we defined in Sec. 4.4.1 that corresponds to the total energy con-

sumed within the coalition T .

From an individual’s perspective, each user joins or splits from any coalition depending on

its own preferences over the energy consumption on its device while not hurting others’ benefit.

The less its energy consumption, the higher the user’s preference of the coalition. Hence we

consider individual values that for a partition T := {T1, ..., Tl},

φ(T ) := {φn(Ti)|Ti ∈ T, n ∈ Ti}. (4.13)

Given two partitions T = {T1, ..., Tl} and T ′ = {T ′1, ..., T′

k} of the same set of players N, the

comparison relations now compare φ(T ) and φ(T ′), which are multisets of |N| real numbers,


one player per number. φ(T ) can be also viewed as a sequence of user utilities of the same

length |N|. We compare such sequences using Pareto order:

ABB ⇐⇒ ∀ n, φn(A) ≤ φn(B) and ∃m,φm(A) < φm(B) (4.14)

where A and B are assumed to be of the same length.

With the comparison relation defined in (4.12) and (4.14), we hereby give the following

two rules that allow us to transform partitions of the grand coalition:

merge: {T1, ..., Tk} ∪ P → {⋃kj=1 Tj} ∪ P , where {

⋃kj=1 Tj}B {T1, ..., Tk}.

split: {⋃kj=1 Tj} ∪ P → {T1, ..., Tk} ∪ P , where {T1, ..., Tk}B {

⋃kj=1 Tj}.

Using the above rules, multiple coalitions can merge into a larger one if at least one user

strictly reduces its energy usage. Likewise, one coalition can be split into smaller coalitions.

Because the number of different partitions is finite, every iteration of the merge and split rules

terminates.

Theorem 4.1 Given the comparison relation defined in Def. 4.3, every iteration of merge and

split rules terminates.

However, when different sequences of merge and split rules apply to the initial partition,

the outcome may be different. In the following section, we study under what conditions that it

is guaranteed that arbitrary sequences of these two rules yield the same outcome. Before that,

we illustrates our algorithms based on these two simple rules.

4.4.3 Stability Analysis

In this section, we study the conditions guaranteeing the unique outcome of the iterations of

the merge and split rules. First, we introduce the notion of defection function D that assigns to


Algorithm 4.1 Collaborative Computing Game through Merge and Split

Input: Initial partition T = {T1, ..., Tl} = NOutput: Final partition T final

repeatT = Merge(T );T = Split(T );

until merge and split terminates.T final = T .

Algorithm 4.2 Merge

Input: Initial partition T = {T1, ..., Tk}Output: Intermediate partition Ffor i ∈ [1, k] do

for j ∈ [i+ 1, k] doif {Ti, Tj}B {{Ti}, {Tj}} thenTi = {Ti, Tj};Remove Tj;for l ∈ [j, k − 1] doTl = Tl+1;

k = k − 1;j = j − 1;

F = {T1, ..., Tk}

Algorithm 4.3 Split

Input: Initial partition T = {T1, ..., Tk}Output: Intermediate partition Ffor i ∈ [1, k] do

Randomly choose Tj ⊂ Tiif {{Tj}, {Ti/Tj}}B {Ti} thenTi = {Ti/Tj};Tk+1 = {Tj};k = k + 1;

F = {T1, ..., Tk}


each partition some partitioned subsets of the grand coalition. Particularly, Dp is the family of

defection functions that allows formation of all partitions of the grand coalition, and Dc are the

functions that allow formation of all collections in the grand coalition. According to the results

of [25], Dc-stable is important to the unique outcome.

Definition 4.4 A partition P = {P1, ..., Pk} of N is Dc-stable iff the following two conditions

are satisfied.

• for each i ∈ {1, ..., k} and each pair of disjoint coalitionsA andB such thatA∪B ⊆ Pi,

{A ∪B}B {A,B}.

• for each coalition T ⊆ N such that ∀ i ∈ {1, ..., k}, T * Pi,

{T}[P ]B {T},

where

{T}[P ] := {P1 ∩⋃{T}, ..., Pk ∩

⋃{T}}\{∅}.

Adopting the analysis results of [25], we found the following theorem guaranteeing the

unique outcome of merge and split rules.

Theorem 4.2 Assume that B is a comparison relation and P is a Dc-stable partition, then P

is the unique outcome of every iteration of the merge and split rules.

4.5. SIMULATION RESULTS AND ANALYSIS 53

4.5 Simulation Results and Analysis

4.5.1 Simulation Setup

Usually we collect energy usage data by using profilers such as MAUI profilers as in [3], as well

as the computational cycles per task, and the amount of data transferred between different tasks.

The data can be either files or the state transferred between devices for distributedly running

tasks. All these information is taken as input to the task distribution optimization problem to

determine which task should be assigned to which device that minimizes the overall energy

consumption.

In our simulation, we adopted the task graph as shown in Fig. 4.3, and set the computational

cycles of each task to 20− 100 M cycles and the data transferred from 10 to 1000 KB on each

link. As pointed out in [23], the energy consumption of transmitting a fixed amount of data is

related to the available bandwidth condition or connection condition. As it is further verified

in [3], the average transfer energy to download 50 KB data over WiFi is most energy-efficient

with approximately 20 mJ/KB while downloading over GSM or 3G costs around 50 − 200

mJ/KB. However, transmitting data in bad connectivity could consume much more energy

than that in good condition no matter what connection mode the smartphone is in.

In the experiments, we fix the topology of the task graph, and randomly generate resource

connection graph in which every two nodes are linked with each other with given probabil-

ity. The data transfer energy consumption of a link is assigned a value that is uniformly

distributed on [20, 200] mJ/KB, and the computational energy consumption on each device

is uniformly drawn from [40, 60] mJ/M cycles. Given the task and resource graph, each of

the algorithms–centralized, merge-and-split with Pareto order, merge-and-split with utilitarian

order algorithms are run 50 times to get the following results.


4.5.2 Performance Evaluation

In this section, we compare the performance of the merge-and-split algorithms with the cen-

tralized one, and the non-cooperative situation in terms of average energy cost per user, average

running time, and the average coalition size. The centralized algorithm traverses through all

partitions of the given resource graph, and find the optimal partition in which the total sum of

the energy cost is minimized over the entire graph. In the non-cooperative setting, each user

finishes the job individually without incurring any communication cost. Simulation results are

given in Fig. 4.4, Fig. 4.5, and Fig. 4.6 respectively.

Fig. 4.4 (a-d) show the average energy cost per user for different resource graph size. The

energy costs are averaged over randomly generated resource graph while fixing the probabil-

ity that every two nodes are connected. Fig. 4.4(a) represents the situation where most of the

nodes on the resource graph are disconnected nodes, while Fig. 4.4(d) shows the other extreme

that the resource graph is almost a fully-connected graph. The proposed algorithm yields a sig-

nificant reduction in average energy cost per user up to 71.59% related to the non-cooperative

case. This advantage still increases with the total number of users, or the connection ratio of

the users. As the number of users increases as well as the connection ratio, each single user has

many more options to choose the external links from. In another words, the constraints (4.7)

is much more relaxed.

However, as the increase of the connection ratio, the gap between the merge-and-split al-

gorithm and the centralized algorithm also gets bigger. The gap exists because the merge-and-

split algorithm terminates but not necessarily yields a unique outcome. Only parts of the entire

partition set of the resource graph can be traversed before termination. For example, in one

round of simulation, given a certain resource graph of 7 users in total, the optimal partition

should be {{1, 2}, {3, 6, 7}, {4, 5}}, however, the result of merge-and-split by utilitarian order


is {{1, 2, 3}, {4, 5, 6}, {7}}. The partition of the latter cannot be transformed into the former

one by merge and split rules for {1, 2, 3} B {{1, 2}, {3}} so that the partition {1, 2, 3} can-

not be split up. As the connection ratio increases, the difference between the average energy

computed by merge-and-split and that computed by centralized algorithm enlarges for the fol-

lowing reason: when the connection ratio is low, in the optimal structure, users are more likely

to split up rather than to form coalitions with each other, and that can be easily achieved by

split rule. On the other hand, if the connection ratio is high, the optimal structure may not be

obtained by the merge-and-split rule.

In each set of experiments, while the centralized algorithm gives the lower bound, it be-

comes hardly feasible as the total number of users increases. For the number of users is 3, 5, 7,

on average, the mean energy cost of merge-and-split with Pareto order is 6.23% above the

lower bound, while that of utilitarian order is 5.71% above the lower bound. We also observe

that, while merge-and-split with Pareto order incurs more average energy cost than utilitarian

order when the total number of users is less than 10, it incurs less energy cost when the total

number of users is beyond 10. Actually, Pareto order is stricter than utilitarian order in that

whenever the comparison relation satisfies individual order, it satisfies the coalitional order as

well. But the reverse is not necessarily true.

Fig. 4.5(a-d) shows the average coalition size as the underlying topology of the resource graph

varies. There are two trends: the average coalition size is larger if there are more users in total,

or if the user connecting ratio is higher. That is to say, when more users are connected, they tend

to merge to form coalitions. The observation is aligned with the principle that trading commu-

nicational costs for computational costs reduces the overall energy cost. Moreover, the variance

of the average coalition size is higher when users are less connected with each other, largely

because the computational cost takes a significant part in the overall cost. We also observe

4.6. SUMMARY 56

that merge-and-split with Pareto order results in an average coalition size of 1.4516 which is

smaller than that with utilitarian order 1.6034 when the number of users is 10. Viewing Fig. 4.4

and Fig. 4.5 together, it is interesting to see that the average energy cost does not necessarily

drop as the coalition size grows: although the centralized algorithm yields larger coalitions

than merge-and-split while costs less per user, the utilitarian order yields larger coalitions than

Pareto order with higher average energy cost per user.

Fig. 4.6 illustrates the average running time over all user connecting ratio cases on different

number of users for each algorithm. It is shown that the centralized algorithm is highly ineffi-

cient when the number of users exceeds 7 while merge and split performs significantly better

and scales well as the number of users increases.

4.6 Summary

In this work, we study the problem of coalition formation among a group of collaborative mo-

bile users. In order to distribute the tasks while minimizing the energy costs, we formulate the

problem as 0-1 integer programming problem and apply heuristic method to solve it. How-

ever, while assigning the tasks globally in a centralized way is impossible in real world, it is

neither feasible to compute. Thus we devise merge-and-split algorithm in which the decision

of joining or splitting from coalitions is made distributedly on each user only considering the

utility improvement of users in that coalition. We also reveal the conditions under which the

merge-and-split algorithm yields unique outcome in stability analysis. Finally, in the perfor-

mance evaluation, the simulation results show that our algorithm obtains near optimal results,

and is highly efficient over the centralized strategy.

4.6. SUMMARY 57

Number of Users (probability 0.1)3 5 7 10 15 20

Ave

rag

e E

ne

rgy C

ost

pe

r U

se

r (m

J)

8000

8100

8200

8300

8400

8500

8600

8700Non-cooperateMerge-and-split (Pareto order)Merge-and-split (utilitarian order)Centralized


Ave

rag

e E

ne

rgy C

ost

pe

r U

se

r (m

J)

6800

7000

7200

7400

7600

7800

8000

8200

8400

8600

8800

Non-cooperateMerge-and-split (Pareto order)Merge-and-split (utilitarian order)Centralized

(a) (b)


Ave

rag

e E

ne

rgy C

ost

pe

r U

se

r (m

J)

6000

6500

7000

7500

8000

8500

9000



Ave

rag

e E

ne

rgy C

ost

pe

r U

se

r (m

J)

4500

5000

5500

6000

6500

7000

7500

8000

8500

9000


(c) (d)

Figure 4.4: Average energy cost per user when users are non-cooperate, running centralized,merge and split algorithms using Pareto order and utilitarian order. (a) User connecting ratio0.10. (b) User connecting ratio 0.35. (c) User connecting ratio 0.60. (d) User connecting ratio0.95.

4.6. SUMMARY 58

Number of Users (probability 0.1)3 5 7 10 15 20A

ve

rag

e C

oa

litio

n S

ize

(N

um

be

r o

f U

se

rs p

er

Co

alit

ion

)

0.99

1

1.01

1.02

1.03

1.04

1.05

1.06

1.07

1.08

CentralizedMerge-and-split (Pareto order)Merge-and-split (utilitarian order)


ve

rag

e C

oa

litio

n S

ize

(N

um

be

r o

f U

se

rs p

er

Co

alit

ion

)1.1

1.15

1.2

1.25

1.3

1.35

1.4


(a) (b)


ve

rag

e C

oa

litio

n S

ize

(N

um

be

r o

f U

se

rs p

er

Co

alit

ion

)

1.2

1.3

1.4

1.5

1.6

1.7

1.8



ve

rag

e C

oa

litio

n S

ize

(N

um

be

r o

f U

se

rs p

er

Co

alit

ion

)

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8


(c) (d)

Figure 4.5: Average coalition size when users running centralized and merge and split algo-rithms. (a) User connecting ratio 0.10. (b) User connecting ratio 0.35. (c) User connectingratio 0.60. (d) User connecting ratio 0.95.

4.6. SUMMARY 59

Number of Users3 5 7 10 15 20

Avera

ge R

unnin

g T

ime (

s)

0

10

20

30

40

50

60

70


Figure 4.6: Average running time when users running centralized and merge and split algo-rithms.

Chapter 5

Conclusion

With the focus on energy-efficient mobile offloading, this thesis studied two major problems:

coalesced offloading from mobile devices to the cloud, and coalition formation among collabo-

rative mobile users. In essence, mobile offloading achieves energy saving by trading relatively

low communicational costs with high computational costs. The main contributions of the two

works are listed below.

In Chapter 3, we proposed the idea of coalesced offloading that batches offloading requests

from multiple applications to reduce the period that the smartphone stays at the high-power

state. The problem is formulated as a joint optimization problem concerning both the energy

consumption and the response time. An offline solution is designed to serve as the performance

benchmark and an online algorithm is derived that achieves optimal competitive ratio.

In Chapter 4, we studied the problem that how a group of mobile users collaborate with

each other on one job. The problem is formulated as a coalitional game in which users choose

to join or split from coalitions depending on the energy cost of the coalition. The energy cost

is computed based on the tasks assignment to all users within the coalition. While finding

the optimal partition of the coaltion group is NP-hard, we tackled the problem by adopting

60

CHAPTER 5. CONCLUSION 61

merge-and-split algorithm, and verified its efficiency through simulations.

While this thesis has raised and solved some practical problem in the area of mobile cloud

computing, there are many more interesting problems pointing to new challenging directions

for future research. We detail some of them in the following.

• Optimal mobile offloading with network performance prediction. In previous literatures,

it is assumed that the smartphone has stable network connections during the offloading

process. This assumption is not valid in practice, especially when users carry phones

around. When the connection is not sufficiently good, offloading to the cloud may cause

battery to drain out rather than to save energy. Hence it is a very practical consideration to

incorporate user mobility and network connection prediction into the mobile offloading

framework. With regards to the prediction aspect, machine learning techniques may be

used.

• Optimal mobile offloading for specific types of applications. Until very recently, most

mobile offloading frameworks are designed for general applications. However, it is not

efficient to apply this general framework to every applications. For applications with

certain characteristics, for example, a typical learning algorithm minimizes the objective

function to obtain the model, and iteratively refines this model by processing the training

data. For another example, an iBeacon app may constantly gather data from seen bea-

cons and upload information to the cloud. All these applications need mobile offloading

framework optimized specifically for them. Hence, the questions raised is not simply to

offload or not any more, but may take the application structure into consideration.

Bibliography

[1] B.-G. Chun, S. Ihm, P. Maniatis, M. Naik, and A. Patti, “Clonecloud: elastic execution

between mobile device and cloud,” Proceedings of the sixth conference on Computer

systems, 2011.

[2] M. S. Gordon, D. A. Jamshidi, S. Mahlke, Z. M. Mao, and X. Chen, “Comet: Code offload

by migrating execution transparently,” Proceedings of the 10th USENIX conference on

Operating Systems Design and Implementation, 2012.

[3] E. Cuervo, A. Balasubramanian, D.-k. Cho, A. Wolman, S. Saroiu, R. Chandra, and

P. Bahl, “Maui: making smartphones last longer with code offload,” Proceedings of the

8th international conference on Mobile systems, applications, and services, 2010.

[4] B.-G. Chun and P. Maniatis, “Augmented Smartphone Applications Through Clone Cloud

Execution,” USENIX HotOS Workshop, 2009.

[5] N. Balasubramanian, A. Balasubramanian, and A. Venkataramani, “Energy Consumption

in Mobile Phones: a Measurement Study and Implications for Network Applications,” in

Proc. 9th ACM SIGCOMM Conf. on IMC, 2009.

[6] M. Satyanarayanan, P. Bahl, R. Caceres, and N. Davies, “The case for vm-based cloudlets

in mobile computing,” Pervasive Computing, 2009.

62

BIBLIOGRAPHY 63

[7] A. Amiri Sani, K. Boos, M. H. Yun, and L. Zhong, “Rio: a system solution for sharing

i/o between mobile systems,” Proceedings of the 12th annual international conference on

Mobile systems, applications, and services, 2014.

[8] S. Kosta, A. Aucinas, P. Hui, R. Mortier, and X. Zhang, “ThinkAir: Dynamic Resource

Allocation and Parallel Execution in the Cloud for Mobile Code Offloading,” in Proc.

IEEE INFOCOM, 2012.

[9] A. R. Karlin, C. Kenyon, and D. Randall, “Dynamic TCP Acknowledgment and Other

Stories about e/(e-1),” in Proc. 33rd ACM Symposium on Theory of Computing, 2001.

[10] D. R. Dooly, S. A. Goldman, and S. D. Scott, “TCP Dynamic Acknowledgment Delay:

Theory and Practice,” in Proc. 30th ACM Symposium on Theory of Computing, 1998.

[11] W. Wang, B. Li, and B. Liang, “To Reserve or Not to Reserve: Optimal Online Multi-

Instance Acquisition in IaaS Clouds,” in Proc. IEEE/ACM ICAC, 2013.

[12] P. Baptiste, “Scheduling Unit Tasks to Minimize the Number of Idle Periods: a Poly-

nomial Time Algorithm for Offline Dynamic Power Management,” in Proc. 17th ACM-

SIAM Symposium on Discrete Algorithm, 2006.

[13] E. Miluzzo, R. Caceres, and Y.-F. Chen, “Vision: mclouds-computing on clouds of mo-

bile devices,” Proceedings of the third ACM workshop on Mobile cloud computing and

services, 2012.

[14] M. Guirguis, R. Ogden, Z. Song, S. Thapa, and Q. Gu, “Can you help me run these code

segments on your mobile device?” Global Telecommunications Conference (GLOBE-

COM 2011), 2011 IEEE, 2011.

BIBLIOGRAPHY 64

[15] C. Shi, V. Lakafosis, M. H. Ammar, and E. W. Zegura, “Serendipity: enabling remote

computing among intermittently connected mobile devices,” Proceedings of the thirteenth

ACM international symposium on Mobile Ad Hoc Networking and Computing, 2012.

[16] T. Langford, Q. Gu, A. Rivera-Longoria, and M. Guirguis, “Collaborative computing on-

demand: Harnessing mobile devices in executing on-the-fly jobs,” Mobile Ad-Hoc and

Sensor Systems (MASS), 2013 IEEE 10th International Conference on, 2013.

[17] M. Y. Arslan, I. Singh, S. Singh, H. V. Madhyastha, K. Sundaresan, and S. V. Krish-

namurthy, “Computing while charging: building a distributed computing infrastructure

using smartphones,” Proceedings of the 8th international conference on Emerging net-

working experiments and technologies, 2012.

[18] X. Jin and Y.-K. Kwok, “Cloud assisted p2p media streaming for bandwidth constrained

mobile subscribers,” Parallel and Distributed Systems (ICPADS), 2010 IEEE 16th Inter-

national Conference on, 2010.

[19] W. Saad, Z. Han, T. Basar, M. Debbah, and A. Hjorungnes, “A selfish approach to coali-

tion formation among unmanned air vehicles in wireless networks,” Game Theory for

Networks, 2009. GameNets’ 09. International Conference on, 2009.

[20] “Powergremlin.” [Online]. Available: https://github.com/palominolabs/powergremlin

[21] “Cocoa xml-rpc framework.” [Online]. Available: https://github.com/corristo/xmlrpc

[22] “Wireshark.” [Online]. Available: http://www.wireshark.org/

[23] P. Shu, F. Liu, H. Jin, M. Chen, F. Wen, Y. Qu, and B. Li, “etime: energy-efficient trans-

mission between cloud and mobile devices,” INFOCOM, 2013 Proceedings IEEE, 2013.

BIBLIOGRAPHY 65

[24] T. Sandholm, K. Larson, M. Andersson, O. Shehory, and F. Tohme, “Coalition structure

generation with worst case guarantees,” Artificial Intelligence, 1999.

[25] K. R. Apt and A. Witzel, “A generic approach to coalition formation,” International Game

Theory Review, 2009.

[26] R. B. Myerson, “Game theory: analysis of conflict,” Harvard University, 1991.

MOBILE OFFLOADING FOR ENERGY-EFFICIENT COMPUTATION … · MOBILE OFFLOADING FOR ENERGY-EFFICIENT...

Documents

Transcript of MOBILE OFFLOADING FOR ENERGY-EFFICIENT COMPUTATION … · MOBILE OFFLOADING FOR ENERGY-EFFICIENT...