Detecting display energy hotspots in Android appsspmahaja/papers/wan17stvr.pdf · 2of15 WAN ET AL....

15
Received: 25 September 2015 Revised: 15 March 2017 Accepted: 21 March 2017 DOI: 10.1002/stvr.1635 SPECIAL ISSUE PAPER Detecting display energy hotspots in Android apps Mian Wan Yuchen Jin Ding Li Jiaping Gui Sonal Mahajan William G. J. Halfond Department of Computer Science, University of Southern California, Los Angeles, CA, USA Correspondence William G. J. Halfond, Department of Computer Science, University of Southern California, Los Angeles, California, USA. Email: [email protected] Funding information National Science Foundation, Grant/Award Number: CCF-1321141 and CCF-1619455 Summary The energy consumption of mobile apps has become an important consideration for develop- ers as the underlying mobile devices are constrained by battery capacity. Display represents a significant portion of an app’s energy consumption—up to 60% of an app’s total energy con- sumption. However, developers lack techniques to identify the user interfaces in their apps for which energy needs to be improved. This paper presents a technique for detecting display energy hotspots—user interfaces of a mobile app whose energy consumption is greater than optimal. The technique leverages display power modeling and automated display transformation techniques to detect these hotspots and prioritize them for developers. The evaluation of the technique shows that it can predict display energy consumption to within 14% of the ground truth and accurately rank display energy hotspots. Furthermore, the approach found 398 display energy hotspots in a set of 962 popular Android apps, showing the pervasiveness of this problem. For these detected hotspots, the average power savings that could be realized through better user interface design was 30%. Taken together, these results indicate that the approach represents a potentially impact- ful technique for helping developers to detect energy related problems and reduce the energy consumption of their mobile apps. KEYWORDS display, energy, mobile applications, optimization, power 1 INTRODUCTION In less than six years, mobile apps have gone from zero downloads to over 35 billion downloads [1,2]. Simultaneously, smartphones have achieved a nearly 31% penetration rate [3]. Smartphones and apps have become so popular, in part, because they combine sensors and data access to provide many useful services and a rich user experience. How- ever, the usability of mobile devices is inherently limited by their battery power, and the use of popular features, such as the camera and network, can quickly deplete a device’s limited battery power. Therefore, energy consumption has become an important concern. For the most part, major reductions in energy consumption have come about through a focus on developing better batteries, more efficient hardware, and bet- ter operating system level resource management. However, software engineers have become increasingly aware of the way an app’s imple- mentation can impact its energy consumption [4-7]. This realization has motivated the development of software-level techniques that can identify energy bugs and provide more insights into the energy related behaviors of an application. An important observation is that the display component of a smart- phone consumes a significant portion of the device’s total battery power. This problem has only grown as smartphone display sizes have increased from an average of 2.9 inches in 2007 to 4.8 inches in 2014 [8]. Studies show that display can now consume up to 60% of the total energy expended by a mobile app [9,10]. Traditionally, optimizing dis- play power has been seen as outside of the control of software devel- opers. This is true for LCD screens, for which energy consumption is based on the display’s brightness and is controlled by either the end user or the Operationg System (OS) performing opportunistic dimming of the display. However, most modern smartphones, such as the Sam- sung Galaxy S7, are powered by a new generation of screen technology, the organic light-emitting diode (OLED). For this type of screen, bright- ness is still important [11]; however, the colors that are displayed also become important. Because of the underlying technology, this type of screen consumes less energy when displaying darker colors (eg, black) than lighter ones (eg, white). The use of these screens means there are enormous energy savings to be realized at the software level by opti- mizing the colors and layouts of the user interfaces (UIs) displayed by the smartphone. In fact, prior studies have shown that savings of over 40% can be achieved by this method [6,9,10]. Softw Test Verif Reliab. 2017;27:e1635. wileyonlinelibrary.com/journal/stvr Copyright © 2017 John Wiley & Sons, Ltd. 1 of 15 https://doi.org/10.1002/stvr.1635

Transcript of Detecting display energy hotspots in Android appsspmahaja/papers/wan17stvr.pdf · 2of15 WAN ET AL....

Received: 25 September 2015 Revised: 15 March 2017 Accepted: 21 March 2017

DOI: 10.1002/stvr.1635

S P E C I A L I S S U E P A P E R

Detecting display energy hotspots in Android apps

Mian Wan Yuchen Jin Ding Li Jiaping Gui Sonal Mahajan William G. J. Halfond

Department of Computer Science, University

of Southern California, Los Angeles, CA, USA

Correspondence

William G. J. Halfond, Department of

Computer Science, University of Southern

California, Los Angeles, California, USA.

Email: [email protected]

Funding information

National Science Foundation, Grant/Award

Number: CCF-1321141 and CCF-1619455

Summary

The energy consumption of mobile apps has become an important consideration for develop-

ers as the underlying mobile devices are constrained by battery capacity. Display represents a

significant portion of an app’s energy consumption—up to 60% of an app’s total energy con-

sumption. However, developers lack techniques to identify the user interfaces in their apps for

which energy needs to be improved. This paper presents a technique for detecting display energy

hotspots—user interfaces of a mobile app whose energy consumption is greater than optimal. The

technique leverages display power modeling and automated display transformation techniques to

detect these hotspots and prioritize them for developers. The evaluation of the technique shows

that it can predict display energy consumption to within 14% of the ground truth and accurately

rank display energy hotspots. Furthermore, the approach found 398 display energy hotspots in a

set of 962 popular Android apps, showing the pervasiveness of this problem. For these detected

hotspots, the average power savings that could be realized through better user interface design

was 30%. Taken together, these results indicate that the approach represents a potentially impact-

ful technique for helping developers to detect energy related problems and reduce the energy

consumption of their mobile apps.

KEYWORDS

display, energy, mobile applications, optimization, power

1 INTRODUCTION

In less than six years, mobile apps have gone from zero downloads

to over 35 billion downloads [1,2]. Simultaneously, smartphones have

achieved a nearly 31% penetration rate [3]. Smartphones and apps have

become so popular, in part, because they combine sensors and data

access to provide many useful services and a rich user experience. How-

ever, the usability of mobile devices is inherently limited by their battery

power, and the use of popular features, such as the camera and network,

can quickly deplete a device’s limited battery power. Therefore, energy

consumption has become an important concern. For the most part,

major reductions in energy consumption have come about through a

focus on developing better batteries, more efficient hardware, and bet-

ter operating system level resource management. However, software

engineers have become increasingly aware of the way an app’s imple-

mentation can impact its energy consumption [4-7]. This realization

has motivated the development of software-level techniques that can

identify energy bugs and provide more insights into the energy related

behaviors of an application.

An important observation is that the display component of a smart-

phone consumes a significant portion of the device’s total battery

power. This problem has only grown as smartphone display sizes have

increased from an average of 2.9 inches in 2007 to 4.8 inches in 2014

[8]. Studies show that display can now consume up to 60% of the total

energy expended by a mobile app [9,10]. Traditionally, optimizing dis-

play power has been seen as outside of the control of software devel-

opers. This is true for LCD screens, for which energy consumption is

based on the display’s brightness and is controlled by either the end

user or the Operationg System (OS) performing opportunistic dimming

of the display. However, most modern smartphones, such as the Sam-

sung Galaxy S7, are powered by a new generation of screen technology,

the organic light-emitting diode (OLED). For this type of screen, bright-

ness is still important [11]; however, the colors that are displayed also

become important. Because of the underlying technology, this type of

screen consumes less energy when displaying darker colors (eg, black)

than lighter ones (eg, white). The use of these screens means there are

enormous energy savings to be realized at the software level by opti-

mizing the colors and layouts of the user interfaces (UIs) displayed by

the smartphone. In fact, prior studies have shown that savings of over

40% can be achieved by this method [6,9,10].

Softw Test Verif Reliab. 2017;27:e1635. wileyonlinelibrary.com/journal/stvr Copyright © 2017 John Wiley & Sons, Ltd. 1 of 15https://doi.org/10.1002/stvr.1635

2 of 15 WAN ET AL.

Despite the high impact of focusing on display energy, developers

lack techniques that can help them identify where in their apps such

savings can be realized. For example, the well-known Android bat-

tery monitor only provides device level display energy consumption

and cannot isolate the display energy per app or per UI screen. Other

energy-related techniques have focused on surveys to identify patterns

of energy consumption [5], design refactoring techniques that improve

energy consumption [7,12], programming language level constructs to

make implementation more energy aware [13], energy visualization

techniques [14], or energy prediction techniques [15]. Although help-

ful, the mentioned techniques do not account for display energy nor

are they able to isolate display related energy. Existing work on dis-

play energy has focused on techniques that can transform the colors

in a UI (eg, Nyx [6] and Chameleon [10]). But these techniques do not

guide developers as to where they should be applied, therefore they

must be (1) used automatically for the entire app, which means that

although colors will be transformed automatically into more energy

efficient equivalents, the color transformation may be less aesthetically

pleasing than a developer guided one; or (2) applied based solely on

developers’ intuition as to where they would be most effective, which

means that some energy-inefficient UIs may be missed.

This paper presents a novel approach to assist developers in identi-

fying the UIs of their apps that can be improved with respect to energy

consumption. To do this, the approach combines display energy mod-

eling and color transformation techniques to identify a display energy

hotspot (DEH)—a UI of a mobile app whose energy consumption is

higher than an energy-optimized but functionally equivalent UI. The

approach is fully automated and does not require software developers

to use power monitoring equipment to isolate the display energy, which,

as explained in Section 3, requires extensive infrastructure and tech-

nical expertise. The approach to identify DEHs performs three general

steps. First, the approach traverses the UIs of an app and takes screen-

shots of the app’s UIs when they change in response to different user

actions. Second, for each screenshot, the approach calculates an esti-

mate of how much energy and power could be saved by using a color

optimized version of the screenshot. Finally, the approach ranks the

UIs based on the magnitude of these differences. The approach reports

these results, along with detailed power and energy information, to the

developer, who can target the most impactful UIs for energy-optimizing

transformations.

The paper also presents the results of an empirical evaluation of the

approach on a collection of real-world and popular mobile apps. The

results showed that the approach was able to accurately estimate dis-

play power consumption to within 14% of the measured ground truth

and identify the most impactful DEHs. Furthermore, the results gener-

ated by the approach can be generalized from one hardware platform

to others. The approach was also used to investigate 962 Android mar-

ket apps; the investigation showed that 41% of these apps have DEHs.

Overall, these results indicate that the approach can accurately iden-

tify DEHs and can be useful to assist developers in reducing the display

energy of their mobile applications.

The rest of this paper is organized as follows: Section 2 describes

the approach for detecting DEHs. Section 3 describes how to build the

display power model for a device. The results of the evaluation are in

Section 4. Related work is discussed in Section 5. Finally, the conclu-

sions and contributions are summarized in Section 6.

2 APPROACH

The goal of the approach is to assist developers in identifying UIs that

can be improved with respect to energy consumption. More specifi-

cally, the approach detects DEHs, which are UIs that consume more

display energy than their energy-optimized versions would. To detect

these, the approach automatically scans each UI of a mobile app and

then determines if a more energy efficient version could be designed.

It is important to note that DEHs are not necessarily energy bugs, as

the DEHs may not be caused by a fault in the traditional sense. Instead,

DEHs represent points where code is energy inefficient with respect to

an optimized alternative. After detecting the DEHs, the UIs are ranked

in order of the potential energy improvement that could be realized via

energy optimization and then reported to the developers.

To achieve complete automation and not require developers to have

power monitoring equipment, there are two significant challenges to be

addressed. The first challenge is to determine how much display energy

will be consumed by an app at runtime without physical measurements.

To address this, the insight is that power consumption can be estimated

by a display power model that takes UI screenshots as input. The second

challenge is to determine whether a more energy-efficient version of a

UI exists and to quantify the difference between these two versions. To

address this, the insight is that automated energy-oriented color trans-

formation techniques can be used to recolor the screenshots and then

calculate the difference between the original and the more efficient

version. Based on these two insights, the approach can automatically

detect DEHs without requiring power monitoring equipment.

An overview of the approach is shown in Figure 1. The approach

requires three inputs: (1) a description of the workload for which the

developers want to evaluate the UIs, (2) a display energy profile (DEP)

that describes the hardware characteristics of the target mobile device

FIGURE 1 Overview of the approach. UI, user interface

WAN ET AL. 3 of 15

platform, and (3) the mobile app to be analyzed. Using these inputs,

the approach performs the detection in five steps. In the first step, the

approach instruments the apps to record runtime information about

the UI layout. This information is used to identify certain types of com-

ponents, such as ads, that should not be part of the DEH identification.

The second step is to run the app based on the workload description and

capture screenshots of the different UIs displayed. In the third step, the

approach processes these screenshots and generates energy-efficient

versions via a color transformation technique. Next, in the fourth step,

a hardware model based on the DEP is used to predict the display

energy that would be consumed by each of the screenshots and their

energy-optimized versions. Finally, the fifth step compares the energy

consumption of each UI with that of its optimized version and gives

the developers a list of UIs ranked according to the potential energy

impact of transforming each UI. Each of these steps is now explained in

more detail.

2.1 Step 1: gather UI layout information

The goal of the first step is to facilitate the detection of content dis-

played in the app’s UI that should not be considered for the purpose

of detecting DEHs. This type of content is called Excluding Content

(EC) and, broadly, it includes UI elements whose appearance will vary

between executions. Excluding Contents are very common in mobile

apps. For example, mobile ads are present in over 50% of all apps [16].

Since the colors present in an EC should not or cannot be changed by

a developer, yet they could occupy a potentially significant amount of

the screenspace of a UI, they must be identified and removed from the

screenshots to preserve the usefulness of the calculated display energy

for a UI.

The primary challenge in detecting ECs is that they are mostly indis-

tinguishable from static content. For example, there is no visual differ-

ence between an image that is static versus one that is dynamically

loaded, nor is there a difference in terms of the APIs used to display

them. A notable exception is mobile ads, which invoke special APIs to

visually render themselves in the UI. Based on this insight, the approach

instruments the app so that when the workload is executed during step

2, the ads’ size and location are recorded. The process to identify ad

related information is as follows. First, the approach identifies invoca-

tions in the app’s bytecode that call the app’s ad network API. Then,

certain ad related event handlers and callbacks are instrumented. The

exact set of invocations to be instrumented varies by ad network. For

example, for the Google ad network, AdMob, this set would include

onAdLoaded and onReceiveAd, which can be defined by an ad lis-

tener, and loadAd, which is defined in the ad library and can be called

by an Activity. At runtime, the instrumentation records timestamps,

position, and size information about each of the ads displayed. This

information is used to populate a sequence of tuples F, in which each

tuple is of the form ⟨t, a⟩, where t is the time at which the content was

displayed and a represents the location and size of the occupied area.

The approach provides a mechanism by which other types of ECs can

be excluded from the DEH detection as well. Tuples may be manually

added to F. This allows developers to specify image or text areas that

they know to be dynamic and that should be excluded from the energy

and power analysis in step 3.

2.2 Step 2: workload execution and screenshot

capture

The goal of the second step is to convert the workload description into a

set of screenshots that can drive the display energy analysis in the sub-

sequent steps. Strictly speaking, a workload description is not a neces-

sary input to the approach since an automated UI crawler (eg, (PUMA)

[17]) could navigate an app and execute a fixed or random distribu-

tion of actions over the UI elements. However, the use of a workload

description allows the approach to analyze the app using realistic user

behaviors or a particular workload of interest to the developers. For

example, developers could collect execution traces of real users inter-

acting with their app and use this to define a workload for the energy

evaluation.

The inputs to this step are the target app and its workload descrip-

tion. The app A is the Android Package Kit (APK) file that can be exe-

cuted on a mobile device. The workload W is represented as a sequence

of event tuples in which each tuple is of the form ⟨e, t⟩, where e is a rep-

resentation of the event (eg, “OK button pressed”) and t is a timestamp

of when the event occurs relative to the first event (ie, t = 0 for the first

event e1). The approach does not impose a specific format or syntax on

W except that it must be reproducible. In other words, it must be speci-

fied in a way that allows for some mechanism to replay the workload. In

the current implementation of the approach (Section 4.2), the RERAN

tool [18] is used to record and then replay a workload description, so

the exact syntax and format of W is dictated by that tool.

Given W and A, the approach captures screenshots of the differ-

ent UIs displayed on a device’s screen during the execution of W on A.

The general process is as follows. The replay mechanism executes each

event at its specified time. A monitor mechanism executes in the back-

ground of the device and captures a screenshot of the display every

time it changes. This is done by hooking into the refresh and repaint

events of the underlying device. The execution of the workload con-

tinues until all event tuples have been executed. Once the screenshots

have been captured, developers may manually analyze the screenshots

to identify areas (in addition to the EC areas that are automatically

identified) that should be excluded from the DEH identification. The

output of the second step is a sequence of tuples S, in which each tuple

is of the form ⟨s, t⟩, where s is a screenshot and t is the time at which

the screenshot was taken (ie, when the display changed), and F, which

contains the EC information collected via the mechanisms described as

part of step 1 and the areas marked by the developer.

To capture screenshots, the implementation uses a modified version

of an existing tool called Ashot. Ashot periodically captures screen-

shots of the currently displayed UIs. Ashot has a maximum sampling

frequency that is fast enough to catch user speed events (eg, clicks) but

will not sample videos or animations at their full refresh rate. This sam-

pling frequency does not affect the accuracy of the approach; it only

reduces the overall number of screenshots captured. Furthermore, to

reduce storage overhead, Ashot drops consecutive screenshots that

are identical. The use of Ashot did not introduce any observable delay in

the execution of W. Note that a necessary condition of both the replay

and screenshot capture mechanisms is that their use does not alter the

functionality of the app or the UI’s appearance when it is rendered on

the device. Both of these conditions were met by RERAN and Ashot.

4 of 15 WAN ET AL.

2.3 Step 3: generate energy-efficient

alternative UIs

The goal of the third step is to generate an optimized version of each

screenshot tuple in S so that the fourth step can calculate estimates of

the energy consumption for each screenshot and its optimized version.

However, this optimized alternative does not exist, so the approach

must first generate a reasonable approximation of what such an alter-

native would look like. A guiding insight is that prior work has shown

that darker colors on OLED screens are more energy efficient than

lighter colors [9]. To take advantage of this insight, one could invert

the colors of UIs with a white-colored background or systematically

shift colors to make them darker and then use this transformation as

the optimized version. However, these approaches neglect the fact that

both color inversion and linear color shifts do not maintain color differ-

ence, which is the visual relationship that humans perceive when they

look at a colored display [6]. Therefore, although the color-adjusted UIs

would be more energy efficient, they would not represent a reason-

able approximation of optimized UIs as the resulting UIs would not be

aesthetically pleasing.

To address this challenge, the approach leverages a color transfor-

mation technique, Nyx, that was developed in prior work [6,19]. A

key aspect of Nyx is that the color scheme it generates represents

a reasonably aesthetically pleasing new color scheme. Nyx statically

analyzes the structure of the HTML pages of a web application and

generates a color transformation scheme (CTS) that represents a more

energy-efficient color scheme for the web application. Nyx does this

by first creating a color conflict graph (CCG), where each node in the

graph is a color that appears in a web page and each edge represents

the type of visual relationship (eg, “next to”, “enclosing”, or “not touch-

ing”) that any two colors in the CCG have. The edges in the CCG are

weighted by the type of visual relationship, with higher weights given

to edges so that “enclosing” > “next to” > “not touching”. Then, Nyx

solves for a recoloring of the CCG that is energy efficient and also main-

tains, as much as possible, the color distances between colors in the

original page that have a visual relationship. The weighting allows Nyx

to prioritize maintaining certain types of color distances over others.

Empirical studies show that the resulting color schemes can reduce dis-

play power consumption of web apps by over 40%. Additionally, user

studies of the UIs generated by Nyx and other similar color transforma-

tion techniques [6,9,10,20] have shown that the transformed UIs have

high end-user and developer acceptance while only minimally affecting

the resulting UIs’ aesthetics.

The approach adapts the CTS generation process of Nyx. There

are three primary challenges to be addressed to conduct this

adaptation—generation of the CCG, accounting for areas in the

screenshots occupied by EC, and scalability. Nyx generates the CCG by

statically analyzing the server-side code that dynamically generates

web pages. In contrast, the approach only has screenshots available;

therefore, the adapted CCG models the color relationships between

adjacent pixels, which are identified by analyzing each pixel of each

screenshot and identifying its color and the colors of its surrounding

pixels. To handle ECs, the approach only builds a CCG for the pixels of

the screenshot that are not in an area defined in F. Entries in F can be

matched to screenshots by matching the time intervals specified by

the timestamps. Using the screenshot to construct the CCG leads to

the third challenge, scalability. The recoloring of the CCG is an NP-hard

problem with respect to the number of colors. The rendering kits of

mobile devices use antialiasing and color shading to smooth lines and

curves. This means that even a simple image, such as a black circle

over a white background, would be rendered with many additional

colors, such as grays, to smooth out transitions between adjacent col-

ors. Because of these extra colors, the time needed to generate a CTS

would make the approach’s analysis time impractical.

To address this scalability challenge, the approach maps each color

in a screenshot to the closest of the 140 standardized UI colors [21]

and then uses the resulting reduced set of colors to create the CCG.

The approach then converts the color with the largest cluster to black,

which is the most energy-efficient color, and solves the CCG recolor-

ing problem using a simulated-annealing algorithm to find the CTS [6].

Guided by the newly generated CTS, the approach recolors the original

screenshot, except for the areas in F, so that every color in the cluster is

replaced with its corresponding color in the CTS.

This process is repeated for every screenshot tuple ⟨s, t⟩ ∈ S. For

each such s, the approach generates an s′, which is the alternate version

of the screenshot recolored as described above. The output of this step

is a function O that maps each s to its corresponding s′.

Since the approach uses an approximation algorithm, the generated

CTS may not reflect the most optimal recoloring. Instead, the recolored

UI represents a lower bound on the potential savings a color opti-

mization could achieve. Additionally, the use of clustering means the

approach performs its analysis on simpler versions of the screenshots

with fewer colors. This can also introduce inaccuracy into the power

estimation of the color-optimized screenshot. However, unless the

screenshots differ significantly in the amount of antialiasing used, this

inaccuracy is small. To confirm this, the power consumption between a

set of screenshots and the versions of the screenshots using the results

of the clustered colors was compared and the average difference was

found to be below 2%; thus indicating the simplified version was a

reasonable proxy for the full-color version.

2.4 Step 4: display energy prediction

The fourth step of the approach computes the display power and

energy of the screenshots and their energy-efficient alternatives. The

approach does this by analyzing each screenshot obtained in the sec-

ond step and its optimized version generated in the third step with

cost functions that estimate the energy consumption based on the col-

ors used in the screenshot. The inputs to this step are F, populated

in the second step; the screenshot tuples, S, generated by the second

step; O, generated by the third step; and the cost function, C, provided

by the DEP. (The development of the cost function provided by the

DEP is explained in Section 3, and an evaluation of its accuracy is in

Section 4.3.) The outputs of this step are two functions that map each

screenshot tuple in S or O to its power (P) and energy (E).

P(s, t) =∑k∈|s|

C(Rk,Gk,Bk) −∑

a∈F(t)

∑k∈|a|

C(Rk,Gk,Bk), (1)

E(s, ts, te) = P(s) ∗ (te − ts). (2)

WAN ET AL. 5 of 15

The formulas for calculating the output functions are shown in

Equations 1 and 2. Here, s can be replaced with O(s) as needed. To cal-

culate P(s, t) for all ⟨s, t⟩ ∈ S, the approach first sums the power cost

of each pixel in s, which is calculated by the cost function C that takes

the values associated with the red (R), green (G), and blue (B) values of

the pixel’s color. From the calculated power value, the approach sub-

tracts the power values calculated for each of the EC areas contained

in s. As in step 3, the approach identifies the EC areas corresponding to

the screenshots using the timestamp information, represented as F(t)

and then uses C to calculate the power of each pixel in each EC area

a. The sum of the power for all of the EC areas is subtracted from the

screenshot’s overall power value. The value returned by P is in Watts.

Recall that energy is equal to power multiplied by time. Therefore, E

is equal to the power associated with the screenshot (P(s)) multiplied by

the amount of time the screenshot is displayed. The display time is cal-

culated by subtracting the time the screenshot is displayed (ts) from the

time the next screenshot is displayed (te), or in other words, subtracting

the timestamp associated with screenshot si from the timestamp asso-

ciated with screenshot si+1, which would be of the form E(si, ti, ti+1). The

value returned by E is in Joules.

2.5 Step 5: prioritizing the UIs

The goal of the fifth step is to rank the UIs in order of their potential

power and energy reduction. To do this, the approach calculates the

power and energy of each color-transformed screenshot and compares

it to the power and energy of the original screenshot. The inputs to the

fifth step are S, P, E, and O.

DP(s) = P(s) − P(O(s)), (3)

DE(s, ts, te) = E(s, ts, te) − E(O(s), ts, te). (4)

Given these inputs, the approach calculates the power and energy

difference according to the formulas shown in Equations 3 and 4. For

the difference in power (DP), the approach subtracts the power of the

corresponding O(si) from that associated with each si ∈ S. The resulting

number is in Watts and represents the power that could be saved by

using s′

i, the color-optimized version of si. For the difference in energy

(DE), the approach subtracts the energy of the corresponding O(s) from

that associated with each si ∈ S. The resulting numbers is in Joules and

represents the energy that could be saved by using s′

iinstead of si.

The output of the fifth step is two sequences, RP and RE. Each

sequence is comprised of the tuples ⟨s,D⟩where s is the screenshot and

D is either the difference in power (DP) or the difference in energy (DE).

By choosing the metric DP or DE, the developers could choose whether

or not to take the time spent by each screenshot into consideration. The

sequences are ordered by each tuple’s D value from highest to lowest.

This ranking is the output of the approach and represents a prioritiza-

tion of the screenshots that appear during the workload’s execution in

order of their potential power and energy reduction if they were to be

color optimized.

2.6 Discussion of usage scenarios

The output of the approach allows developers to identify the UIs of

their app that could save the most energy with color optimization.

Although the technique provides developers with a CTS, it does not

automatically transform the app to use these colors. Nonetheless, the

color mapping information can be useful. Developers may choose to use

this CTS, build on it as a starting point for graphic designers to create

a new palette, or leverage other automated techniques for identifying

energy efficient and aesthetically pleasing color schemes [20]. To close

the loop and use the new color scheme, developers must modify points

in the code where colors are defined and/or modify the Android UI lay-

out XML file color specifications so that the app uses the new colors in

places where the old colors would have been used.

3 THE DISPLAY ENERGY PROFILE

The DEP provides a pixel-based power cost function for a target mobile

device. The use of the DEP allows the approach to analyze display

power for multiple devices by simply providing different DEP as input. It

is expected that, in the future, a DEP will be developed and provided as

part of a device’s software development kit. However, this is currently

not common in practice, so this section discusses the steps required to

develop a DEP.

At a high level, the DEP provides a cost function that can predict how

much power an OLED screen will consume when displaying a particu-

lar UI. Prior research work has shown that the power consumption of a

pixel in an OLED screen is based on its color [22]. Therefore, the input

to the cost function is the RGB value that defines a pixel’s color. The out-

put of the cost function is the amount of Watts that will be consumed

by the display of the pixel on the target device.

C(R,G,B) = rR + gG + bB + c. (5)

The general form of the cost function is shown in Equation 5. R,G,

and B represent the red, green, and blue components of a pixel’s color,

respectively. The coefficients r, g, b, and c represent empirically deter-

mined constants. The value for each constant varies by mobile device.

Note that the power model does not account for screen brightness. This

is generally controlled by the user or OS, not the software developer.

Furthermore, savings incurred by adjusting brightness would apply uni-

formly across all UIs. Display energy profiles for four mobile devices, a

2.83" 𝜇OLED -32028-P1T display (𝜇OLED ) from 4D Systems, a Sam-

sung Galaxy SII (S2), a Samsung Galaxy Nexus (Nexus), and a Samsung

Galaxy S5 (S5) were constructed. For all of these displays, power con-

sumption was measured using the Monsoon Power Monitor (MPM)

from Monsoon Solutions Inc. [23]. The MPM allows voltage to be held

constant while supplying a current that may be varied from its positive

and negative terminals. The MPM samples the voltage and the current

supplied and outputs the power consumption with a frequency of 5kHz.

This sampling frequency is sufficient for the development of the DEP,

since the average duration of screenshots is in the order of seconds.

Each power model was built by roughly following the process out-

lined by Dong and colleagues [22]. First, the power consumption of

a completely black screen was measured to define a baseline power

usage for an active screen. To determine the parameters for each RGB

component, the power consumption of the screen was measured while

displaying solid-colored pages. The intensity of each color component

was varied while holding the other two components at zero, and data

6 of 15 WAN ET AL.

points for 16 intensities of each component (R, G, and B) were collected.

In total, 48 data points for each device were obtained.

After taking measurements for each color component, the baseline

power usage was subtracted from these measurements to isolate the

power consumption of each R, G, and B component. The relationship

between the power consumption and the RGB value is non-linear due

to a gamma encoding of the screen. Gamma encoding is a digital image

editing process that defines the relationship between pixel values and

the colors’ luminance. It allows for human eyes to correctly perceive the

shades of color of images that are captured digitally and displayed on

monitors. To account for this encoding, the RGB values were raised to

the 2.2 power to decode the image. While the gamma value can vary

between 1.8 to 2.6, 2.2 is the standard image gamma of screens adopted

by industry. After gamma decoding of the RGB values, linear regres-

sion was used to determine the coefficients for Equation 5. The linear

relationship between the RGB values and the power consumption was

very strong. The average R2 value for the four models was 0.99288. The

detailed coefficients for each device can be found in the project web

page [24].

One particular problem with the S2, Nexus, and S5 was that the

measured energy also included the energy consumed by background

processes and other hardware components of the smartphone. To prop-

erly isolate the display energy, two measurements were taken. For the

first, the flex cable that provided power, as well as data and signals,

between the display and the CPU was disconnected. This offered a

baseline measurement of the power consumption of the phone with-

out the display. By disconnecting the cable, the phone still maintains

its background processes instead of suspending them and going into

sleep mode. This baseline value was subtracted from the second mea-

surement, the power of the phone with the display cable attached, to

calculate the display power for each of the colored pages.

4 EVALUATION

This section presents the results of an evaluation of the approach. The

approach was implemented in a tool called dLens, and it was used to

answer the following research questions:

RQ 1: How accurate is the dLens analysis?

RQ 2: How generalizable are the dLens results across devices?

RQ 3: What is the impact of ads on the rankings?

RQ 4: How long does it take to perform the dLens analysis?

RQ 5: What is the potential impact of the dLens analysis?

4.1 Subject applications

The top ten most popular free Android market apps in the United

States, as of August 2014, were selected as subject applications. How-

ever, since these apps either did not contain ads or their ad invocations

could not be instrumented due to obfuscation, another five apps that

used mobile ads without any obfuscation were added to the evalua-

tion. The subject applications are from different developers and have

different features. For each of the subjects, a workload was manually

generated by exercising the primary features and functions of each

TABLE 1 Subject application information

Name Size, MB Screenshots Time, s

Facebook 23.7 116 554

Facebook Messenger 12.9 55 268

FaceQ 17.9 96 470

Instagram 9.7 93 429

Pandora internet radio 8.0 75 278

Skype 19.9 65 254

Snapchat 8.8 142 465

Super-Bright LED Flashlight 5.1 20 51

Twitter 13.7 101 388

WhatsApp Messenger 15.3 65 242

Arcus Weather 4.0 36 143

Drudge Report 2.8 25 105

English for kids learning free 7.9 43 181

Retro Camera 29.0 29 106

Restaurant Finder 5.3 41 148

app. The average duration of the workloads was 272 seconds, and each

workload resulted in an average of 67 captured screenshots. Informa-

tion about the apps is listed in Table 1. For each app, the size of its APK

file, the number of screenshots captured as part of its workload, and

the time duration (in seconds) of the recorded workload are reported.

The first ten rows contain information about the top ten apps, and the

remaining five rows contain information about the unobfuscated apps.

4.2 Implementation

The approach was implemented in a prototype tool called dLens. The

implementation of dLens leveraged several other libraries and tools. To

gather the mobile ad UI layout information, apktool [25] and the dex2jar

[26] tool were used to reverse engineer the APK files. The apps were

instrumented using the ASM library [27]. Workloads were recorded

and replayed using the RERAN tool [18]. The Ashot [28] tool was used

to record screenshots of the different UIs displayed. Ashot was also

modified to associate a timestamp with each generated screenshot. As

described in Section 2.3, the CTS generation was based on the code

developed in the Nyx project [6,19]. Nyx was adapted to build CCGs

using color information obtained from screenshots instead of static

analysis. The energy consumption of each screenshot was measured

on an MPM. Finally, the experiments were performed on four different

platforms: a 2.83" 𝜇OLED -32028-P1T display (𝜇OLED) from 4D Sys-

tems, a Samsung Galaxy SII smartphone (S2), a Samsung Galaxy Nexus

smartphone (Nexus), and a Samsung Galaxy S5 smartphone (S5).

4.3 RQ1: accuracy

This research question deals with the accuracy of the dLens approach.

The accuracy of dLens was evaluated with two metrics. First, the error

estimation rate (EER), which is the accuracy of the power estimate pro-

duced by dLens for a given screenshot, was calculated. Note that the

EER is different from the accuracy reported in Section 3, which is the

Pearson coefficient that expresses the closeness of the fit between

the power measurements of the solid-color screenshots and the

regression-based model. In contrast, the EER evaluates the closeness

WAN ET AL. 7 of 15

FIGURE 2 The error estimation rate of the power model

of the ground truth of the screenshots from the subject applications

with dLens’s estimates. The second accuracy metric was Device Rank-

ing Accuracy (DRA), which is the accuracy of the UI screenshot ranking

provided by dLens compared to the UI screenshot ranking provided by

the ground truth power measurements. Note that if the EER was 0, the

ranking of the screenshots would be 100% accurate. However, since the

approach is dealing with physical systems, a certain amount of estima-

tion error is to be expected. Therefore, the measurement of the DRA

reflects the ability of the dLens approach to rank the UI screenshots

accurately despite the underlying EER of the approach. The accuracy

metrics were only calculated for power since the corresponding energy

measurements could simply be obtained by multiplying the power esti-

mates by the length of time the screenshot was displayed.

The EER was calculated by the following process. First, dLens was run

on each of the subject applications to generate the list of screenshots

ranked by power. To compute the EER, five screenshots and their cor-

responding color-transformed versions, were randomly selected from

each app. For each pair, their ground truths were measured using the

MPM on each of the four devices. The screenshots were sampled

instead of completely measured because the process for isolating a

screenshot’s power and energy (see Section 3) was manual and, there-

fore, time-intensive. Then the power values estimated by dLens were

compared to the ground truths for the selected screenshots. Figure 2

shows, for each of the subject applications and for each of the four

devices, the average EER for the five screenshots. Across all of the appli-

cations, the average EER for the 𝜇OLED, S2, Nexus, and S5 was 3%,

5%, 8% and 8%, respectively. The accuracy for the corresponding color

transformed versions was 5%, 7%, 6%, and 8%. Overall, the𝜇OLED had

a lower EER than the other three devices. A possible reason for this is

that the 𝜇OLED is only a display device and therefore does not have

any noise introduced into its power measurements and models by back-

ground components or processes that would be experienced by the

three smartphones.

To compute the DRA, a process similar to that of calculating the EER

was followed. However, instead of a random sample, the top-five ranked

screenshots were chosen. The top five were used because this rep-

resented a group of screenshots that would likely be similar in terms

of power consumption, and therefore, their relative ranking accuracy

would be more sensitive to the underlying EER than a random subset.

Also, the top five represented the set most likely to be used to guide

the developers and therefore is the most representative of the accu-

racy that an end user of dLens would experience in real usage. For these

top five, the ground truths of each screenshot and its color-transformed

version were calculated and then ranked by their power consump-

tion. Then this ranking was compared against the ranking computed

by the dLens tool. To compare the rankings, each ranking of the five

screenshots was treated as a vector and Spearman’s rank correlation

coefficient was used to compute the closeness of the vectors. The coef-

ficient gives a −1 or 1 when the two rankings are perfectly negatively

or positively correlated and is 0 when the two rankings are not corre-

lated. Across all of the applications, the average DRA for the𝜇OLED, S2,

Nexus, and S5 was 0.83, 0.72, 0.6, and 0.89, respectively. These results

indicate that the rankings were not an exact match. The cases where

the ranking was not an exact match were investigated and were found

to be likely caused by the closeness of the power consumption mea-

surements of the screenshots in the top five. For example, it was typical

to see the top five separated by about a 2% difference in their overall

power consumption. This was well within the possible error range that

was measured for the EER for each device and was likely the reason for

this ranking variation.

Overall, the results for RQ1 show that the dLens tool is very accu-

rate. For the EER, the estimated power was within 14% of the ground

truth for all devices. Even with a non-zero EER, dLens was able to accu-

rately rank the UIs as verified by the ground truth measurements. The

minor variations seen in the rankings could be attributed to the small

size of the EER. Both of these accuracy measurements are important

for developers, as they indicate that their design changes can be made

with a high degree of confidence in both the actual estimates and the

relative ranking of the UI screenshots.

4.4 RQ2: generalizability

This research question evaluates the generalizability of the results

computed by dLens . This research question addresses a potential limi-

tation to the usage of the approach. Namely, the screenshots captured

by the approach and the corresponding DEP reflect the power and

energy usage of UIs displayed on a particular set of devices. In prac-

tice, this set would be the device(s) the developer has available for

8 of 15 WAN ET AL.

testing purposes. However, other devices are likely to vary in terms

of screen resolution and power consumption characteristics. To better

understand this potential limitation, this research question evaluates

how well the rankings for one device match the rankings that would

be computed for other devices using their own DEP. If the results of

the dLens approach are generalizable across mobile devices, develop-

ers can use the results from their own devices as a proxy for other or

similar devices.

To answer this research question, the similarity of the rankings gen-

erated by dLens for each of the mobile devices was compared. For

this experiment, dLens was run for each device (ie, using its own DEP)

on the set of all screenshots for each app. For each app, its screen-

shot rankings were compared against those computed for the other

devices. To compare the rankings, the Spearman’s rank correlation coef-

ficient, explained in Section 4.3, was used. A pair-wise comparison of

the rankings generated for each of the four devices was performed. The

average Spearman’s correlation coefficient across all apps for each of

the devices is shown in Table 2.

The similarity of the top n entries of each ranking was also investi-

gated, since developers may only check the top entries instead of the

entire list. To do this, the top n of each device’s ranking were treated

as a set and its overlap with that of the other devices was computed.

For each such comparison, the cardinality of the intersection of the two

set’s intersection was computed. This result is shown in Table 3 for n

equal to 5 and 10. The results show a high similarity in the top n of the

rankings. The average overlap between rankings is over 4 for the top 5

and over 9 for the top 10, demonstrating that this similarity applies to

the portion of the ranking most likely to be used by developers.

Overall, the results show that the rankings generated by dLens for

each of the mobile devices were, in fact, highly similar. The implica-

tions of this finding are that the results of running dLens on one device

are similar to those for other devices. More broadly, this indicates that

energy-reducing redesigns undertaken by developers based on results

from one device are likely to also reduce energy on other devices.

TABLE 2 Average Spearman’s correlation coefficient of rankingsbetween devices

Base Device 𝜇OLED S2 Nexus S5

𝜇OLED - 0.9874 0.9849 0.9888

S2 0.9874 - 0.9985 0.9990

Nexus 0.9849 0.9985 - 0.9942

S5 0.9888 0.9990 0.9942 -

TABLE 3 Average common screenshots in top 5 and top 10 betweendevices

Top 5 Top 10

𝜇OLED vs S2 4.33 9.47

𝜇OLED vs Nexus 4.6 9.67

𝜇OLED vs S5 4.47 9.2

S2 vs Nexus 4.47 9.33

S2 vs S5 4.07 9

Nexus vs S5 4.27 9

TABLE 4 The differences between rankings with and withoutexcluding ads

Top 5 Top 10 Rank

overlap overlap correlation

Arcus Weather 4 6 0.8263

Drudge Report 5 9 0.9687

English for kids learning free 3 9 0.9822

Retro Camera 2 7 0.3908

Restaurant Finder 4 8 0.9745

4.5 RQ3: ad impact

This research question evaluates the impact of mobile ads on the rank-

ings reported by the dLens approach. Essentially, it is measuring the

impact on the rankings of the mechanism to exclude advertisements

described in step 1 (Section 2.1). To measure this impact, the dLens

approach was run twice, the first time excluding the ad portions of the

screenshots and the second time including the ad portions. The rank-

ings generated by the two variations of the dLens approach were then

compared. The results of this analysis are shown in Table 4. First, the

rankings of all of the screenshots were compared using the Spearman’s

rank correlation coefficient. This is shown in the table as “Rank Cor-

relation.” The amount of overlap in the top n of the rankings was also

computed by treating the top n screenshots as a set and computing the

cardinality of the intersection of the two rankings. The results of this

comparison are shown in the columns labeled “Top N Overlap” where

N was set to 5 and 10. The results show that for all apps and com-

parisons except for one (top 5 for Drudge Report) ads could affect the

rankings. The magnitude of this impact varied significantly. For Arcus

Weather and Retro Camera, the impact was significantly higher than

for the other apps, such as Drudge Report and Restaurant Finder. The

results differed due to a number of reasons, including the prevalence

of ads and the appearance of the rest of the UI. These results indicate

that although ads account for a small portion of the UI display, for some

apps, they can have a significant impact on the rankings of the DEHs

and thus it is useful to have a mechanism to exclude them from the

DEH calculations.

4.6 RQ4: analysis time

This research question addresses the time needed to analyze an app

using the dLens approach. The overall time needed to run the dLens

approach and the time for three of the steps (instrumentation, power

estimation, and color transformation) was measured. Note that for this

RQ, the time to replay the workload for each app was not included as

this time is under developer control.

The analysis times are shown in Table 5. Only the analysis time for

the S2 DEP was reported because the results were very similar for all

four devices. The time for the color transformation is shown as “TC”

and the time for power estimation is shown as “TE.” For the five apps

with ads, which required instrumentation, the required time is shown

as “TI.” The total time is shown as “All UI”, which also includes time for file

processing, ranking, etc. Since each app varies in terms of the number

of screenshots in its workload, the time measurement was also normal-

ized by dividing the total time by the number of screenshots captured

WAN ET AL. 9 of 15

TABLE 5 Analysis time of the dLens approach

Name TC, s TE, s TI , s All UIs, s Per UI, s

Facebook 1470 7 - 1477 12

Facebook Messenger 997 3 - 1001 18

FaceQ 1145 5 - 1151 12

Instagram 2799 6 - 2806 30

Pandora internet radio 1418 4 - 1423 19

Skype 871 3 - 875 13

Snapchat 1444 8 - 1453 10

Super-Bright LED Flashlight 863 1 - 865 43

Twitter 1316 6 - 1323 13

WhatsApp Messenger 897 3 - 901 13

Arcus Weather 879 2 1 883 25

Drudge Report 1192 2 1 1195 48

English for kids learning free 1377 4 1 1382 32

Retro Camera 1899 3 1 1903 66

Restaurant Finder 1320 3 1 1324 32

Abbreviation: UI, user interface.

in each app’s workload. This value is shown in the column labeled “Per

UI.” All time measurements are shown in seconds.

The average time for dLens to analyze an app was 22 minutes and

ranged from 14 to 46 minutes. Although this amount can be considered

high, it was directly dependent on the overall size of the set of screen-

shots, which ranged from 20 to 142. Therefore, the per screenshot

number, which ranged from 12 to 66 seconds, is informative. For each

screenshot, it was observed that most of the time was taken by the

generation of the CTS. The runtime of the CTS algorithm is expo-

nential with respect to the number of colors present in a screenshot.

The Nyx approach uses an approximation algorithm to solve this

problem. Therefore, this aspect of the approach can be sped up,

if needed, by accepting lower quality approximations. However, this

may have a trade-off in terms of accuracy of the computed results

and rankings.

4.7 RQ5: potential impact

This research question investigated the potential impact of dLens in

two ways. The first was by determining how many market apps con-

tained DEHs, and the second was by computing the amount of energy

savings that could be realized by transforming the apps that contained

DEHs. This analysis was performed on both the subject apps listed in

Table 1 and on a much larger sample of 1082 random Android mar-

ket apps.

Because of the large number of apps, the evaluation process was

fully automated. The execution of the apps and the capture of their

initial home page was automated by using the adb [29] tool from the

Android software development kit. One challenge was to automatically

detect when the initial page of an app was valid (ie, finished loading).

This problem was solved with the heuristic that the screenshot was cap-

tured five seconds after the app started. For almost all of the apps, this

was sufficient time for the initial UI to load and display. To ensure that all

the apps had been executed successfully, all screenshots were manually

checked, and invalid screenshots, such as those representing crashed

apps, were removed. In total, screenshots of 962 apps were valid and

dLens was run on this set.

For the subject apps, the potential savings are shown in Figure 3. As

the figure shows, the savings varied from 0% to 28%. Flashlight’s poten-

tial savings were low because it has an almost all black background,

which means it was already close to optimal and there were only minus-

cule improvements to be realized. For the market apps, dLens found

that 398 of the 962 apps contained DEHs. That means that for 41% of

the examined apps, their main page consumes more display power than

the optimized version. On average, the optimized versions would con-

sume 30% less energy than the original versions. For some apps, this

number was as large as 50%. For the apps with DEHs, Figure 4 shows

how much energy could be saved by using the optimized version. In this

figure, a point (X,Y) on the line means that there are X apps that could

consume at least Y% less energy than their original version.

The app with the largest potential energy saving was “Bible Study.”

This app’s original design and optimized design are shown in Figure 5.

In Bible Study, the original design uses a significant amount of white as

the background color, which consumes more energy than other colors.

However, by using black as the background color, as shown in Figure 5B,

a significant amount of energy could be saved. Information about the

top 10 most energy-inefficient apps is shown in Table 6.

The category distribution of the apps, per their app store category,

was also analyzed. The categories that involved extensive reading or

presented textual information, such as Communication and News &

Magazines, tended to have a higher ratio of apps with DEHs (above

43%). The categories that contained more video and graphical infor-

mation tended to have fewer apps with DEHs (below 20%). A possi-

ble explanation for this is that developers of text or reading oriented

apps prefer to use light colored backgrounds (eg, white) to mimic the

traditional print media colors (eg, newspapers) or because this color

combination is considered more readable. The increased readability of

light-colored apps is supported by evidence from user studies in prior

work [6], but this same study also showed that users would prefer

energy savings over a small decrease in readability. This suggests that

developers of these types of apps might improve user satisfaction by

optimizing the color of their apps’ UIs and explaining to end users the

energy related benefits of the revamped design.

The UI color choices for the apps were also investigated. The apps

that did not contain DEHs tended to use black (#000000) as the main

color. Here, a color is considered the main color of a screenshot if it cov-

ers the most pixels in the screenshot. On average, black covered 79.3%

of the pixels of the screenshots of those apps. Dimgray (#696969), dark-

slategray (#2F4F4F), darkgray (#A9A9A9), and gray (#808080) were

also commonly used as secondary colors in these apps. These colors

covered 8%, 5%, 3%, and 2% of the pixels, respectively, of the screen-

shots. The color combination black:darkgray:gray:white:dimgray with

the ratio 794:32:13:10:6 was the most commonly used combination

in apps without DEHs. It was used in 23% of apps without DEHs and

on average comprised 86% of the screenshots. In the apps with DEHs,

white, dimgray, and whitesmoke (#F5F5F5) were the three most pop-

ular main colors. They were used as the main colors in 42%, 12%, and

10% of the apps, respectively. On average, each of these colors covered

65%, 56%, and 55% of the pixels on the screen.

The results of this RQ showed that many apps in the Android market

are not optimized in terms of display energy efficiency and that signif-

icant savings could be realized through their optimization. The results

10 of 15 WAN ET AL.

FIGURE 3 The average estimated power savings of the subject apps

FIGURE 4 The number of apps with display energy hotspots

(A) (B)

FIGURE 5 Transformed and original screenshots of the most energy-inefficient app

WAN ET AL. 11 of 15

TABLE 6 The ten apps with the largest DEHs

Potential Power

App Package Name Savings (%)

biblereader.olivetree 50

com.amirnaor.happybday 49

com.adp.run.mobile 49

com.airbnb.android 49

com.darkdroiddevs.blondJokes 49

com.chapslife.nyc.traffic.free 49

com.al.braces 48

appinventor.ai_vishnuelectric.notextwhiledriving2_checkpoint1 48

appinventor.ai_freebies_freesamples_coupons.StoreCoupons 48

bazinga.emoticon 48

Abbreviation: DEH, display energy hotspot.

of this RQ also revealed basic trends and UI design patterns in both

apps with and without DEHs. These results show that dLens can gen-

erate useful and actionable information that can have a large potential

impact in terms of helping developers optimize the display energy for

their apps above and beyond the detection of DEHs.

4.8 Threats to validity

A possible threat to the external validity of the results is the selec-

tion of only the most popular apps and the workload generation. For

app selection, although the most popular apps may not be represen-

tative of all apps, choosing subjects in this way eliminated the threat

of selection bias for the subjects and made it possible to argue that

the approach is useful for even well-engineered apps. Furthermore, the

results of RQ4 show the general applicability of the approach to a wide

range of other apps. Even though the workloads were generated by this

paper’s authors, the workloads used the primary features of the apps,

which were easy to identify in all cases. It is important to note that the

usefulness of the approach does not depend on accurately identifying

typical or representative workloads, as the approach can be used for

any workload of interest.

A threat to the internal validity of the results could exist if the

mechanism of establishing ground truth or the approximation of the

CTS generated by the Nyx-based method was inaccurate. To ensure

the accuracy of the ground truth measurements, the protocols were

developed and tested to ensure that they reliably and accurately cap-

tured power measurements. The protocols are also based on prior

approaches [6,22,30]. The accuracy of the Nyx-based CTS has been

established in prior work [6].

A threat to internal validity is that that the screenshots may contain

additional ECs whose colors should not be optimized. To determine the

magnitude of this threat to validity, a study was performed to quan-

tify the potential impact of not excluding dynamic images and text.

In this study, the screenshots of nine apps (representing 298 screen-

shots) were manually modified to remove the display area associated

with EC from the display energy calculation. Then the difference in

power between the original and modified versions of the screenshots

was calculated. The median power difference was 0.81% and the aver-

age difference was 1.8%. Although these numbers are small, they are

high enough to affect the rankings of some apps. Therefore, a failure to

properly exclude EC could lead to misranked screenshots.

A final threat to internal validity is that the approach assumes that

the CTSs generated by Nyx represents an aesthetically acceptable ver-

sion of the UIs. This assumption is reasonable based on prior studies

that measured end users’ assessment of the aesthetics of the generated

color schemes. In an end user study of the color schemes generated

by Nyx, it was found that although the readability and attractiveness

of the transformed UIs were rated slightly lower than the originals,

users overwhelmingly preferred the new color schemes when made

aware of the energy trade-offs [6]. More recent approaches that incor-

porate additional aesthetic constraints into the generation of color

schemes report even better preference results [20]. Taken together,

these results indicate that the automatically generated color schemes

represent reasonable proxies for an aesthetically acceptable version of

the analyzed UIs.

For RQ1, a threat to validity is that the accuracy of only the top-five

ranked screenshots and a random sample of the remaining screenshots

were measured. Sampling was used since the screenshot isolation pro-

cess described in Section 3 is very time consuming and it was not feasi-

ble to measure the power and energy of every screenshot. Nonetheless,

the experiment evaluation included over 1800 such measurements to

evaluate the accuracy of the four devices for all of the experiments

required by the RQs. The subset was chosen at random to eliminate any

potential selection bias, and there is not a reason to believe that the

results would differ with a larger subset. The raw data is available via

the dLens project page [31].

For RQ2, there are two threats to validity. The first of these is a

threat to criterion validity for the selection of ranking as the metric

that was used to evaluate consistency across devices. An alterna-

tive would have been to use power measurements. However, rank-

ing is the more appropriate metric since developers would be likely

to prioritize their work based on rank instead of actual power dif-

ferences. Second, a threat to external validity for RQ2 is that only

four power models were used, and three of them are manufactured

by Samsung. Despite the results that showed generalizability across

devices, it is likely that some phones may have DEP cost functions

that will result in different rankings due to different underlying hard-

ware mechanisms. Therefore, the results will not always generalize

to all devices. Future investigations into what enables more reliable

12 of 15 WAN ET AL.

result generalizability will enable stronger conclusions to be made on

this aspect.

For RQ3, a threat to validity is that only the ranking differences for

five apps were computed. This was due to the limitations of reverse

engineering tools and the use of obfuscation techniques in the apps.

However, these five apps cover different app categories, and it is likely

that the results will be similar when considering a larger set of apps

containing ads.

5 RELATED WORK

This paper extends prior work [32] by the authors in the following ways:

(1) dLens was enhanced to be able to exclude dynamic content or con-

tent that should not be considered in the identification of the DEHs. For

advertisements, a fully automated mechanism was developed for iden-

tifying and excluding the portion of the screen they occupy. This mecha-

nism also allows developers to manually specify other areas to exclude

from the analysis. (2) The size of the evaluation was expanded by adding

five additional apps and adding an additional mobile phone platform,

the Samsung Galaxy S5, to evaluate the approach. Overall, these exten-

sions improve the accuracy of the technique and the generalizability of

the results.

The approach to building the DEP is based on research work per-

formed by Dong and colleagues [22,30]. In their work they constructed

a power model for a commercial QVGA OLED display module. In

their power model, they demonstrated the linear relationship between

power consumption and sRGB value and achieved 90% accuracy in

the display power estimation. Other power models for 𝜇OLED screens

have also been proposed and could also be used to implement the DEP.

Zhang [11] built a quadratic model for OLED screens, which increased

estimation accuracy. Kim and colleagues [33] modified Dong and col-

leagues’ model by considering the brightness and sum of RGB values

for AMOLED screens. Mittal and colleagues [34] refined a power model

based on threshold of RGB component values. However, these works

only focused on developing new techniques for modeling the power

consumption of 𝜇OLED screens and did not detect DEHs or give opti-

mization guidance to developers.

Many approaches have been proposed to help developers reduce

the energy consumption of UIs by manipulating their colors. The dLens

approach builds on prior work in the area, Nyx [6,19,35], to conduct

step 3 of the approach. Linares-Vásquez and colleagues [20] showed

that the same problem could be solved using a multi-objective genetic

algorithm–based approach. Prior to Nyx, Dong and colleagues [22,36]

proposed a method to generate CTSs and then later used this to build

a color transforming browser that could reduce the energy of the web

pages it displayed [9,10]. On the basis of a screen space variant energy

model, Chuang and colleagues [37] presented an approach to gener-

ate energy-efficient color designs by using iso-lightness colors. Wang

and colleagues [38] proposed a technique to find an energy-saving

color scheme for sequential data on OLED screens. Kamijoh and col-

leagues reduced the display energy of the IBM Wristwatch by reducing

the number of white pixels [39]. All of these techniques assume that a

DEH has been located. As such, these techniques can be seen as com-

plementary to the approach described in this paper. Once a DEH has

been identified by dLens, these techniques can help guide developers

to choose new color schemes that can reduce energy while maintaining

the aesthetics of the app’s UI.

Other proposed ways to save display energy include darkening the

unimportant parts of a screen. Iyer and colleagues [40] proposed a

method to reduce the energy consumption of OLED screens by dark-

ening the user-unfocused areas. Wee and colleagues [41] designed an

approach to reduce the power consumption of gaming on OLED dis-

plays by dimming noninteresting parts. Tan and colleagues [42] also

proposed a tool called Focus to reduce OLED display power by dimming

less important regions. Chen and colleagues [43] reduced OLED dis-

play power by using a dimming scheme to eliminate undesired details.

Lin and colleagues [44] provided an OLED power saving technique by

distorting image regions according to visual attention level. For LCD

displays, energy optimization is mainly achieved by reducing external

brightness [45,46] or refresh rates [47] and adapting the content to the

brightness change [48].

Sampson and colleagues [49] developed a tool called WebChar to

evaluate browser performance and the energy consumption of differ-

ent code features in HTML and CSS, which guides developers to opti-

mize browsers or web applications. This work focused on the impact of

HTML and CSS code structure on energy consumption, whereas dLens

focuses on UI color design.

Many approaches have also been proposed to model the power

consumption of other components in mobile devices (ie, nondisplay

components). Shye and colleagues [50] obtained power models for

all components of an Android G1 based on measurements of a log-

ger they developed. Negri and colleagues [51] treated applications as

Finite-state Machines (FSMs) and built power models through mea-

surement of selected states. Several approaches [11,52,53] acquired

power models automatically based on the battery behavior of mobile

devices. Other techniques estimate energy consumption based on dif-

ferent models. Previous work [14,15,54] estimated energy consump-

tion at the source line level. Tiwari and colleagues [55,56] modeled the

CPU energy of hardware instructions. Eprof [57] modeled energy with

a state machine. Wang and colleagues [58] estimated the power con-

sumption of mobile applications with profile-based battery traces. Li

and colleagues [59] proposed Bugu, an application level power profiler

and analyzer for mobile phones. Tsao and colleagues [60] estimated the

energy consumption of I/O requests in application processes. Xiao and

colleagues [61] proposed a methodology to build system-level power

models for different components of mobile devices without measure-

ments.

In addition, many other researchers have empirically investigated

the energy consumption of Android apps [4], and the energy impacts

of mobile advertisement [62,63], software changes [64,65,66,67], user

choice of applications [68], blocking advertisements [69], refactor-

ing [70,71], obfuscation [72,73], and test suite selection [74]. Other

researchers have proposed new techniques to detect different kinds

of energy bugs. Pathak and colleagues defined energy bugs [75] and

proposed an automatic technique to detect energy bugs on smart-

phones [7]. However, they only focused on identifying “wakelock bugs.”

Linares-Vásquez [76] studied energy greedy API usage in Android

apps. Liu and colleagues [77] proposed an automated approach to

detect energy bugs that did not deactivate sensors and misused sensor

WAN ET AL. 13 of 15

data. Other researchers have focused on optimization of mobile apps.

Kwon and colleagues [78] optimized the energy consumption by

offloading partial app functionality to the cloud. Li and colleagues

[79,80] used a proxy server to optimize the HTTP requests of mobile

apps. For Green Mining, Romansky and colleagues [81] introduced a

search-based approximation method to accelerate harvesting energy

profiles while minimizing the accuracy loss. In addition, Hindle and col-

leagues [82] proposed a dedicated hardware Mining Software Repos-

itories (MSR)-based test harness called Green Miner to facilitate the

research of Green Mining.

6 CONCLUSIONS

This paper presented a new technique for detecting DEHs in mobile

apps. A DEH is defined as a UI of a mobile app whose energy con-

sumption is higher than that of an energy-optimized but functionally

equivalent UI. The approach detects DEHs with five steps. First, the

approach processes the target app and instruments mobile ads so that

their location can be identified at runtime. Second, the approach exe-

cutes a developer specified workload for the target app on a mobile

device and captures screenshots of the display and ad related informa-

tion. Third, the approach generates a CTS for each captured screen-

shot and produces an approximation of an energy-efficient version.

Fourth, it analyzes each screenshot and its alternative version to deter-

mine their expected display energy. Finally, the approach calculates the

power and energy difference between the two versions of screenshots

and provides developers with a list of DEHs. The approach was imple-

mented in a prototype tool, dLens, and evaluated on real-world popular

mobile apps. The results of the evaluation showed that the approach

was able to accurately estimate display power consumption and rank

screenshots with DEHs. Furthermore, the technique was able to detect

DEHs in 398 Android market apps. Overall, the results are very promis-

ing and indicate that the technique is a viable and potentially impactful

technique for helping developers to reduce the energy consumption of

their mobile apps.

ACKNOWLEDGEMENTS

This work was supported, in part, by the National Science Foundation

under grant numbers CCF-1321141 and CCF-1619455.

REFERENCES

1. Inc A., Press Releases. http://www.apple.com/pr/library.

2. C. Warren, Google Play Hits 1 Million Apps. Mashable. Retrieved, 4 June2014.

3. T. Ahonen, Lets Do 2014 Numbers for the Mobile Industry: Now we are at100% Mobile Subscription Penetration Rate Per Capita Globally, RetrievedMay 12, 2014.

4. D. Li et al., An empirical study of the energy consumption of androidapplications, Software Maintenance and Evolution (ICSME), 2014IEEE International Conference on, 2014, Washington, DC, USA, pp.121–130.

5. D. Li and W. G. J. Halfond, An investigation into energy-saving program-ming practices for android smartphone app development, Proceedingsof the 3rd International Workshop on Green and Sustainable Software,GREENS 2014, ACM, New York, NY, USA, 2014, pp. 46–53.

6. D. Li, A. H. Tran, and W. G. J. Halfond, Making web applicationsmore energy efficient for OLED smartphones, Proceedings of the 36th

International Conference on Software Engineering, ICSE 2014, ACM, NewYork, NY, USA, 2014, pp. 527–538.

7. A. Pathak et al., What is keeping my phone awake? Characterizing anddetecting no-sleep energy bugs in smartphone apps, Proceedings of the10th International Conference on Mobile Systems, Applications, and Ser-vices, MobiSys ’12, ACM, New York, NY, USA, 2012, pp. 267–280.

8. A. Barredo, A comprehensive look at smartphone screen size statistics andtrends. Medium.

9. M. Dong and L. Zhong, Chameleon: a color-adaptive web browser formobile OLED displays, Proceedings of the 9th International Conference onMobile Systems, Applications, and Services, MobiSys ’11, ACM, New York,NY, USA, 2011, pp. 85–98.

10. M. Dong and L. Zhong, Chameleon: a color-adaptive web browser formobile OLED displays, IEEE Trans. Mobile Comput. 11 (2012), no. 5,724–738.

11. L. Zhang. (2013). Power, performance modeling and optimization formobile system and applications, Ph.D. Thesis.

12. C. Sahin et al., Initial explorations on design pattern energy usage,Proceedings of the First International Workshop on Green and Sustain-able Software, GREENS ’12, IEEE Press, Piscataway, NJ, USA, 2012,pp. 55–61.

13. I. Manotas, L. Pollock, and J. Clause, SEEDS: A software engineer’senergy-optimization decision support framework, Proceedings of the36th International Conference on Software Engineering, ICSE 2014, ACM,New York, NY, USA, 2014, pp. 503–514.

14. D. Li et al., Calculating source line level energy information for androidapplications, Proceedings of the 2013 International Symposium on Soft-ware Testing and Analysis, ISSTA 2013, ACM, New York, NY, USA, 2013,pp. 78–89.

15. S. Hao et al., Estimating mobile application energy consumption usingprogram analysis, Proceedings of the 2013 International Conference onSoftware Engineering, ICSE ’13, IEEE Press, Piscataway, NJ, USA, 2013,pp. 92–101.

16. I. J. M. Ruiz et al., Impact of Ad libraries on ratings of android mobile apps,IEEE Software 31 (2014), no. 6, 86–92.

17. S. Hao et al., PUMA: Programmable UI-automation for large-scaledynamic analysis of mobile apps, Proceedings of the 12th Annual Interna-tional Conference on Mobile Systems, Applications, and Services, MobiSys’14, ACM, New York, NY, USA, 2014, pp. 204–217.

18. L. Gomez et al., RERAN: Timing- and touch-sensitive record and replayfor android, Proceedings of the 2013 International Conference on SoftwareEngineering, ICSE ’13, IEEE Press, Piscataway, NJ, USA, 2013, pp. 72–81.

19. D. Li, A. H. Tran, and W. G. J. Halfond, Nyx: A Display Energy Optimizerfor Mobile Web Apps, Proceedings of the 2015 10th Joint Meeting onFoundations of Software Engineering, ESEC/FSE 2015, ACM, New York,NY, USA, 2015, pp. 958–961.

20. M. Linares-Vásquez et al., Optimizing energy consumption of GUIs inandroid apps: A multi-objective approach, Proceedings of the 2015 10thJoint Meeting on Foundations of Software Engineering, ESEC/FSE 2015,ACM, New York, NY, USA, 2015, pp. 143–154.

21. Recognized color keyword names. http://www.w3.org/TR/SVG/types.html.

22. M. Dong and L. Zhong, Power modeling and optimization for OLED dis-plays, IEEE Trans. Mobile Comput. 11 (2012), no. 9, 1587–1599.

23. Inc. Monsoon Solutions, Power Monitor. https://www.msoon.com/LabEquipment/PowerMonitor/.

24. DEP coefficients. https://sites.google.com/site/dlensproject/dep-coefficients.

25. A tool for reverse engineering Android apk files. https://ibotpeaches.github.io/Apktool/.

26. dex2jar. https://github.com/pxb1988/dex2jar.

27. ASM. http://asm.ow2.org/.

28. Android Screenshots and Screen Capture. http://sourceforge.net/projects/ashot/.

29. G. Inc., Android Debug Bridge. http://developer.android.com/tools/help/adb.html.

14 of 15 WAN ET AL.

30. M. Dong, Y-SK Choi, and L. Zhong, Power modeling of graphical userinterfaces on OLED displays, Proceedings of the 46th Annual DesignAutomation Conference, DAC ’09, ACM, New York, NY, USA, 2009,pp. 652–657.

31. dLens. https://sites.google.com/site/dlensproject/.

32. M. Wan et al., Detecting display energy hotspots in android apps, 2015IEEE 8th International Conference on Software Testing, Verificationand Validation (ICST), Washington, DC, USA, 2015, pp. 1–10.

33. D. Kim, W. Jung, and H. Cha, in Runtime power estimation of mobileAMOLED displays, Design, Automation Test in Europe ConferenceExhibition (DATE), 2013, Washington, DC, USA, 2013, pp. 61–64.

34. R. Mittal, A. Kansal, and R. Chandra, Empowering developers to esti-mate app energy consumption, Proceedings of the 18th Annual Interna-tional Conference on Mobile Computing and Networking, Mobicom ’12,ACM, New York, NY, USA, 2012, pp. 317–328.

35. D. Li, A. H. Tran, and W. G. J. Halfond, Optimizing display energy con-sumption for hybrid android apps (invited talk), Proceedings of the 3rdInternational Workshop on Software Development Lifecycle for Mobile,DeMobile 2015, ACM, New York, NY, USA, 2015, pp. 35–36.

36. M. Dong, Y-SK Choi, and L. Zhong, Power-saving color transforma-tion of mobile graphical user interfaces on OLED-based displays, Pro-ceedings of the 2009 ACM/IEEE International Symposium on Low PowerElectronics and Design, ISLPED ’09, ACM, New York, NY, USA, 2009,pp. 339–342.

37. J. Chuang, D. Weiskopf, and T. Möller, Energy aware color sets, Comput.Graphics Forum 28 (2009), no.2, 203–211.

38. J. Wang, X. Lin, and C. North, GreenVis: energy-saving color schemes forsequential data visualization on OLED displays, 2012.

39. N. Kamijoh et al., Energy trade-offs in the IBM wristwatch computer,Proceedings of the 5th IEEE International Symposium on WearableComputers, ISWC ’01, IEEE Computer Society, Washington, DC, USA,2001, pp. 133–140.

40. S. Iyer et al., Energy-adaptive display system designs for future mobileenvironments, Proceedings of the 1st International Conference on MobileSystems, Applications and Services, MobiSys ’03, ACM, New York, NY,USA, 2003, pp. 245–258.

41. T. K. Wee and R. K. Balan, Adaptive display power management forOLED displays, Proceedings of the First ACM International Workshop onMobile Gaming, MobileGames ’12, ACM, New York, NY, USA, 2012,pp. 25–30.

42. K. W. Tan et al., FOCUS: A usable & effective approach to OLED displaypower management, Proceedings of the 2013 ACM International JointConference on Pervasive and Ubiquitous Computing, UbiComp ’13, ACM,New York, NY, USA, 2013, pp. 573–582.

43. H. Chen et al., An image-space energy-saving visualization scheme forOLED displays, Comput. Graphics (2014).

44. C. H. Lin, C-K Kang, and P. C. Hsiu, in Catch your attention:Quality-retaining power saving on mobile OLED displays, 2014 51stACM/EDAC/IEEE Design Automation Conference (DAC), New York,NY, USA, 2014, pp. 1–6.

45. H. Shim, N. Chang, and M. Pedram, A backlight power management frame-work for battery-operated multimedia systems, IEEE Design Test of Com-put. 21 (2004), no. 5, 388–396.

46. A. Iranli and M. Pedram, in DTM: Dynamic tone mapping for back-light scaling, Proceedings. 42nd Design Automation Conference, 2005,2005, New York, NY, USA, pp. 612–616.

47. A. K. Bhowmik and R. J. Brennan, in System-level display power reduc-tion technologies for portable computing and communications devices,Portable Information Devices, 2007. Portable07. IEEE InternationalConference on, Washington, DC, USA, 2007, pp. 1–5.

48. A. Iranli, H. Fatemi, and M. Pedram, HEBS: Histogram equalizationfor backlight scaling, Proceedings of the Conference on Design, Automa-tion and Test in Europe - Volume 1, DATE ’05, IEEE Computer Society,Washington, DC, USA, 2005, pp. 346–351.

49. A. Sampson et al., Automatic discovery of performance and energy pit-falls in HTML and CSS, Workload Characterization (IISWC), 2012 IEEEInternational Symposium on, Washington, DC, USA, 2012, pp. 82–83.

50. A. Shye, B. Scholbrock, and G. Memik, Into the wild: Studying real useractivity patterns to guide power optimizations for mobile architec-tures, Proceedings of the 42nd Annual IEEE/ACM International Sympo-sium on Microarchitecture, MICRO 42, ACM, New York, NY, USA, 2009,pp. 168–178.

51. L. Negri, D. Barretta, and W. Fornaciari, Application-level power man-agement in pervasive computing systems: A case study, Proceedings ofthe 1st Conference on Computing Frontiers, CF ’04, ACM, New York, NY,USA, 2004, pp. 78–88.

52. L. Zhang et al., Accurate online power estimation and automatic bat-tery behavior based power model generation for smartphones, Hard-ware/Software Codesign and System Synthesis (CODES+ISSS), 2010IEEE/ACM/IFIP International Conference on, 2010, New York, NY,USA, pp. 105–114.

53. M. Dong and L. Zhong, Self-constructive high-rate system energymodeling for battery-powered mobile systems, Proceedings of the 9thInternational Conference on Mobile Systems, Applications, and Services,MobiSys ’11, ACM, New York, NY, USA, 2011, pp. 335–348.

54. S. Hao et al., Estimating android applications’ CPU energy usage viabytecode profiling, Proceedings of the First International Workshop onGreen and Sustainable Software, GREENS ’12, IEEE Press, Piscataway,NJ, USA, 2012, pp. 1–7.

55. V. Tiwari, S. Malik, and A. Wolfe, Power analysis of embedded software: Afirst step towards software power minimization (1994), 384–390.

56. V. Tiwari et al., Instruction level power analysis and optimization ofsoftware, VLSI Design, 1996. Proceedings., Ninth International Con-ference on, Washington, DC, USA, 1996, pp. 326–328.

57. A. Pathak, Y. C. Hu, and M. Zhang, Where is the energy spent insidemy app?: fine grained energy accounting on smartphones with Eprof,Proceedings of the 7th ACM European Conference on Computer Systems,EuroSys ’12, ACM, New York, NY, USA, 2012, pp. 29–42.

58. C. Wang et al., Power estimation for mobile applications withprofile-driven battery traces, Low Power Electronics and Design(ISLPED), 2013 IEEE International Symposium on, Washington, DC,USA, 2013, pp. 120–125.

59. Y. Li, H. Chen, and W. Shi, Power behavior analysis of mobile applicationsusing bugu, Sustainable Comput. Inf. Syst. (2014).

60. S-L Tsao, C-K Yu, and Y-H Chang, Architecture of Computing Sys-tems – ARCS 2013: 26th International Conference, Prague, Czech Republic,February 19-22, 2013. Proceedings, Springer Berlin Heidelberg, Berlin,Heidelberg, 2013, pp. 195–206.

61. Y. Xiao et al., A system-level model for runtime power estimation

on mobile devices, Green Computing and Communications (GREEN-

COM), 2010 IEEE/ACM Int’l Conference on Int’l Conference on Cyber,

Physical and Social Computing (CPSCom), Washington, DC, USA, 2010,

pp. 27–34.

62. J. Gui et al., Truth in advertising: the hidden cost of mobile Ads for

software developers, Proceedings of the 37th International Conference onSoftware Engineering - Volume 1, ICSE ’15, IEEE Press, Piscataway, NJ,

USA, 2015, pp. 100–110.

63. J. Gui et al., Lightweight measurement and estimation of mobile Ad

energy consumption, Proceedings of the 5th International Workshop onGreen and Sustainable Software, GREENS ’16, ACM, New York, NY, USA,

2016, pp. 1–7.

64. A. Hindle, in Green mining: Investigating power consumption across

versions, 2012 34th International Conference on Software Engineer-

ing (ICSE), Washington, DC, USA, 2012, pp. 1301–1304.

65. A. Hindle, in Green mining: A methodology of relating software changeto power consumption, 2012 9th IEEE Working Conference on MiningSoftware Repositories (MSR), Washington, DC, USA, 2012, pp. 78–87.

66. K. Aggarwal et al., The power of system call traces: Predicting

the software energy consumption impact of changes, Proceedings of24th Annual International Conference on Computer Science and Soft-ware Engineering, CASCON ’14, IBM Corp., Riverton, NJ, USA, 2014,

pp. 219–233.

WAN ET AL. 15 of 15

67. A. Hindle, Green Mining: a Methodology of Relating Software Change andConfiguration to Power Consumption, Empirical Software Eng. 20 (2015),no. 2, 374–409.

68. C. Zhang, A. Hindle, and D. M. German, The impact of user choice onenergy consumption, IEEE Software 31 (2014), no. 3, 69–75.

69. K. Rasmussen, A. Wilson, and A. Hindle, Green mining: Energy con-sumption of advertisement blocking methods, Proceedings of the 3rdInternational Workshop on Green and Sustainable Software, GREENS2014, ACM, New York, NY, USA, 2014, pp. 38–45.

70. C. Sahin, L. Pollock, and J. Clause, How do code refactorings affectenergy usage?, Proceedings of the 8th ACM/IEEE International Symposiumon Empirical Software Engineering and Measurement, ESEM ’14, ACM,New York, NY, USA, 2014, pp. 36:1–36:10.

71. W. G. da Silva et al., Evaluation of the impact of code refactoring onembedded software efficiency, Proceedings of the 1st Workshop deSistemas Embarcados, Porto Alegre, Brazil, 2010, pp. 145–150.

72. C. Sahin et al., How Does Code Obfuscation Impact Energy Usage?Proceedings of the 2014 IEEE International Conference on Software Main-tenance and Evolution, ICSME ’14, IEEE Computer Society, Washington,DC, USA, 2014, pp. 131–140.

73. C. Sahin et al., How does code obfuscation impact energy usage? J. Soft-ware: Evol. Process (2016).

74. D. Li et al., Integrated energy-directed test suite optimization, Proceed-ings of the 2014 International Symposium on Software Testing and Analysis,ISSTA 2014, ACM, New York, NY, USA, 2014, pp. 339–350.

75. A. Pathak, Y. C. Hu, and M. Zhang, Bootstrapping energy debugging onsmartphones: A first look at energy bugs in mobile devices, Proceedingsof the 10th ACM Workshop on Hot Topics in Networks, HotNets-X, ACM,New York, NY, USA, 2011, pp. 5:1–5:6.

76. M. Linares-Vásquez et al., Mining energy-greedy API usage patterns in

android apps: An empirical study, Proceedings of the 11th Working Con-ference on Mining Software Repositories, MSR 2014, ACM, New York, NY,

USA, 2014, pp. 2–11.

77. Y. Liu et al., GreenDroid: automated diagnosis of energy inefficiency forsmartphone applications, IEEE Trans. Software Eng. 40 (2014), no. 9,911–940.

78. Y-W Kwon and E. Tilevich, Reducing the energy consumption of mobile

applications behind the scenes, Proceedings of the 2013 IEEE Interna-tional Conference on Software Maintenance, ICSM ’13, IEEE Computer

Society, Washington, DC, USA, 2013, pp. 170–179.

79. D. Li and W. G. J. Halfond, Optimizing energy of HTTP requests in

android applications, Proceedings of the 3rd International Workshop onSoftware Development Lifecycle for Mobile, DeMobile 2015, ACM, New

York, NY, USA, 2015, pp. 25–28.

80. D. Li et al., Automated energy optimization of HTTP requests for

mobile applications, Proceedings of the 38th International Conferenceon Software Engineering, ICSE ’16, ACM, New York, NY, USA, 2016,

pp. 249–260.

81. S. Romansky and A. Hindle, On improving green mining for

energy-aware software analysis, Proceedings of 24th Annual Inter-national Conference on Computer Science and Software Engineering,

CASCON ’14, IBM Corp., Riverton, NJ, USA, 2014, pp. 234–245.

82. A. Hindle et al., GreenMiner: A hardware based mining software repos-itories software energy consumption framework, Proceedings of the11th Working Conference on Mining SoftwareRepositories, MSR 2014,ACM, New York, NY, USA, 2014, pp. 12–21.

How to cite this article: Wan M, Jin Y, Li D, Gui J,

Mahajan S, Halfond WGJ. Detecting display energy hotspots

in Android apps. Softw Test Verif Reliab. 2017;27:e1635.

https://doi.org/10.1002/stvr.1635