Smart prevention of unintended · 2020. 5. 18. · Everything Android application draw is rendered...

저 시-비 리-동 조건 경허락 2.0 한민

는 아래 조건 르는 경 에 한하여 게

l 저 물 복제, 포, 전송, 전시, 공연 송할 수 습니다.

l 차적 저 물 성할 수 습니다.

다 과 같 조건 라야 합니다:

l 하는, 저 물 나 포 경 , 저 물에 적 허락조건 확하게 나타내어야 합니다.

l 저 터 허가를 러한 조건들 적 지 않습니다.

저 에 른 리는 내 에 하여 향 지 않습니다.

것 허락규약(Legal Code) 해하 쉽게 약한 것 니다.

Disclaimer

저 시. 하는 원저 를 시하여야 합니다.

비 리. 하는 저 물 리 적 할 수 없습니다.

동 조건 경허락. 하가 저 물 개 , 형 또는 가공했 경에는, 저 물과 동 한 허락조건하에서만 포할 수 습니다.

http://creativecommons.org/licenses/by-nc-sa/2.0/kr/legalcode

http://creativecommons.org/licenses/by-nc-sa/2.0/kr/

공학석사학위논문

Smart prevention of unintended

latency on mobile devices : a machine

learning based approach

모바일게임성능개선을위한머신러닝

기반의스케쥴링방법

2019년 12월

서울대학교대학원

컴퓨터공학부

강동완




지도교수이창건

이논문을공학석사학위논문으로제출함

2019년 12월

서울대학교대학원

컴퓨터공학부

강동완

강동완의석사학위논문을인준함

2019년 12월

위 원 장 하순회 (인)

부위원장 이창건 (인)

위 원 이재진 (인)

2

Abstract




Dongwan Kang

Department of Computer Science and Engineering

The Graduate School

Seoul National University

Over the past few decades, we have seen tremendous improvements in comput-

ing power in mobile devices. Multi-tasking for a variety of applications, as well as

high resolution and colorful graphic games, is no longer exclusive to desktop com-

puters. Nevertheless, battery technology has not shown a significant improvement

over computing power and still, batteries remain a challenge for developers to keep

working on. Big.Little Architecture and many technologies have been developed to

solve the battery problem. The Linux kernel’s EAS is a powerful solution that many

smartphone manufacturers are already applying to their products for energy efficient

systems. EAS achieves both performance and efficiency by allocating just the right

amount of CPU resources to execute each task and freeing up extra CPU resources.

However, the traditional approach to calculating the computing resources needed to

execute each task is simply statistical and does not take all the relevant factors into

account. In this way, stable game performance cannot always be guaranteed.

In this paper, we will discuss the performance degradation issues caused by the

i

EAS’s scheduling method and demonstrate how our machine learning model can

improve performance in mobile games.

keywords : Machine Learning, Prediction, Performance, Smart Phone, Scheduling

Student Number : 2018-25193

ii

Contents

1 Introduction 1

2 Background and Motivation 2

2.1 EAS Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 Android Graphic Overview . . . . . . . . . . . . . . . . . . . . . . 4

2.3 Motivation of Our approach . . . . . . . . . . . . . . . . . . . . . . 6

3 Predictive Model and Execution 7

3.1 Overview of Our Approach . . . . . . . . . . . . . . . . . . . . . . 7

3.2 Training the Predictor . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.3 Input features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.4 Runtime Deployment . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.5 Model Overhead and Control Time . . . . . . . . . . . . . . . . . . 12

4 Experiment Setup 13

4.1 Experiment Platform . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.2 Evaluation Methodology . . . . . . . . . . . . . . . . . . . . . . . 14

5 Experiment Evaluation 16

5.1 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

6 Related Work 21

6.1 Energy-efficient computing in HMP . . . . . . . . . . . . . . . . . 21

iii

6.2 Energy reduction via DVFS technique . . . . . . . . . . . . . . . . 22

6.3 Class Imbalance Learning . . . . . . . . . . . . . . . . . . . . . . . 22

6.4 LSTM networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

7 Conclusion 24

References 25

iv

List of Figures

1 EAS building blocks in relation to Linux scheduler . . . . . . . . . 3

2 BufferQueue communication process . . . . . . . . . . . . . . . . . 5

3 An example of Frame Jitter and Frame Drop . . . . . . . . . . . . . 7

4 Our approach for predicting and scheduling . . . . . . . . . . . . . 7

5 A training process of our prediction model . . . . . . . . . . . . . . 8

6 A working process of our predictor on device . . . . . . . . . . . . 11

7 An Execution Cycle of Scheduler and Our Model . . . . . . . . . . 12

8 Evaluation for (a)Drop rate, (b)Jitter rate, (c)energy consumption,

and (d)ESP against EAS scheduler . . . . . . . . . . . . . . . . . . 17

9 Evaluation for (a)Prediction accuracy, (b)Drop rate and (c)Jitter rate

against Oracle scheduler and EAS . . . . . . . . . . . . . . . . . . 17

10 Confusion matrix in classification problems and Our accuracy metric 20

11 Prediction Accuracy for (a)Jitter and (b)Drop under default threshold,

(c)Jitter and (d)Drop under threshold - 0.2 . . . . . . . . . . . . . . 20

12 Correlation Matrix plot of input features . . . . . . . . . . . . . . . 21

13 Internal structure of LSTM network . . . . . . . . . . . . . . . . . 23

v

1 Introduction

Mobile game is an activity that billions of mobile users perform everyday. Battery

life is a major concern for users of the mobile game. Many game users have experi-

enced that their phone shut down due to low battery at the most uncomfortable times.

Therefore, the need for better power efficiency of mobile devices, which implies

lower power consumption and higher performance is growing. Recently, big.LITTLE

architecture has been introduced to fulfill the demand. A few solutions have been de-

veloped to exploit the big.LITTLE architecture. The Energy-aware scheduling(EAS)

in the Linux kernel is one of the most powerful solutions for energy-efficient process-

ing. It finds the best appropriate CPU core for a current task to maximize performance

and achieves better energy efficiency through a tight integration with the big.LITTLE

architecture and the frequency-scaling subsystem. For example, the EAS system finds

the lowest cpu frequency subject to the workload constraints of tasks to be scheduled.

However, this operation of EAS just considers the amount of tasks’ workload with-

out knowledge of UI-task’s real-time constraints like frame time(time deadline for

drawing) and causes occasional frame misses of some graphic tasks. Thus, despite

having the latest and high-performance smart-phone, many mobile users often have

bad experiences while playing games. This paper describes a machine learning based

approach as an enhancement to EAS. Our machine learning-based model determines

whether current scheduling can perform game UI tasks while complying with real-

time constraints. Our predictor determines it feeding multiple input features of the

kernel scheduler such as workload and cpu capacity into ML model. In other words,

our model predicts if an unintended frame drop or a latency of frame occurs over next

1

window (time interval). If we can predict it in advance, we can also prevent it. Our

technique is implemented as an extension for Andoid’s linux Kernel. For prediction,

we developed a supervised machine learning model based on LSTM, and train the

model using 100 hottest mobile games from Google Playstore.

Our approach achieves about 80% of the maximum UI smoothness that an ideal

predictive model can provide. Compared to the Linux EAS scheduler, it only adds an

average 1.6% more energy consumption, resulting in an average 50% improvement

in UI smoothness against the native Android EAS.

2 Background and Motivation

2.1 EAS Overview

The Completely Fair Scheduler (CFS) scheduler implements a throughput-oriented

task placement policy. EAS adds an energy-based policy to this scheduler. It opti-

mizes energy savings while intelligently managing your CPU’s spare capacity and

task placement. A major part of EAS is the algorithm responsible for task place-

ment decisions and cpu frequency decisions. This algorithm selects the one with the

highest energy efficiency metric among various scheduling options. By the way, cal-

culating the metric is not as simple as it is not universal to every device, because

it requires calculating the amount of current consumption of the various schedul-

ing policies that the scheduler can choose to give an optimal performance. For this

calculation, EAS uses a HW-dependent table called an Energy Model. The Energy

Model(EM) is composed of a power cost table per performance domain, which is a

group of CPUs whose performance is scaled together. So, EAS is closely integrated

2

with Energy Model(EM) provided by each platform.

Scheduler is best positioned to estimate the input features of our model like the

load profile of a task, since it controls where to place various tasks and requests

the amount of computing resource needed on a per-task basis. So, the main part of

gathering input data for the machine learning model in our approach is based on the

kernel scheduler. And our model schedules using the associated subsystem, such as

SchedUtil and SchedTune, as EAS Scheduler does.

Figure 1: EAS building blocks in relation to Linux scheduler

WALT

The load tracking mechanism was introduced to estimate the nature of task nu-

merically. This mechanism calculates task load as the amount of time that the task

was runnable during the time that it was alive. Assist Load Tracking(WALT) is one

of the load tracking mechanisms. WALT uses periodic calculations that are synchro-

3

nized across all of the run queues, attempting to track the behaviour of all scheduling

such as task demand and CPU utilization.

Schedutil

Schedutil is a scheduler driven cpu frequency governor found in Linux Kernel.

This governor does load based Dynamic voltage and frequency scaling (DVFS). It

brings that the scheduler can choose the frequency at which the CPU should run in

the near future. This promotes more accurate frequency selection and therefore better

servicing of the current load and utilization.

SchedTune

SchedTune enables special case compute reservation for groups of tasks using

cgroups while also considering the energy impact. SchedTune is aimed for deploy-

ment in run-times with high visibility of compute requirements for tasks by way of a

priority task classification.

2.2 Android Graphic Overview

Everything Android application draw is rendered onto a “surface”. The surface

represents the producer side of a buffer queue that is often consumed by Surface-

Flinger. All of the visible surfaces rendered are composited onto the display by Sur-

faceFlinger. In android graphic system, each application is an image stream producer

that produces graphic buffers for consumption. The most common consumer of im-

age streams is SurfaceFlinger, the system service that consumes the currently visible

surfaces and composites them onto the display. BufferQueues provide the glue be-

4

Figure 2: BufferQueue communication process

tween the producer and the consumer. These are a pair of queues that mediate the

constant cycle of buffers from the producer to the consumer as you can see in Figure

2. Once the producers hand off their buffers, SurfaceFlinger is responsible for com-

positing everything onto the display. In Android graphics, double buffering technique

is used for drawing graphics that shows no (or less) stutter, tearing.

VYSNC synchronizes the time apps wake up to start rendering, the time Surface-

Flinger wakes up to composite the screen, and the display refresh cycle. If the pro-

ducer is too fast and creates buffers faster than they are being drained, it will block

and wait for free buffers and the consumer has no choice but to compose the screen

with old frames. If the producer is doesn’t create buffers in VSYNC interval or slower

than they are being drained, the consumer cannot compose anything onto the display

for a while. And the frequency at which SurfaceFlinger updates frames is labeled

Frame Per Second (FPS), which is typically used as a metric to compare graphics

performance (i.e., higher FPS is better graphics performance). Generally the refresh

rate is fixed at 60 Hz or 30 Hz. If the app has 60 Hz refresh rate, the maximum FPS is

60 and a new frame should be drawn every 16.67 ms. In the case of intensive graphics

5

workload, the frame drawing time can often exceed 16.67ms and FPS drops below

60.

We define “frame drop” as what SurfaceFlinger doesn’t update anything on display

and define “jitter” as what an old image is drawn on the screen. We count a jitter

when a UI thread didn’t finish to queue a SurfaceView in VSYNC interval. And we

count a drop when the SurfaceFlinger doesn’t dequeue any SurfaceView.

2.3 Motivation of Our approach

The EAS scheduler is designed to minimize energy consumption compared to

other schedulers. Such biased scheduling can sometimes not be guaranteed to satisfy

all the specified deadline constraints like frame time. As shown at Figure 3, there can

be occasional deadline misses of a UI Thread. Those misses mean that screen doesn’t

display smoothly. Although EAS considers the workload of each task in advance, it

schedules all UI tasks so that they can be processed quickly, intentionally biasing

towards energy efficient operation causes occasional frame misses of some graphic

tasks. Because deadline misses in these UI threads are very rarely and too much

current consumption is expected to resolve them, EAS seems to have adopted a way to

further reduce current consumption. So we considered how to solve this problem with

minimal increase in current consumption. We can minimize the energy consumption

while meeting performance constraints if we can predict the frame misses caused by

current energy efficient operation.

6

Figure 3: An example of Frame Jitter and Frame Drop

3 Predictive Model and Execution

3.1 Overview of Our Approach

Figure 4: Our approach for predicting and scheduling

Figure 4 depicts our approach for scheduling in respect to tasks’ time constraints.

While a game application is running, our prediction model begins from extracting in-

formation from scheduler(previous scheduling behavior). This information includes

Task Demand, CPU utilization, CPU capacity(from frequency value), GPU usage,

etc. Then, a machine learning based predictor (that is built in off-line) takes in these

7

feature values and predict whether current UI task can draw screen in a frame time.

Finally, the Kernel Scheduler schedules the UI-task with extra information provided

from our predictor. Building and using the prediction model follow the well-known

3-step process for supervised machine learning: (i) generate training data (ii) train a

predictive model (iii) use the predictor, described as follows.

3.2 Training the Predictor

Figure 5: A training process of our prediction model

Figure 5 depicts the process to build a LSTM classifier for a prediction accuracy

metrics.

Generating Data for training

We ran 100 games and trained our Machine Learning model with the data ex-

tracted from them. The games are downloaded from Google Playstore. Before start-

ing the train we have pre-played some games, because we need to extract data from

a variety of gaming situations. To simplify data collection, we created a daemon pro-

gram to store the data. While running a game, the daemon record the data including

some parameters(information used for scheduling) from a kernel scheduler and the

8

occurrence of frame drop or jitter(result after scheduling) from an android frame-

work. It collects data(47 parameters and 1 result variable) every 20 ms(scheduling

interval) and record it every 100 ms.

Building the Model

The feature values together with the occurrence of frame drop or jitter are feed

to a supervised learning algorithm. The learning algorithm tries to find a correla-

tion from the feature values to the occurrence of frame drop or jitter and generates a

classification model. The model’s net contains 4 layers with weights; Normalization

layer, LSTM layer and two fully-connected layers. We tried so many kinds of net-

work to set a competent network architecture(i.e., the number and type of neuronal

layers and number of neurons comprising each layer). Using too few neurons in the

hidden layers will result in underfitting. Using too many neurons in the hidden lay-

ers can result in overfitting. And inordinately large number of neurons in the hidden

layers can increase the time it takes to train the network. We compromised between

too many and too few layers or neurons in hidden layers by comparing with an accu-

racy metric and an inferencing time. We defined this prediction problem as a binary

classification. So, a sigmoid activation was used in the last(output) layer and binary

cross-entropy(BCE) was used as the loss function.

One of the other key aspects in building a successful predictor is finding the right

features to characterize the input data which would be feed during training. This is

described in the next section.

9

3.3 Input features

Our predictor is trained based on a number of features extracted from the kernel

scheduling result. Initially, We collected from 47 raw features from kernel and 4 raw

features from Android framework. Table I lists the raw features. These are chosen

based on our intuitions of what factors can affect scheduling which cause drawing

latency or not. We considered as many features as possible since we will select im-

portant features among them later.

from Kernel

total tasks demand, big task demand, cpu id of big task, cpu utilization for each cpus,

cpu isolation, GPU usage, cpu frequencies for each cpu cluster, max frequencies for

each cluster, min frequencies for each cluster, touch event, ...

from Framework current FPS, stability of FPS, time to draw, number of buffers in SurfaceView

Table 1: Raw features from Scheduling info and frame drawing

Feature Selection

We feed the data of 5-elements sequence to our model(exactly for LSTM Layer).

It means that the number of inputs becomes 5 x 47, if we use all of the raw input

features. We should minimize an inferencing time, because the inferencing result

must be passed to the scheduler in next scheduling time. So, our model should be

a form with as few input features as possible. We constructed a correlation coeffi-

cient matrix to quantify the correlation among features to select important features

and remove similar features. And we derived new features with some raw features

using intuitive ideas. For example, for some features, whether it is changed is more

important than the amount of change. After comparing the prediction accuracy with

each input feature set, our feature selection process results in about 11 input features

10

and 2 output features listed in Table II. As a result, we feed 5 x 11 input data into our

model.

Inputtot demand, tsk demand, capacity, util, touch, gpu load,

isol change, cpu migration, prev drop, prev jitter, current FPS

Output frame drop, frame jitter

Table 2: Selected and Processed features of input/output data

3.4 Runtime Deployment

Our predictor predicts whether an unintended frame drop or a latency of frame

happens over next window (time interval) with the ML model described above. And

it communicates to EAS system continuously while running as an Android Service.

Figure 6: A working process of our predictor on device

Figure 6 illustrate the process of runtime prediction and performance control on

device with EAS scheduler & Frequency governor. Our model work in parallel with

EAS. Our Predictor send a request for our scheduling control unit if a frame drop or

frame jitter is expected in a following frame interval. The control unit uses SchedUtil

and SchedTune to make the UI task run with more CPU resources.

11

3.5 Model Overhead and Control Time

The Linux EAS scheduler run for task scheduling every 20ms. And its lead time

takes almost no time for scheduling. But the lead time of our predictor takes several

ms(4ms on 1.7GHz cpu clock). If our model also runs with the same execution cy-

cle(every 20ms) as the scheduler, the overhead of our model can have a significant

impact on overall performance. And there are more serious problems than just system

overhead. The point of time that reflects the result predicted with the data extracted

from the previous window is the past, not the time point we want. In other words,

even if the drop or jitter is accurately predicted, it cannot be solved. So, we increased

the execution period of our model from 20ms to 100ms. Instead, we made the effect

of task migration and frequency scaling last for 100ms. As you can see Figure 7,

we changed our model to run once when the scheduler run five times. Instead, the

control effects of our model last for five time intervals of the scheduler. Nevertheless,

the frame latency occurring during the first time interval of the scheduler cannot be

solved even if it was predicted. We call it Structural Hazard. And the additional load

due to our model is reduced to one fifth, which is very small compared to the entire

load of the game tasks.

Figure 7: An Execution Cycle of Scheduler and Our Model

12

4 Experiment Setup

4.1 Experiment Platform

Hardware Platform

Our hardware evaluation platform is the latest Android mobile device(LG G8).

Its application processor is Qualcomm Snapdragon 855(SM8150) multi-processor

which includes three separate clusters, with only one high-performance core run-

ning at 2.84 GHz(Prime processor), three more performance cores running at 2.42

GHz(Gold processor), plus four efficient cores running at 1.78 GHz(Silver proces-

sor). The device has 6GB LPDDR4X SDRAM and 128GB UFS. And it runs the An-

droid 9(Pie) OS. Our approach is expected to be more effective on low-performance

multiprocessors, because game performance problems occur more often on it. But

we targeted modern and high-performance smartphones to minimize the impact of

model overhead on the system.

Machine Learning platform

We made and trained a prediction model using TensorFlow, which is an end-to-

end open source platform for machine learning. The biggest benefit of TensorFlow is

abstraction for machine learning development. Instead of dealing with the complex

details of implementing algorithms, or figuring out proper ways to train models and

serve predictions, the developer can focus on the overall logic of the application.

Another reason we use Tensorflow is because it provides a framework for deploying

models on Android mobile platform. TensorFlow Lite is TensorFlow’s lightweight

13

solution for mobile and embedded devices. It lets us run machine-learned models on

mobile devices with low latency. We train a model on a higher powered machine,

and then convert that model to the .TFLITE format which it is loaded into a mobile

interpreter.

Games

We acquired training data on playing 100 hottest games downloaded from Google

PlayStore. These games cover a wide variety of game genres, including Action, Ad-

venture, Role-playing, MMORPG, Simulation, Puzzle, Shooting etc. Our model has

a greater impact on heavy games that require a lot of computational resources for

graphics processing such as MMORPG, Action than light games like Simulation or

Puzzle. However, because the dependence on the game genre makes implementation

and deployment of our model very complex, we acquired data from as many genres

of games as possible.

4.2 Evaluation Methodology

Training set and Test set

Each 100 games were run for 20 minutes to acquire the data for learning. The

total size of observation is 1967091 except for abnormal observations and the shape

of data is (1967091, 47) as each data has 47 features. The ratio of drop and jitter in the

total observations is 0.044 and 0.117 respectively. We split our data into two subset :

training data and test data. We fit our model on the training data and make predictions

on the test data. And leave-one-out cross-validation is used to avoid overfitting and

underfitting. The split ratio is 80/20. 80% of the data is set as training set(size =

14

1573672), and the rest of the data is set as test set(size = 393419). As mentioned

above, we reduced 47 features of the raw data into 11 features through data refinery.

Since we need to enter a sequence of features into the LSTM network, we use a

bunch of five observation vectors for each input row. So the shape of input data set is

(1573668 x 5 x 8).

Comparisons

We compare our approach to the Energy Aware Scheduling(EAS) which is an

enhancement to Linux power management in Android OS. It launched into the main-

stream of Android platform with the Qualcomm Snapdragon 855 - our evaluation

Hardware platform. We consider EAS as a state-of-the-art scheduling mechanism

and use it as a comparison for our model’s performance evaluation. In addition, we

compare our approach to Oracle model, which is a hypothetical model that knows

future events. Comparing with Oracle shows the limitations of our model.

Evaluation Metrics

As mentioned in Section 3, we target three evaluation metrics, Graphic Smooth-

ness, Energy Consumption and ESP(Energy Smoothness product). To measure Graphic

Smoothness, we count the frame drop or frame jitter every 100 ms and use a fre-

quency of occurrence. And we use mobile communication dc source (Agilent 66321D)

for battery current drain measure. We run each game for 10 minutes and measure an

average current draw. ESP is the product of these two measurements. A lower ESP

value usually implies well balanced result between graphic performance and energy

consumption. Normally, models with lower ESPs perform better, but it is difficult

15

to use this value as a basis for selecting a model. This is because the comparison

of performance and current consumption can be judged differently depending on the

situation.

5 Experiment Evaluation

This session shows the results of the experiment described in the previous session.

We compare our approach against EAS scheduler. Then, we evaluate our approach

against an ideal predictor which has 100% prediction accuracy.

5.1 Evaluation Results

Comparison with EAS

We first evaluate our approach compared to the EAS. Figure 8 shows the eval-

uation results for three metrics. For an average graphic smoothness, our approach

achieves 56.8% and 47.9% reduction of average for Drop Frame rate and Jitter Frame

rate against EAS scheduling, respectively. The implications of these figures and the

limitations of improvement will be discussed later. And for an average energy con-

sumption, our approach is just 1.6% worse than EAS (EAS: 404mA, Ours: 411mA).

This is a meaningful trade-off, because it allows us to significantly reduce graphics

latency with minimal increase in energy consumption.

Comparison with Oracle

In Figure 9, we compare our system with an ideal predictor that can always accu-

rately predict the occurrence of Drop and Jitter. We call it Oracle. Figure 8(a) shows

16

Figure 8: Evaluation for (a)Drop rate, (b)Jitter rate, (c)energy consumption, and

(d)ESP against EAS scheduler

Figure 9: Evaluation for (a)Prediction accuracy, (b)Drop rate and (c)Jitter rate against

Oracle scheduler and EAS

the average prediction accuracy of our model for test set (393419 observations). And

Figure 8(b) and Figure 8(c) shows the average Frame Drop rate and average Frame

Jitter rate, compared to Oracle respectively. Actually, it is impossible to implement

Oracle predictor and get experimental results from it. We can estimate the same ex-

perimental result by migrating all the UI task to the Big CPU cluster and running all

CPUs at the highest cpu clock.

17

5.2 Analysis

Improvement Margin

Even though our predictor has more than 90% prediction accuracy, we can only

improve the graphic smoothness by half. The reason for this can be seen in com-

parison with Oracle (Figure 9). There are many frame drops or frame jitters that are

not improved despite using all available system resources. We assume that they are

intended latencies due to some game activity changes or abnormal operations of sys-

tem. Our method for detecting graphic latency cannot distinguish intended latencies

and unintended latencies. It just monitors the number of SufaceView buffers and the

composition from SurfaceFlinger at every VSYNC.

Class Imbalance data

The output classes of our data set are not balanced for distribution. The defect

case(Frame Drop or Frame Jitter) is much less than the non-defect case. The col-

lected train or test data also contain much more non-defective outputs (majority) than

defective ones (minority). Frame Jitter is less than 10% of observations and Frame

Drop is less than 4%. The imbalanced nature of data makes our classifier difficult

to be fit for minority class. So, we made our classifier aware of the imbalanced data

by incorporating the weights of the classes into the cost function(objective function).

That means to give higher weight to minority class and low weight to majority class.

The ratio of each class is used as the weight value.

18

Decision of Score Threshold

Our approach simplified the problem into a binary classification. The actual out-

put of our predictor is a prediction score(or probability). It indicates the predictor’s

certainty that given observation belongs to the positive class, which is a Frame Jit-

ter or a Frame Drop. In order to map the prediction score to a binary category, we

must define a classification threshold (also called the decision threshold). A score

over that threshold indicates the positive class. A score below indicates the negative

class. Normally, we assume that the classification threshold should always be 0.5, but

thresholds are problem-dependent, and are therefore values that you must tune. Fig-

ure 11 shows our experimental results on the prediction accuracy of Total, Normal,

Jitter and Drop under the 0.5 classification threshold. You can see about each accu-

racy metric from Figure 10. As you can see in Figure 11 (a),(b) our prediction model

hardly predicts the positive class - drop or jitter compared to negative class - normal.

It is necessary to tune the classification threshold in order to have higher prediction

accuracy for positive class we targeted. Maybe, it generates more misclassifications

for the majority(negative) class while generating more correct classification for mi-

nority(positive) class. Because actually the additional current consumed by not too

much misclassification is very little compared to the improved graphic smoothness,

we lowered a score threshold as 0.2.

Feature selection

As mentioned in Section 4, we constructed a correlation coefficient matrix to

quantify the correlation among features. Figure 12 is a correlation matrix made in the

process of determining the final input features. We visualize the correlation of input

19

Figure 10: Confusion matrix in classification problems and Our accuracy metric

Figure 11: Prediction Accuracy for (a)Jitter and (b)Drop under default threshold,

(c)Jitter and (d)Drop under threshold - 0.2

features as a heatmap. In the heatmap, the stronger color means larger correlation

magnitude, the blurry color means smaller one. E.g. we can check that little cpu num

and big cpu num have a strong negative relationship. We removed a features giving

redundant information from final input feature set. And we validate the result with

leave-one-out-cross-validation for input features.

20

Figure 12: Correlation Matrix plot of input features

6 Related Work

6.1 Energy-efficient computing in HMP

Much research has focused on an effective solution to energy-efficient computing

with Heterogeneous multi-cores. By running multi-threaded applications on hetero-

geneous system, each thread can run on a core that matches its resource needs more

closely than one-size-fits-all solution. However, the effectiveness of heterogeneous

system significantly depends on the scheduling policy and how efficiently we can al-

locate applications to the most appropriate processing core [10], [5]. The main chal-

lenge for scheduling is to effectively tune system, architecture and application level

parameters when running multi-threaded applications. There has been a number of

studies for these core configurations [4], [8], [7]. Recent works have been able to

21

build models that can predict what core configurations are suitable for an application.

Machine learning approach has been used for these energy efficient prediction and

scheduling [9], [10].

6.2 Energy reduction via DVFS technique

Dynamic voltage and frequency scaling (DVFS) has been proven to be a feasible

solution to reduce processor power consumption [11], [6]. By lowering processor

clock frequency and supply voltage during some time slots, for example, idle or com-

munication phases, large reductions in power consumption can be achieved with only

modest performance losses. A DVFS-enabled cluster [1] is a compute cluster where

compute nodes can run at multiple power/performance operating points.

6.3 Class Imbalance Learning

The challenge of learning from imbalanced data is that the relatively or abso-

lutely underrepresented class cannot draw equal attention to the learning algorithm

compared to the majority class, which often leads to very specific classification rules

or missing rules for the minority class without much generalization ability for future

prediction [3]. How to better recognize data from the minority class is a major re-

search question in class imbalance learning. Its learning objective can be generally

described as “obtaining a classifier that will provide high accuracy for the minority

class without severely jeopardizing the accuracy of the majority class” [12]. In our

project, we applied the method - Class weighted/ cost sensitive learning referred from

[13]. To incorporate the weights of two classes which have imbalance distribution into

cost function, a weighted cross entropy is defined as :

22

6.4 LSTM networks

Artificial neural networks are computing systems that are inspired by biological

neural networks that constitute human brains. Each node of a network graph repre-

sents an artificial neuron. A Recurrent Neural Network (RNN) is a class of artificial

neural networks where connections between nodes from a directed graph along a tem-

poral sequence. Long Short Term Memory networks – usually just called “LSTMs”

– are a special kind of RNN, capable of learning long-term dependencies. [2] The

major feature of LSTM networks is its ability to retain and persist information for

long sequences. LSTM networks cell can process data sequentially and keep its hid-

den state with several layers as shown in Figure 13 (The diagram is taken from [2]).

As a result, LSTM networks are suited for our time series prediction as it retains

sequential information used by EAS scheduler.

Figure 13: Internal structure of LSTM network

23

7 Conclusion

This paper has described an automatic approach to optimize Linux kernel schedul-

ing on heterogeneous mobile platforms, using a machine learning based model to

provide an accurate prediction of the unintended latency for graphic drawing. Al-

though we use a machine learning model, we focused on the overall logic of the

application instead of dealing with machine learning theories. And we proved that

our approach can provide a significant performance improvement for graphic task

against the state-of-the-art scheduler(EAS). It achieves 56.8% and 47.9% reduction

of average for Drop Frame rate and Jitter Framerate against EAS scheduling with

minimal increase in energy consumption. It is about 80% of the maximum improve-

ment for UI smoothness. Our future work will improve the prediction accuracy of

unintended latency, reduce the overhead of running prediction model and optimize

the way to resolve the predicted latency. To get more accurate result of prediction,

there is a need for more refined training data that excludes false positive data(e.g.

intended latency). To reduce inference time of our model, research on quantization of

model and computational methods using DSP or GPU will be needed. And to get the

most appropriate action for latency predicted, we can consider to apply reinforcement

learning model.

24

References

[1] Chung-hsing Hsu and Wu-chun Feng. A power-aware run-time system for high-

performance computing. In SC ’05: Proceedings of the 2005 ACM/IEEE Con-

ference on Supercomputing, pages 1–1, Nov 2005.

[2] colah’s blog. Understanding lstm networks.

[3] H. He and E. A. Garcia. Learning from imbalanced data. IEEE Transactions on

Knowledge and Data Engineering, 21(9):1263–1284, Sep. 2009.

[4] H. Homayoun, V. Kontorinis, A. Shayan, T. Lin, and D. M. Tullsen. Dynami-

cally heterogeneous cores through 3d resource pooling. In IEEE International

Symposium on High-Performance Comp Architecture, pages 1–12, Feb 2012.

[5] Houman Homayoun. Heterogeneous chip multiprocessor architectures for big

data applications. In Proceedings of the ACM International Conference on

Computing Frontiers, CF ’16, pages 400–405, New York, NY, USA, 2016.

ACM.

[6] C. Hsu and W. Feng. A feasibility analysis of power awareness in commodity-

based high-performance clusters. In 2005 IEEE International Conference on

Cluster Computing, pages 1–10, Sep. 2005.

[7] Engin Ipek, Meyrem Kirman, Nevin Kirman, and Jose F. Martinez. Core fusion:

Accommodating software diversity in chip multiprocessors. In Proceedings of

the 34th Annual International Symposium on Computer Architecture, ISCA ’07,

pages 186–197, New York, NY, USA, 2007. ACM.

25

[8] A. Lukefahr, S. Padmanabha, R. Das, F. M. Sleiman, R. Dreslinski, T. F.

Wenisch, and S. Mahlke. Composite cores: Pushing heterogeneity into a core.

In 2012 45th Annual IEEE/ACM International Symposium on Microarchitec-

ture, pages 317–328, Dec 2012.

[9] H. Sayadi, N. Patel, A. Sasan, and H. Homayoun. Machine learning-based

approaches for energy-efficiency prediction and scheduling in composite cores

architectures. In 2017 IEEE International Conference on Computer Design

(ICCD), pages 129–136, Nov 2017.

[10] Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel

Emer. Scheduling heterogeneous multi-cores through performance impact esti-

mation (pie). SIGARCH Comput. Archit. News, 40(3):213–224, June 2012.

[11] G. von Laszewski, L. Wang, A. J. Younge, and X. He. Power-aware schedul-

ing of virtual machines in dvfs-enabled clusters. In 2009 IEEE International

Conference on Cluster Computing and Workshops, pages 1–10, Aug 2009.

[12] Gary M. Weiss. Mining with rarity: A unifying framework. SIGKDD Explor.

Newsl., 6(1):7–19, June 2004.

[13] Zichen Wang’s blog. Practical tips for class imbalance in binary classification.

26

요약(국문초록)

지난 수십년간 모바일 기기는 연산능력 면에서는 엄청난 성능 향상을

이루었다.고해상도의화려한그래픽게임뿐만아니라다양한어플리케이

션의 멀티테스킹은 더이상 데스크탑 컴퓨터들의 전유물이 아니게 되었다.

그렇지만컴퓨팅파워의향상에비해배터리기술은충분한향상을보여주

지 못하였고 아직도 배터리 문제는 개발자들이 계속 풀어 나가야 할 숙제

로 남아 있다. 배터리 문제의 해결책으로 빅-리틀 아키텍쳐와 이를 이용한

많은 기술들이 개발되었는데, 그 중 리눅스 커널의 EAS는 에너지 효율적

인 시스템 구현을 위한 강력한 해결책으로 대부분의 스마트폰 제조사들이

적용하고 있는 기술이다. EAS는 각각의 작업을 실행하기 위해 딱 알맞은

수준의컴퓨팅자원을각각의작업에할당하고여분의컴퓨팅자원이낭비

되지 않게 함으로 성능과 효율성을 함께 달성한다. 하지만 현재 EAS가 사

용하고 있는 필요한 컴퓨팅 자원의 양을 예측하는 방식의 한계로 인하여,

EAS는항상원활한게임성능을보장할수없다.

본논문에서우리는 EAS스케쥴링방식으로인하여발생하는성능저하

문제점을 논의하고, 감독학습으로 학습된 머신러닝 모델을 통해서 모바일

게임에서성능이어떻게개선될수있는지를보일것이다.

주요어 :스마트폰,인공지능,성능예측,성능개선,스케쥴링

학번 : 2018-25193

27

Smart prevention of unintended · 2020. 5. 18. · Everything Android application draw is rendered...

Documents

Transcript of Smart prevention of unintended · 2020. 5. 18. · Everything Android application draw is rendered...