Icse2013 malik

Automatic Detection of Performance Deviations in the Load

Testing of Large Scale Systems

Haroon MalikSoftware Analysis and Intelligence Lab (SAIL)

Queen’s University, Kingston, Canada

Hadi HemmatiSoftware Architecture Group

(SWAG)University of Waterloo,

Waterloo, Canada

Ahmed E. HassanSoftware Analysis and Intelligence Lab (SAIL)

Queen’s University, Kingston, Canada

Performance TeamCanada

Large scale systems need to satisfy performance constraints

LOAD TESTING

4

Environment setup Load test execution Load test analysis Report generation

Current Practice

1 2 3 4

5

Environment Setup Load test execution Load test analysis Report generation

1 2 3 4

Current Practice

6


1 2 3 4

Current Practice

7

Load Test Execution

MONITORING TOOL

LOAD GENERATOR- 1 SYSTEM PERFORMANCE REPOSITORY

LOAD GENERATOR- 2

Ahmed E. Hassan

This is the last slide where I fixed your slide headings.. You need to use the format painter to fix things

8


1 2 3 4

Current Practice

9


1 2 3 4

Current Practice

10


1 2 3 4

Current Practice

11

Challenges with Load Test Analysis

Limited Knowledge

Limite

d TimeLarge Number of Counters

1 2 3

12


Limited Knowledge

Limite


1 2 3

13


Limited Knowledge

Limite


1 2 3

14


Limited Knowledge

Limite


1 2 3

We Propose 4 Deviation detection Approaches

15

3 Unsupervised1 Supervised

Hadi Hemmati

The picture does not connect ot the slide I removet ityou are overacting in using pictures

Using Performance Counters to Construct Performance Signature

16

%CPU Idle

%CPU Busy

Byte Commits Disk

writes/sec

% Cache Faults/ Sec Bytes received

17

Performance Counters Are Highly Correlated

CPU DISK (IOPS)

NETWORK

MEMORY

TRANSACTIONS/SEC

18

High Level Overview of our Approach

Data Preparation

Signature Generation

Deviation Detection

Baseline Test

New Test

SanitizationStandardization

Performance Report

Input Load Test

haroon

1. I am not sure, if i keep this slide along with the previous two, or just this one.2. Also, I understand, that you may not consider NEW TEST as part of signature generation, .... despite the new test needs to go through all the phases that are essential for signature generation (since we need to compare the weights of the same set of counters as present in the Baseline) . To audience signature generation of both the baseline and NEW TEST and then their comparison will get digested easily.However, I can remove the NEW TEST from the signature generation, and make it go separately through PCA and Mapping process.In case you feel, its ok to keep it this way thenFor the diagram of this slide, i can say with exception to PCA all our approaches use only baseline test to generate performance signature.3. Since data preparation step is being used by all our approaches, i have not recursively used it in the next slide. I will verbally clear it to audience

Hadi Hemmati

1- i removed old slides 18-23just come to here after saying there exists some correlation2- keep the new test since it is part of deviation detection inputs3- it is fine. i just removed the extra text from the dat prep box. it was repretitive

Hadi Hemmati

New test should go to data prep since we compare signatures generate dfrom the clean data and not the full raw datai changed it

19

Prepared Load Test Extracting

CentroidsSignature

Data ReductionClustering

Prepared Load Test Random

SamplingSignature

Unsupervised Signature Generation

Random SamplingApproach

Clustering Approach

SignaturePrepared Load Test

Dimension Reduction

(PCA)

Identifying Top k Performance Counters

Mapping Ranking

Analyst tunes weight parameter

PCA Approach

Ahmed E. Hassan

How come? The PCA approach has the NEW TEST in there.. while the other approach do not have it?Also the Baseline Signature box should be OUT of the green area!Since that is the output of the process!--I want all three if not even all FOUR! in the same slide - showing pu together so people can see the differences!

Ahmed E. Hassan

Also ideally I want a figure the saysBaseline TEst -> Data Preperation -> Derivation -> Detection ->ResultsThen you can avoid showing the the Baseline and Data prepeartion parts of your figures in this slide which makes the slide too busy for no good reason

Hadi Hemmati

I deleted the new test as inputsignature generation does not need two inputs you give one test data(baseline or the new one) and get the signature as outputalso since it is the output of data prep I called the input as prepared Load test (whcih can be a prepared baseline or a prepared new test)

20

Supervised Signature Generation

Identifying Top k Performance

Countersi. Count

ii. % Frequency

Attribute Selection

OneR

Genetic Search

………

SPC1

SPC2

SPC10

.

.

.Par

titi

onin

g th

e D

ata

Prepared Load Test

Labeling (only for baseline)

WRAPPERApproach Signature

Ahmed E. Hassan

Labelling needs to be its OWN box in the figure.It is not a data prepeartion step.. It is part and ESSENTIAL difference having it in there makes is hard for people to spot it.Again Signature needs to be out of it

21

Supervised Signature Generation

Identifying Top k Performance

Countersi. Count

ii. % Frequency

Attribute Selection

OneR

Genetic Search

………

SPC1

SPC2

SPC10

.

.

.Par

titi

onin

g th

e D

ata

Prepared Load Test

Labeling (only for baseline)

WRAPPERApproach Signature

Ahmed E. Hassan

Labelling needs to be its OWN box in the figure.It is not a data prepeartion step.. It is part and ESSENTIAL difference having it in there makes is hard for people to spot it.Again Signature needs to be out of it

22

Two Deviation Detection Approaches

Using Control Chart Using Approach-Specific Techniques

For Clustering and Random Sampling Approaches

For PCA and WRAPPERApproaches

23

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16

Baseline Load TestNew Load Test

Time (min)

Perf

orm

ance

Cou

nter

Val

ue

24

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16


Time (min)

Perf

orm

ance

Cou

nter

Val

ue

25

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16


Time (min)

Perf

orm

ance

Cou

nter

Val

ue

26

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16


Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Control Chars use control limits (CL) to represent the limits of variation that should be expected

from a process.

27

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16


Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Baseline CL

Central Limit (CL): the median of all values of the performance counters in a baseline load test

28

Control Chart

1 2 3 4 5 6 7 8 9 10 110

2

4

6

8

10

12

14

16


Time (min)

Perf

orm

ance

Cou

nter

Val

ue

Baseline LCL, UCL

The Upper/Lower Control Limits (U/LCL) is the upper/lower limit of the range of a counter under the normal behavior of the system

Baseline CL

29

Deviation Detection

Clustering and Random Sampling

PCA Approach Performance Report

Comparing PCA Counter

Weights

Baseline Signature

New Test Signature

Training Data

Evaluation Data

WRAPPER Approach

Baseline Signature

New Test Signature

Control ChartPerformance

Report

Performance Report

Logistic Regression

Baseline Signature

New Test Signature

Ahmed E. Hassan

Need a slide that says we have two ways to do this1. Control chart based detection2. Approach specific detection

Hadi Hemmati

The fonts for Signature generation side and that of Deviation Detection box are kept diffrent on purpose----- to contrastit has enough of contrast (bkg color!)So I reduced stuff

30

Case StudyHow effective are our signature-based approaches in detecting performance deviations in load tests?

RQ

31

Case StudyHow effective are our signature-based approaches in detecting performance deviations in load tests?

RQ

An Ideal approach should predict a minimal and correct set of

performance deviations.

Evaluation Using:Precision, Recall and F-measure

Subjects of Study

System: Open SourceDomain: Ecommerce

Type of data:1. Data From Our Experiments with and Open Source Benchmark Application

DVD Store

32

System: Industrial SystemDomain: Telecom

Type of data:1. Load Test Repository2. Data From Our Experiments

on the Company’s Testing Platform

33

Fault Injection

Category Failure Trigger Faults

Software FailureResource Exhaustion

CPU Stress

Memory Stress

System Overload Abnormal Workload

Operator Errors Procedural ErrorsInterfering Workload

Unscheduled Replication

34

Case Study Findings

EffectivenessPrecision/Recall/F-measure

Practical Differences

35

Case Study Findings(Effectiveness)

WRAPPER PCA Clustering Random0

0.10.20.30.40.50.60.70.80.9

1

PrecisionRecallF-Measure

Random Sampling has the lowest effectiveness

On Avg. and in all experiments, PCA performs better than Clustering approach.

WRAPPER dominates the best supervised approach i.e., PCA

Ahmed E. Hassan

Convert this to a Figure..BOSS HUMANS ARE NOT ABLE TO READ SUCH TABLES.I mentioned this like every single time we talked so please fix this

36

Case Study Findings(Effectiveness)

WRAPPER PCA Clustering Random0

0.10.20.30.40.50.60.70.80.9

1

PrecisionRecallF-Measure

Overall, there is an excellent balance of high precision and recall of both the WRAPPER and PCA approaches (on average 0.95, 0.94 and 0.82, 0.84 respectively) for

deviation detection

Ahmed E. Hassan

Convert this to a Figure..BOSS HUMANS ARE NOT ABLE TO READ SUCH TABLES.I mentioned this like every single time we talked so please fix this

37

Real Time Analysis Stability Manual Overhead

Case Study Findings(Practical Differences)

38

Real Time Analysis

WRAPPER--- deviations on a per-observation basis.

PCA --- requires a certain amount of observations (wait time).

39

Stability

We refer to ‘Stability’ as the ability of an approach to remain effective while its signature size is reduced.

40

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001

Unsupervised (PCA)Supervised(Wrapper)

Signature Size

F-M

easu

re

Stability

41

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001


Signature Size

F-M

easu

re

Stability

42

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001


Signature Size

F-M

easu

re

Stability

646055504540353025201510 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001


Signature Size

F-M

easu

re

43

45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001


Signature Size

F-M

easu

re

Stability

646055504540353025201510 5 4 3 2 10.600000000000006

0.650000000000006

0.700000000000006

0.750000000000006

0.800000000000006

0.850000000000006

0.900000000000006

0.950000000000006

1.00000000000001


Signature Size

F-M

easu

re

WRAPPER approach is more stable that PCA approach

44

Manual Overhead

WRAPPER approach requires all observations of the baseline performance counter data to be labeled as Pass/Fail

45

Manual Overhead

Marking each observation is time consuming

46

Limitations of Our Approaches

Hardware Difference Sensitivity

47

Hardware Differences Large systems have many testing labs to

run multiple load tests simultaneously. Labs may have different hardware.

Our approaches may interpret same test on different labs as deviated tests

Recent work by K.C.Foo et al. propose several techniques to over come this problemMining Performance Regression Testing Repositories for Automated Performance Analysis ( QSIC 2010)

haroon

I am not sure, if (ethically) i need to cite Dereks Regression paper or his thesis in the end of the slidesThe box with DEREK work is diffrent, since its not a finding or take away messege (perhaps, i should not even make it a box)Thanks.

Hadi Hemmati

yes you need a small citatinon in this page

48

Sensitivity

We can tune the sensitivity of our approaches. PCA – Analyst tunes the ‘weight’

parameter. WRAPPER – Analyst decides, in the

labelling phase, how big the deviation should be in order to be flagged.

Lowering sensitivity reduces false alarms, it may overlook some important outliers

Icse2013 malik

Documents

Transcript of Icse2013 malik