Icse2013 malik
Transcript of Icse2013 malik
Automatic Detection of Performance Deviations in the Load
Testing of Large Scale Systems
Haroon MalikSoftware Analysis and Intelligence Lab (SAIL)
Queen’s University, Kingston, Canada
Hadi HemmatiSoftware Architecture Group
(SWAG)University of Waterloo,
Waterloo, Canada
Ahmed E. HassanSoftware Analysis and Intelligence Lab (SAIL)
Queen’s University, Kingston, Canada
Performance TeamCanada
Large scale systems need to satisfy performance constraints
LOAD TESTING
4
Environment setup Load test execution Load test analysis Report generation
Current Practice
1 2 3 4
5
Environment Setup Load test execution Load test analysis Report generation
1 2 3 4
Current Practice
6
Environment Setup Load test execution Load test analysis Report generation
1 2 3 4
Current Practice
7
Load Test Execution
MONITORING TOOL
LOAD GENERATOR- 1 SYSTEM PERFORMANCE REPOSITORY
LOAD GENERATOR- 2
8
Environment Setup Load test execution Load test analysis Report generation
1 2 3 4
Current Practice
9
Environment Setup Load test execution Load test analysis Report generation
1 2 3 4
Current Practice
10
Environment Setup Load test execution Load test analysis Report generation
1 2 3 4
Current Practice
11
Challenges with Load Test Analysis
Limited Knowledge
Limite
d TimeLarge Number of Counters
1 2 3
12
Challenges with Load Test Analysis
Limited Knowledge
Limite
d TimeLarge Number of Counters
1 2 3
13
Challenges with Load Test Analysis
Limited Knowledge
Limite
d TimeLarge Number of Counters
1 2 3
14
Challenges with Load Test Analysis
Limited Knowledge
Limite
d TimeLarge Number of Counters
1 2 3
We Propose 4 Deviation detection Approaches
15
3 Unsupervised1 Supervised
Using Performance Counters to Construct Performance Signature
16
%CPU Idle
%CPU Busy
Byte Commits Disk
writes/sec
% Cache Faults/ Sec Bytes received
17
Performance Counters Are Highly Correlated
CPU DISK (IOPS)
NETWORK
MEMORY
TRANSACTIONS/SEC
18
High Level Overview of our Approach
Data Preparation
Signature Generation
Deviation Detection
Baseline Test
New Test
SanitizationStandardization
Performance Report
Input Load Test
19
Prepared Load Test Extracting
CentroidsSignature
Data ReductionClustering
Prepared Load Test Random
SamplingSignature
Unsupervised Signature Generation
Random SamplingApproach
Clustering Approach
SignaturePrepared Load Test
Dimension Reduction
(PCA)
Identifying Top k Performance Counters
Mapping Ranking
Analyst tunes weight parameter
PCA Approach
20
Supervised Signature Generation
Identifying Top k Performance
Countersi. Count
ii. % Frequency
Attribute Selection
OneR
Genetic Search
………
SPC1
SPC2
SPC10
.
.
.Par
titi
onin
g th
e D
ata
Prepared Load Test
Labeling (only for baseline)
WRAPPERApproach Signature
21
Supervised Signature Generation
Identifying Top k Performance
Countersi. Count
ii. % Frequency
Attribute Selection
OneR
Genetic Search
………
SPC1
SPC2
SPC10
.
.
.Par
titi
onin
g th
e D
ata
Prepared Load Test
Labeling (only for baseline)
WRAPPERApproach Signature
22
Two Deviation Detection Approaches
Using Control Chart Using Approach-Specific Techniques
For Clustering and Random Sampling Approaches
For PCA and WRAPPERApproaches
23
Control Chart
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
14
16
Baseline Load TestNew Load Test
Time (min)
Perf
orm
ance
Cou
nter
Val
ue
24
Control Chart
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
14
16
Baseline Load TestNew Load Test
Time (min)
Perf
orm
ance
Cou
nter
Val
ue
25
Control Chart
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
14
16
Baseline Load TestNew Load Test
Time (min)
Perf
orm
ance
Cou
nter
Val
ue
26
Control Chart
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
14
16
Baseline Load TestNew Load Test
Time (min)
Perf
orm
ance
Cou
nter
Val
ue
Control Chars use control limits (CL) to represent the limits of variation that should be expected
from a process.
27
Control Chart
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
14
16
Baseline Load TestNew Load Test
Time (min)
Perf
orm
ance
Cou
nter
Val
ue
Baseline CL
Central Limit (CL): the median of all values of the performance counters in a baseline load test
28
Control Chart
1 2 3 4 5 6 7 8 9 10 110
2
4
6
8
10
12
14
16
Baseline Load TestNew Load Test
Time (min)
Perf
orm
ance
Cou
nter
Val
ue
Baseline LCL, UCL
The Upper/Lower Control Limits (U/LCL) is the upper/lower limit of the range of a counter under the normal behavior of the system
Baseline CL
29
Deviation Detection
Clustering and Random Sampling
PCA Approach Performance Report
Comparing PCA Counter
Weights
Baseline Signature
New Test Signature
Training Data
Evaluation Data
WRAPPER Approach
Baseline Signature
New Test Signature
Control ChartPerformance
Report
Performance Report
Logistic Regression
Baseline Signature
New Test Signature
30
Case StudyHow effective are our signature-based approaches in detecting performance deviations in load tests?
RQ
31
Case StudyHow effective are our signature-based approaches in detecting performance deviations in load tests?
RQ
An Ideal approach should predict a minimal and correct set of
performance deviations.
Evaluation Using:Precision, Recall and F-measure
Subjects of Study
System: Open SourceDomain: Ecommerce
Type of data:1. Data From Our Experiments with and Open Source Benchmark Application
DVD Store
32
System: Industrial SystemDomain: Telecom
Type of data:1. Load Test Repository2. Data From Our Experiments
on the Company’s Testing Platform
33
Fault Injection
Category Failure Trigger Faults
Software FailureResource Exhaustion
CPU Stress
Memory Stress
System Overload Abnormal Workload
Operator Errors Procedural ErrorsInterfering Workload
Unscheduled Replication
34
Case Study Findings
EffectivenessPrecision/Recall/F-measure
Practical Differences
35
Case Study Findings(Effectiveness)
WRAPPER PCA Clustering Random0
0.10.20.30.40.50.60.70.80.9
1
PrecisionRecallF-Measure
Random Sampling has the lowest effectiveness
On Avg. and in all experiments, PCA performs better than Clustering approach.
WRAPPER dominates the best supervised approach i.e., PCA
36
Case Study Findings(Effectiveness)
WRAPPER PCA Clustering Random0
0.10.20.30.40.50.60.70.80.9
1
PrecisionRecallF-Measure
Overall, there is an excellent balance of high precision and recall of both the WRAPPER and PCA approaches (on average 0.95, 0.94 and 0.82, 0.84 respectively) for
deviation detection
37
Real Time Analysis Stability Manual Overhead
Case Study Findings(Practical Differences)
38
Real Time Analysis
WRAPPER--- deviations on a per-observation basis.
PCA --- requires a certain amount of observations (wait time).
39
Stability
We refer to ‘Stability’ as the ability of an approach to remain effective while its signature size is reduced.
40
45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006
0.650000000000006
0.700000000000006
0.750000000000006
0.800000000000006
0.850000000000006
0.900000000000006
0.950000000000006
1.00000000000001
Unsupervised (PCA)Supervised(Wrapper)
Signature Size
F-M
easu
re
Stability
41
45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006
0.650000000000006
0.700000000000006
0.750000000000006
0.800000000000006
0.850000000000006
0.900000000000006
0.950000000000006
1.00000000000001
Unsupervised (PCA)Supervised(Wrapper)
Signature Size
F-M
easu
re
Stability
42
45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006
0.650000000000006
0.700000000000006
0.750000000000006
0.800000000000006
0.850000000000006
0.900000000000006
0.950000000000006
1.00000000000001
Unsupervised (PCA)Supervised(Wrapper)
Signature Size
F-M
easu
re
Stability
646055504540353025201510 5 4 3 2 10.600000000000006
0.650000000000006
0.700000000000006
0.750000000000006
0.800000000000006
0.850000000000006
0.900000000000006
0.950000000000006
1.00000000000001
Unsupervised (PCA)Supervised(Wrapper)
Signature Size
F-M
easu
re
43
45 40 35 30 25 20 15 10 5 4 3 2 10.600000000000006
0.650000000000006
0.700000000000006
0.750000000000006
0.800000000000006
0.850000000000006
0.900000000000006
0.950000000000006
1.00000000000001
Unsupervised (PCA)Supervised(Wrapper)
Signature Size
F-M
easu
re
Stability
646055504540353025201510 5 4 3 2 10.600000000000006
0.650000000000006
0.700000000000006
0.750000000000006
0.800000000000006
0.850000000000006
0.900000000000006
0.950000000000006
1.00000000000001
Unsupervised (PCA)Supervised(Wrapper)
Signature Size
F-M
easu
re
WRAPPER approach is more stable that PCA approach
44
Manual Overhead
WRAPPER approach requires all observations of the baseline performance counter data to be labeled as Pass/Fail
45
Manual Overhead
Marking each observation is time consuming
46
Limitations of Our Approaches
Hardware Difference Sensitivity
47
Hardware Differences Large systems have many testing labs to
run multiple load tests simultaneously. Labs may have different hardware.
Our approaches may interpret same test on different labs as deviated tests
Recent work by K.C.Foo et al. propose several techniques to over come this problemMining Performance Regression Testing Repositories for Automated Performance Analysis ( QSIC 2010)
48
Sensitivity
We can tune the sensitivity of our approaches. PCA – Analyst tunes the ‘weight’
parameter. WRAPPER – Analyst decides, in the
labelling phase, how big the deviation should be in order to be flagged.
Lowering sensitivity reduces false alarms, it may overlook some important outliers
49