Adaptive Random Test Case Prioritization Speaker: Bo Jiang * Co-authors: Zhenyu Zhang *, W.K.Chan,...
-
Upload
madeleine-balle -
Category
Documents
-
view
214 -
download
1
Transcript of Adaptive Random Test Case Prioritization Speaker: Bo Jiang * Co-authors: Zhenyu Zhang *, W.K.Chan,...
1
Adaptive Random Test Case Prioritization
Speaker: Bo Jiang* Co-authors: Zhenyu Zhang*, W.K.Chan†, T.H.Tse*
*The University of Hong Kong†City University of Hong Kong
2
Contents Background Motivation Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work
3
Regression Testing Techniques
ProgramP
ProgramP’
Test Suite
T
Test Suite
T’
Test Suite
T’
Test Suite
T’
Test Suite
T’
Test Suite
T
Obsolete Test Case Elimination
Test Case Augmentation
Test Case Prioritization
Test Case Reduction
Test Case SelectionAccounts for 50% of the cost of software maintenance.
4
Test Case Prioritization
Definition Test case prioritization permutes a test suite T for
execution to meet a chosen testing goal. Typical testing goals
Rate of code coverage Rate of fault detection Rate of requirement coverage
Merits No impact on the fault detection ability
5
Coverage-based Test Case Prioritization Technique
Total-statement/function/branch Highest code coverage first Resolve tie-case randomly
Additional-statement/function/branch Additional highest code coverage first Reset when no more coverage can be achieved Resolve tie-case randomly
Disadvantages Hard to scale to larger programs
6
Contents Background Motivation Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work
7
Problem With Total Techniques
GREP FLEX
Elbaum et al. @ TSE 2002
AP
FD
8
Problem With Total(greedy) Techniques
GREP FLEX
Total strategy may NOT be effective for real-life program
Elbaum et al. @ TSE 2002
AP
FD
9
Problems with Additional Techniques
1 2 3 4 5 6
0
5
10
15
20
25
30
35
40
45
Tim
e U
se
d f
or
Pri
ori
tiza
tio
n
RandomSiemens
RandomUnix
AdditionalSiemens
AdditionalUnix
TotalSiemens
TotalUnix
10
Problems with Additional Techniques
1 2 3 4 5 6
0
5
10
15
20
25
30
35
40
45
Tim
e U
se
d f
or
Pri
ori
tiza
tio
n
RandomSiemens
RandomUnix
AdditionalSiemens
AdditionalUnix
TotalSiemens
TotalUnix
Additional Techniques may NOT be efficient for real-life programs.
11
Problems with Additional Techniques
1 2 3 4 5 6
0
5
10
15
20
25
30
35
40
45
Tim
e U
se
d f
or
Pri
ori
tiza
tio
n
RandomSiemens
RandomUnix
AdditionalSiemens
AdditionalUnix
TotalSiemens
TotalUnix
Can we find a prioritization techniques that is both effective and efficient for real
life program?
12
Adaptive Random Testing (ART)
Adaptive Random Testing (ART) A technique for test case generation Evenly spread randomly generated test cases across the input
domain. In empirical study, ART can detect failures using up to 50% fewer
test cases than random testing.
13
Random generate a test case and execute it.
Fixed-Sized-Candidate-Set ART Algorithm
14
Randomly generate a set of candidate test cases.
Fixed-Sized-Candidate-Set ART Algorithm
15
For each candidate test case, find its nearest neighbor within the executed test cases.
Fixed-Sized-Candidate-Set ART Algorithm
16
Select the test case which has longest distance with its nearest neighbor and execute it.
Fixed-Sized-Candidate-Set ART Algorithm
17
Randomly generate a set of candidate test cases.
Fixed-Sized-Candidate-Set ART Algorithm
18
For each candidate test case, find its nearest neighbor within the executed test cases.
Fixed-Sized-Candidate-Set ART Algorithm
19
For each candidate test case, find its nearest neighbor within the executed test cases.
Fixed-Sized-Candidate-Set ART Algorithm
20
For each candidate test case, find its nearest neighbor within the executed test cases.
Fixed-Sized-Candidate-Set ART Algorithm
21
For each candidate test case, find its nearest neighbor within the executed test cases.
Fixed-Sized-Candidate-Set ART Algorithm
22
For each candidate test case, find its nearest neighbor within the executed test cases.
Fixed-Sized-Candidate-Set ART Algorithm
23
Select the test case which has longest distance with its nearest neighbor and execute it.
Fixed-Sized-Candidate-Set ART Algorithm
24
Repeat until a failure is encountered.
Fixed-Sized-Candidate-Set ART Algorithm
X
25
Adaptive Random Testing (ART)
ART is based on the observation that failure turned to cluster across the input domain.
Intuitively, evenly spread the test case may increase the probability of exposing the first fault faster.
In test case prioritization, we also want to increase the rate of fault detection.
26
Use ART directly for test case prioritization?
The variety of black-box input information makes it hard to define a general distance metric. Video streams Images Xml …
The white-box coverage information of the previously executed test cases are readily available Statement coverage Branch coverage Function coverage
And…
27
Distribution of Failures in Profile Space on LilyPond
William Dickinson et al. @ FSE, 2001.
28
MDS Display of Distribution of Failures in Profile Space on LilyPond
William Dickinson et al. @ FSE, 2001.
Failures tend to cluster together.
29
MDS Display of Distribution of Failures in Profile Space on GCC
William Dickinson et al. @ FSE, 2001.
30
Distribution of Failures in Profile Space on GCC
William Dickinson et al. @ FSE, 2001.
Failures tend to cluster together.
31
Use ART directly for test case prioritization?
The variety of black-box input information makes it hard to define a uniform distance metric. Video streams Images Xml …
The white-box coverage information of the previously executed test cases are readily available Statement coverage Branch coverage Function coverage …
Why NOT use such low-cost white-box information to evenly spread test cases across the
code coverage space?
32
Contents Background Motivation Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work
33
Adaptive Random Test Case Prioritization Generate candidate set
Random select a test case into the candidate set If code coverage improve, continue; Otherwise, stop. Merits: No magic number, non-parametric
Select the farthest candidate from the prioritized set Distance between test cases Distance between a candidate test case and the already
prioritized test cases
Repeat until all test cases are prioritized
34
Adaptive Random Test Case Prioritization How to measure the distance of test cases
Jaccard Distance
General distance metric for binary data Can also use other distance metric for substitution.
How to select the test case from the candidate set that is farthest away from the already prioritized test cases? Maximize the minimum distance (maxmin for short)
Chen et al. @ ASIAN '04, LNCS 2004 Maximize the average distance (maxavg for short)
Ciupa et al. @ ICSE 2008 Maximize the maximum distance (maxmax for short)
|)()(|/|)()(|1),( 212121 tststststtJaccard
35
Contents Background Motivation Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future Work
36
Research Questions Do different levels of coverage information
have significant impact on ART techniques? Do different definitions of test set distances
have significant impacts on ART techniques? Are ART techniques efficient?
37
Subject Programs
SubjectNo. of Faulty
Versions LOC Test Pool Size
tcas 41 133–137 1608
schedule 9 291–294 2650
schedule2 10 261–263 2710
tot_info 23 272–274 1052
print_tokens 7 341–342 4130
print_tokens2 10 350–354 4115
replace 32 508–515 5542
flex 21 8571–10124 567
grep 17 8053–9089 809
gzip 55 4081–5159 217
sed 17 4756–9289 370
38
Techniques Studied in the PaperGroup Name Descriptions
Random random Random prioritization
Level of Coverage Info.
Total
total-st statement
total-fn function
total-br branch
Additional
addtl-st statement
addtl-fn function
addtl-br branch
ART Level of Coverage Info. Test Set Distance (f2)
ART
ART-fn-maxmin
Function
Maximize minimum distance
ART-fn-maxavg Maximize average distance
ART-fn-maxmax Maximize maximum distance
ART-br-maxmin
Branch
Maximize minimum distance
ART-br-maxavg Maximize average distance
ART-br-maxmax Maximize maximum distance
ART-st-maxmin
Statement
Maximize minimum distance
ART-st-maxavg Maximize average distance
ART-st-maxmax Maximize maximum distance
39
Experiment Setup Dynamic coverage information collection
gcov tool Effectiveness Metric
APFD: weighted average of the percentage of faults detected over the life of the suite
Process For each of the 11 subject programs, randomly
select 20 test suite, and repeat 50 times for each ART techniques.
40
Research Questions Do different levels of coverage information
have significant impact on ART techniques? Do different definitions of test set distances
have significant impacts on ART techniques? Are ART techniques efficient?
41
Do different levels of coverage information have significant impact on ART techniques?
Fix the other variable: definitions of test set distances. Perform multiple comparison between each pair of
coverage information and gather the statistics.
branch > functionbranch = function
branch > statementbranch = statement
statement > functionstatement = functionstatement < function
42
Do different levels of coverage information have significant impact on ART techniques?
Fix the other variable: definitions of test set distances. Perform multiple comparison between each pair of
coverage information and gather the statistics.
branch > functionbranch = function
branch > statementbranch = statement
statement > functionstatement = functionstatement < function
As confirmed by previous research:Branch > Statement > Function
43
Research Questions Do different levels of coverage information
have significant impact on ART techniques? Branch > Statement > Function
Do different definitions of test set distances have significant impacts on ART techniques?
Is ART techniques efficient?
44
The Impact of Test Set Distance
maxmin > maxavgmaxmin = maxavgmaxmin < maxavg
maxavg > maxmaxmaxavg = maxmaxmaxavg < maxmax
maxmin > maxmaxmaxmin = maxmax
Fix the other variable: definitions of coverage information Perform multiple comparison between each pair of test
set distance and gather the statistics.
45
The Impact of Test Set Distance
maxmin > maxavgmaxmin = maxavgmaxmin < maxavg
maxavg > maxmaxmaxavg = maxmaxmaxavg < maxmax
maxmin > maxmaxmaxmin = maxmax
Fix the other variable: definitions of coverage information Perform multiple comparison between each pair of test
set distance and gather the statistics.
Max-Min > Max-Avg ≈ Max-Max
46
Best ART Technique
ART-br-maxmin is the best ART prioritization Technique
47
Research Questions Do different levels of coverage information
have significant impact on ART techniques? Branch > Statement > Function
Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max
How does ART-br-maxmin compare with greedy?
Is ART techniques efficient?
48
Multiple Comparisons for ART-br-maxmin on Siemens
0.8 0.82 0.84 0.86 0.88 0.9 0.926 groups have means significantly different from ART-br-maxmin
ART-br-maxmax
ART-br-maxmin
ART-br-maxavg
ART-fn-maxmax
ART-fn-maxmin
ART-fn-maxavg
ART-st-maxmax
ART-st-maxmin
ART-st-maxavg
addtl-br
addtl-fn
addtl-st
total-br
total-fn
total-st
random
Multiple Comparisons for ART-br-maxmin on Siemens
490.8 0.82 0.84 0.86 0.88 0.9 0.92
6 groups have means significantly different from ART-br-maxmin
ART-br-maxmax
ART-br-maxmin
ART-br-maxavg
ART-fn-maxmax
ART-fn-maxmin
ART-fn-maxavg
ART-st-maxmax
ART-st-maxmin
ART-st-maxavg
addtl-br
addtl-fn
addtl-st
total-br
total-fn
total-st
random
Only maginal difference difference between ART-br-maxmin and traditional coverage-based techniques, and it is not statistical
significant.
50
Multiple Comparisons for ART-br-maxmin on UNIX
0.4 0.5 0.6 0.7 0.8 0.9 18 groups have means significantly different from ART-br-maxmin
ART-br-maxmax
ART-br-maxmin
ART-br-maxavg
ART-fn-maxmax
ART-fn-maxmin
ART-fn-maxavg
ART-st-maxmax
ART-st-maxmin
ART-st-maxavg
addtl-br
addtl-fn
addtl-st
total-br
total-fn
total-st
random
51
Multiple Comparisons for ART-br-maxmin on UNIX
0.4 0.5 0.6 0.7 0.8 0.9 18 groups have means significantly different from ART-br-maxmin
ART-br-maxmax
ART-br-maxmin
ART-br-maxavg
ART-fn-maxmax
ART-fn-maxmin
ART-fn-maxavg
ART-st-maxmax
ART-st-maxmin
ART-st-maxavg
addtl-br
addtl-fn
addtl-st
total-br
total-fn
total-st
random
Only maginal difference difference between ART-br-maxmin and traditional
coverage-based techniques, and it is not statistically significant.
52
Research Questions Do different levels of coverage information
have significant impact on ART techniques? Branch > Statement > Function
Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max
How does ART-br-maxmin compare with greedy? ART-br-maxmin ≈ Additional > Total
Is ART techniques efficient?
53
Time Cost Analysis across All Programs
1 2 3 4
0
5
10
15
20
25
Tim
e
Random Additional Total ART
54
Time Cost Analysis across All Programs
1 2 3 4
0
5
10
15
20
25
Tim
e (s
)
Random Additional Total ART
ART << AdditionalART ≈ Total
55
Research Questions Do different levels of coverage information
have significant impact on ART techniques? Branch > Statement > Function
Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max
Is there a best ART technique? ART-br-maxmin ART ≈ Additional > Total
Is ART techniques efficient? YES (<<Additional, ≈Total)
56
Contents Background Motivating Example Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work
57
Related Works Greedy Techniques for Test Case Prioritization
Rothermel et al. @ ICSM 1999, S. Elbaum et al @ TSE’02.
Greedy Algorithms
ART Seminal Paper Chen et al. @ ASIAN '04, LNCS 2004 ART techniques can improve the effectiveness of random test
case selection by 40%-50%
Theoretical Aspects of ART Techniques Chen et al. @ ACM TOSEM 17, 3, 2008. No technique can improve the effectiveness of random
test case selection by more than 50%.
58
Related Works ART for Object-Oriented Software
Ciupa et al. @ ICSE 2008 Define the metric for measuring object distance ARTOO is faster to find fault Detect faults not found by directed random.
Profile Guided Test Case Generation Dickinson et al. @ FSE, 2001. Study the how failure is distributed in profile space in real software Improve test case generation by perusing failure regions
59
Contents Background Motivating Example Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work
60
Conclusion Adaptive Random Test Case Prioritization can be
much more effective than random prioritization.
There is marginal difference in effectiveness between ART-br-maxmin and additional greedy techniques (but not statistically significant), yet ART-br-maxmin is much more efficient.
Compared to the total technique, ART-br-maxmin is more effective on real-life program but slightly less efficient.
61
Future Work Are there any better metrics to measure test case
distance?
Improve greedy techniques by using ART to resolve tie cases.
Extend the ART prioritization techniques to the testing of concurrent programs and other domain specific techniques.
62
Comments are welcome!