Adaptive Random Test Case Prioritization Speaker: Bo Jiang * Co-authors: Zhenyu Zhang *, W.K.Chan,...

1

Adaptive Random Test Case Prioritization

Speaker: Bo Jiang* Co-authors: Zhenyu Zhang*, W.K.Chan†, T.H.Tse*

*The University of Hong Kong†City University of Hong Kong

2

Contents Background Motivation Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work

3

Regression Testing Techniques

ProgramP

ProgramP’

Test Suite

T

Test Suite

T’

Test Suite

T’

Test Suite

T’

Test Suite

T’

Test Suite

T

Obsolete Test Case Elimination

Test Case Augmentation

Test Case Prioritization

Test Case Reduction

Test Case SelectionAccounts for 50% of the cost of software maintenance.

4

Test Case Prioritization

Definition Test case prioritization permutes a test suite T for

execution to meet a chosen testing goal. Typical testing goals

Rate of code coverage Rate of fault detection Rate of requirement coverage

Merits No impact on the fault detection ability

5

Coverage-based Test Case Prioritization Technique

Total-statement/function/branch Highest code coverage first Resolve tie-case randomly

Additional-statement/function/branch Additional highest code coverage first Reset when no more coverage can be achieved Resolve tie-case randomly

Disadvantages Hard to scale to larger programs

6


7

Problem With Total Techniques

GREP FLEX

Elbaum et al. @ TSE 2002

AP

FD

8

Problem With Total(greedy) Techniques

GREP FLEX

Total strategy may NOT be effective for real-life program

Elbaum et al. @ TSE 2002

AP

FD

9

Problems with Additional Techniques

1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

Tim

e U

se

d f

or

Pri

ori

tiza

tio

n

RandomSiemens

RandomUnix

AdditionalSiemens

AdditionalUnix

TotalSiemens

TotalUnix

10


1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

Tim

e U

se

d f

or

Pri

ori

tiza

tio

n

RandomSiemens

RandomUnix

AdditionalSiemens

AdditionalUnix

TotalSiemens

TotalUnix

Additional Techniques may NOT be efficient for real-life programs.

11


1 2 3 4 5 6

0

5

10

15

20

25

30

35

40

45

Tim

e U

se

d f

or

Pri

ori

tiza

tio

n

RandomSiemens

RandomUnix

AdditionalSiemens

AdditionalUnix

TotalSiemens

TotalUnix

Can we find a prioritization techniques that is both effective and efficient for real

life program?

12

Adaptive Random Testing (ART)

Adaptive Random Testing (ART) A technique for test case generation Evenly spread randomly generated test cases across the input

domain. In empirical study, ART can detect failures using up to 50% fewer

test cases than random testing.

13

Random generate a test case and execute it.

Fixed-Sized-Candidate-Set ART Algorithm

14

Randomly generate a set of candidate test cases.


15

For each candidate test case, find its nearest neighbor within the executed test cases.


16

Select the test case which has longest distance with its nearest neighbor and execute it.


17

Randomly generate a set of candidate test cases.


18



19



20



21



22



23

Select the test case which has longest distance with its nearest neighbor and execute it.


24

Repeat until a failure is encountered.


X

25

Adaptive Random Testing (ART)

ART is based on the observation that failure turned to cluster across the input domain.

Intuitively, evenly spread the test case may increase the probability of exposing the first fault faster.

In test case prioritization, we also want to increase the rate of fault detection.

26

Use ART directly for test case prioritization?

The variety of black-box input information makes it hard to define a general distance metric. Video streams Images Xml …

The white-box coverage information of the previously executed test cases are readily available Statement coverage Branch coverage Function coverage

And…

27

Distribution of Failures in Profile Space on LilyPond

William Dickinson et al. @ FSE, 2001.

28

MDS Display of Distribution of Failures in Profile Space on LilyPond


Failures tend to cluster together.

29

MDS Display of Distribution of Failures in Profile Space on GCC


30

Distribution of Failures in Profile Space on GCC


Failures tend to cluster together.

31

Use ART directly for test case prioritization?

The variety of black-box input information makes it hard to define a uniform distance metric. Video streams Images Xml …

The white-box coverage information of the previously executed test cases are readily available Statement coverage Branch coverage Function coverage …

Why NOT use such low-cost white-box information to evenly spread test cases across the

code coverage space?

32


33

Adaptive Random Test Case Prioritization Generate candidate set

Random select a test case into the candidate set If code coverage improve, continue; Otherwise, stop. Merits: No magic number, non-parametric

Select the farthest candidate from the prioritized set Distance between test cases Distance between a candidate test case and the already

prioritized test cases

Repeat until all test cases are prioritized

34

Adaptive Random Test Case Prioritization How to measure the distance of test cases

Jaccard Distance

General distance metric for binary data Can also use other distance metric for substitution.

How to select the test case from the candidate set that is farthest away from the already prioritized test cases? Maximize the minimum distance (maxmin for short)

Chen et al. @ ASIAN '04, LNCS 2004 Maximize the average distance (maxavg for short)

Ciupa et al. @ ICSE 2008 Maximize the maximum distance (maxmax for short)

|)()(|/|)()(|1),( 212121 tststststtJaccard

35

Contents Background Motivation Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future Work

36

Research Questions Do different levels of coverage information

have significant impact on ART techniques? Do different definitions of test set distances

have significant impacts on ART techniques? Are ART techniques efficient?

37

Subject Programs

SubjectNo. of Faulty

Versions LOC Test Pool Size

tcas 41 133–137 1608

schedule 9 291–294 2650

schedule2 10 261–263 2710

tot_info 23 272–274 1052

print_tokens 7 341–342 4130

print_tokens2 10 350–354 4115

replace 32 508–515 5542

flex 21 8571–10124 567

grep 17 8053–9089 809

gzip 55 4081–5159 217

sed 17 4756–9289 370

38

Techniques Studied in the PaperGroup Name Descriptions

Random random Random prioritization

Level of Coverage Info.

Total

total-st statement

total-fn function

total-br branch

Additional

addtl-st statement

addtl-fn function

addtl-br branch

ART Level of Coverage Info. Test Set Distance (f2)

ART

ART-fn-maxmin

Function

Maximize minimum distance

ART-fn-maxavg Maximize average distance

ART-fn-maxmax Maximize maximum distance

ART-br-maxmin

Branch


ART-br-maxavg Maximize average distance

ART-br-maxmax Maximize maximum distance

ART-st-maxmin

Statement


ART-st-maxavg Maximize average distance

ART-st-maxmax Maximize maximum distance

39

Experiment Setup Dynamic coverage information collection

gcov tool Effectiveness Metric

APFD: weighted average of the percentage of faults detected over the life of the suite

Process For each of the 11 subject programs, randomly

select 20 test suite, and repeat 50 times for each ART techniques.

40


have significant impact on ART techniques? Do different definitions of test set distances

have significant impacts on ART techniques? Are ART techniques efficient?

41

Do different levels of coverage information have significant impact on ART techniques?

Fix the other variable: definitions of test set distances. Perform multiple comparison between each pair of

coverage information and gather the statistics.

branch > functionbranch = function

branch > statementbranch = statement

statement > functionstatement = functionstatement < function

42

Do different levels of coverage information have significant impact on ART techniques?

Fix the other variable: definitions of test set distances. Perform multiple comparison between each pair of

coverage information and gather the statistics.

branch > functionbranch = function

branch > statementbranch = statement

statement > functionstatement = functionstatement < function

As confirmed by previous research:Branch > Statement > Function

43


have significant impact on ART techniques? Branch > Statement > Function

Do different definitions of test set distances have significant impacts on ART techniques?

Is ART techniques efficient?

44

The Impact of Test Set Distance

maxmin > maxavgmaxmin = maxavgmaxmin < maxavg

maxavg > maxmaxmaxavg = maxmaxmaxavg < maxmax

maxmin > maxmaxmaxmin = maxmax

Fix the other variable: definitions of coverage information Perform multiple comparison between each pair of test

set distance and gather the statistics.

45

The Impact of Test Set Distance

maxmin > maxavgmaxmin = maxavgmaxmin < maxavg

maxavg > maxmaxmaxavg = maxmaxmaxavg < maxmax

maxmin > maxmaxmaxmin = maxmax

Fix the other variable: definitions of coverage information Perform multiple comparison between each pair of test

set distance and gather the statistics.

Max-Min > Max-Avg ≈ Max-Max

46

Best ART Technique

ART-br-maxmin is the best ART prioritization Technique

47



Do different definitions of test set distances have significant impacts on ART techniques? Max-Min > Max-Avg > Max-Max

How does ART-br-maxmin compare with greedy?


48

Multiple Comparisons for ART-br-maxmin on Siemens

0.8 0.82 0.84 0.86 0.88 0.9 0.926 groups have means significantly different from ART-br-maxmin

ART-br-maxmax

ART-br-maxmin

ART-br-maxavg

ART-fn-maxmax

ART-fn-maxmin

ART-fn-maxavg

ART-st-maxmax

ART-st-maxmin

ART-st-maxavg

addtl-br

addtl-fn

addtl-st

total-br

total-fn

total-st

random

Multiple Comparisons for ART-br-maxmin on Siemens

490.8 0.82 0.84 0.86 0.88 0.9 0.92

6 groups have means significantly different from ART-br-maxmin

ART-br-maxmax

ART-br-maxmin

ART-br-maxavg

ART-fn-maxmax

ART-fn-maxmin

ART-fn-maxavg

ART-st-maxmax

ART-st-maxmin

ART-st-maxavg

addtl-br

addtl-fn

addtl-st

total-br

total-fn

total-st

random

Only maginal difference difference between ART-br-maxmin and traditional coverage-based techniques, and it is not statistical

significant.

50

Multiple Comparisons for ART-br-maxmin on UNIX

0.4 0.5 0.6 0.7 0.8 0.9 18 groups have means significantly different from ART-br-maxmin

ART-br-maxmax

ART-br-maxmin

ART-br-maxavg

ART-fn-maxmax

ART-fn-maxmin

ART-fn-maxavg

ART-st-maxmax

ART-st-maxmin

ART-st-maxavg

addtl-br

addtl-fn

addtl-st

total-br

total-fn

total-st

random

51

Multiple Comparisons for ART-br-maxmin on UNIX

0.4 0.5 0.6 0.7 0.8 0.9 18 groups have means significantly different from ART-br-maxmin

ART-br-maxmax

ART-br-maxmin

ART-br-maxavg

ART-fn-maxmax

ART-fn-maxmin

ART-fn-maxavg

ART-st-maxmax

ART-st-maxmin

ART-st-maxavg

addtl-br

addtl-fn

addtl-st

total-br

total-fn

total-st

random

Only maginal difference difference between ART-br-maxmin and traditional

coverage-based techniques, and it is not statistically significant.

52




How does ART-br-maxmin compare with greedy? ART-br-maxmin ≈ Additional > Total


53

Time Cost Analysis across All Programs

1 2 3 4

0

5

10

15

20

25

Tim

e

Random Additional Total ART

54

Time Cost Analysis across All Programs

1 2 3 4

0

5

10

15

20

25

Tim

e (s

)

Random Additional Total ART

ART << AdditionalART ≈ Total

55




Is there a best ART technique? ART-br-maxmin ART ≈ Additional > Total

Is ART techniques efficient? YES (<<Additional, ≈Total)

56

Contents Background Motivating Example Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work

57

Related Works Greedy Techniques for Test Case Prioritization

Rothermel et al. @ ICSM 1999, S. Elbaum et al @ TSE’02.

Greedy Algorithms

ART Seminal Paper Chen et al. @ ASIAN '04, LNCS 2004 ART techniques can improve the effectiveness of random test

case selection by 40%-50%

Theoretical Aspects of ART Techniques Chen et al. @ ACM TOSEM 17, 3, 2008. No technique can improve the effectiveness of random

test case selection by more than 50%.

58

Related Works ART for Object-Oriented Software

Ciupa et al. @ ICSE 2008 Define the metric for measuring object distance ARTOO is faster to find fault Detect faults not found by directed random.

Profile Guided Test Case Generation Dickinson et al. @ FSE, 2001. Study the how failure is distributed in profile space in real software Improve test case generation by perusing failure regions

59

Contents Background Motivating Example Adaptive Random Test Case Prioritization Experiments and Results Analysis Related Works Conclusion & Future work

60

Conclusion Adaptive Random Test Case Prioritization can be

much more effective than random prioritization.

There is marginal difference in effectiveness between ART-br-maxmin and additional greedy techniques (but not statistically significant), yet ART-br-maxmin is much more efficient.

Compared to the total technique, ART-br-maxmin is more effective on real-life program but slightly less efficient.

61

Future Work Are there any better metrics to measure test case

distance?

Improve greedy techniques by using ART to resolve tie cases.

Extend the ART prioritization techniques to the testing of concurrent programs and other domain specific techniques.

62

Comments are welcome!

Adaptive Random Test Case Prioritization Speaker: Bo Jiang * Co-authors: Zhenyu Zhang *, W.K.Chan,...

Documents

Transcript of Adaptive Random Test Case Prioritization Speaker: Bo Jiang * Co-authors: Zhenyu Zhang *, W.K.Chan,...