CS5103 Software Engineering Lecture 16 Test coverage Regression Testing.
-
Upload
bernice-cannon -
Category
Documents
-
view
222 -
download
0
Transcript of CS5103 Software Engineering Lecture 16 Test coverage Regression Testing.
CS5103 Software
Engineering
Lecture 16Test coverage
Regression Testing
2
Today’s class
Test coverage Input combination coverage
Mutation coverage
Regression Testing Test Prioritization
Mocking
3
Input Combination Coverage
Basic idea Origins from the most straightforward idea
In theory, proof of 100% correctness when achieve 100% coverage in theory
In practice, on very trivial cases
Main problems Combinations are exponential
Possible values are infinite
4
Input Combination Coverage
An example on a simple automatic sales machine Accept only 1$ bill once and all beverages are 1$
Coke, Sprite, Juice, Water
Icy or normal temperature
Want receipt or not
All combinations = 4*2*2 = 16 combinations
Try all 16 combinations will make sure the system works correctly
5
Input Combination Coverage
Sales Machine Example
Coke
Sprite
Juice
Water
Normal
Icy
Receipt
No-Receipt
Input 1 Input 2 Input 3
6
Combination Explosion
Combinations are exponential to the number of inputs
Consider an annual tax report system with 50 yes/no questions to generate a customized form for you
250 combinations = about 1015 test cases
Running 1000 test case for 1 second -> 30,000 years
7
Observation
When there are many inputs, usually a relationship among inputs usually involve only a small number of inputs
The previous example: Maybe only icy coke and sprite, but receipt is independent
8
Example of Tax Report
Input 1: Family combined report or Single report
Input 2: Home loans or not
Input 3: Receive gift or not
Input 4: Age over 60 or not
…
Input 1 is related to all other inputs
Other inputs are independent of each other
9
Studies
A long term study from NIST (national institute of standardization technology) A combination width of 4 to 6 is enough for detecting
almost all errors
10
N-wise coverage
Coverage on N-wise combination of the possible values of all inputs
Example: 2-wise combinations (coke, icy), (sprite, icy), (water, icy), (juice, icy)
(coke, normal), (sprite, normal), …
(coke, receipt), (sprite, receipt), … (coke, no-receipt), (sprite, no-receipt), … (coke, no-receipt), (sprite, no-receipt), … (icy, receipt), (normal, receipt) (icy, no-receipt), (normal, no-receipt) 20 combinations in total We had 16 3-wise combinations, now we have 20, get
worse??
11
N-wise coverage
Note: One test case may cover multiple N-wise combinations E.g., (Coke, Icy, Receipt) covers 3 2-wise combinations
(Coke, Icy), (Coke, Receipt), (Icy, Receipt)
100% N-wise coverage will fully cover 100% (N-1)-wise coverage, is this true?
For K Boolean inputs Full combination coverage = 2k combinations: exponential Full n-wise coverage = 4*k*(k-1)* … *(k-n+1)/n!
combinations: polynomial, for 2-wise combination, 2*k*(k-1)
12
N-wise coverage: Example
How many test cases for 100% 2-wise coverage of our sales machine example? (coke, icy, receipt), covers 3 new 2-wise combinations
(sprite, icy, no-receipt), cover 3 new …
(juice, icy, receipt), covers 2 new …
(water, icy, receipt), covers 2 new …
(coke, normal, no-receipt), covers 3 new …
(sprite, normal, receipt), cover 3 new …
(juice, normal, no-receipt), covers 2 new …
(water, normal, no-receipt), covers 2 new …
8 test cases covers all 20 2-wise combinations
13
Combination Coverage in Practice
2-wise combination coverage is very widely used Pair-wise testing
All pairs testing
Mostly used in configuration testing Example: configuration of gcc
All lot of variables
Several options for each variable
For command line tools: add or remove an option
14
Input model
What happened if an input has infinite possible values Integer
Float
Character
String
Note: all these are actually finite, but the possible value set is too large, so that they are deemed as infinite
Idea: map infinite values to finite value baskets (ranges)
15
Input model
Input partition Partition the possible value set of a input to several
value ranges
Transform numeric variables (integer, float, double, character) to enumerated variables
Example: int exam_score => {less than -1}, {0, 59}, {60,69},
{70,79},
{80,89}, {90, 100}, {100+} char c => {a, z}, {A,Z}, {0,9}, {other}
16
Input model
Feature extraction For string and structure inputs Split the possible value set with a certain feature Example:
String passwd => {contains space}, {no space} It is possible to extract multiple features from one input Example:
String name => {capitalized first letter}, {not}
=> {contains space}, {not}
=> {length >10}, {2-10}, {1}, {0}
One test case may cover multiple features
17
Input model
Feature extraction: structure input A Word Binary Tree (Data at all nodes are strings)
Depth : integer -> partition {0, 1, 1+} Number of leaves : integer -> partition {0, 1, <10, 10+} Root: null / not A node with only left child / not A node with only right child / not Null value data on any node / not Root value: string -> further feature extraction Value on the left most leaf: string -> further feature
extraction …
18
Input model
Infeasible feature combination? Example:
String name => {capitalized first letter}, {not}
=> {contains space}, {not}
=> {length >10}, {2-10}, {1}, {0}
Length = 0 ^ contains space
Length = 0 ^ capitalized first letter
Length = 1 ^ contains space ^ capitalized first letter
19
Input combination coverage
Summary: Try to cover the combination of possible values of
inputs
Exponential combinations: N-wise coverage 2-wise coverage is most popular, all pairs testing
Infinite possible values Input partition Input feature extraction
Coverage is usually 100% once adopted It is easy to achieve, compared with code coverage Models are not easy to write
20
Test coverage
So far, covering inputs and code
The final goal of testing Find all bugs in the software
So there should be a bug coverage
The coverage represents the adequacy of a test suite 50% bug coverage = half done!
100% bug coverage = done!
21
But it is impossible
Bugs are unknown Otherwise we do not need testing
So we have the number of bugs found, we do not know what to divide
One possible solution Estimation
1-10 bugs in 1 KLOC Depends on the type of software and the stage of
development, imprecise When you find many bugs, do you think all bugs are
there or the code is really of low quality?
22
Mutation coverage
How can we know how many bugs there are in the code?
If only we plant those bugs!
Mutation coverage checks the adequacy of a test suite by how many human-planted bugs it can expose
23
Concepts
Mutant A software version with planted bugs
Usually each mutant contains only one planted bug, why?
Mutant Kill Given a test suite S and a mutant m, if there is a test
case t in S, so that execute(original, t) != execute(m, t), we state that S can kill m
Basically, a test suite can kill a mutant, meaning that the test suite is able to detect the planted bug represented by the mutant
24
Illustration
Test Cases
Original
Mutant 1
Mutant 2
Mutant n
...
Oracles
Results
Results
Results
same Survived
different Killed
25
Concepts
Mutation coverage
generated mutants of #
killed mutants of #
26
Mutant generation
Traditional mutation operators Statement deletion
Replace Boolean expression with true/false
Replace arithmetic operators (+, -, *, /, …)
Replace comparison relations (>=, ==, <=, !=)
Replace variables
…
27
Mutation Example: Operator
Mutant operator In original In mutant
Statement Deletion z=x*y+1;
Boolean expression to true | false
if (x<y) if(true)
If(false)
Replace arithmetic operators
z=x*y+1; z=x*y-1
z=x+y-1
Replace comparison operators
if(x<y) if(x<=y)
if(x==y)
Replace variables z=x*y+1; z = z*y+1
z = x*x+1
28
Mutant testing tools
MILU
http://www0.cs.ucl.ac.uk/staff/Y.Jia/#tools MuJava
http://cs.gmu.edu/~offutt/mujava/ Javalanche
https://github.com/david-schuler/javalanche/
29
Summary on all coverage measures
Code coverage Target: code
Adequacy: no -> 100% code coverage != no bugs
Approximation: dataflow, branch, method/statements
Preparation: none (instrumentation can be done automatically)
Overhead: low (instrumentation cause some overhead)
30
Summary on all coverage measures
Input combination coverage Target: inputs
Adequacy: yes -> 100% input coverage == no bugs
Approximation: n-wise coverage, input partition, input feature extraction
Preparation: hard (require input modelling)
Overhead: none
31
Summary on all coverage measures
Mutation coverage Target: bugs
Adequacy: no -> 100% mutant coverage != no bugs
Approximation: mutation is already approximation
Preparation: none (mutation and execution can be done automatically)
Overhead: very high (execution on instrumented mutated versions)
32
Regression Testing
So far Unit testing
System testing
Test coverage
All of these are about the first round of testing Testing is performed time to time during the software
life cycle
Test cases / oracles can be reused in all rounds
Testing during the evolution phase is regression testing
33
Regression Testing
When we try to enhance the software We may also bring in bugs
The software works yesterday, but not today, it is called “regression”
Numbers Empirical study on eclipse 2005
11% of commits are bug-inducing
24% of fixing commits are bug-inducing
34
Regression Testing
Run old test cases on the new version of software
It will cost a lot if we run the whole suite each time
Try to save time and cost for new rounds of testing Test Prioritization
Fake Objects
35
Test prioritization
Rank all the test cases
Run test cases according to the ranked sequence
Stop when resources are used up
How to rank test cases To discover bugs sooner
Or approximation: to achieve higher coverage sooner
36
APFD: Measurement of Test Prioritization
Average Percentage of Fault Detected (APFD) Compare two test case sequences
A number of faults (bugs) are detected after each test case
The following two sequences, which is better? S1: T1 (2), t2(3), t3(5) S2: T2(1), t1(3), t3(5)
APFD is the average of these numbers (normalized with the total number of faults), and 0 for initial state
APFD (S1) = (0/5 + 2/5 + 3/5 + 5/5) / 4 = 0.5
APFD (S2) = (0/5 + 1/5 + 3/5 + 5/5) / 4 = 0.45
37
APFD: Illustration
APFD can be deemed as the area under the TestCase-Fault curve Consider t1(f1, f2), t2(f3), t3(f3), t4(f1, f2, f3, f4)
38
Coverage-based test case prioritization
Code coverage based Require recorded code-coverage information in
previous testing
Combination coverage based Require input model
Mutation coverage based Require recorded mutation-killing stats
39
Total Strategy
The simplest strategy
Always select the unselected test case that has the best coverage
40
Example
Consider code coverage on five test cases: T1: s1, s3
T2: s2, s3, s4, s5
T3: s3, s4, s5
T4: s6, s7
T5: s3, s5, s8, s9, s10
Ranking: T5, T2, T3, T1/T4
41
Additional Strategy
An adaption of total strategy
Instead of always choosing the test case with highest coverage Choose the test case that result in most extra
coverage
Starts from the test case with highest coverage
42
Example
Consider code coverage on five test cases: T1: s1, s3
T2: s2, s3, s4, s5
T3: s3, s4, s5
T4: s6, s7
T5: s3, s5, s8, s9, s10
Ranking: T5(5), T2(2, s2, s4) / T4(2, s6, s7), T1(1, s1), T3
43
Fake Objects
A resource waste in regression testing We change the code a little bit
We need to run all the unchanged code in the test execution
Using fake objects For all/some of the unchanged modules
Do not run the modules
Use the results of previous test instead
44
Fake Objects
Example Testing an expert system for finance
Has two components, UI and interest calculator (based on the inputs from UI)
In first round of testing, store as a map the results of interest calculator: (a, b) -> 5%, (a, c) -> 10%, (d, e) -> 7.7%
In regression testing, if the change is made on UI, you can rerun the software with the data map
Using more fake objects means saving more time in regression testing, should we mock every object???
45
Pros & Cons
Pros Saving time in regression testing
Cons Be careful when mocking non-deterministic
components E.g., mocking getSystemTime(), may conflict with
another call
Spend a lot of time for recording data maps
Stored data map can be too huge
When the mocked object is changed, the data map requires updates
46
Selection of faking modules
Rules Using fake objects for time consuming modules
So that you save more time
The fake module should be stable E.g., libraries
The interface should contain a small data flow E.g., numeric inputs and return values
47
Fake objects
Fake objects are not just useful for regression testing
They are also useful for UI Components
Internet Components
Components that will affect real world Sending an email Transfer money from credit cards
48
Next class
Debugging Test coverage based bug localization
Delta debugging
49
Thanks!