8/23/00ISSTA-20001 Comparison of Delivered Reliability of Branch, Data Flow, and Operational...
-
Upload
stanley-thompson -
Category
Documents
-
view
218 -
download
0
description
Transcript of 8/23/00ISSTA-20001 Comparison of Delivered Reliability of Branch, Data Flow, and Operational...
8/23/00 ISSTA-2000 1
Comparison of Delivered Reliability of Branch, Data Flow, and Operational Testing: A Case
Study
Phyllis G. FranklYuetang Deng
Polytechnic UniversityBrooklyn, NY
8/23/00 ISSTA-2000 2
Outline
• Measures of test effectiveness• Delivered reliability• Experiment design• Subject program• Results• Threats to validity• Conclusions
8/23/00 ISSTA-2000 3
Measures of Test Effectiveness
• Probability of detecting at least one fault [DN84,HT90,FWey93,FWei93,…]
• Expected number of failures during test [FWey93,CY96]
• Number of faults detected [HFGO94]
• Delivered reliability [FHLS98]
8/23/00 ISSTA-2000 4
Select test cases
Execute test cases
Check results
Debug program
Release program
Check test data adequacy
OK?
OK?no
yes
yes
no
8/23/00 ISSTA-2000 5
Select test cases
Execute test cases
Check results
Debug program
Release program
Estimate reliability
OK?no
yes
8/23/00 ISSTA-2000 6
Delivered Reliability
• Captures intuition that discovery and removal of “important” faults is more crucial
• Evaluates testing technique according to the extent to which testing will increase reliability
• Introduced and studied analytically, FHLS (FSE-97, TSE-98)
8/23/00 ISSTA-2000 7
Failures, Faults, and Failure Regions
int foo();int x,y;{ s1; s2; if c1 { s3; s4; }; s5; s6;}
qi = probability that input selected according to operational distribution willhit failure region i
8/23/00 ISSTA-2000 8
Failure Rate After Testing/Debugging
• Reliability after testing and debugging determined by which failure regions are hit by test cases
• Random variable represents failure rate after testing and debugging
• Compare testing techniques by comparing statistics of their ’s
8/23/00 ISSTA-2000 9
ExampleFault set Probability
of detectionFailure rate
Empty 0.94 0.0F1 0.02 0.001F2 0.03 0.010F1,F2 0.01 0.011
01.0)000.0Pr(03.0)001.0Pr(02.0)010.0Pr(94.0)011.0Pr(
8/23/00 ISSTA-2000 10
Testing Criteria Considered
• Various levels of coverage of– decision coverage (branch testing)– def-use coverage (all-used data flow
testing)– grouped into quartiles and deciles
• random testing with no coverage criterion
8/23/00 ISSTA-2000 11
Questions Investigated
• How do test sets that achieve high coverage levels (of branch testing or data flow testing) compare to those achieving lower coverage, according to– Expected improvement in reliability: – Probability of reaching given reliability
target:
)(E
)Pr( x
8/23/00 ISSTA-2000 12
Subject Program
• “Space” Program• 10,000+ LOC C antenna design program,
written by professional programmers, containing naturally occurring faults
• Test generator generates tests according to operational distribution [Pasquini et al]
• Considered 10 relatively hard-to-detect faults• Failure rate: 0.05564
8/23/00 ISSTA-2000 13
Experiment Design• Adapted from design used to compare
probability of detecting at least one fault [Frankl, Weiss, et al.]
• Simulate execution of very large number of fixed-sized test sets
• For each, note coverage achieved (branch, data flow) and faults detected
• Compute density function of for various coverage-level groups
8/23/00 ISSTA-2000 14
featuresTe
st c
ases
Coverage matrixFa
ult-s
ets
Failure rate vectorTe
st c
ases
faultsResults matrix
Faul
t-set
s
Fault-detection matrix
Coverage levels
8/23/00 ISSTA-2000 15
Coverage Levels
• Considered the following groups of test sets for test sets of size 50:– highest decile of decision coverage– highest decile of def-use coverage– four quartiles of decision coverage– four quartiles of def-use coverage
8/23/00 ISSTA-2000 16
Expected Valuescoverage expected percentage
range decrease in decreasefailure rate
all 0.021 38%
decision coverage 0 to 26 % 0.018 32%26 to 51% 0.021 38%51 to 77% 0.22 40%77 to 100% 0.023 42%88 t0 100% 0.024 43%
def-use coverage 0 to 32% 0.017 13%32 to 53% 0.021 38%53 to 77% 0.023 40%77 to 100% 0.025 44%88 to 100% 0.025 46%
8/23/00 ISSTA-2000 17
Tail Probabilities
8/23/00 ISSTA-2000 18
8/23/00 ISSTA-2000 19
8/23/00 ISSTA-2000 20
8/23/00 ISSTA-2000 21
Idealized Test Generation Strategy• Select one test case from each subdomain
(independently, randomly)• Widely studied analytically• Results in very large test sets for this
subject– decision coverage: 995– def-use coverage: 4296
• Compared to large random test sets
8/23/00 ISSTA-2000 22
Expected Values
size expected percentagedecrease in decreasefailure rate
100% decision coverage 995 0.055 100%random 995 0.054 96%
100% def-use coverage 4296 0.056 100%random 4296 0.056 100%
8/23/00 ISSTA-2000 23
Tail Probabilities
8/23/00 ISSTA-2000 24
Threats to Validity
• Single program• Dependence on programmers’
characterization of the faults• Dependence on universe• Universe based on operational distribution• Single test set size (50)• Accurate estimates of expected value, but
less accuracy in estimates of density function
8/23/00 ISSTA-2000 25
Conclusions• Positive:
– higher decision coverage yields lower expected failure rate
– higher def-use coverage yields lower expected failure rate
– higher coverage increases likelihood of reaching high reliability target (low failure rate target)
8/23/00 ISSTA-2000 26
Conclusions (continued)
• Negative:– reliability gains with increased coverage
are modest• cost-effectiveness questionable• economic significance of increases depends on
context– no silver bullet for ultra-reliability