8/23/00ISSTA-20001 Comparison of Delivered Reliability of Branch, Data Flow, and Operational...

Post on 18-Jan-2018

218 views 0 download

description

8/23/00ISSTA Measures of Test Effectiveness Probability of detecting at least one fault [DN84,HT90,FWey93,FWei93,…] Expected number of failures during test [FWey93,CY96] Number of faults detected [HFGO94] Delivered reliability [FHLS98]

Transcript of 8/23/00ISSTA-20001 Comparison of Delivered Reliability of Branch, Data Flow, and Operational...

8/23/00 ISSTA-2000 1

Comparison of Delivered Reliability of Branch, Data Flow, and Operational Testing: A Case

Study

Phyllis G. FranklYuetang Deng

Polytechnic UniversityBrooklyn, NY

8/23/00 ISSTA-2000 2

Outline

• Measures of test effectiveness• Delivered reliability• Experiment design• Subject program• Results• Threats to validity• Conclusions

8/23/00 ISSTA-2000 3

Measures of Test Effectiveness

• Probability of detecting at least one fault [DN84,HT90,FWey93,FWei93,…]

• Expected number of failures during test [FWey93,CY96]

• Number of faults detected [HFGO94]

• Delivered reliability [FHLS98]

8/23/00 ISSTA-2000 4

Select test cases

Execute test cases

Check results

Debug program

Release program

Check test data adequacy

OK?

OK?no

yes

yes

no

8/23/00 ISSTA-2000 5

Select test cases

Execute test cases

Check results

Debug program

Release program

Estimate reliability

OK?no

yes

8/23/00 ISSTA-2000 6

Delivered Reliability

• Captures intuition that discovery and removal of “important” faults is more crucial

• Evaluates testing technique according to the extent to which testing will increase reliability

• Introduced and studied analytically, FHLS (FSE-97, TSE-98)

8/23/00 ISSTA-2000 7

Failures, Faults, and Failure Regions

int foo();int x,y;{ s1; s2; if c1 { s3; s4; }; s5; s6;}

qi = probability that input selected according to operational distribution willhit failure region i

8/23/00 ISSTA-2000 8

Failure Rate After Testing/Debugging

• Reliability after testing and debugging determined by which failure regions are hit by test cases

• Random variable represents failure rate after testing and debugging

• Compare testing techniques by comparing statistics of their ’s

8/23/00 ISSTA-2000 9

ExampleFault set Probability

of detectionFailure rate

Empty 0.94 0.0F1 0.02 0.001F2 0.03 0.010F1,F2 0.01 0.011

01.0)000.0Pr(03.0)001.0Pr(02.0)010.0Pr(94.0)011.0Pr(

8/23/00 ISSTA-2000 10

Testing Criteria Considered

• Various levels of coverage of– decision coverage (branch testing)– def-use coverage (all-used data flow

testing)– grouped into quartiles and deciles

• random testing with no coverage criterion

8/23/00 ISSTA-2000 11

Questions Investigated

• How do test sets that achieve high coverage levels (of branch testing or data flow testing) compare to those achieving lower coverage, according to– Expected improvement in reliability: – Probability of reaching given reliability

target:

)(E

)Pr( x

8/23/00 ISSTA-2000 12

Subject Program

• “Space” Program• 10,000+ LOC C antenna design program,

written by professional programmers, containing naturally occurring faults

• Test generator generates tests according to operational distribution [Pasquini et al]

• Considered 10 relatively hard-to-detect faults• Failure rate: 0.05564

8/23/00 ISSTA-2000 13

Experiment Design• Adapted from design used to compare

probability of detecting at least one fault [Frankl, Weiss, et al.]

• Simulate execution of very large number of fixed-sized test sets

• For each, note coverage achieved (branch, data flow) and faults detected

• Compute density function of for various coverage-level groups

8/23/00 ISSTA-2000 14

featuresTe

st c

ases

Coverage matrixFa

ult-s

ets

Failure rate vectorTe

st c

ases

faultsResults matrix

Faul

t-set

s

Fault-detection matrix

Coverage levels

8/23/00 ISSTA-2000 15

Coverage Levels

• Considered the following groups of test sets for test sets of size 50:– highest decile of decision coverage– highest decile of def-use coverage– four quartiles of decision coverage– four quartiles of def-use coverage

8/23/00 ISSTA-2000 16

Expected Valuescoverage expected percentage

range decrease in decreasefailure rate

all 0.021 38%

decision coverage 0 to 26 % 0.018 32%26 to 51% 0.021 38%51 to 77% 0.22 40%77 to 100% 0.023 42%88 t0 100% 0.024 43%

def-use coverage 0 to 32% 0.017 13%32 to 53% 0.021 38%53 to 77% 0.023 40%77 to 100% 0.025 44%88 to 100% 0.025 46%

8/23/00 ISSTA-2000 17

Tail Probabilities

8/23/00 ISSTA-2000 18

8/23/00 ISSTA-2000 19

8/23/00 ISSTA-2000 20

8/23/00 ISSTA-2000 21

Idealized Test Generation Strategy• Select one test case from each subdomain

(independently, randomly)• Widely studied analytically• Results in very large test sets for this

subject– decision coverage: 995– def-use coverage: 4296

• Compared to large random test sets

8/23/00 ISSTA-2000 22

Expected Values

size expected percentagedecrease in decreasefailure rate

100% decision coverage 995 0.055 100%random 995 0.054 96%

100% def-use coverage 4296 0.056 100%random 4296 0.056 100%

8/23/00 ISSTA-2000 23

Tail Probabilities

8/23/00 ISSTA-2000 24

Threats to Validity

• Single program• Dependence on programmers’

characterization of the faults• Dependence on universe• Universe based on operational distribution• Single test set size (50)• Accurate estimates of expected value, but

less accuracy in estimates of density function

8/23/00 ISSTA-2000 25

Conclusions• Positive:

– higher decision coverage yields lower expected failure rate

– higher def-use coverage yields lower expected failure rate

– higher coverage increases likelihood of reaching high reliability target (low failure rate target)

8/23/00 ISSTA-2000 26

Conclusions (continued)

• Negative:– reliability gains with increased coverage

are modest• cost-effectiveness questionable• economic significance of increases depends on

context– no silver bullet for ultra-reliability