Effect of testing techniques on software reliability estimates obtained using a time-domain model

7
IEEE TRANSACTIONS ON RELIABILITY, VOL. 44, NO. 1, 1995 MARCH 91 Effect of Testing Techniques on Software Reliability Estimates Obtained Using A Time-Domain Model Mei-Hwa Chen, Member IEEE Aditya P. Mathur, Member IEEE Vernon J. Rego, Member IEEE Purdue University, West Lafayette Purdue University, West Lafayette Purdue University, West Lafayette Key Words - Software reliability estimation, software testing, coverage measures, random testing. Reader Aids - reliability estimates General purpose: Show that testing-technique affects software- Special math needed for derivations: Probability Special math needed to use results: Same Results useful to: Software reliability engineers. Summary & Conclusions - Since the early 1970s, researchers have proposed several models to estimate software-reliability as testing progresses. Among these, the time-domain models are the most common. We present empirical evidence to show that the testing method does affect the reliability estimates using one of these models, viz, the Musa basic execution-time model. The evidence suggests that: 1) reliability models need to consider additional data, generated during testing, such as some form of code coverage, to obtain accurate reliability estimates; and 2) further research is necessary to determine which testing method, or combination thereof, leads to higher reliability. 1. INTRODUCTION The reliability of a program is defined as the probability that the program does not fail in a given environment during a given time interval. This is an important metric in deciding about software release and can also be a useful predictor of reliability for software currently in operation. Several analytic models have been proposed to estimate software reliability. Among these, the time-domain models are widely researched [4]. These models use the failure history of a program obtain- ed during testing. Model parameters are estimated using data collected during testing, and the reliability is estimated from a fitted model. In applying these models, it is assumed that the program is tested using an operational profile. The implication’ is that the failures induced by such tests reveal errors in code that are most likely to be exercised in an operational environment. As these errors are found & removed, the operational reliability of the program improves. Again, the implication is that even though some errors might remain in the program, if they are not in code that will often be used in an operational environ- ment, then they have only a marginal effect on the operational reliability. Similar arguments, and data obtained from some soft- ware products in operation, have been used by some resear- chers to conclude that random testing, as opposed to selective or structural testing [ 11, tends to find errors in the order of their seriousness. However, it is far from clear as to the best way to test a program to make it reliable. To compare the reliability estimates obtained using failure data generated when different testing methods have been used, we conducted an experiment with one program. The remainder of this paper describes the results of this experiment, our conclusions, and suggestions for future research. Section 2 introduces the 3 testing methods us- ed in our experiments. Section 3 describes the steps to obtain the failure data in our experiments. Section 4 shows the plots of failure data obtained for the 3 methods. The reliability estimates from these failure data are also presented & compared in this section. One may argue that while testing to improve coverage, one does not use the operational profile. We discuss this in section 5. Notation t, t’ [execution, exposure] time B fault-reduction factor 4 per-fault hazard rate ti time of failure i R,, Rfi, Rb reliability estimates using failure data from [ran- dom testing, functional-testing-i (i = 1,2), block testing] LCB & UCB lower & upper 2-sided symmetrical 95% s- confidence bounds. Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue. Assumptions 0. All times are CPU time. 1. Testing methods used are: random testing, functional 2. Debugging is imperfect. 3. Times between failures are s-independent. 4. Faults are uniformly distributed. 5. All inputs are equally likely for random testing. 6. Time is used as a basis for the failure rate. testing, and block testing. 4 ’The implications listed in this paper are strictly those of the authors and are not necessarily corroborated in the literature or by other researchers. 2. OVERVIEW OF TESTING METHODS We experimented with three testing methods: 0018-9529/95/$4.00 01995 IEEE

Transcript of Effect of testing techniques on software reliability estimates obtained using a time-domain model

IEEE TRANSACTIONS ON RELIABILITY, VOL. 44, NO. 1, 1995 MARCH 91

Effect of Testing Techniques on Software Reliability Estimates Obtained Using A Time-Domain Model

Mei-Hwa Chen, Member IEEE

Aditya P. Mathur, Member IEEE

Vernon J. Rego, Member IEEE

Purdue University, West Lafayette

Purdue University, West Lafayette

Purdue University, West Lafayette

Key Words - Software reliability estimation, software testing, coverage measures, random testing.

Reader Aids -

reliability estimates General purpose: Show that testing-technique affects software-

Special math needed for derivations: Probability Special math needed to use results: Same Results useful to: Software reliability engineers.

Summary & Conclusions - Since the early 1970s, researchers have proposed several models to estimate software-reliability as testing progresses. Among these, the time-domain models are the most common. We present empirical evidence to show that the testing method does affect the reliability estimates using one of these models, viz, the Musa basic execution-time model. The evidence suggests that: 1) reliability models need to consider additional data, generated during testing, such as some form of code coverage, to obtain accurate reliability estimates; and 2) further research is necessary to determine which testing method, or combination thereof, leads to higher reliability.

1. INTRODUCTION

The reliability of a program is defined as the probability that the program does not fail in a given environment during a given time interval. This is an important metric in deciding about software release and can also be a useful predictor of reliability for software currently in operation. Several analytic models have been proposed to estimate software reliability. Among these, the time-domain models are widely researched [4]. These models use the failure history of a program obtain- ed during testing. Model parameters are estimated using data collected during testing, and the reliability is estimated from a fitted model.

In applying these models, it is assumed that the program is tested using an operational profile. The implication’ is that the failures induced by such tests reveal errors in code that are most likely to be exercised in an operational environment. As these errors are found & removed, the operational reliability

of the program improves. Again, the implication is that even though some errors might remain in the program, if they are not in code that will often be used in an operational environ- ment, then they have only a marginal effect on the operational reliability. Similar arguments, and data obtained from some soft- ware products in operation, have been used by some resear- chers to conclude that random testing, as opposed to selective or structural testing [ 11, tends to find errors in the order of their seriousness. However, it is far from clear as to the best way to test a program to make it reliable. To compare the reliability estimates obtained using failure data generated when different testing methods have been used, we conducted an experiment with one program. The remainder of this paper describes the results of this experiment, our conclusions, and suggestions for future research. Section 2 introduces the 3 testing methods us- ed in our experiments. Section 3 describes the steps to obtain the failure data in our experiments. Section 4 shows the plots of failure data obtained for the 3 methods. The reliability estimates from these failure data are also presented & compared in this section. One may argue that while testing to improve coverage, one does not use the operational profile. We discuss this in section 5.

Notation

t , t ’ [execution, exposure] time B fault-reduction factor 4 per-fault hazard rate ti time of failure i R,, Rfi, Rb reliability estimates using failure data from [ran-

dom testing, functional-testing-i ( i = 1,2), block testing]

LCB & UCB lower & upper 2-sided symmetrical 95% s- confidence bounds. Other, standard notation is given in “Information for Readers & Authors” at the rear of each issue.

Assumptions

0. All times are CPU time. 1. Testing methods used are: random testing, functional

2. Debugging is imperfect. 3. Times between failures are s-independent. 4. Faults are uniformly distributed. 5. All inputs are equally likely for random testing. 6 . Time is used as a basis for the failure rate.

testing, and block testing.

4

’The implications listed in this paper are strictly those of the authors and are not necessarily corroborated in the literature or by other researchers.

2. OVERVIEW OF TESTING METHODS

We experimented with three testing methods:

0018-9529/95/$4.00 01995 IEEE

98 IEEE TRANSACTIONS ON RELIABILITY, VOL. 44, NO. 1. 1995 MARCH

random [2] functional [3] block [l].

Random & functional testing are two forms of black-box testing. Test cases constructed using functional testing are based on the functional specifications of program-requirements [3]. For ex- ample, when the -i option is used in the command line of the Unix sort program, it causes sort to ignore characters outside the ASCII range 040-0176 (octal) in non-numeric comparisons. One can combine the options supported by sort in a variety of ways to obtain a large set of functions.

Keeping the above possibilities in view, it is evident that two individuals are likely to construct different test sets for testing the same program using functional testing. Thus, it is also likely that the two test sets will cause the program under test to exhibit different failure sequences. As explained in sec- tion 3, this observation has resulted in our experimenting with two variations of functional testing.

When using random testing, an input test case is selected randomly from the input domain of the program. With each test case one can associate a probability of selection. In our experi- ment, random testing is carried out by randomly selecting at least one of several functions from the specification of the sort program according to its predefined occurrence probability, and then a suitable test case is selected randomly to test this specifica- tion. Appendix B lists the algorithm used to generate test cases based on various options to the sort program.

A sequence of statements with a single exit and a single entry point is a block. A block in program P is covered if it is executed by at least one test case while P is being tested. Ob- taining 100% block coverage is perhaps the simplest form of structural testing and is hereafter referred to as block testing. When using block testing, a tester begins by constructing a test case C using the functional specification of the program and executes the program on C. If C does not cover all the blocks, as is often the case for non-trivial programs, then at least one of the uncovered blocks is examined to determine new test cases. The program is then executed on the new test cases. This pro- cess of constructing test cases and determining which blocks remain uncovered, continues until either 100 % block coverage has been obtained or the program fails2. In the latter case, we assume that the error is corrected and one begins the testing process anew, reusing the test cases already generated. We used a data-flow testing tool, ATAC3, to measure block coverage.

While using block or functional testing as described above, one does not use the operational profile. Instead, one develops test cases based on what blocks have not been covered. One may argue that it is incorrect to apply time domain models, which assume testing based on operational profile, to estimate

'Often one finds blocks in a program that are unreachable. Such blocks can arise due to defensive coding. We ignore such blocks from the coverage computation. 'ATAC is a proprietary data-flow testing tool developed at Bell Com-

reliability using failure data obtained using block testing. We counter this argument in section 5.

3. DESCRIPTION OF OUR EXPERIMENT

We describe an experiment conducted to obtain failure data using the 3 testing methods in section 2. All parts of this ex- periment have been conducted with the Unix sort program or variations of this program obtained by injecting faults. We have defined two different sequences of specifications and their com- binations for functional testing. The testing so carried out is referred to as functional-testing-1 and functional-testing2 in the following discussion. For random testing, we started with 19 different seeds to get 19 different sequences of failure data: si, i = 1,2,. . . ,19. si, was obtained by executing sort on a total of Ni test cases. Testing was stopped after all the 9 faults in sort were removed.

Notation

s; Li s h Renumber the sequences such that kl L k2 5: ... L k19 to ob- tain the sequence S I , S2, ..., SI9. Then,

- - { t i l , ti2 ,..., < k } , 1 5 i 5 19 total number of failures required before all the 9 faults in sort are revealed sequence of the average failure intervals, used to ob- tain reliability estimates fault i , i E [1,9].

, . A A s = ( t l J2 I . . . , tk} ,

1 = max(mIk, 2 j ) .

The failure data obtained during testing are used to com- pute reliability estimates from the Musa basic execution-time model [5] as described in section 4. The parameters in the Musa model were estimated using the failure data. We used assump- tion #2, implying that even when the program failed on a test case, testing continued without repairing the fault.

Nine faults were introduced into the Unix sort program - table 1 enumerates them. The fault types are a sample from the most commonly occurring fault types as reported by other researchers. The sort program itself is about 1000 lines of ex- ecutable C code. The failure data were generated by executing the following sequence of steps.

1. Ten faulty versions of sort were constructed as follows:

sort-0: sort containing all 9 faults. sort-1: sort containing faults f,, .... f,. munications Research, Morristown, New Jersey. , ,

CHEN ET A L EFFECT OF TESTING TECHNIQUES ON SOITWARE-RELIABILITY ESTIMATES

~

99

TABLE 1 List of Faults Introduced in the sort Program

Fault Line Fault Original statement Replaced by

1 76 f, fold[240]=0132 fold[250] =0172 2 145 f2 dict[192]=1 dict[192]=O 3 345 f3 qsort((char”)lspace, lp); Deleted 4 453 f4 lf(j > 0) if (i < 0) 5 812 fs tabchar = * + + s ; tabchar = *s+ +; 6 634 fs sa = -sa; Deleted 7 704 h *pa=’\ n’? fields[O].rflg *pa=’\ n’?-fields[O].rflg

8 817 fs p->rflg = -1 p->rflg = 0 9 891 fs if (c<O) if (c<O)

*pb=’\ n’?-fields[O].rflg *pb=’\ n’? fields[O].rflg

sort-i (2 I i I 8): sort containing faultsfi, . . . , J.l,j+ 1, . . . ,fs. sort-9: sort containing faultsfi, . . . , fs.

2. Repeat this step for each of the 3 testing methods until all faults are found. 4

a. [Initialization]. Initialize fuultsfiund to 0. b. [Test-case construction]. Construct a test case C. This is

done manually for functional & block testing. It is done by a pro- gram for random testing. In any case, output correctness is checked automatically by an oracle.

c. [Program execution]. Determine if test case C causes at least one fault to be revealed. Execute sort and sort-0 against C. If the output of sort-0 is correct then C is a success4. Repeat step 2b. If the output of sort-0 is incorrect then a failure has occurred. Record the total execution time between the previous and this failure.

d. [Fault removal]. Simulate imperfect debugging. Execute sort-1 - sort-9 on Cconstructed in step 2b. The output of each sort-i is examined in sequence, starting with i = 1. The first correct sort- i, i E [ 1,9], implies thatA is the fault responsible for the failure. In this case,J is considered removed. Replace sort-0 by sort-i, in- crementfuultsfound by 1 and go to step 2e.

If none of the outputs from the 9 versions of sort is correct then assume that the fault is not corrected and repeat the procedure from step 2b.

e. [Check for termination of the experiment]. IffuultsJound < 9, then repeat the procedure from step 2b; otherwise this pro-

4

Step 2d implements the imperfect debugging assumption #2 by examining the outputs of all the variants of the sort program. Since sort-0 contains all 9 faults, correct behavior of this program implies that C cannot reveal any of these faults. However, if sort-0 fails then we need to determine which fault, or a combination thereof, results in failure. To do so, we execute the remaining 9 versions of sort. If exactly 1 of these variants, say sort-i, generates correct output on C, we conclude that: a) the fault which caused sort to fail isJ, and b) the fault has been discovered and corrected.

cedure terminates for one testing method.

4The correctness is determined with respect to the Unix Sort program.

If none of the 9 versions of sort generates correct output, then we conclude that this failure of sort on C is due to a combination of the 9 faults and not due to any single fault. In this case no fault is corrected.

The above method is just one of several for implementing the imperfect debugging assumption. Other methods can be devised and investigated. However, it is not clear how imperfect debugg- ing actually works in practice.

4. ANALYSIS OF FAILURE DATA

Data collected from the above experiments were used for estimating reliability. All experiments were conducted on a Sun Sparc machine. An analysis of the data and the corresponding reliability estimates is given here. Figure 1 shows the growth of cumulative failures as the execution time increases. The plots label- ed functionaltestingl and functionalAesting2 are from failure data obtained using two different sets of functional test data generated as described in section 3.

Cumulative failures

t *’” r ....---

I 0.00 10.00 20.00 30.00 40.00 50.00

c Execution time

Figure 1. Cumulative Failures vs Execution Time for Functional, Random, and Block Testing

We observe the following from the 4 plots in figure 1:

1. Block testing converges faster than functional or random testing; ie, the total execution time required to find & fix d 9 faults is less for block testing than for random or functional testing.

2. There is a noticeable difference in the behavior of failure data between the two versions of functional testing. It takes 48 time units for functionaltestingl to find all failures as against only 28 time units for functionaLtesting_l .

3. Random testing requires the largest amount of execution time (50 time units) and also the largest number of cumulative

4 failures (24) to find & fix all 9 faults.

I O 0 IEEE TRANSACTIONS ON IELIA~3IL~ITY. VOL. 44. NO. I , 1995 MARCH

These failure data were used to compute reliability estimates from the Musa basic execution-time model. Reliability estimates were obtained:

after the failure k , and after t units of execution time,

to compare the performance of different testing methods. We selected k = 10 and r = 12.3 because they were the smallest values at which B . 4 > 0.

Exposure time refers to the total execution time expended since the software was installed in its intended operational environment.

Notation

f r , f, [LCB, UCB] for reliability estimates from functional-testing2

rl, r, [LCB, UCB] for reliability estimates from random testing.

4.1 Reliability After a Given Number of Failures

Reliability A , functionjl_tfsting-2

0 5 0

0.40

0.3(1

0.20

O.I(l

0.00

Figure 2.

2.00 4.00 6.00 8 00 10 00 - Expowre time

Reliability Estimates After Failure #IO [For functional, random, and block testing using the Musa basic execution-time model. Dotted lines repre- sent 95% LCB & UCB]

Figure 2 shows a set of reliability estimates, with their LCB & UCB, as a function of increasing exposure time after failure #lo. At this point the number of remaining faults is 6, 1, 7, 1, respectively, for random, functional-testinggl , functionaLtesting2, and block testing. Since block testing pro- duces the most conservative estimates, we compare the other three estimates to those obtained using block testing. The

overlapping region indicates that functionalLtesting2 produces results comparable to results from random testing. As an- ticipated, all the estimates decline with increasing exposure time. However, the estimates differ noticeably. For example, the ratio:

Rr/Rh increases from 1.40 to 10.53 as the exposure time in- creases from 1 to 10 units. This implies that random testing provides increasingly optimistic reliability estimates with in- creasing exposure time. Rfl /Rh increases from 1.08 to 2.03 as exposure time in- creases from 1 to 10 units.

4 RR/Rb increases from 1.41 to 11.36.

These ratios indicate that functional testing provides increas- ingly optimistic estimates.

4.2 Reliability After a Given Execution Time

Reliability Random testing- ! ~

0.90

0.80

0.70

0.M)

0.50

0.40 2

0.30 ~.

0.10- I - d I I , 1 -,. I 1 I I I

200 4.00 6.00 8.00 1000 - Exposure time

Figure 3. Reliability Estimates After 12.3 Units of Execution Time [For functional, random, and block testing using the Musa basic execution-time model. Dotted lines repre- sent 95% LCB & UCB]

Reliability estimates were obtained from the available data after 12.3 units of execution time; the number of remaining faults were 6 , 2 , 6, 0 , respectively for random, functional-testing1 , functional-testingx, and block testing. Figure 3 shows plots of the estimates as a function of exposwe time; they are similar to the ones in figure 2.

The ratio Rr/Rh varies from 1.31 to 5.84 as the exposure time increases from 1 to 10 units. This has the same implica- tion for random testing as mentioned earlier.

CHEN ET AI,: EFFECT OF TESTING TECHNIQUES ON SOITWARE-RELIABILITY ESTIMAI‘ES 101

The ratio Rfl/Rb varies from 0.97 to 0.39. This implies that block testing provides an optimistic estimate as compared to functionaLtesting9. This appears to be contrary to the rela- tionship between the two estimates obtained after failure #lo. However, the difference in the two relationships can be ex- plained by examining the plots of block & functional- testing2 in figure 1.

From these plots we notice that:

1. The cumulative failure growth of the two methods is similar up to 11.9 units of execution time. The two plots diverge when the execution time increases beyond 11.9 units. This is because block testing has revealed all 9 faults by the first 1 1.9 units and functionalLtesting9 has only revealed 3 faults in the same time.

2 . The cumulative number of failures found after 12.3 units of execution time is less for block testing (1 1 failures) than for functional-testing-2 (12 failures). 4

These observations explain the close-to-1 ratios of the two reliability estimates for low values of exposure time. The ratio R / R varies from 1.08 to 2.43, and indicates optimistic reliability estimates from functional-testing-1 and are similar to the ratios obtained after failure #lo.

fl. b

4.3 Effect of Test-Case Sequencing

We also conducted three variants of the experiment in sec- tion 3, whose goal was to study the questions:

1. How does failure history vary with the execution se- quence of the fault finding versions of sort?

2. Is there any difference in the reliability estimates if, for functional testing, the same test cases are executed in different order?

3. Is there any difference in the reliability estimates if, for 4

The results obtained from our experiments on these ques- tions are plotted in figures 4 - 6 in appendix A. Based on these data we conclude that for the sort program:

block testing, the coverage sequence is varied?

The program’s failure behavior is not sensitive to the order in which the fault-finding versions of the sort program are executed. The order of test cases or coverage sequences is important, and affects the program’s failure behavior. It is consistent with our conclusion for comparing the behaviors of functional-testing-1 and functionaLtesting9.

5. DISCUSSION

Our results indicate that random testing is likely to lead to optimistic (higher) reliability estimates when compared with the estimates obtained using block or functional testing. This happens primarily because, in our experiments, random testing finds faults slowly, viz, after a larger execution time, compared to the other methods. This observation could be dismissed as

unimportant in the light of the oft made recommendation that testing be carried out using the operational profile of the pro- gram under test. When this is done, the failures that occur, and the faults that are removed, are the ones that mutter. Thus, in experiments like the one reported here, where we remove all faults, random testing is being used in an impractical manner. We have the following arguments to offer in support of the claim that our experimental observations and conclusions are indepen- dent of the operational profile.

1. The set of inputs that we used in our experiments can be considered to be drawn from an operational profile. The en- tire options set to sort was constrained by assigning arbitrary probability values to each option. For this operational profile, random testing leads to an optimistic reliability estimate.

2. An operational profile constrains the input domain. It is considered important by reliability researchers & testers alike for two reasons:

a. it helps in reducing the number of test cases that one needs to examine, and

b. it enables the tester to test only those parts of the pro- gram that are most likely to be exercised in practice and hence

4

The problem with the use of an operational profile, however, is that it is difficult or impossible to obtain. Further, upgrades to software might invalidate any existing estimate of the opera- tional profile because the new features can change the ways in which the software is used. Any inaccuracy in the operational profile directly affects the reliability estimate, most likely resulting in optimistic (too high) estimates when random testing is used.

One might argue that while using functional & block testing in the experiments described in sections 2 - 4, testing was not done using an operational profile and hence reliability estima- tion models, such as Musa [ 5 ] , cannot be used. A counter- argument is that the test sequence that was generated in each case could have been a sample from an operational profile. Hence, in each case we have satisfied the assumption that testing use an operational profile. As a consequence one might argue further that because the operational profiles were different in each of our three testing methods, the reliability estimates are anticipated to be different. On the contrary, we contend that large variations in the reliability estimates for different opera- tional profiles, as shown in our experiments, can be severely detrimental to the marketability & usability of a software product.

Our experiments show that the testing method can affect the reliability estimates. A testing method that leads to high reliability need not necessarily lead to a program with fewer faults. These observations, and the possibility that a program might be used in widely differing user groups, lead us to sug- gest that reliability modeling should strive towards estimation procedures that are relatively insensitive to variations in opera- tional profile. We believe that this goal could be achieved by accounting for various structural coverages in reliability estima- tion models.

lead to an estimate of the operational reliability.

102

APPENDIX A

Supplementary Experiments

IEEE TRANSACTIONS ON RELIABILITY, VOL. 44, NO. 1, 1995 MARCH

Functional,Testing,l . The differences between fixed-order and random-order sort-programs are small, but the differences be- tween fixed-order and random-order options are much larger; see figure 5.

This appendix describes the variants of the experiments in section 3.

A. 1 Objectives

Experiment 1. Determine the effect of permuting the execution order of fault-finding versions of the sort program.

Experiment 2. Determine the effect of reordering test cases for functional-testing- 1 . Experiment 3 . Determine the effect of different coverage se- quences of block testing.

A .2 Descriptions of Experiments

Experiment 1 . After each failure, the fault-finding versions of sort programs are randomly re-ordered.

Experiment 2 . The test cases in the original experiment are ran- domly re-ordered.

Experiment 3. Two test-case sequences are used in order to ob- tain different sequences of blocks covered.

A.3 Observations

Random Testing. The mean values of failure data for both ex- periments are almost identical in the first 32 time units. The difference between the results of the two experiments is that the one with random-order sort-programs revealed all faults earlier than the one with fixed order; see figure 4.

Cumulative failures 25.00

20.00

1 15.00

10.00

5.00-

0 .00

RANDOM TESTING

D 10.00 20.00 30.00 40.00 50.00

Execution time

Figure 4. Random Testing with Fixed & Random Order Sort- Programs

Cumulative failures ,

Random ordr opfons 1 25.00

Z0.W

Execution time

Figure 5. Functional Testing with Fixed & Random Order Functions and Sort-Programs

Block Testing. The results from fixed-order and random-order sort-programs are identical, but there are differences between two different coverage sequences; see figure 6.

Cumulative failures

m o a

12.0a

I 1O.OD

8.00-

6.00-

4.00-

2.00-

0.00.

I

BLOCK TESTING 1 Block overagesequ nce2 I IC .$

1 I I

I 10.00 20.00 30.00 40.00

Execution time

Figure 6. Block Testing with 2 Coverage Sequences and With Fixed & Random Order Sort-Programs

CHEN ET AL: EFFECT O F TESTING TECHNIQUES ON SOFTWARE-RELIABILITY ESTIMATES 103

APPENDIX B

Algorithm for Random Testing on the Unix Sort Program

sort has the options:

m, U, c, b, d, J i, n, r , t , 0, T.

Trans. Reliability, vol R-33, 1984 Apr, pp 48-55. [8] S. Yamada, S. Osaki, “Optimal software release policies with

simultaneous cost and reliability requirements”, European J. Operationul Research, vol 31, 1987, pp 46-51.

AUTHORS

Dr. Mei-Hwa Chen; Dept. of Computer Sciences; SUNY; Albany, New York 12222 USA. Internet (e-mail): [email protected]

Mei-Hwa Chen received the BS (1983) in Mathematics from National Taiwan Normal University, Taipei, the MS (1987) in Applied Mathematics and PhD (1994) in Computer Science from Purdue University. She has been a resear- cher at the Software Engineering Research Center site at Purdue University where her research has focused on tools & techniques for testing-based soft- ware reliability estimation. Her research interests are software verification, testing & debugging, software reliability modeling, and user interfaces. Dr. Chen is a member of the Association for Computing Machinery and the IEEE Computer Society.

Selection of options for sort

Step1 . Total number of options for each test case is a random number r, r E [ 0 , 9 ] .

Step2. NumberOfOptions = 0 While (NumberOfOptions < r) do {select a random number r‘;

i f r ’ < 0.2 then {add the next option listed above to the current test case; NumberOfOptions = NumberOfOptions + 1 ; > >

The file to be sorted by sort contains:

Dr. Aditya P. Mathur; Dept. of Computer Sciences; Purdue Univ; West Lafayette, Indiana 47907-1398 USA. Internet (e-mail): [email protected]

Aditya Mathur is an Associate Professor in the Department of Com- io4 lines; a random number, c E [0 ,80] , of characterdline; each character is a randomly selected ASCII character.

puter Science at Purdue University. He has been an active researcher with the NSF sponsored Software Engineering Research Center site at Purdue Univer- sity. He received his BS (1970), MS (1972), and PhD (1977) in Electrical Engineering from the Birla Institute of Technology and Science, Pilani. He has published two books and over 60 research articles. His main interests lie in of software testing and reliability, computing with sound, and music composi- tion. He is a member of the IEEE and the Association of Computing Machinery. REFERENCES

[l] B. Beizer, Software Testing Techniques, 1990; Van Nostrand. [2] J.W. Duran, S.C. Ntafos, “An evaluation of random testing”, ZEEE

Trans. Softwure Engineering, vol SE-10, 1984 Jan, pp 438-444. [3] W.E. Howden, “Functional testing”, ZEEE Trans. Sofrware Engineer-

ing, vol SE-6, 1980 Mar, pp 162-169. [4] Y.K. Malaiya, P.K. Srimani, Software Reliability Models: Theoretical

Developments, Evaluation & Applications, 1990; IEEE Computer Society Press. J.D. Musa, A. Iannino, K. Okumoto, Sofhyare Reliability: Measurement, Prediction, Application, 1987; McGraw-Hill. C.V. Ramamoorthy, F.B. Bastani, “Software reliability - Status and perspectives”, IEEE Trans. Softwure Engineering, vol SE-8, 1982 Jul,

M.L. Shooman, “Software reliability: An historical perspective”, ZEEE

[5]

[6]

pp 354-371. [7]

Dr. Vernon Rego; Dept. of Computer Sciences; Purdue Univ; West Lafayette, Indiana 47907-1398 USA. Internet (e-mail) : rego@cs . purdue . edu

Vernon Rego is an Associate Professor of Computer Sciences at Purdue University. His research interests include probabilistic system modeling, parallel software systems and simulation, and software engineering. He was codeveloper of the EcliPSe toolkit for parallel simulation which he used in a large-scale simulation that was awarded the 1992 Gordon Bell Supercomputing Prize. He also received a 1988 DFG award from Germany for research in computer net- works. He has authored over 70 papers in refereed journals & conferences. He is a member of IEEE and the Society for Computer Simulation.

Manuscript received 1994 April 25.

IEEE Log Number 94-10821 4 T R b