Systematic reliability analysis of a class of application...

13
Systematic Reliability Analysis of a Class of Application-Specific Embedded Software Frameworks Sung Kim, Farokh B. Bastani, Member, IEEE, I-Ling Yen, Member, IEEE, and Ing-Ray Chen, Member, IEEE Abstract—Dramatic advances in computer and communication technologies have made it economically feasible to extend the use of embedded computer systems to more and more critical applications. At the same time, these embedded computer systems are becoming more complex and distributed. As the bulk of the complex application-specific logic of these systems is realized by software, the need for certifying software systems has grown substantially. While relatively mature techniques exist for certifying hardware systems, methods of rigorously certifying software systems are still being actively researched. Possible certification methods for embedded software systems range from formal verification to statistical testing. These methods have different strengths and weaknesses and can be used to complement each other. One potentially useful approach is to decompose the specification into distinct aspects that can be independently certified using the method that is most effective for it. Even though substantial research has been carried out to reduce the complexity of the software system through decomposition, one major hurdle is the need to certify the overall system on the basis of the aspect properties. One way to address this issue is to focus on architectures in which the aspects are relatively independent of each other. However, complex embedded systems are typically comprised of multiple architectures. In this paper, we present an alternative approach based on the use of application-oriented frameworks for implementing embedded systems. We show that it is possible to design such frameworks for embedded applications and derive expressions for determining the system reliability from the reliabilities of the framework and the aspects. The method is illustrated using a distributed multimedia collaboration system. Index Terms—Distributed embedded systems, software composition, application-oriented frameworks, software reliability assessment. æ 1 INTRODUCTION D RAMATIC advances in computer and communication technologies have made it economically feasible to extend the use of embedded computer systems to more and more critical applications, such as process-control systems, manufacturing systems, avionics, railway systems, etc. At the same time, these embedded computer systems are becoming more complex and distributed. A distributed embedded system consists of an array of sensors, actuators, displays, and control logic. The sensors acquire information regarding the state of the system and the environment and send these to the control logic components and display (monitoring) stations. The control components, typically realized in software, embody all the intelligence in the system. They perform control-related computations driven by the specified control goals and then send commands to the actuators to cause desired transitions in the state space. Distributed embedded systems are typically used in early warning systems, distributed power management systems, traffic monitoring and control systems, defense command- and-control systems, and other emerging applications. For these real-time critical applications, it is necessary to be able to rigorously certify the quality (reliability, safety, perfor- mance) of the system. While relatively mature techniques exist for certifying hardware systems, including the use of “overdesign” techniques for dealing with worst-case situations, methods of rigorously certifying software systems are still being actively researched. In addition, most of the hardware system failures are caused by some random process (i.e., heat, physical wear and tear, etc.) in the normal course of hardware usage whereas software failures are due to design flaws. Therefore, certifying a software system becomes a very difficult task. Furthermore, as the bulk of the complex system application logic is realized by software, one must be able to deal with extremely large and complex programs. Com- pounding this difficult situation even further is the fact that these systems are typically long-lived systems that must be continually updated in the field to enhance their function- ality. This further exacerbates the complexity of these systems and requires frequent expensive recertification. Possible certification methods for embedded software systems range from formal verification to statistical testing. Verification is especially effective for certifying logical properties of the system, i.e., properties that either hold or do not hold. Logical properties are generally application- specific, but also include more general properties such as the absence of deadlocks, absence of race conditions, 218 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004 . S. Kim, F.B. Bastani, and I.-L. Yen are with the Computer Science Department, University of Texas at Dallas, MS EC-31, Box 830688, Richardson, TX 75038-0688. E-mail: {sungkim, bastani, ilyen}@utdallas.edu. . I.-R. Chen is with the Department of Computer Science, Virginia Polytechnic Institute and State University, 7054 Haycock Road, Falls Church, VA 22043. E-mail: [email protected]. Manuscript received 2 Sept. 2003; revised 4 Feb. 2004; accepted 4 Feb. 2004. Recommended for acceptance by J. Bechta Dugan. For information on obtaining reprints of this article, please send e-mail to: [email protected], and reference IEEECS Log Number TSE-0132-0903. 0098-5589/04/$20.00 ß 2004 IEEE Published by the IEEE Computer Society

Transcript of Systematic reliability analysis of a class of application...

Page 1: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

Systematic Reliability Analysis of a Class ofApplication-Specific Embedded

Software FrameworksSung Kim, Farokh B. Bastani, Member, IEEE, I-Ling Yen, Member, IEEE, and

Ing-Ray Chen, Member, IEEE

Abstract—Dramatic advances in computer and communication technologies have made it economically feasible to extend the use of

embedded computer systems to more and more critical applications. At the same time, these embedded computer systems are

becoming more complex and distributed. As the bulk of the complex application-specific logic of these systems is realized by software,

the need for certifying software systems has grown substantially. While relatively mature techniques exist for certifying hardware

systems, methods of rigorously certifying software systems are still being actively researched. Possible certification methods for

embedded software systems range from formal verification to statistical testing. These methods have different strengths and

weaknesses and can be used to complement each other. One potentially useful approach is to decompose the specification into

distinct aspects that can be independently certified using the method that is most effective for it. Even though substantial research has

been carried out to reduce the complexity of the software system through decomposition, one major hurdle is the need to certify the

overall system on the basis of the aspect properties. One way to address this issue is to focus on architectures in which the aspects are

relatively independent of each other. However, complex embedded systems are typically comprised of multiple architectures. In this

paper, we present an alternative approach based on the use of application-oriented frameworks for implementing embedded systems.

We show that it is possible to design such frameworks for embedded applications and derive expressions for determining the system

reliability from the reliabilities of the framework and the aspects. The method is illustrated using a distributed multimedia collaboration

system.

Index Terms—Distributed embedded systems, software composition, application-oriented frameworks, software reliability

assessment.

1 INTRODUCTION

DRAMATIC advances in computer and communicationtechnologies have made it economically feasible to

extend the use of embedded computer systems to more andmore critical applications, such as process-control systems,manufacturing systems, avionics, railway systems, etc. Atthe same time, these embedded computer systems arebecoming more complex and distributed. A distributedembedded system consists of an array of sensors, actuators,displays, and control logic. The sensors acquire informationregarding the state of the system and the environment andsend these to the control logic components and display(monitoring) stations. The control components, typicallyrealized in software, embody all the intelligence in thesystem. They perform control-related computations drivenby the specified control goals and then send commands tothe actuators to cause desired transitions in the state space.Distributed embedded systems are typically used in earlywarning systems, distributed power management systems,

traffic monitoring and control systems, defense command-and-control systems, and other emerging applications. Forthese real-time critical applications, it is necessary to be ableto rigorously certify the quality (reliability, safety, perfor-mance) of the system.

While relatively mature techniques exist for certifyinghardware systems, including the use of “overdesign”techniques for dealing with worst-case situations, methodsof rigorously certifying software systems are still beingactively researched. In addition,most of the hardware systemfailures are caused by some random process (i.e., heat,physicalwear and tear, etc.) in the normal course of hardwareusage whereas software failures are due to design flaws.Therefore, certifying a software system becomes a verydifficult task. Furthermore, as the bulk of the complex systemapplication logic is realized by software, one must be able todeal with extremely large and complex programs. Com-pounding this difficult situation even further is the fact thatthese systems are typically long-lived systems that must becontinually updated in the field to enhance their function-ality. This further exacerbates the complexity of these systemsand requires frequent expensive recertification.

Possible certification methods for embedded softwaresystems range from formal verification to statistical testing.Verification is especially effective for certifying logicalproperties of the system, i.e., properties that either hold ordo not hold. Logical properties are generally application-specific, but also include more general properties such asthe absence of deadlocks, absence of race conditions,

218 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004

. S. Kim, F.B. Bastani, and I.-L. Yen are with the Computer ScienceDepartment, University of Texas at Dallas, MS EC-31, Box 830688,Richardson, TX 75038-0688.E-mail: {sungkim, bastani, ilyen}@utdallas.edu.

. I.-R. Chen is with the Department of Computer Science, VirginiaPolytechnic Institute and State University, 7054 Haycock Road, FallsChurch, VA 22043. E-mail: [email protected].

Manuscript received 2 Sept. 2003; revised 4 Feb. 2004; accepted 4 Feb. 2004.Recommended for acceptance by J. Bechta Dugan.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TSE-0132-0903.

0098-5589/04/$20.00 � 2004 IEEE Published by the IEEE Computer Society

Page 2: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

assurance that exceptions will not be generated, etc.Verification is less effective for quantitative properties thatrequire domain-specific analysis, such as the averageperformance, probability of security violation, degree offault coverage, etc. For these properties, statistical testingand quantitative modeling and analysis techniques aremore practical.

It is clear that neither formal verification nor quantitativeanalysis and testing are by themselves adequate forcertifying complex embedded software systems. Thesemethods have different strengths and weaknesses and canbe used to complement each other. One potentially usefulapproach is to decompose the specification into distinctaspects that can be independently certified using themethod that is most effective for it. This approach isespecially attractive if each aspect corresponds to a disjointportion of the program since, then, a large verification ortesting problem is reduced to a set of substantially simplerverification or testing problems. It also enables the use ofverification or testing on a per aspect basis rather than usingthe same method across the entire program.

One of the earliest works related to software decomposi-tion is [33] where a requirements specification is decom-posed into multiple views, each of which captures somebehavior of the system. The concept of multiple views hasalso been used in StateCharts [18], separation using rely-guarantee assertions [24], behavioral inheritance [2], andAspect-Oriented Programming [23]. One major hurdle,however, is the need to certify the overall system on thebasis of the aspect properties [15], [26], [27], [28], [31]. Eventhough these decompositions reduce the complexity of thesystem, two different views are not necessarily indepen-dent. This complicates the reliability analysis. For example,if the reliabilities of components f and g are 1.0 and 0.9999,respectively, then the reliability of a system consisting of fand g can range from 1.0 to 0.0 depending on how f and ginteract. This uncertainty means that considerable effort hasto be expended to reason about global properties of thesystem, rather than by simple deductions from thecomponent properties, especially when there are hierarch-ical or circular dependencies among the aspects. In suchcases, system certification still requires the assessment ofthe reliability of one complex monolithic program.

One way around this problem is to focus on architecturesin which the aspects are relatively independent of eachother, as in the shared repository architecture, pipes-and-filters architecture, and their variations [7]. However,complex embedded systems are typically comprised ofmultiple architectures. Hence, each system needs detailedindividual analysis which has to be repeated after eachupgrade to the system.

In this paper, we explore an alternative approach basedon the use of application-oriented frameworks for imple-menting embedded systems. Each framework is designed tohave a stable portion that provides scheduling andtransport functionalities and mechanisms for coping withsecurity threats and component failures. The frameworksupports “plug-in” aspects that can be added, upgraded, orremoved dynamically without having to stop the system.Each plug-in aspect must be designed to be certifiableindependently of other aspects or the framework. Theframework is classified according to how different classes ofapplications are composed to make up the framework. Theclass includes applications that can be decomposed intoIndependently-Developable End-user Assessable Logical(IDEAL) aspects. Hence, we show that it is possible to

design such frameworks for embedded applications andderive expressions for determining the system reliabilityfrom the reliabilities of the framework and the aspects. Theapproach is illustrated using a detailed example involvingdistributed multimedia collaboration.

The rest of this paper is organized as follows: Section 2presents a model of a class of application-oriented frame-works that enable aspects to be dynamically plugged intothe framework. Sections 3, 4, and 5 present the reliabilityassessments of systems that are composed from differentclasses of aspects. Section 6 applies the approach to a casestudy of a distributed multimedia collaboration system.Section 7 reviews the related work while Section 8summarizes the paper and outlines some future researchdirections.

2 SYSTEM MODEL

We consider a framework consisting of two sets ofindependent aspects and one composition component,fS;A; Cg, where S denotes the set of fixed (static) frame-work aspects, A denotes the set of “plug-and-play”application aspects, and C denotes the composition com-ponent that ties everything together. The set S can containaspects that address functional as well as nonfunctionalrequirements. These include user interface aspects, com-munication aspects, scheduling/security/fault-toleranceaspects, etc. Essentially, the static framework aspectsprovide “services” that may enhance the framework’scomputing platform. Let S ¼ fs1; s2; . . . ; snS

g, where s1, s2,. . . , snS

represent the static framework aspects. A require-ment imposed on these aspects is that each one should beindependent of the other aspects. That is, aspect si does notinvoke or depend on any other aspect sj during itsexecution and its correctness can be certified independentlyof other aspects. However, an aspect may coordinate theexecution of other aspects and its output can be visible tothe composition component.

The setA ¼ fa1; a2; . . . ; anAg contains application-specific

plug-in aspects that can be dynamically added to orremoved from the framework. Each aspect must bedesigned to operate independently given its inputs and itmust also be possible to certify it independently of theframework or other aspects. Generally, these aspectsimplement specific features or functional aspects of theapplication.

Whether it is a static framework aspect or an applicationspecific plug-in aspect, an aspect can be separated fromother aspects depending on how it accesses the shared statespace. The notion of “access” here includes read and writeoperations (i.e., an operation that affects the shared statespace). Aspects can affect different regions of the sharedstate space or may affect at least one common point in theshared state space. For the cases where there is a clearseparation in the regions they affect, the separation mayoccur statically (i.e., deterministically) or dynamically. If theaccesses can be known statically (i.e., at compile-time), theseparation is denoted as compile-time separation; however,if the accesses are dynamic in nature (i.e., the access point inthe state space is determined at runtime), the separation isdenoted as runtime separation. In both cases, aspects canaffect the shared state space at the same time or at differenttimes. Fig. 1 shows the classification of such separation.

. Compile-Time Spatially Separable.

. Compile-Time Temporally Separable.

KIM ET AL.: SYSTEMATIC RELIABILITY ANALYSIS OF A CLASS OF APPLICATION-SPECIFIC EMBEDDED SOFTWARE FRAMEWORKS 219

Page 3: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

. Runtime Spatially Separable.

. Runtime Temporally Separable.

. Spatially Inseparable and Temporally Inseparable.

A given framework can incorporate a combination of theabove aspects. For example, consider a simple frameworkfor a VoIP communication management system. The frame-work consists of a slot for data capture on the transmissionside and a slot for data presentation on the receiving side. Itallows a varying number of components to be added ateither end. Potential aspects include error handler, eventreporting, time manager, call record keeper, resourcechecker as well as complementary pairs of components oneach side, such as compression/decompression and en-cryption/decryption components. On the transmission side,there is also a component for data integrity management(i.e., connection management) that assembles packets usingthe output of the pipeline stage as well as checksumcomputation, sequence number assignment, to/from

address headers, and data descriptors (e.g., length andformat). The sequence number and data to be transmittedare determined based on the fusion of several aspects,including the transmission window size, elapsed time,pending acknowledgments, etc. On the receiving side, asimilar component (i.e., connection management) is used toparse the message and divide it into the checksum, sequencenumber, to/from address, and data portions. The checksumcomponent checks whether any failures have occurred. Thesequencing component checks whether the packet wasalready received. The acknowledgment component pre-pares and sends an acknowledgment if needed. If the packetpasses the checks, it is then passed to the inverses of thecomponents that are responsible for decryption anddecompression as well as appropriate handler componentsand, finally, to the presentation component. Fig. 2 shows theUML package diagram of a simple framework for VoIPcommunication.

220 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004

Fig. 1. Classification of accesses to a shared state space.

Fig. 2. Specification of application-specific plug-and-play framework using UML package diagram.

Page 4: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

The components in the above VoIP communicationmanagement framework can be classified into differentaspects as mentioned earlier. For example, the errorhandler, the various input handlers (i.e., Event Reporting,Time Manager, and Call Record Keeper), and the resourcechecker are compile-time temporally separable aspects asthey can be positioned in a pipes-and-filters fashion atcompile time. In addition, the paired components are alsocompile-time temporally separable aspects for the samereason. The only difference is that the former is the propertypreserving aspects and the latter is the property restoringtype. However, the various input handlers are compile-timespatially separable aspects as each looks at different regionsof the data passed. Therefore, Event Reporting, TimeManager, and Call Record Keeper can execute in parallelwithout affecting each other.

Considering the above example of the VoIP communica-tion management system containing the set of aspects datacapture, error handler, event reporting, time manager, callrecord keeper, resource checker, compression, encryption,transmission on the sending side, and receiving component,decryption, decompression, resource checker, call recordkeeper, time manager, event reporting, error handler, datapresentation on the receiving side, the reliability of theseaspects can be assessed independently:

1. Data capture and presentation. These aspects can betested and evaluated as one unit. The presentationroutine can be analyzed independently by usingstandard preassembled data, but it is very difficult totest the data capture module independently. Also,these aspects involve quantitative quality assertions,such as fidelity of the data capture or presentation.Hence, data capture and presentation are easier totest and analyze as one unit.

2. Error handler, event reporting, time manager, call recordkeeper, and resource checker aspects. These can be testedindependently using standard test patterns or inconjunction with the presentation component.Again, these components involve quantitative qual-ity assertions that make them more amenable totesting and analysis.

3. Compression and decompression aspects. Assuminglossless compression, these components can betested easily as one unit. A quality measure is thedegree of compression achieved, which can bechecked via testing and analysis.

4. Encryption and decryption aspects. These componentscan also be tested as one unit. A key quality factor ishow difficult it is for a third party to decipher thedata. This is difficult to test, since it is impossible toenumerate all possible strategies that a third partymight use. However, the protection provided by theencryption can be analyzed in general terms.

5. Communication management. This is a part of theframework, but can be tested and evaluated inde-pendently. Issues such as congestion and packet lossneed to be analyzed using standard performanceanalysis techniques.

Component C denotes the composition component. Itreceives the outputs of the aspects in A as well as someaspects in S and assembles the final output of the system.The fixed part of the framework provides the frameworkservices to meet other nonfunctional requirements for theapplication. For example, hardware failures can be dealt

with by using a fault-tolerance service, coordination ofmultiple accesses to the system resources is provided bythe concurrency control service, finite memory constraints areaddressed by the garbage collection service, etc.

Each of these aspects is simpler and more focused thanthe overall framework and its code can be evaluatedindependently. The user plug-in aspects can be added,removed, or updated dynamically at runtime—the frame-work must include coordination code to ensure that correctmatching components are used on both the sender andreceiver sides in spite of these dynamic updates. Anotheradvantage is that, by observing the behavior of the system,it is possible to trace defects to the individual components.For example,

1. abnormal input error handling and ignored eventscan be traced to the error handling and eventreporting aspects,

2. abnormally large packet size may signal a problemwith the compression routine,

3. poor security will require changes to the encryp-tion/decryption routines, and

4. out of order packets implies a connection manage-ment problem.

This type of dynamic fault diagnosis is more difficult toachieve in monolithic systems where there can be manyinterdependencies among the components.

3 RELIABILITY ASSESSMENT OF COMPILE-TIME

SEPARABLE ASPECTS

3.1 Compile-Time Spatially Separable Systems

As mentioned above, the event reporting, time manager,and call record keeper aspects of the VoIP communicationmanagement system are compile-time spatially separableaspects (i.e., they affect different regions of the state space).In particular, two aspects, A and B, of the system, R, arecompile-time spatially separable if the system postcondi-tion, PCðSÞ, can be decomposed into P ðS0; S1Þ ^QðS0; S2Þ,where S1 \ S2 ¼ �, S0 � S1 � S2 ¼ S, and neither P nor Qmodifies S0. If A implements P and B implements Q, thenA and B together implement R. A and B have noninterfer-ing failure aspects if the system is designed to guaranteethat it is physically impossible for A to update S2 and for Bto update S1. It is also necessary in this as well as the othercases to ensure that one aspect does not “walk” overanother aspect, i.e., update the code or data of anotheraspect. For example, the time manager aspect records thecall time, and the call record keeper simply notes anysignificant call related events for future billing purposes;thus, neither of these aspects updates the code or data of theother aspect. Under these conditions, we have the followinglemma.

Lemma 1. For independent aspectsA andB—i.e., they access andmodify disjoint regions in the state space (i.e., S0 ¼ � )—theprobability of the correct outcome produced by A and B is theproduct of the probability thatA produces the correct output andthe probability that B produces the correct output.

Proof. For aspects A and B, assume that S0 ¼ � and let S1 ¼fa1; a2; � � � ; ang and S2 ¼ fb1; b2; � � � ; bmg denote the statespaces of A and B, respectivley. Let S0

1 ¼ fa01; a02; � � � ; a0jgand S0

2 ¼ fb01; b02; � � � ; b0kg be the sets of points for which AandB are correct, respectivley. Then, the probability of the

KIM ET AL.: SYSTEMATIC RELIABILITY ANALYSIS OF A CLASS OF APPLICATION-SPECIFIC EMBEDDED SOFTWARE FRAMEWORKS 221

Page 5: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

correct outcomeproducedbyA andB ¼ ðjS01j � jS0

2jÞ=ðn�mÞ since A and B are both correct for the combination ofany a0h where 1 � h � j and any b0i where 1 � i � k. Then,the probability of the correct outcome produced by A andB equals theprobability thatAproduces the correct output� the probability that B produces the correct output sincethe probability that A produces the correct output equaljS0

1j=n and the probability that B produces the correctoutput equal jS0

2j=m. tuLemma 2. For compile-time spatially separable aspects A and B

of R, Reliability of R equals reliability of A� Reliability of Bif A and B are independent (i.e., S0 ¼ �).

Proof. Since A and B are independent components, i.e.,S0 ¼ �, therefore, the system reliability, ReliabilitySystemðtÞ ¼ probability (system performs correctly in ½0; t�jsystem is up at time 0) ¼ probability (P ðS0; S1Þ ^ QðS0,S2Þ holds for every input satisfying preconditionðÞ ofsystem in ½0; t�j system is up at time 0). Since probabilityðA ^BjCÞ¼ probability ðAjCÞ � probability ðBjCÞwhenAandB are independent and using Lemma 1, we then haveRSystemðtÞ ¼ probability (P ðS0; S1Þ holds for every inputsatisfying preconditionðÞ of system in ½0; t�j system is up attime 0) � probability (QðS0; S2Þ holds for every inputsatisfying preconditionðÞ of system in ½0; t�j system is up attime 0) ¼ RAðtÞ �RBðtÞ. tu

Lemma 3. For compile-time spatially separable aspects A and Bof R, the reliability of R is bounded by the followingexpression:

maxf0; ReliabilityA þReliabilityB � 1g � ReliabilitySystem

� minfReliabilityA;ReliabilityBg

if the input spaces of A and B are not disjoint nordecomposable.

Proof. Input space is shared by both A and B and we do notknow exactly how the input is shared and accessed. Let nbe the total number of discrete points in the input space.Then, if ReliabilityA is the reliability of A, n �ReliabilityAis the number of inputs for which A is correct, and ð1�ReliabilityAÞn is the number of inputs for which A fails.Likewise, n �ReliabilityB is the number of inputs forwhich B is correct, and ð1�ReliabilityBÞn is the numberof inputs for which B fails. The maximum reliability ofthe system is obtained when the system works correctlyfor the maximum number of inputs. Since the systemworks correctly if and only if both A and B workcorrectly, the maximum reliability is generated whenthere is the largest overlap of the inputs on which both Aand B work correctly. With complete overlap, thenumber of inputs for which the system works correctlyis min (maximum number of inputs for which A workscorrectly, maximum number of inputs for which Bworkscorrectly). Therefore, the maximum reliability of thesystem is minðn � ReliabilityA=n; n �ReliabilityB=nÞ. Onthe other hand, the minimum reliability of the system isobtained when there is the smallest overlap of the inputson which both A and B work correctly. In the case wherethere is minimal overlap, the number of inputs for whichthe system works correctly is max(0, total number of

inputs—(number of inputs for which A fails + number ofinputs for which B fails)). Therefore, the minimumreliability of the system is maxð0; ðn� ð1�ReliabilityAÞn� ð1�ReliabilityBÞnÞ=nÞ which is maxð0, ReliabilityAþReliabilityB � 1Þ. tu

As seen in Lemmas 2 and 3, the input space for theaspects of the system may be overlapping or disjoint. Let usconsider the same example mentioned above where twoaspects A and B of the system access and modify the statespaces ðS0; S1Þ and ðS0; S2Þ, respectively, where S1 \ S2 ¼ �,S0 � S1 � S2 ¼ S, state space for the system, and neither Anor B modifies S0. Disjoint input space (i.e., S0 ¼ �) causesno complications; however, in the case where the inputspace is overlapping (i.e., S0 6¼ �), we do not know exactlywhat each aspect would do with a random input (i.e., fail orsucceed) since aspects can be developed by different peopleat different times. Therefore, we can only assume that thereis no correlated input failures among aspects for the giveninput. In addition, it is not possible in practice to know theexact behavior of each aspect since it is very difficult totabulate the input-output relationships for infinite inputspace for each aspect. With these assumptions, we cangenerally say that the reliability of the system is the productof the reliabilities of the aspects when aspects areindependently tested with sample points from the inputspace and we have the average reliabilities of the aspects.

3.2 Montage Composition

In general, the output of each aspect influences thecorrectness of the overall output in the system that isdecomposed into compile-time spatially separable aspects.This is especially true for a serial composition of compile-time spatially separable aspects. However, users may besatisfied with a partial output where the system output iscomposed of outputs from a subset of the aspects. This is aspecial case where the compile-time spatially separableaspects can exist simultaneously. In such a situation, theoutput of individual aspects are composed into the systemoutput via the montage composition.

In montage composition, the output of each componentis evident in the final system output which forms a specifiedmontage of the outputs of all the aspects. The aspects in thismodel may process fully overlapping, partially overlap-ping, or fully disjoint parts of the input set. Examplesinclude HTTP (Hypertext Transfer Protocol) and SIP(Session Initiation Protocol) message processing, GUI(Graphical User Interface), and visualization applications.For example, consider a Web page composed of differenttext, images, links, and mpeg clips. The Web page is amontage of these different components. Each output ofthese different components of the page is evident in theoverall display of the page.

LetRðtÞ ¼ fr1ðtÞ; r2ðtÞ; . . . ; rnAðtÞg represent the reliability

vector for the aspects. The montage composition makes thesuccess or failure of each aspect directly visible to the user.Further, the success or failure of aspect ai hasno impact on thesuccess or failure of other aspects aj, j 6¼ i. From the user’sperspective, the system does not necessarily fail the momentone aspect fails. Instead, the systemmay degrade in the sensethat the quality of the output becomes lower and lower asmore and more aspects fail. (Here, the “quality” of an aspectrefers to thevalueprovided to theuserby the correctoperationof that aspect. It is similar to a reward function. By definition,

222 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004

Page 6: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

the quality of an aspect that has failed is 0.) Let Q ¼fq1; q2; . . . ; qnA

g represent the quality vector for the aspects.LetCðQÞ represent the composition expression for the systemquality. Then, the average quality of the system output isCðQRðtÞÞ, where QRðtÞ ¼ fq1 � r1ðtÞ; q2 � r2ðtÞ; . . . ; qnA

�rnA

ðtÞg.As an example, consider a system that controls the

button/light display panel of an elevator. If each button/light is controlled by a separate aspect, then the quality andreliability of each button/light is distinct from those of theother buttons/lights. Hence, we have a montage composi-tion with CðQÞ ¼

PnA

i¼1 qi. Hence, the average qualityqðtÞ ¼

Pni¼1 riðtÞ � qi.

As another example, let us expand our VoIP system a bitfurther. In addition to the voice, the system can now offervideo display. Assume that the video display consists of twovideo streams (brightness and color frames). Also assumethat the color stream adds value to the display in addition tothe brightness, but the system works if we have at least theaudio stream and the brightness stream. The added colorcapability is a “nice to have” feature, and users are partiallysatisfied with just the voice and the brightness stream. Then,the montage composition expression is CðQÞ ¼ qb � qa�ð1þ qcÞ, where “b” denotes brightness, “a” denotes audio,and “c” denotes color. Hence, the average quality qðtÞ ¼ qb�rbðtÞ � qa � raðtÞ � ð1þ qc � rcðtÞÞ.

3.3 Compile-Time Temporally Separable Systems

The case considered in the previous section is conceptuallysimple. In every cycle, each aspect looks at its portion of thestate space, performs its computation, and updates itsportion of the state space. Thus, there is no direct functionaldependency between the aspects. The class of compile-timetemporally separable systems is a richer class than compile-time spatially separable systems. Since actions can beseparated in time, it is now possible to have functionaldependency among the aspects. However, the key propertyof this class is that temporal separation is achieved atcompile time which limits the architecture to a subset of thepipes-and-filters architecture. That is, aspect A processesthe input that is then forwarded to aspect B for furtherprocessing. (We denote this by A ! B.) In the VoIPcommunication management framework, the error handler,the various input handlers, resource checker, compression,encryption, and connection management aspects are com-pile-time temporally separable. For example, the output ofthe compression aspect (i.e., compressed data) is passeddown to the encryption aspect for further processing. Theerror handler may also “filter out” the captured data fromany further processing. In the following sections, weidentify two subclasses for which the system reliabilitycan be inferred from the aspect reliabilities.

3.3.1 Visible Intermediate Result

In this case, it is assumed that the output of A is visible tothe end-user and A is functional; thus, the specifications forA and B can be derived directly from the requirementsspecification. This is the case with compression andencryption aspects. The output of the compression aspect(i.e., compressed data) is visible to the end-user and thecompression aspect is functional. To estimate the reliabilityof A ! B from the individual reliabilities of A and B, it isnecessary to be able to accurately determine the operationalprofile of the input to B. This requires the specification of A

to define a function (rather than a relation). Under thecondition that the specification of A is functional, we have:

Lemma 4. The reliability of A ! B is equal to the reliability ofA� the reliability of B if A is functional.

Proof. Since the operational profile of A, OPA, is knownand A is functional, then OPB can be generated fromOPA and the function F that is A. That is,OPB ¼ F ðOPAÞ. Given the state space S, A is correctif preconditionA ) wpðA; postconditionAÞ, and B iscorrect if preconditionB ) wpðB; postconditionBÞ andpostconditionA ) preconditionB. Then, the correctnessof A and B can be estimated by sampling OPA andOPB and comparing the generated output withpostconditionA and postconditionB, respectively. Fromthis observation, we can conclude that we are dealingwith a system composed of two serial and independentaspects (A and B). Therefore, the reliability of thesystem (i.e., A ! B) = probability(A is correct in [0, t] ^B is correct in [0, t]) = probability(A is correct in [0, t])� probability(B is correct in [0, t]) = the reliability ofA� the reliability of B. tu

The restriction that A should be a functional componentcan be removed by having filters that establish somespecific property and, at the same time, preserve all otherproperties. We call such filters Property Preserving Filters.Suppose P ðSÞ is the precondition of the system and RðS00Þ isthe postcondition. Further, suppose that RðS00Þ can bedecomposed into two parts, namely, R1ðS00Þ ^R2ðS00Þ. Let Aestablish R1 while B preserves R1 and establishes R2. Forverification purposes, assume that the postcondition of A isQðS0Þ ^R1ðS0Þ which is also the precondition of B.

To assess the reliability of A ! B from the reliabilities ofA and B, it is necessary to know the operational profile of B.Since A establishes a partial result, it is not practical torequire the specification of A to be functional. Instead, werequire the implementation of A to be relational, that is, Agenerates all possible values that satisfy its requirements.This effectively makes the input to B invariant of the actualimplementation of A and, hence, the operational profile ofB can be determined from the specification of A. Under thiscondition, we have:

Lemma 5. The reliability of A ! B is equal to the reliability ofA� the reliability of B if A is relational.

Proof. The proof is similar to the proof of Lemma 4. Thedifference is that A is relational instead of functional.However, A generates all possible values that satisfy itsrequirements. This effectivelymakesA “functional” since,for a fixed input, different correct implementations of Amust produce the same output set. Therefore, B’soperational profile can be determined from the specifica-tion of A since the input to B is invariant of the actualimplementation of A. That is, given IA (the actual inputspace for A), A generates a set of possible outputs(A : IA ! 2IB ) that will be passed on to B. The inputselector for B, FB, generates the actual input for B from aset of possible inputs for B (FB : 2IB ! IB). Then,FB �A : IA ! IB. Therefore, we can generate B’s opera-tional profile from the specification Spec of A and thespecification of FB. Then, given the state space S, A is

KIM ET AL.: SYSTEMATIC RELIABILITY ANALYSIS OF A CLASS OF APPLICATION-SPECIFIC EMBEDDED SOFTWARE FRAMEWORKS 223

Page 7: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

correct if preconditionA ) wpðA; postconditionAÞ, andB iscorrect if preconditionB ) wpðB; postconditionBÞ. Then,the correctness of A and B can be estimated by samplingOPA and OPB and comparing the generated output withpostconditionA and postconditionB, respectively. Hence,we can conclude that we are dealing with a systemcomposed of two serial and independent aspects (A andB). Therefore, the reliability of the system (i.e., A ! B) =the reliability of A� the reliability of B. tu

All the aspects in property-preserving filters must workcorrectly for the system output to be correct. Hence, thereliability of the system is

QnA

i¼1 riðtÞ. The quality of the finaloutput is additive if the composition preserves the effect ofeach aspect on the data stream. In this case, the averagequality is given by ð

PnA

i¼1 qiÞ �QnA

i¼1 riðtÞ, where qi is theaverage quality of aspect ai.

Filtering composition, whether it is composed of prop-erty-preserving or property restoring filters, occurs in signalprocessing applications, text processing applications, andalso certain types of process control applications. In theVoIP communication management system, the error hand-ler “filters out” abnormal data from any further processing.Aspects in filtering composition are domain-specific and,hence, require testing or domain-specific analysis forcertification.

3.3.2 Nonvisible Intermediate Result

In this case, each aspect is implemented by a pair of filters,one (say, A) that satisfies a specific nonfunctional require-ment and another (say, A0) that restores the information, i.e.,A0ðAðIÞÞ ¼ I. We call such filters Property RestoringFilters. The verification of the system is similar to themethods in the previous case for the functional aspect of therequirements. This essentially means verifying thatA0ðAðIÞÞ ¼ I. However, nonfunctional aspects, such asperformance and security properties of the system, can bemore difficult to verify and are impossible to do so if theproperties are achieved only statistically. Statistical relia-bility assessment works well for this type of system. First,since the inverse of Amust exist, the specification of A mustbe functional. Hence, the operational profile at the output ofA can be determined accurately from the operational profileat the input of A and the specification of A. Second, the testoracle is obvious if A and A0 are tested as one unit. Thissubstantially reduces the cost of testing. Third, thenonfunctional aspects can be validated statistically as partof the reliability assessment procedure. Hence, we have:

Lemma 6. The reliability of A ! B = reliability of A�reliability of B.

Proof. (The proof follows the same line as Lemma 4 andLemma 5.) Since the inverse of A must exist, thespecification of A must be functional. Therefore, OPA

and SpecA can be used to determine OPB much like in theproof of Lemma 5. Then, given the state space S, A iscorrect if preconditionA ) wpðA; postconditionAÞ, andB iscorrect if preconditionB ) wpðB; postconditionBÞ. Then,the correctness of A and B can be estimated by samplingOPA and OPB and comparing the generated outputs withpostconditionA and postconditionB, respectively. Hence,we can conclude that we are dealing with a systemcomposed of two serial and independent aspects (A and

B). Therefore, the reliability of the system (i.e., A ! B) =the reliability of A� the reliability of B. tu

Since any faulty aspect in property-restoring filters cancorrupt the output, the reliability of this composition isQnA

i¼1 riðtÞ. Each aspect can be viewed as adding to thequality of the system. Hence, the overall system quality isðPnA

i¼1 qiÞ �QnA

i¼1 riðtÞ.Property restoring filters occur in communication sys-

tems. For example, as discussed in Section 2, for a VoIPcommunication management system, aspects pairs caninclude (capture, presentation), (compression, decompres-sion), (encryption, decryption), and (disassemble, reassem-ble). Each aspect must work correctly for the output of thesystem to be correct. However, each aspect achieves specificobjectives, such as reducing the bandwidth requirement,increasing the security, enhancing the fault tolerance, andso on. Aspects in property restoring filters are amenable torigorous formal verification with respect to their composi-tional correctness, i.e., verification that 8x; ai2ðai1ðxÞÞ ¼ x.Once this assertion has been verified for each aspect, thereliability of the system is 1.0 for all time t. Formalverification can be replaced by runtime checking at someextra cost. For example, let x be the input to aspect ai1. Thecode for aspect ai1 can be augmented to apply aspect ai2 andto pass x to the next stage in case ai2ðai1ðxÞÞ 6¼ x; otherwise,it passes ai1ðxÞ to the next stage. Though the reliability(correctness) is 1, the quality (such as bandwidth required,security achieved, etc.) is affected by the behaviors of theaspects; with checking, the average quality is given byPnA

i¼1ðqi � riðtÞÞ.

4 RELIABILITY ASSESSMENT OF RUNTIME

SEPARABLE ASPECTS

4.1 Runtime Spatially Separable Systems

A typical example of this type of system is a delegationdesign pattern where a coordinator monitors events orinputs and selects an appropriate handler for each event orportions of the state space. For a two aspect runtimespatially separable system, the coordinator C scans the statespace S and decomposes it into orthogonal sets S0, S1, andS2. It then invokes AðS0; S1Þ and BðS0; S2Þ that can operatein parallel. The outputs of A and B are passed to modify S1

and S2, respectively. The verification of the system is doneby verifying that the coordinator identifies S0, S1, S2

correctly and verifying that A and B are correct. Statisticalreliability assessment for these systems is complicated bythe fact that the operational profiles induced at A and Bmust be accurately estimated. This can be done indepen-dently of C only if the specification of C is functional;otherwise, the internal logic of the moderator has to beanalyzed in order to determine the correct operationalprofile, which makes the reliability estimates of A and Bdependent on the implementation of C. Once the opera-tional profiles have been estimated, the reliability of thesystem is assessed by first determining the reliabilities of A,B, and C separately. The reliability of the system is equal tothe product of the reliabilities of C, A, and B provided that

1. the decomposition specification is derived from therequirements specification,

224 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004

Page 8: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

2. a defect in one aspect cannot modify the code or dataof another aspect,

3. A and B only modify their own portions of the statespace containing S1 and S2, respectively, and

4. A, B, and C operate in parallel.

4.1.1 Selection Composition

Selection composition falls into the runtime spatially separ-able category. In the selection composition method, at anygiven instant of time, one of the aspects is selected. Variouscriteria can be used to select the appropriate output, such asthe earliest response, the response having the smallest orlargest value, etc. Examples include pattern matching inimage recognition and bidding systems in E-commerceapplications.

Let P ¼ fp1; p2; . . . ; pNAg be a vector such that pi is the

probability that aspect ai is selected at any time t. Then, thesystem reliability is given by

PnA

i¼1 riðtÞ � pi, where riðtÞ isthe reliability for the aspect ai for a random input. Theaverage quality of the system for a random input isPnA

i¼1 pi � riðtÞ � qi, where qi is the average quality of theoutput of aspect ai. As an example, consider the VoIPcommunication management system using different tech-niques to optimize its performance. We have seen thecompression/decompression aspects, encryption/decryp-tion aspects, and other aspects that may use differenttechniques to achieve satisfactory solutions. For example,the compression/decompression aspects must considercompression level, compressed audio quality, and compres-sion/decompression speed to select appropriate techniquesto meet the user requirements. The reliability of the systemincreases if a reliable technique is plugged that meets theuser requirements into the framework and decreases if sucha technique is unavailable, or the chosen technique isunreliable. Since the system is correct as long as one aspectproduces the correct output, the probability that the outputis correct is 1�

QnA

i¼1ð1� riðtÞÞ. The average quality of theoutput is ½

PnA

i¼1 pi � qi� � ½1�QnA

i¼1ð1� riðtÞÞ�, where pi isthe probability that the ith aspect is selected.

4.2 Runtime Temporally Separable Systems

One way of preventing two spatially inseparable aspects, Aand B, from interfering with each other when they access ashared state space is to coordinate their execution to ensurethat their accesses are serializable. Usually, this type ofcoordination logic is interspersed within the code thatimplements the functional aspects of the process, but it isalso possible to separate out the coordination details fromthe implementation of functional aspects. Two cases ofcoordination logic are presented here, namely, sequencecoordinators and concurrency control.

4.2.1 Sequence Coordinators

In the case of Sequence Coordinators, the coordinator Cimplements a schedule of activation patterns for aspectsthat perform specific tasks. For a system consisting ofaspects A and B, the coordinator alternates the activation ofA and B. The verification of C consists of proving that itcorrectly enforces the invocation schedule. The reliabilityassessment again requires the operational profile to bedetermined for A and B. Since these aspects do not run inparallel, it is also necessary to deduce the proportion of timefor which each one is active. This is impacted by the service

times of A, B, and C as well as the coordination policyimplemented by C. To make the analysis independent ofthe implementation of C, the specification of the coordi-nator must be functional. Let p denote the proportion oftime that A is active and let q denote the proportion of timethat B is active. (pþ q can be less than 1 if there is a delaybetween the completion of an aspect and the selection of thenext aspect.) Then, the reliability of the system is thereliability of C � (p� reliability of Aþ q � reliability of B).

4.2.2 Concurrency Control

In Concurrency Control, a number of independent tasks viefor access to the shared state space. The coordinatorprocesses the current state of the system as well as allevents and requests. Based on specified concurrency controlpolicies, it enqueues or activates certain tasks. In addition topolicies that impact the functional behavior of the system,the coordinator may need to meet other constraints, such asperformance constraints, absence of deadlock, absence ofstarvation, etc. The system can be verified by verifying eachindividual task separately. The verification of the coordi-nator consists of proving that it enforces the specifiedcoordination policies. Model checking can be used to verifyother properties, such as absence of deadlocks and absenceof starvation. These proofs together imply the correctness ofthe overall system. Statistical reliability assessment isextremely complex for such systems due to the nondeter-minism inherent in the coordination process. It is difficult todetermine the operational profile and equally difficult torecreate the operational profile during testing.

5 RELIABILITY ASSESSMENT OF SPATIALLY

INSEPARABLE AND TEMPORALLY INSEPARABLEASPECTS

In this case, it is neither possible to separate the accesses ofthe aspects in different dimensions of the state space nor isit possible to separate their accesses at different times.Instead, the aspects have to operate in parallel and have toupdate the state space at the same time. One way ofenabling the code for the programs to be developedindependently and to then be composed together is towrite the programs so that they generate all possible outputvalues corresponding to an input rather than simply onevalue. The programs can be composed together at theoutput level by taking the intersection and union of theiroutput sets as needed.

Suppose the postcondition is P ^Q and the preconditionis T for a 2-aspect system consisting of aspects A and B,respectively. Let the output set of A be pðA; T Þ and theoutput set of B be qðB; T Þ. Then, verification consists ofshowing that T ) wpðA;P Þ and T ) wpðB;QÞ. This impliesthat pðA; T Þ satisfies P and qðB; T Þ satisfies Q. Hence, theintersection of the outputs, if it is nonnull, satisfies P ^Q. Ifthe intersection is null, then it means that P andQ cannot besatisfied simultaneously for this input.

The reliability of the system can be estimated from thereliabilities of the aspects. Each of the aspects accesses theshared state space directly and, hence, it sees the sameoperational profile. Testing using random sampling is usedto validate each aspect independently. The test oracle mustaccept a set of values and check whether all the valuessatisfy the postcondition.

In the following, we discuss fusion composition, wherethe outputs of the different aspects are merged together to

KIM ET AL.: SYSTEMATIC RELIABILITY ANALYSIS OF A CLASS OF APPLICATION-SPECIFIC EMBEDDED SOFTWARE FRAMEWORKS 225

Page 9: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

obtain the output of the system. Hence, even though theaspects are independent, a failure in one aspect can cause afailure of the whole system. In particular, there are threetypes of fusion composition, namely, cooperative fusion [1],competitive fusion [21], and relational fusion. In all thesecases, the composition component processes all the outputsand generates the final system output. The processing caninvolve set intersection and union operations, arithmeticand Boolean operations, and information fusion techniques.Examples include deciding which actuator(s) to activatedepending on parallel processing of different goals,composing features in a telephone switching system,determining which packet to transmit next based on faulttolerance and performance constraints, etc.

In a cooperative fusion, the output from two or moreindependent aspects are fused together to achieve anoutcome that could not have been achieved by theindividual outputs of the aspects. For example, a set ofthermometers over a region can produce a gradient oftemperature difference within the entire region [1]. In acompetitive fusion, the outputs from the individual aspectscompete for the overall output of the system. However, thisis different from the selection type where one output from aset of individual outputs is selected as the overall systemoutput. In a competitive fusion, the individual outputs“influence” the overall system output rather than determinethe outcome [21]. Statistical mean, majority voting, and n-version voting are some of the examples of competitivefusions. The cooperative and competitive fusions are well-known in the sensor network area. In the following section,relational fusion is discussed in detail.

5.1 Relational Fusion

In relational fusion, the top level specification, g, is firstdecomposed into a conjunction of predicates, g ¼ g1 ^ g2^ � � � ^ gn. The individual predicates, gi, are further decom-posed into a disjunction of predicates, i.e., gi ¼ gi1 _ gi2 _� � � _ gini

. Each predicate gij represents one way of achievinggi. Note that conjunctive and disjunctive decompositionscan be applied to the specification iteratively as necessary.Let Pij be a program that achieves gij. In conventionalprograms, Pij is viewed as a function. There is no obviousmathematical model for merging independently developedPijs into the overall system program P since the output ofthe Pijs may be incompatible with each other. In ourapproach, Pij is viewed as a general relation and, hence, itreturns the set of all output values for each input. P can beobtained by simply forming the intersection of the outputsets of Pis, P � P1 \ P2 \ � � � \Pn, where Pi is the programfor achieving gi. Similarly, each Pi can be obtained via asystematic union operation from its components, i.e.,Pi � Pi1 [ Pi2 [ � � � [ Pini

.Since each relational program, Pij, returns the set of all

possible output values for each input, Pij is correct if andonly if there are no extra values or missing values in theoutput set. Having an extra value in the output set causesthe system to work incorrectly when the extra value ischosen as the final selection for the system output. On theother hand, missing values pose a serious problem in thecase where the system output set is obtained from anintersection of the output sets of the components since themissing values would be carried to the system output.Hence, the incorrect output sets can be categorized into two

types—one that is missing the necessary output values andthe other that contains extra values since we do not knowhow each output set will be used. Note also that anyincorrect value in the output set is considered as an extravalue. It is also reasonable to assume that the probability ofhaving missing values in the output set and the probabilityof having extra values in the output set are independent. LetProbmðPijÞ denote the probability that Pij has some missingvalues and ProbeðPijÞ denote the probability that Pij hassome extra values. Consequently, let ProbneðPijÞ be theprobability that Pij has no extra values and ProbnmðP Þ be theprobability that Pij has no missing values in the output set.

Lemma 7. For a relational program P , the reliability of P ¼ProbneðP Þ � ProbnmðP Þ.

Proof. Since a relational program returns a set of all possibleoutput values for a given input set, it is consideredcorrect if and only if there are no extra values nor anymissing values in the output set. The reliablity of P ,ReliabilityP , is then ProbfP has no extra value and P has nomissing valuesg ¼ ð1� ProbeðP ÞÞ � ð1� ProbmðP ÞÞ =ProbneðP Þ � ProbnmðP Þ. tu

The system level reliability then follows from therelational aspect level reliability in a simple way.

Lemma 8. If P1 and P2 are two relational programs, the reliabilityof the system,ReliabilityðP1 \ P2Þ, is bounded by the followingexpression: fProbneðP1Þ � ProbnmðP1Þg � fProbneðP2Þ �ProbnmðP2Þg � ReliabilityðP1 \ P2Þ provided P1 and P2 haveindependent failure processes.

Proof. If both P1 and P2 work correctly, then the system,P1 \ P2, works correctly. As seen in Lemma 7, a relationalprogramworks correctly if and only if it contains no extravalues nor missing values in its output set. Therefore, ifP1’s output set contains no extra values normissing valuesAND P2’s output set contains no extra values nor missingvalues, then the system works correctly. Then, thereliability of the system, ReliabilityðP1 \ P2Þ, becomes

fProbneðP1Þ � ProbnmðP1Þg � fProbneðP2Þ � ProbnmðP2Þg:However, there are some cases where the reliability of thesystemmaybehigher than the expressionmentioned. Thishappenswhen one relational component’s extra values donot fall within the range of the other relational compo-nent’s output set and/or one relational component’smissing values do not fall within the range of the otherrelational component’s output set. In such a situation,those extra or missing values get discarded duringintersection process. Therefore, the reliability of thesystem, ReliabilityðP1 \ P2Þ, is bounded by:

fProbneðP1Þ � ProbnmðP1Þg � fProbneðP2Þ � ProbnmðP2Þg

� ReliabilityðP1 \ P2Þ:ut

Lemma 9. If P1 and P2 are two relational programs, the reliabilityof the system,ReliabilityðP1 [ P2Þ, is bounded by the followingexpression: fProbneðP1Þ � ProbnmðP1Þg � fProbneðP2Þ �ProbnmðP2Þg � ReliabilityðP1 [ P2Þ provided P1 and P2 haveindependent failure processes.

226 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004

Page 10: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

Proof. Theproof is similar to theproof ofLemma8except thatthe system is correct when one relational component’sextra values do fall within the range of the other relationalcomponent’s correct output set and/or one relationalcomponent’smissing values do fall within the range of theother relational component’s correct output set. Therefore,the reliability of the system, ReliabilityðP1 \ P2Þ, isbounded by: fProbneðP1Þ � ProbnmðP1Þg � fProbneðP2Þ �ProbnmðP2Þg � ReliabilityðP1 \ P2Þ. tu

6 CASE STUDY

In this section, we illustrate the framework-based reliabilityassessment approach using a multimedia collaboration tool.The tool not only provides voice chat capability usingVoIP asshown in Section 2, but also allows users to carry out videoconferencing aswell as file transfers. A high level descriptionof selected functionalities is as follows: Consider a multi-media collaboration tool that supports up to three users indoing voice and video conferencing. Each participant of acollaboration session can hear and see all participants of theconference. The tool also allows users to choose differentaudio codec techniques (i.e.,G.711,G.723.1, andG.729) for thevoice conferencing. Users of the tool can also monitor overallnetwork usages of a collaboration session as well as theaverage system resource usages. These usage statistics can beused for future billing purposes and resource optimizations.

The multimedia collaboration system enables a user to

1. invite up to two people to participate in a three-person voice/video conference and hear and see allparticipants,

2. select different audio codec technique that worksbest for a particular situation,

3. monitor overall network usages in terms of sessionrelated packets exchanged among all participants,and

4. select up to three session related resources tomonitor average resource usage.

Also, the framework presents the user with a set of toolsand menus that will enable the user to create a customizedway of viewing the readings of different usage readings.These are upgradable and extensible “visual aspects” andinclude time plots, use of different colors, use of differentgeometric shapes or icons, use of sound, moving bars orother objects along different scales (vertical, horizontal, 2D,along a curve, etc.), circular gauges, etc. The capabilityprovided to the user enables grouping and placement ofdifferent usage displays.

Let RF ðtÞ denote the overall reliability of the frameworkexcluding the reliabilities of the usage data acquisitionaspect, the visual aspect, the chosen audio codec technique,and the video displays. Assume that a user is monitoringthree resource usages, RS1, RS2, and RS3, two networkusages, N1 and N2 (i.e., network usages between two otherparticipants and himself or herself). To enhance the audioquality for a voice chatting session, the user is able to selecta codec scheme from three different options, G1, G2, and G3

using the ITU-T recommendations for codecs G.711,G.723.1, and G.729, respectively. In addition, there are threevideo output windows, V1, V2, and V3 to see all threeparticipants including himself or herself.

Assume that

1. the resource usage is combined together usingcomponent C1 and their average is displayed as amoving bar in display region D1,

2. the network usage is combined together usingcomponent C2 and the aggregate value is displayedas a color in display region D2,

3. the audio codec technique selection is displayed inregion D3(the user can only select one scheme at atime), and

4. the outputs of the Web-cams of the three participantsare displayed in regions D4, D5, and D6.

The overall display is a montage of six display objects.Hence, the system quality is

X6i¼1

qi �RDiðtÞ

!�RF ðtÞ:

Since D1 is a fusion of RS1, RS2, and RS3,

RD1ðtÞ ¼

Y3i¼1

RdataAcquisition�RSiðtÞ

!�Rcomposition�C1

ðtÞ�

Rvisualization�D1ðtÞ;

where RdataAcquisition�RSiðtÞ is the reliability of the plug-in

resource usage monitoring aspect RSi, 1 � i � 3,

Rcomposition�C1ðtÞ

is the reliability of the composition routine that computesthe average of RS1, RS2, and RS3, and Rvisualization�D1

ðtÞ isthe reliability of the visual aspect for display region D1.

Similarly, the composition for region D2 is of the fusiontype, so

RD2ðtÞ ¼

Y2i¼1

RdataAcquisition�NiðtÞ

!�Rcomposition�C2

ðtÞ�

Rvisualization�D2ðtÞ;

where RdataAcquisition�NiðtÞ is the reliability of the plug-in

network usagemonitoring aspectNi, 1 � i � 2,Rcomposition�C2

ðtÞ is the reliability of the composition routine that computesthe effective network usage from N1 and N2, andRvisualization�D2

ðtÞ is the reliability of the visual aspect fordisplay region D2. Since the composition for region D3 is ofthe selection type,

RD3ðtÞ ¼

X3i¼1

Pselect�Gi�RdataAcquisition�Gi

ðtÞ !

Rvisualization�D3ðtÞÞ;

where Pselect�Giis the probability that the ith codec

technique is selected, 1 � i � 3. The video camera displaysconform to the selection composition. Hence, for 4 � i � 6,

RDiðtÞ ¼ RdataAcquisition�Vi�3

ðtÞÞ �Rvisualization�DiðtÞ:

Hence, the overall reliability of the system is

X6i¼1

RDiðtÞ

!�RF ðtÞ:

KIM ET AL.: SYSTEMATIC RELIABILITY ANALYSIS OF A CLASS OF APPLICATION-SPECIFIC EMBEDDED SOFTWARE FRAMEWORKS 227

Page 11: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

The main benefit of the framework-based assessmentapproach is that the reliability of the system can be easilyrecomputed as aspects are modified or as new aspects areadded to the system since the system reliability is mathema-tically inferred from the aspect reliabilities. For example,consider a system of two compile-time temporally separableaspectsA andB. Since the system reliability,Reliabilitysystem,is ReliabilityA �ReliabilityB, if the aspect A is modified to orreplaced with A0, the system reliability can easily berecomputed using the reliability of A0 (i.e., Reliabilitysystemsimply becomes ReliabilityA0 �ReliabilityB).

7 RELATED WORK

Designing a software system as a framework that cansupport plug-in aspects is an effective way of simplifyingthe system and assuring high-quality by making thespecification of each aspect as well as the compositioncomponent more amenable to rigorous analysis. It alsoenables the framework to deal with generic application-independent issues, such as scheduling, buffering, commu-nication, security, fault tolerance, naming services, etc.

The basis for plug-in aspects can be traced to extensivework in the area of requirements decomposition. One of theearliest works is reported in [33] where a requirementsspecification is decomposed into multiple views, each ofwhich captures some behavior of the system. Each view isrepresented by a sequence diagram. This decompositionreduces the complexity of the system. However, twodifferent views are not necessarily independent, e.g., theycan interact via aliases in order to react in a compatible wayto a given input. The concept of multiple views has alsobeen used in Statecharts [17], Objectcharts [11], and otherrelated methods. It has also been applied to existinglanguages, e.g., Z [20]. The primary motivation for theseviews (achieved by grouping multiple states into one superstate) is to reduce the complexity of the underlying FiniteState Machine specification of the system. Again, interac-tions between machines (e.g., via synchronous events) canintroduce dependencies between different machines. RSML[25] is a significant extension to Statecharts with the goal ofachieving more easily understandable and reviewablespecifications. It also has a more intuitive step semantics,but the objective is to assure analyzability of complexspecifications rather than to identify plug-in softwareaspects.

Decomposition methods that persist over the life-cycleinclude separation using rely-guarantee assertions [24],behavioral inheritance [2], IDEAL components [7], andAspect-Oriented Programming [23]. These methods resultin distinct pieces of code that can be analyzed separatelyand are then formally composed together to form thesystem. The rely-guarantee-based approach achieves se-paration between different components by using a commoninterface language between two components with a precisespecification of rely and guarantee conditions [22] for eachseparate component. However, components are not re-quired to be observable by the end-user who may not evenbe aware of some interfaces, especially interfaces with innercomponents. Behavioral inheritance is an elegant way ofseparating out synchronization concerns from functionalconcerns in object-oriented languages [2]. The approachproposed in [2] uses multiple inheritance, by inheriting onefunctional component and one behavioral component. Itsatisfies independent assessability, but does not guarantee

an implementation-invariant state space (so, the systemproperties cannot be inferred from the component proper-ties). In the IDEAL systems approach [7], separate aspectsare used to ensure the invariance of the components. Inaddition, this approach enables the decomposition of abehavioral component into more than one component (e.g.,by having separate components for assuring mutualexclusion, priority, FIFO access, enforcement of precedence,etc.). Aspect-Oriented Programming (AOP) is a more recenttechnique [23] that strives for separation of concerns inimplementing object-oriented programs. Features that canbe used for more than one object, such as error detection,exception handling, and synchronization code, are sepa-rated from the main functionality of the objects. The codefor these features are written once along with identificationof the objects that will need the code and the positions/situations that will activate the code. Then, a preprocessor isused to “weave” the code for the features with the code forthe objects. There is substantial overlap between thephilosophy of Aspect Oriented Programming and frame-work plug-in aspects. However, there are also someimportant differences. For example, framework plug-inaspects can be executed as separate processes, while aspectsin AOP have to be statically “woven” together to form theprogram. Dynamic composition allows each process to beevaluated separately using model checking [19] andoperational profile testing [29]. Also, each aspect can bemade fault-tolerant more cost-effectively using designdiversity [14], exception handling, and other methods.

In summary, a lot of work has been done in decomposingspecifications into multiple views. The key feature of theapproach discussed in this paper is that the framework andcomposition method can be used to infer the systemreliability from the aspect reliabilities.

A problem with decomposition of a specification intosimpler components is how to compose the components toobtain a system with assessable properties. One difficultproblem is how to assure the consistency of the differentviews [3]. Nonmonotonic logic [9], especially paraconsistentnonmonotonic logic [10], provides some support, but itcannot handle all types of inconsistencies. Formal techni-ques have been developed to tag inconsistent specificationsand remove them either manually or by using rule-basedmethods [16]. This difficulty is compounded when differentspecification methods are used to specify different views[34]. Such multiparadigm methods can result in simplerspecifications by allowing the use of notations that enableeasy expression of specific aspects, e.g., Z, specification forabstract data types and Statecharts for reactive components.Inconsistencies are resolved by automatically translating allthe specifications into a uniform framework, typically a firstorder predicate calculus specification. These approaches donot address execution-time concerns, such as striving forend-user assessable components or ensuring that thereliability of the system can be inferred from the reliabilityof its components.

In the plug-in approach, the system requirementsspecification is decomposed based on conjunctive anddisjunctive connectives and directly mapped to simplecomposition operations, including fusion, nesting, selection,etc. Each aspect can be implemented and evaluatedindependently and the different aspects can be composeddynamically by the framework. Each aspect is directlyassessable at the system level and can be traced back to therequirements specification. This property facilitates fault-confinement and isolation. Also, inconsistencies can be

228 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004

Page 12: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

detected during some composition methods, for example,set intersection in fusion composition, and can be resolvedby assigning priorities to components. Another nice featureis that aspect reliability estimates can be statisticallycombined to obtain the system reliability. Further, eachaspect has relatively few states and transitions, whichmakes it feasible to develop highly reliable components andassess their reliabilities to a high degree of confidence.

8 SUMMARY

In this paper, we have presented and evaluated a frame-work-based approach for decomposing and implementingcomplex software systems that 1) enable the systemreliability to be rigorously inferred from the aspectreliabilities based on the composition method, 2) allow theaspects to be upgraded or removed, and 3) allow newaspects to be added dynamically. In both the latter twocases, the reliability of the system can be inferred withouthaving to test the entire system again.

The approach uses orthogonal decomposition methodsto partition a complex software system into a fixed part,consisting of the framework and composition components,and slots in which application aspects can be addeddynamically as needed. Since each aspect is independentof other aspects and is substantially simpler than the wholesystem, the verification and analysis of each aspect is muchsimpler than attempting to verify the entire system as onesingle monolithic entity. Further, it is possible to selectivelyuse verification or analysis and testing depending on thecharacteristics of each aspect in order to maximize theconfidence in the correctness of the system.

In addition to the above advantages, the approach alsoallows aspect-level hardening for redundancy, customiza-tion, etc. With end-user observability of aspects, each aspectcan be tested, validated, and verified by the end-user (notthe developers). Furthermore, interactions among aspectsare straight-forward, and users do not have to worry aboutnew glue code synthesis when changing aspects. Theapproach also supports multiple architectures in imple-mentation of different parts of the system.

The approach currently has some limitations, the mainone being that the application must be decomposable into aset of IDEAL aspects. The need for relational compositionfor some of the aspects implies that the approach is mostsuitable for applications where the output sets can berepresented concisely, such as continuous process-controlsystems, or where the composition can be done statically.The approach has been successfully applied to process-control systems [8], a high-assurance measurement reposi-tory system [6], and telecommunications systems. Thecurrent version of the approach is not optimized, andmethods of improving the performance of the frameworkneeds more investigation. The framework currently en-compasses two types of architectures, namely, the pipes-and-filters architecture and the shared repository architec-ture. A future research direction is to investigate theapplicability of the approach to other types of softwarearchitectures. It is also interesting to explore the possibilityof other composition methods that support plug-in applica-tion aspects. Other research issues include incorporatingquality of service specifications by associating a quality ofservice measure with each output, decomposition of a givenspecification into finer aspects, and the automated identi-fication of such decompositions.

ACKNOWLEDGMENTS

This research was supported in part by the US NationalScience Foundation under grant numbers CCR-9900922 andEIA-0103709 and by the Texas ATP under grant number009741-0143-1999.

REFERENCES

[1] J. Agre and L. Clare, “An Integrated Architecture for CooperativeSensing Networks,” Computer, vol. 33, no. 5, pp. 106-108, May 2000.

[2] C. Atkinson, Object-Oriented Reuse, Concurrency and Distribution.New York: Addison-Wesley & ACM Press, 1991.

[3] R. Balzer, “Tolerating Inconsistency,” Proc. 13th Int’l Conf. SoftwareEng., pp. 158-165, May 1991.

[4] F.B. Bastani, B. Cukic, V. Hilford, and A. Jamoussi, “TowardDependable Safety-Critical Software,” Proc. Second WorkshopObject-Oriented Real-Time Dependable Systems, Feb. 1996.

[5] F.B. Bastani, “Relational Programs: Architecture for RobustProcess-Control Programs,” Annals, 1999.

[6] F.B. Bastani, S. Ntafos, I.-L. Yen, D.E. Harris, R.R. Morrow, and R.Paul, “High-Assurance Measurement Repository System,” Proc.Fifth IEEE Int’l Symp. High Assurance Systems Eng., Nov. 2000.

[7] F.B. Bastani, I.-L. Yen, S. Kim, J. Linn, and K. Rao, “Reliability ofSystems of Independently Developable End-User AssessableLogical (IDEAL) Programs,” Proc. IEEE Int’l Symp. SoftwareReliability Eng., Nov. 2001.

[8] F.B. Bastani, I.-L. Yen, and S. Kim, “Highly Reliable RelationalControl Programs for Robust Rapid Transit Systems,” Proc. SixthIEEE Int’l Symp. High Assurance Systems Eng., Oct. 2001.

[9] J. Bell, “Non-Monotonic Reasoning, Non-Monotonic Logics, andReasoning about Change,” Artificial Intelligence Rev., vol. 4, pp. 79-108, 1990.

[10] H. Blair and V.S. Subrahmanian, “Paraconsistent Logic Program-ming,” Theoretical Computer Science, vol. 68, pp. 135-154, 1989.

[11] D. Coleman, F. Hayes, and S. Bear, “Introducing Objectcharts orHow to Use Statecharts in Object-Oriented Design,” IEEE Trans.Software Eng., vol. 18, no. 1, pp. 9-18, Jan. 1992.

[12] B. Cukic and F.B. Bastani, “On Reducing the Sensitivity ofSoftware Reliability to Variations in the Operational Profile,” Proc.IEEE Int’l Symp. Software Reliability Eng., Oct. 1996.

[13] J.B. Dugan and R. Van Buren, “Reliability Evaluation of Fly-by-Wire Computer Systems,” J. Systems and Safety, June 1993.

[14] J.B. Dugan and M.R. Lyu, “System Reliability Analysis of an N-Version Programming Application,” IEEE Trans. Reliability,pp. 513-519, Dec. 1994.

[15] W.W. Everett, “Software Reliability Component Analysis,” Proc.Ninth Ann. Software Reliability Engineering Workshop, July 1998.

[16] A.C.W. Finkelstein, D. Gabbay, A. Hunter, J. Kramer, and B.Nuseibeh, “Inconsistency Handling in Multiperspective Specifica-tions,” IEEE Trans. Software Eng., vol. 20, no. 8, pp. 569-578, Aug.1994.

[17] D. Harel, “Statecharts: A Visual Formalism for Complex Systems,”Science of Computer Programming, vol. 8, pp. 231-274, 1987.

[18] D. Harel, H. Lachover, A. Naamad, A. Pnueli, M. Politi, R.Sherman, A. Shtull-Trauring, and M. Trakhtenbrot, “Statemate: AWorking Environment for the Development of Complex ReactiveSystems,” IEEE Trans. Software Eng., vol. 16, no. 4, pp. 403-414,Apr. 1990.

[19] C.L. Heitmeyer, B.L. Labaw, and D. Kiskis, “Consistency Check-ing of SCR-Style Requirements Specifications,” Proc. Second IEEEInt’l Symp. Reliability Eng., pp. 56-63, Mar. 1995.

[20] D. Jackson, “Structuring Z Specifications with Views,” ACM Trans.Software Eng. and Methodology, vol. 4, no. 4, pp. 365-389, Oct. 1995.

[21] D.N. Jayasimha, S.S. Iyengar, and R.L. Kashyap, “InformationIntegration and Synchronization in Distributed Sensor Networks,”IEEE Trans. Systems, Man, and Cybernetics, vol. 21, no. 5, pp. 1032-1043, Sept./Oct. 1991.

[22] C.B. Jones, “Tentative Steps Towards a Development Method forInterfering Programs,” ACM Trans. Programming Languages andSystems, vol. 5, no. 4, pp. 596-619 Oct. 1983.

[23] G. Kiczales, J. Lamping, A. Mendhekar, C. Maeda, C.V. Lopes, J.-M. Loigtier, and J. Irwin, “Aspect-Oriented Programming,” Proc.European Conf. Object-Oriented Programming, June 1997.

[24] S.S. Lam and A.U. Shankar, “A Theory of Interfaces and Modules:I—Composition Theorem,” IEEE Trans. Software Eng., vol. 20, no. 1,pp. 55-71, Jan. 1994.

KIM ET AL.: SYSTEMATIC RELIABILITY ANALYSIS OF A CLASS OF APPLICATION-SPECIFIC EMBEDDED SOFTWARE FRAMEWORKS 229

Page 13: Systematic reliability analysis of a class of application ...people.cs.vt.edu/~irchen/ps/tse04-final.pdfembedded system consists of an array of sensors, actuators, displays, and control

[25] N.G. Leveson, M.P.E. Heimdahl, H. Hildreth, and J.D. Reese,“Requirements Specification for Process-Control Systems,” IEEETrans. Software Eng., vol. 20, no. 9, pp. 684-707, Sept. 1994.

[26] Handbook of Software Reliability Eng.,M. Lyu, ed., McGraw-Hill andIEEE Press, 1996.

[27] D. Mason and D. Woit, “Software System Reliability fromComponent Reliability,” Proc. Ninth Ann. Software Reliability Eng.Workshop, July 1998.

[28] J.D. Musa, A. Iannino, and K. Okumoto, Software Reliability:Measurement, Prediction, Applications, professional ed., McGraw-Hill, pp. 178-180, 1990.

[29] J.D. Musa, “Operational Profiles in Software Reliability Engineer-ing,” IEEE Software, pp. 14-32, Mar. 1993.

[30] D.A. Russo, Private Communication, May 2000.[31] C. Smidts, D. Sova, and G.K. Mandela, “An Architectural Model

for Software Reliability Quantification,” Proc. Eighth Int’l Symp.Software Reliability Eng., pp. 324-335, Nov. 1997.

[32] V.L. Winter, Private Comm., July 1998.[33] P. Zave, “A Distributed Alternative to Finite-State-Machine

Specifications,” ACM Trans. Programming Languages and Systems,vol. 7, no. 1, pp. 10-36, Jan. 1985.

[34] P. Zave and M. Jackson, “Where Do Operations Come From? AMultiparadigm Specification Technique,” IEEE Trans. SoftwareEng., vol. 22, no. 7, pp. 508-528, July 1996.

Sung Kim received the BA degree from theUniversity of Texas at Austin and the MS degreein computer science from the University of Texasat Dallas. He is currently working on the PhDdegree in computer science at the University ofTexas at Dallas, while working as a member ofScientific Staff at Nortel Networks, Inc. Hisresearch interests are in the areas of softwaredecomposition, high-assurance software sys-tems engineering, software reliability assess-

ment, component-based software security, and certifications.

Farokh B. Bastani received the BTech degree in electrical engineeringfrom the Indian Institute of Technology, Bombay, and the MS and PhDdegrees in computer science from the University of California, Berkeley.He is currently a professor of computer science at the University ofTexas at Dallas. His research interests are in the areas of relationalprograms, high-assurance hardware/software systems engineering,hardware/software reliability assessment, self-stabilizing systems, in-herent fault tolerance, and high-performance modular parallel programs.Dr. Bastani was the Editor-in-Chief of the IEEE Transactions onKnowledge and Data Engineering and is on the editorial boards of theInternational Journal of Artificial Intelligence Tools, the Journal ofKnowledge and Information Systems (KAIS), and the Springer-Verlagbook series on Knowledge and Information Management (KAIM). Hewas the program cochair of the 1997 Symposium on Reliable DistributedSystems and the 1998 International Symposium on Software ReliabilityEngineering, the program vice-chair of the 1999 Symposium onAutonomous Decentralized Systems, and the program chair for the1995 International Conference on Tools with Artificial Intelligence. Hehas been on the program committees of numerous conferences andworkshops and on the steering committee of the IEEE Symposium onHigh-Assurance Systems Engineering, the IEEE Symposium onApplication-Specific Software Engineering and Technology, and theIEEE Knowledge and Data Engineering Workshop. He was on theeditorial board of the IEEE Transactions on Software Engineering andthe High Integrity Systems Journal. He was a guest editor of the April1992 special issue of IEEE Transactions on Knowledge and DataEngineering devoted to self-organizing systems and the December1985, January 1986, and November 1993 special issues of IEEETransactions on Software Engineering devoted to software reliability. Heis a guest editor of a special issue of the International Journal ofSoftware Engineering and Knowledge Engineering (JSEKE) devoted toembedded software systems. He is a member of ACM and the IEEE.

I-Ling Yen received the BS degree from Tsing-Hua University, Taiwan, and the MS and PhDdegrees in computer science from the Universityof Houston. She is currently an associateprofessor of computer science at the Universityof Texas at Dallas. Her research interestsinclude fault-tolerant computing, security sys-tems and algorithms, distributed systems, self-stabilizing systems, and Internet technologies.She has developed many efficient fault tolerance

methods, including inherent fault tolerance, multilevel robust datastructures, and scalable, adaptive replication protocols. Dr. Yen andher students have developed a mobile agent system, PeAgent, thatsupports enhanced security and fine-grained access control. Thesystem is used in a Weblet infrastructure to achieve better securityand performance for Web services. In addition, she has developed anovel secure computation algorithm which is designed for commercialdatabase systems. Dr. Yen has served as a program committeemember for many conferences and program chair or cochair for theIEEE Symposium on Application-Specific Software and System En-gineering and Technology, the IEEE High Assurance SystemsEngineering Symposium, the IEEE International Computer Softwareand Applications Conference, and the IEEE International Symposium onAutonomous Decentralized Systems. Dr. Yen is a member of the IEEEand the IEEE Computer Society.

Ing-Ray Chen received the BS degree from theNational Taiwan University, Taipei, Taiwan, andthe MS and PhD degrees in computer sciencefrom the University of Houston, Texas. He iscurrently an associate professor in the Depart-ment of Computer Science at Virginia Tech. Hisresearch interests include mobile computing,pervasive computing, multimedia, distributedsystems, real-time intelligent systems, andreliability and performance analysis. Dr. Chen

has served on the program committees of numerous conferences,including being the program chair of the 14th IEEE InternationalConference on Tools with Artificial Intelligence in 2002, and the ThirdIEEE Symposium on Application-Specific Systems and SoftwareEngineering Technology in 2000. Dr. Chen currently serves as anassociate editor for the IEEE Transactions on Knowledge and DataEngineering, the Computer Journal, and the International Journal onArtificial Intelligence Tools. He is a member of the IEEE, the IEEEComputer Society, and the ACM.

. For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

230 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 30, NO. 4, APRIL 2004