SENG521 (Fall SENG 521 Software Reliability & Testing Overview of Software Reliability Engineering...

Post on 06-Jan-2018

232 views 3 download

description

SENG521 (Fall Section 1 Basic Concepts & Definitions

Transcript of SENG521 (Fall SENG 521 Software Reliability & Testing Overview of Software Reliability Engineering...

SENG521 (Fall 2002) far@enel.ucalgary.ca 1

SENG 521SENG 521Software Reliability & Software Reliability & TestingTesting

Overview of Software Reliability Overview of Software Reliability EngineeringEngineering

Department of Electrical & Computer Engineering, University of Calgary

B.H. Far ( far@enel.ucalgary.ca )http://www.enel.ucalgary.ca/~far/Lectures/SENG521/01/

SENG521 (Fall 2002) far@enel.ucalgary.ca 2

ContentsContents About this course.About this course. What is software reliability?What is software reliability? What factors affect software quality?What factors affect software quality? What is software reliability engineering?What is software reliability engineering? Software reliability engineering process.Software reliability engineering process.

SENG521 (Fall 2002) far@enel.ucalgary.ca 3

Section 1Section 1Basic Concepts& Definitions

SENG521 (Fall 2002) far@enel.ucalgary.ca 4

Realities …Realities … Software development is a very high risk task. About 20% of the software projects are canceled.

(missed schedules, etc.) About 84% of software projects are incomplete

when released (need patch, etc). Almost all of the software projects costs exceed

initial estimations. (cost overrun)

SENG521 (Fall 2002) far@enel.ucalgary.ca 5

Software Engineering /1Software Engineering /1 Business software has a large number of parts that have many

interactions (i.e., complexity). Software engineering paradigms provide models and

techniques that make it easier to handle complexity. A number of contemporary software engineering. paradigms

have been proposed: Object-orientation Component-ware Design patterns Software architectures etc.

SENG521 (Fall 2002) far@enel.ucalgary.ca 6

Software Engineering /2Software Engineering /2 Evolution of software

engineering paradigms: Assembly languages Procedural and structured

programming Object Oriented programming Component-ware Design patterns Software architectures

…… Software Agents

Languages that Languages that have their have their conceptual basis conceptual basis determined by determined by machine machine architecturearchitecture

Languages that Languages that have their key have their key abstractions rooted abstractions rooted in the problem in the problem domaindomain

Increase Increase ofofComplexitComplexityy

time

SENG521 (Fall 2002) far@enel.ucalgary.ca 7

What Affects Software?What Affects Software? Timeliness:Timeliness:

Meeting the project deadline. Reaching the market at the right time.

Cost:Cost: Meeting the anticipated project costs.

Reliability:Reliability: Working fine for the designated period on the

designated system.

SENG521 (Fall 2002) far@enel.ucalgary.ca 8

Definition: Failure & AvailabilityDefinition: Failure & Availability Failure: Failure: Any departure of system behavior in

execution from user needs. Failure intensity:Failure intensity: the number of failures per natural

or time unit. Failure intensity is way of expressing reliability.

Availability:Availability: The probability at any given time that a system or a capability of a system functions satisfactorily in a specified environment. If you are given an average down time per failure,

availability implies a certain reliability.

SENG521 (Fall 2002) far@enel.ucalgary.ca 9

Definition: Verification & ValidationDefinition: Verification & Validation Verification:Verification:

For each development phase or for each module are the outputs and inputs generated correctly? And do they match correctly?

Validation:Validation: Does the software meet its requirements?

SENG521 (Fall 2002) far@enel.ucalgary.ca 10

Definition: ReliabilityDefinition: Reliability Reliability is the probability that a system or a

capability of a system functions without failure for a “specified time” or “number of natural units” in a specified environment. (Musa, et al.)

A recent survey of software consumers revealed that reliability was the most important quality attribute of the application software.

This course is concerned with the engineering of reliable software products.

SENG521 (Fall 2002) far@enel.ucalgary.ca 11

About This Course …About This Course … The topics discussed include:

Concepts and relationships; analytical models and supporting tools; techniques for software reliability improvement,

including: fault avoidance, fault elimination, fault tolerance error detection and repair, failure detection and retraction; risk management.

SENG521 (Fall 2002) far@enel.ucalgary.ca 12

Section 2Section 2Reliability

SENG521 (Fall 2002) far@enel.ucalgary.ca 13

Reliability: Natural SystemReliability: Natural System Natural system

life cycle. Aging effect:

Life span of a natural system is limited by the maximum reproduction rate of the cells.

SENG521 (Fall 2002) far@enel.ucalgary.ca 14

Reliability: HardwareReliability: Hardware Hardware life

cycle. Useful life span

of a hardware system is limited by the age (wear out) of the system.

SENG521 (Fall 2002) far@enel.ucalgary.ca 15

Reliability: SoftwareReliability: Software Software life cycle. Software systems

are changed (updated) many times during their life cycle.

Each update adds to the structural deterioration of the software system.

SENG521 (Fall 2002) far@enel.ucalgary.ca 16

Software vs. HardwareSoftware vs. Hardware Software reliability doesn’t decrease with

time. Hardware faults are mostly physical faults. Software faults are mostly design faults

which are harder to measure, model, detect and correct.

SENG521 (Fall 2002) far@enel.ucalgary.ca 17

Reliability: Science Reliability: Science Exploring ways of implementing “reliability”

in software products. Reliability Science’s goals:

Developing “models” and “techniques” to build reliable software.

Testing such models and techniques for adequacy, soundness and completeness.

SENG521 (Fall 2002) far@enel.ucalgary.ca 18

Section 3Section 3Reliability Engineering

SENG521 (Fall 2002) far@enel.ucalgary.ca 19

What is Engineering?What is Engineering? Engineering =

Analysis + Design + Construction + Verification + Management

What is the problem to be solved? What characters of the entity are

used to solve the problem? How will the entity be realized? How it is constructed? What approach is used to uncover

errors in design and construction? How will the entity be supported in

the long term?

SENG521 (Fall 2002) far@enel.ucalgary.ca 20

Reliability: Engineering /1Reliability: Engineering /1 Engineering of “reliability” in software

products. Reliability Engineering’s goal:

developing software to reach the market With “minimum” development time With “minimum” development cost With “maximum” reliability Software

SoftwareQualityQuality

SENG521 (Fall 2002) far@enel.ucalgary.ca 21

Reliability: Engineering /2Reliability: Engineering /2

Pick quantitative representations for the 3 factors (cost, time and reliability) and measure them!

Software quality means getting the right balance among development cost, development time and reliability.

SREMinimum & MaximumCost, Time, Reliability

Optimum

SENG521 (Fall 2002) far@enel.ucalgary.ca 22

What is SRE? /1What is SRE? /1 Software Reliability Engineering (SRE) is a multi-

faceted discipline covering the software product lifecycle.

It involves both technical and management activities in three basic areas: Software Development and Maintenance Measurement and Analysis of Reliability Data, Feedback of Reliability Information into the software

lifecycle activities.

SENG521 (Fall 2002) far@enel.ucalgary.ca 23

What is SRE ? /2What is SRE ? /2 SRE is a practice for quantitatively planning and

guiding software development and test, with emphasis on reliability and availability.

SRE simultaneously does three things: It ensures that product reliability and availability meet

user needs. It delivers the product to market faster. It increases productivity, lowering product life-cycle cost.

In applying SRE, one can vary relative emphasis placed on these three factors.

SENG521 (Fall 2002) far@enel.ucalgary.ca 24

Section 4Section 4Software Reliability

Engineering (SRE) Process

SENG521 (Fall 2002) far@enel.ucalgary.ca 25

SRE: Process /1SRE: Process /1 There are 5 steps in

SRE process (for each system to test):

Define necessary reliability

Develop operational profiles

Prepare for test Execute test Apply failure data to

guide decisions

SENG521 (Fall 2002) far@enel.ucalgary.ca 26

SRE: Process /2SRE: Process /2 The Develop Operational Profiles, and Prepare for

Test activities all start during the Requirements and Architecture phases of the software development process.

They all extend to varying degrees into the Design and Implementation phase, as they can be affected by it.

The Execute Test and Guide Test activities coincide with the Test phase.

SENG521 (Fall 2002) far@enel.ucalgary.ca 27

SRE: Necessary ReliabilitySRE: Necessary Reliability Define what “failure” means for the product. Choose a common measure for all failure intensities, either

failures per some natural unit or failures per hour. Set the total system failure intensity objective (FIO). Compute a developed software FIO by subtracting the total

of the FIOs of all hardware and acquired software components from the system FIOs.

Use the developed software FIOs to track the reliability growth during system test.

SENG521 (Fall 2002) far@enel.ucalgary.ca 28

SRE: Operational Profile /1SRE: Operational Profile /1 An operation is a major system logical task,

which returns control to the system when complete.

An operational profile is a complete set of operations with their probabilities of occurrence.

SENG521 (Fall 2002) far@enel.ucalgary.ca 29

SRE: Operational Profile /2SRE: Operational Profile /2 There are four principal steps in developing an

operational profile: Identify the operation initiators List the operations invoked by each initiator Determine the occurrence rates Determine the occurrence probabilities by dividing the

occurrence rates by the total occurrence rate There are three kinds of initiators: user types,

external systems, and the system itself.

SENG521 (Fall 2002) far@enel.ucalgary.ca 30

SRE: Operational Profile /3SRE: Operational Profile /3 Review Operational profile:

Review the functionality to be implemented to remove operations that are not likely to be worth their cost

Suggest operations where opportunities for reuse will be most cost-effective Plan a more competitive release strategy using operational development.

With operational development, development proceeds operation by operation, ordered by the operational profile. This makes it possible to deliver the most used, most critical capabilities to customers earlier than scheduled.

Allocate resources for requirements, design, and code reviews among operations to cut schedules and costs

Allocate system engineering, architectural design, development, and code resources among operations to cut schedules and costs

Allocate development, code, and test resources among modules to cut schedules and costs

SENG521 (Fall 2002) far@enel.ucalgary.ca 31

SRE: Prepare for TestSRE: Prepare for Test The Prepare for Test activity uses the operational

profiles to prepare test cases and test procedures. Test cases are allocated in accordance with the

operational profile. Test cases are assigned to the operations by

selecting from all the possible intra-operation choices with equal probability.

The test procedure is the controller that invokes test cases during execution.

SENG521 (Fall 2002) far@enel.ucalgary.ca 32

SRE: Execute TestSRE: Execute Test Allocate test time among the associated systems and

types of test (feature, load, regression, etc.). Invoke the test cases at random times, choosing

operations randomly in accordance with the operational profile.

Identify failures, along with when they occur. This information will be used in Apply Failure Data

and Guide Test.

SENG521 (Fall 2002) far@enel.ucalgary.ca 33

Types of TestTypes of Test Reliability Growth Test

Certification Test

SENG521 (Fall 2002) far@enel.ucalgary.ca 34

SRE: Apply Failure DataSRE: Apply Failure Data Plot each new failure as it occurs on a

reliability demonstration chart. Accept or reject software (operations) using

reliability demonstration chart. Track reliability growth as faults are

removed.

SENG521 (Fall 2002) far@enel.ucalgary.ca 35

Collect Field DataCollect Field Data SRE for the software product lifecycle. Collect field data to use in succeeding releases either using

automatic reporting routines or manual collection, using a random sample of field sites.

Collect data on failure intensity and on customer satisfaction and use this information in setting the failure intensity objective for the next release.

Measure operational profiles in the field and use this information to correct the operational profiles we estimated.

Collect information to refine the process of choosing reliability strategies in future projects.

SENG521 (Fall 2002) far@enel.ucalgary.ca 36

Section 5Section 5Error &Failure

SENG521 (Fall 2002) far@enel.ucalgary.ca 37

Definition: FaultDefinition: Fault A fault is a cause for either a failure of the program

or an internal error (e.g., an incorrect state, incorrect timing)

A fault must be detected and then removed Fault can be removed without execution (e.g., code

inspection, design review) Fault removal due to execution depends on the

occurrence of associated “failure”. Occurrence depends on length of execution time

and operational profile.

SENG521 (Fall 2002) far@enel.ucalgary.ca 38

Definition: ErrorDefinition: Error Error has two meanings:

A discrepancy between a computed, observed or measured value or condition and the true, specified or theoretically correct value or condition.

A human action that results in software containing a fault.

Human errors are the hardest to detect.

SENG521 (Fall 2002) far@enel.ucalgary.ca 39

More DefinitionsMore Definitions Defect:Defect: refers to either fault (cause) or failure

(effect) Service:Service: expected behavior of a software

system Availability:Availability: system uptime divided by the

sum of system uptime and downtime.

DowntineUptimeUptimetyAvailabili

SENG521 (Fall 2002) far@enel.ucalgary.ca 40

Failure Specification /1Failure Specification /11) Time of failure2) Time interval

between failures3) Cumulative failure

up to a given time4) Failures

experienced in a time interval

Failure no.

Failure times (hours)

Failure interval (hours)

1 10 10

2 19 9

3 32 13

4 43 11

5 58 15

6 70 12

7 88 18

8 103 15

9 125 22

10 150 25

11 169 19

12 199 30

13 231 32

14 256 25

15 296 40

Time based failure specification

SENG521 (Fall 2002) far@enel.ucalgary.ca 41

Failure Specification /2Failure Specification /21) Time of failure2) Time interval

between failures3) Cumulative failure

up to a given time4) Failures

experienced in a time interval

Time(s) Cumulative Failures

Failures in interval

30 2 2

60 5 3

90 7 2

120 8 1

150 10 2

180 11 1

210 12 1

240 13 1

270 14 1

Failure based failure specification

SENG521 (Fall 2002) far@enel.ucalgary.ca 42

Failure Specification /3Failure Specification /3 Many reliability modeling programs and

tools based on them (e.g., SMERFS, and CASRE) have the capability to estimate model parameters from either “failure count” or “time interval between failures” data.

SENG521 (Fall 2002) far@enel.ucalgary.ca 43

Failure Functions /1Failure Functions /1 Cumulative Failure

Function (mean value function) denotes the average cumulative failures associated with each time point.

Failures in time period

Probability Value X Probability

0 0.10 0.00

1 0.18 0.18

2 0.22 0.44

3 0.16 0.48

4 0.11 0.44

5 0.08 0.40

6 0.05 0.30

7 0.04 0.28

8 0.03 0.24

9 0.02 0.18

10 0.01 0.10

Cumulative failure 3.04

Failure distribution

SENG521 (Fall 2002) far@enel.ucalgary.ca 44

Failure Functions /2Failure Functions /2 Failure Intensity

Function (FIF) represents the rate of change of cumulative failure function.

As faults are removed, failure intensity tends to drop and reliability tends to increase.

SENG521 (Fall 2002) far@enel.ucalgary.ca 45

Failure Functions /3Failure Functions /3 Meantime to Failure (MTTF): expected time

that next failure will be observed.

R(x) is the reliability.

Meantime to Repair (MTTR): expected time until the system will be repaired.

dxxRMTTF

0

SENG521 (Fall 2002) far@enel.ucalgary.ca 46

Failure Functions /4Failure Functions /4 Failure Rate Function: the probability that a failure

per unit time occurs in the interval [t, t+Δt] given the failure has not occurred before t.

Meantime Between Failures (MTBF): MTBF = MTTF + MTTR

Availability can also be defined as:

MTBFMTTF

MTTRMTTFMTTFtyAvailabili

SENG521 (Fall 2002) far@enel.ucalgary.ca 47

Failure Functions /5Failure Functions /5Failure(s) in time period

Probability

Elapsed time(1 hour)

Elapsed time(5 hours)

0 0.10 0.01

1 0.18 0.02

2 0.22 0.03

3 0.16 0.04

4 0.11 0.05

5 0.08 0.07

6 0.05 0.09

7 0.04 0.12

8 0.03 0.16

9 0.02 0.13

10 0.01 0.10

11 0 0.07

12 0 0.05

13 0 0.03

14 0 0.02

15 0 0.01

Mean 3.04 7.77

SENG521 (Fall 2002) far@enel.ucalgary.ca 48

Reliability ModelReliability Model

ReliabilityReliabilityModelModel

Fault introduction:Characteristics of the product (e.g., program size)Development process (e.g., SE tools and techniques, staff experiences, etc.)

Fault removal:Failure discovery (e.g., extent of execution, operational profile)Quality of repair activity

Environment

SENG521 (Fall 2002) far@enel.ucalgary.ca 49

ConclusionConclusion Software Reliability Engineering (SRE) can

offer metrics to help elevate a software development organization to the upper levels of software development maturity.