Lecture 11 : Testing, Verification, Validation

43
Lecture 11 Testing, Verification, Validation and Certification 540 – Quantitative Software Engineering You can’t test in quality Independent system testers

Transcript of Lecture 11 : Testing, Verification, Validation

Lecture 11 Testing, Verification, Validation and Certification

CS 540 – Quantitative Software Engineering

You can’t test in quality

Independent system testers

Software Quality vs. Software Testing

Software Quality Management (SQM) refers to processes designed to engineer value, functional conformance, and minimize faults, failures, and defects• Includes processes throughout the software life cycle

(inspections, reviews, audits, validations, etc.

Software testing is an activity performed for evaluating quality (and improving it) by identifying defects and problems (SWEBOK)

SWEBOK Software Testing

“Software testing consists of dynamic verification of the behavior of a program on a finite set of test cases, suitably selected from the usually infinite executions domain, against the expected behavior”• Dynamic (means software execution vs. static inspections,

reviews, etc.)

• Finite (Trade-off of resources)

• Selected: Techniques vary on how they select tests (purpose)

• Expected behavior: functional and operational

SWEBOK Software Testing Topics

Fundamentals: • Definitions, standards, terminology, etc.

• Keys issues: looking for defects vs. verify and validate

Test levels: • Unit test Beta test

• Objectives: conformance, functional, acceptance, installation, performance/stress, reliability, stress, usability, etc

Test Techniques:• Ad-hoc, exploratory, specification-based, boundary-value

SWEBOK Software Testing Topics

Test Techniques:• Ad-hoc

• Exploratory

• Specification-based

• Boundary-value analysis

• Decision table

• Finite state/Model

• Random Generation

• Code based (control flow vs. data flow)

• Application/Technology based: GUI, OO, Protocol, safety, certification,

SWEBOK Software Testing Topics

Test Effectiveness Metrics• Fault types and categorization

• Fault density

• Statistical estimates of find/fix rates

• Reliability modeling (failure occurrences)

• Coverage measures

• Fault seeding

Testing Metrics

Test Case Execution Metrics• Percent Planned, Executed, Passed

Defect Rates• Defect rates based on NCLOC• Predicted defect detection (upper/lower control limits)• Fault Density/fault criticality (software control board)

Fault types, classification, and root cause analysis Fault on fault, breakage, regression test failures Reliability, Performance Impact Field faults/ Prediction of deficiencies

Software Testing Axioms

Dijkstra “Testing can show the presence of bugs but not their absence!”

Independent testing is a necessary but not sufficient condition for trustworthiness.

Good testing is hard and occupies 20% of the schedule

Poor testing can dominate 40% of the schedule Test to assure confidence in operation; not to find

bugs

Software Quality and Testing Axioms

It is impossible to completely test software. Software testing is a risk based exercise. All software contains faults and defects. The more bugs you find, the more there are. “A relatively small number of causes will

typically produce a large majority of the problems or defects (80/20 Rule).” --Pareto principle

Types of Tests

Unit Interface Integration System Scenario Reliability Stress Verification Validation Certification

When to Test

Boehm- errors discovered in the operational phase incur cost 10 to 90 times higher than design phase• Over 60% of the errors were introduced during

design • 2/3’s of these not discovered until operations

Test requirements specifications, architectures and designs

Testing Approaches

Coverage based - all statements must be executed at least once Fault based- detect faults, artificially seed and determine whether

tests get at least X% of the faults Error based - focus on typical errors such as boundary values (off by

1) or max elements in list Black box - function, specification based,test cases derived from

specification White box - structure, program based, testing considering internal

logical structure of the software Stress Based – no load, impulse, uniform, linear growth, exponential

growth by 2’s.

Testing Vocabulary

Error - human action producing incorrect result Fault is a manifestation of an error in the code Failure – a system anomaly, executing a fault induces a

failure Verification “The process of evaluating a system or

component to determine whether the products of a given development phase satisfy conditions imposed at the start of the phase” e.g., ensure software correctly implements a certain function- have we built the system right

Validation “The process of evaluating a system or component during or at the end of development process to determine whether it satisfies specified requirements”

Certification “The process of assuring that the solution solves the problem.

IEEE 829 IEEE Standard for Software Test Documentation

Test Case Specification Test suite Test Scripts Test Scenarios Test Plans Test Logs Test Incident Report Test Item Transmittal Report Test Summary Report

Test Process

Program or Doc

input

Test strategy

Prototype Or model

Subset of input

Subset of input Execute

Expected output

Acutal output

compare

Testresults

Fault Detection vs. Confidence Building

Testing provokes failure behavior - a good strategy for fault detection but does not inspire confidence

User wants failure free behavior - high reliability Automatic recovery minimizes user doubts. Test team results can demoralize end users, so report only

those impacting them.

A project with no problems is in deep trouble.

Cleanroom

Developer does not execute code - convinced of correctness through static analysis

Modules are integrated and tested by independent testers using traffic based input profiles.

Goal: Achieve a given reliability level considering expected use.

Testing requirements

Review or inspection to check that all aspects of the system have been described• Scenarios with prospective users resulting in

functional tests Common errors in a specification:

• Missing information• Wrong information• Extra information

Boehm’s specification criteria

Completeness- all components present and described completely - nothing pending

Consistent- components do not conflict and specification does not conflict with external specifications --internal and external consistency. Each component must be traceable

Feasibility- benefits must outweigh cost, risk analysis (safety-robotics)

Testable - the system does what’s described Roots of ICED-T

Traceability Tables

Features - requirements relate to observable system/product features

Source - source for each requirement Dependency - relation of requirements to each other Subsystem - requirements by subsystem Interface requirements relation to internal and external

interfaces

Traceability Table: Pressman

S01 S02 S03…

R01 X

R02 X X

R03… X

REQUIREMENTS

SUBSYSTEM

Maintenance Testing

More than 50% of the project life is spent in maintenance Modifications induce another round of tests Regression tests

• Library of previous test plus adding more (especially if the fix was for a fault not uncovered by previous tests)

• Issue is whether to retest all vs selective retest, expense related decision (and state of the architecture/design related decision – when entropy sets test thoroughly!)

• Cuts testing interval in half.

V&V planning and documentation

IEEE 1012 specifies what should be in Test Plan Test Design Document specifies for each software feature the details

of the test approach and lists the associated tests Test Case Document lists inputs, expected outputs and execution

conditions Test Procedure Document lists the sequence of action in the testing

process Test Report states what happened for each test case. Sometimes these

are required as part of the contract for the system delivery. In small projects many of these can be combined

IEEE 1012

PurposeReferenced DocumentsDefinitionsV&V overview

• Organization• Master schedule• Resources summary• Responsibilities• Tools, techniques and

methodologiesLife cycle V&V

• Management of V&V• Requirements phase V&V

3. Design phase V&V4. Implementation V&V5. Test phase V&V6. Installation and checkout

phase V&V7. O&M V&V

Software V&V ReportingV&V admin procedures

1. Anomaly reporting and resolution

2. Task iteration policy3. Deviation policy4. Control procedures5. Standard practices and

conventions

Human static testing

Reading - peer reviews (best and worst technique) Walkthroughs and Inspections Scenario Based Evaluation (SAAM) Correctness Proofs Stepwise Abstraction from code to spec

Inspections

Sometimes referred to as Fagan inspections Basically a team of about 4 folks examines code, statement by

statement• Code is read before meeting• Meeting is run by a moderator• 2 inspectors or readers paraphrase code• Author is silent observer• Code analyzed using checklist of faults: wrongful use of data,

declaration, computation, relational expressions, control flow, interfaces Results in problems identified that author corrects and moderator

reinspectsConstructive attitude essential; do not use for

programmer's performance reviews

Walk throughs

Guided reading of code using test data to run a “simulation”

Generally less formal Learning situation for new developers Parnas advocates a review with specialized roles

where the roles define questions asked - proven to be very effective - active reviews

Non-directive listening

The Value of Inspections/Walk-Thoughs(Humphrey 1989)

Inspections can be 20 times more efficient than testing.

Code reading detects twice as many defects/hour as testing

80% of development errors were found by inspections

Inspections resulted in a 10x reduction in cost of finding errors

Beware bureaucratic code reviews drive away gurus.

SAAM

Software Architecture Analysis Method Scenarios that describe both current and future behavior Classify the scenarios by whether current architecture

directly (full support) or indirectly supports it Develop a list of changes to architecture/high level design

- if semantically different scenarios require a change in the same component, this may indicate flaws in the architecture• Cohesion glue that keeps modules together - low=bad

» Functional cohesion all components contribute to the single function of that module

» Data cohesion - encapsulate abstract data types• Coupling strength of inter module connections, loosely coupled

modules are easier to comprehend and adapt, low=good

Coverage based Techniques(unit testing)

Adequacy of testing based on coverage, percent statements executed, percent functional requirements tested

All paths coverage is an exhaustive testing of code Control flow coverage:

• All nodes coverage, all statements coverage recall Cyclomatic complexity graphs

• All edge coverage or branch coverage, all branches chosen at least once

• Multiple condition coverage or extended branch coverage covers all combinations of elementary predicates

• Cyclomatic number criterion tests all linearly independent paths

Coverage Based Techniques -2

Data Flow Coverage - considers definitions and use of variables• A variable is defined if it is assigned a value in a

statement• A definition is alive if the variable is not reassigned at

an intermediate statement and it is a definition clear path• Variable use P-use (as a predicate) C-use (as anything

else)• Testing each possible use of a definition is all-uses

coverage

Requirements coverage

Transform the requirements into a graph• nodes denoting elementary requirements • edges denoting relations between elementary

requirements Derive test cases Use control flow coverage

Fault Seeding to estimate faults in a program

Artificially seed faults, test to discover both seeded and new faults:

Total faults = ((total faults found – total seeded faults found)[ total seeded faults/total seeded faults found

Assumes real and seeded errors have same distribution but manually generating faults may not be realistic

Alternative: use two groups: real faults found by X become seeded faults for Y

Trust results when most faults found are seeded. Many real faults found is negative. Redesign module. Probability of more faults in a module is proportional to

the number of errors already found!

Orthogonal Array Testing

Intelligent selection of test cases Fault model being tested is that simple interactions are a

major source of defects• Independent variables - factors and number of values they can take -- if

you have four variables, each of which could have 3 values, exhaustive testing would be 81 tests (3x3x3x3) whereas OATS technique would only require 9 tests yet would test all pair-wise interactions

Top-down and Bottom-up

Bottom-up Top-down

Major

Features

Allows early testing

Modules can be integrated in various clusters as desired.

Major emphasis is on module functionality and performance.

The control program is tested first.

Modules are integrated one at a time.

Major emphasis is on interface testing

Advantages No test stubs are needed

It is easier to adjust st5ffing needs

Errors in critical modules are found early

No test drivers are needed

The control program plus a few modules forms a basic early prototype

Interface errors are discovered early

Modular features aid debugging

Disadvantages Test drivers and harness are needed

Many modules must be integrated before a working program is available

Interface errors are discovered late

Test stubs are needed

The extended early phases dictate a slow staff buildup

Errors in critical modules at low levels are found late

Humphrey, 1989

Some Specialized Tests

Testing GUIs Testing with Client/Server architectures Testing documentation and help facilities Testing real time systems Acceptance test Conformance test

Software Testing Footprint

Time

Tests Completed

Planned

Rejection point

Tests run successfullyPoor Module Quality

Test Status

Customer Interests

I N S T A L L A T I O N

Before

• Features• Price• Schedule

After

• Reliability• Response Time• Throughput

• Customer buys off-the-shelf

• System works with 40-60% flow- through

• Developers complies with enhancements

BUT

• Customer refuses critical Billing Module

• Customer demands 33 enhancements and tinkers with database

• Unintended system consequences

Why bad things happen to good systems

Mindset

Move from a culture of minimal change to one of maximal change.

Move to "make it work, make it work right, make it work better" philosophy through prototyping and delaying code optimization.

Give the test teams the "right of refusal" for any code that was not reasonably tested by the developers.

Productivity

Productivity =

F {people,

system nature,

customer relations,

capital investment}

Software Testing Summary

Software testing Body of Knowledge very advanced (in terms of standards, literature, etc.)

Software testing is very expensive; statistical risk analysis must be utilized

• Cost of field faults vs. schedule slips• Release readiness criteria procedures required

Testing techniques vary according to operational environment and application functionality

• No magic methods Organizational conflict of interest between development and

test and project management and test Involve testers throughout project Hardest PM decision ship/don’t ship due to quality