Software Quality Assurance

SOFTWARE QUALITY ASSURANCEWilliam W. McMillan16 March 2013

Quality Assurance = Testing?

Meaning of Quality

• Error-free– How define an error?

• Client is happy (we get paid!).• User is happy (we are loved!).• Stable (we won’t be bothered).• Doesn’t fail when needed (we aren’t sued).• Long-lasting (we can modify & add stuff later).• … ??

How would you define software quality?

Common Measures

Meets specifications Safe Testable Maintainable User-friendly Users can be efficient Meets standards Portable

Learnable Secure Meets real requirements Modular, reusable Well-designed Powerful (throughput) Reliable

Categories of SQA Verification

Meets specifications? “Did we build the product correctly?”

Validation Meets real needs of client and users? “Did we build the correct product?”

Livability Assessment Can we stand to fix, read, update, reuse, port… this

thing?

Verification

We’ve developed user and client requirements (even if just in our heads).

We’ve developed a design and technical specifications (even if just in our heads).

We build a partial or whole system. Verification determines if what we’ve built

matches requirements and technical specs.

Give an example of how you’ve verified some of your own code.

Validation

We might verify a system to our satisfaction and deliver it.

But then the client and users find it to be: Hard to use Incomplete Incorrect

What went wrong?

Validation Object is to determine whether we’ve met the real

needs of the client and users. Requirements can be incorrect. Why? Some reasons:

Ambiguity of natural language Changes in needs We asked the wrong questions Something lost in translation to design and specs

Give an example of when your code was verified, but not valid.

Livability

What would you like in a piece of software that you were going to be “married” to?

Broad category (previously discussed in course) Modular Reusable Well documented Portable (e.g., hardware specific stuff separated) Modifiable (e.g., UI separate from “business” code) Meets code conventions …?

Software Testing For verification, validation, or livability? At least for V & V. “Testing” implies executing code.

I.e., it’s dynamic Most agree that testing is necessary to assure

software quality. But the reliance on testing has been challenged (to

be discussed later).

Software Testing Is it feasible to exhaustively test a program? Say we have only 6 inputs and each one can take

on one of 20 values. That’s 206 = 64 million possible input vectors. How about if we have dozens or hundreds of inputs

and the ranges of values are much wider? Testing almost always has to sample the input

space.

Test-Driven Development

Used in agile methods. Testing is part of development. Define a new function or operation. Define test cases for that component. Develop the code until the tests are passed. Future changes to system involve re-running

previous tests to see what’s been broken.

Test-Driven Development

Advantages of this approach? Is this a kind of exhaustive testing? In what sense is the testing process part of

specification? What if the test cases defined don’t cover the input

domain? What if the functionality defined is not really what

the client and users want?

Partition Testing

Divide up input space into equivalence classes I.e., classes in which behavior of program is essentially

the same. Depends on domain knowledge and system

requirements. Sample each partition. Special attention to boundaries between partitions.

Partition Testing

Age

Family Income

18

Low Medium High

Exercise

Suppose a system is being developed to produce tuition bills from students’ class registrations. Each input object is one student’s class schedule for a single term. Each output is a bill sent to the student.

What useful partitions of the input domain would you define? Within individual domains, in what ways is functioning uniform? Define some data values that would put cases on “boundaries” between partitions.

Random Testing

Sample randomly from input domain. Uniform probability distribution is implied. Many tests can be run automatically. Can be combined with partition testing. What are the advantages and disadvantages?

Statistical Testing

Sample probabilistically, but not uniformly. Have probability distribution(s) from:

Theoretical model Past data

Sample test cases according to expected input distributions.

“Operational Profile”

Statistical Testing

Say we’ve developed a new web site that delivers instruction to automobile technicians.

From past interactions with such services, we expect: 65% of the user actions to be straight progression

through the lessons 20% to be answering self-test questions 8% to be questions asked of the help system 7% to be unexplained or confused

Statistical Testing

We generate test cases for the system in proportion to these expectations.

What measure sometimes defined under non-functional requirements might this kind of testing yield?

Think of a low-level network-traffic function that might be addressed through statistical testing.

Stress Testing Used for systems that require

Heavy data transmission Many transactions (DB access, user events, etc.) Heavy-duty computation

Usually is statistical testing (automatically generated data).

Try to break the system via heavy loads. Monitor performance and bottlenecks. Improve where necessary.

What real system would benefit from stress testing?What non-functional requirements measures might be addressed through stress testing?

Regression Testing After code is added to a system under

development… … or a change is made to a deployed system… We re-run previous test suites to see if an

unintended side effect has broken something. Some firms do a daily system build and regression

testing to see what came off the rails. Used constantly in agile methods.

If regression testing frequently uncovers faults, what advice would you give the developers?

Unit vs. Integration Testing

Some testing is aimed at single methods or one class.

Whole systems or large increments? Have to do both

Top-down (use stubs of lower-level) Bottom-up (use drivers in lieu of higher-level)

Integration issues: Regression testing Interfaces between parts Coupling

What code integration problems have you encountered in the past?

Mutation Testing

Aimed at getting adequate test data set. If program works with these data then you have

confidence the program is correct. To see if data set is adequate, try it with

intentional mutations of the program. Test should fail. If not, you don’t have an adequate

data set.

Mutation Testing

Definitions of propositions:D: The data set adequately tests the program.R: The program runs correctly.M: The program used in testing is a mutation of the target

program. The line of reasoning:

(M and D) R’ The program should run incorrectly.R (M and D)'R (M' or D') If the program does run, we must not be using

a mutation or the test data are inadequate.

Coverage Testing

Test cases developed to maximally cover the code in some sense. (Partition testing “covers” input partitions, but here it’s

code coverage that is the goal.) Systems have failed because some instructions

were never executed in tests. Might want to try to execute as many statements

as possible in testing.

Coverage Testing

Might want to ensure that every decision statement (if, switch, while, etc.) is executed.

Or that every pair consisting of a variable definition and its use in computation is covered.

Or that every pair consisting of a variable definition and its use in a decision is covered.

Use software tools for this kind of testing.

What other kind of code coverage could be defined?

McCabe Metric Graph theoretic measure of code complexity. Has implications for code coverage. Turn all decisions into binary decisions.

if (x > 0) is a binary decision. if (x > 0 && y < 10) needs to be broken into

two decision “nodes” All straight-line blocks of statements are made into

single nodes.

McCabe Metric

“Cyclomatic Complexity”# edges – # nodes + 2

9 edges, 8 nodes, soMcCabe metric is 3

= number of enclosed regions plus background

McCabe Metric

Gives maximum number of test cases needed to execute all statements.

What else is this measure good for?

User Testing

Goals include determining: Usability Learnability Correctness (verification) Match with needs (validation) Requirements (from prototypes)

Participants: Real potential users Handy stand-ins

User Testing

Can be done Throughout development At delivery (part of acceptance testing) After deployment

Users use an executable prototype, a functioning increment, or a complete system.

Higher investment in time and money allows more formal tests.

Even very casual user testing can be very beneficial.

User Testing

Exploratory (“playing” with the system) Requirements discovery Style, likability

Semi-formal Ask user to accomplish some task Note questions, confusions, etc.

Formal Controlled setting and method Times and actions recorded

User Testing

Formal testing should have clear research questions.

What are some examples of formal research questions you might ask about using a web content management system?

What measurements would you want to have? How would you run a study in a controlled setting

(say a usability lab)?

Beta Testing

When have you had contact with this kind of testing?

Why do you think firms employ it? What difficulties might be associated with this kind

of testing? If you were putting a product into beta testing,

what would you do to make the effort pay off?

Static Evaluation Testing is dynamic, i.e., code is executed. Static techniques do not involve running code. The main approaches we’ll look at are:

Formal verification of code correctness Code inspections and walkthroughs Static code analysis Automatic model checking

What do we mean when we say that a technique, tool, or language is “formal”?

Formal Verification

Code is a mathematical or logical entity. It has well-defined syntax and semantics. If specifications are formally stated, why can’t we

use a formal (proof-based) method to determine whether the code will work to specification?

In this view, reliance on testing is seen as an embarrassment or as a sign of an insufficiently educated computer scientist.

Formal Verification State formal preconditions and post-conditions. Basic strategy:

preconditions code post-conditions Need formal definitions of code semantics. Simple example:

// pre: x > 10x = x + 5;// post: x > 15 (by addition, assignment, substitution)

Formal Verification Requires sophisticated person to do. Can be time consuming. Proofs can contain human errors. Doesn’t take into account things like data

transmission times, sensor glitches, disk faults, etc. Where would this be most valuable?

Formal Verification Unit Testing Unit testing (say with JUnit) employs pre and post

conditions (or assertions): With these inputs, I should get so-and-so results.

But these are assertions about specific test cases, not general assertions.

A test case can succeed, but the code can be wrong.

In theory, formal verification proves correctness for all cases.

Code Inspections & Walkthroughs Not exactly the same things, but we’ll combine. Procedure:

Code is written and distributed. At the inspection meeting, moderator leads, presenter

presents the code (not the author), a scribe records, and inspectors comment on correctness and other features of the code.

Defects and places that need improvement are noted and the author reworks the code.

Code Inspections & Walkthroughs Similarity to formal verification. Specifications and the code semantics are central. Presenter and/or author are trying to “prove” that

the code is correct. Inspectors are evaluating whether the details of the

code will lead to desired outcomes. Can be very effective in finding errors, but is not so

formal that one needs special skills.

What would you include in a form to be used in code inspections?

Automatic Static Analysis

Modern compilers do some of this. Method-call signatures Uninitialized variables Variables that won’t be initialized if exception Common programmer errors like:

if (x = 1) … Possible overflows Possible memory leaks

What other errors could static analysis tool find?

Automatic Model Checking

From source code, derive state transition model. Write logical propositions about properties of

nodes. Automatic model checker searches for paths that

make the propositions false. Concurrency makes this interesting. Computationally expensive.

How are model checkers similar to formal proofs of correctness?How do they use “brute force” computation to overcome a major deficiency of formal proofs?

Software Quality Standards Often aimed at quality of process. Underlying philosophy:

If good process then good product. Good process is “conscious,” or self-aware. Documenting experiences is critical.

Many bodies define standards. NATO, IEEE, EU, NIST, ISO,…

Client may require adherence to standard.

Software Quality Standards Continuous improvement of processes is central to

most. Management commitment is critical.

Resources have to be supplied. Need personnel at all levels to participate in

definition. Has to be understood and supported at all levels.

ISO 9001

International Standards Organization For software development. Specialization of ISO 9000, a family of quality

processes. Does not define specific processes. Organizations define own methods that conform

to key principles. Documenting conformance is critical.

ISO 9001

Core processes: Product creation & delivery

Business acquisition, Design and development, Test, Production & delivery, Service & support

Management of: Business, Suppliers, Inventory, Configuration

Show how own processes control above. Need procedures to enforce conformance. Evidence supports value of ISO 9001.

Advantages and disadvantages of seeking quality certification?

Code Metrics

Cyclomatic complexity (see McCabe metric). Lines of code in module. Degree of nesting (ifs, loops). Fan in/out: calls how many, is called by how many? Number of parameters of function. Dependence on common structures (files, data

ports, UI components, static variables,…)

Object-Oriented Code Metrics

Number of classes. Number of API imports, API method calls. Number of methods or data members per class

(can weight by complexity). Depth of class hierarchy. Number of subclasses of a class. Complexity of graph of dependencies between

classes.

What other code metrics could be defined?

Software Quality Assurance

Documents

Transcript of Software Quality Assurance