7010681 Software Quality Metrics

8/9/2019 7010681 Software Quality Metrics

1/26

Chap. 3 Software Quality Metrics

1. Software Metrics Overview

2. Product Quality Metrics

3. In-Process Quality Metrics4. Effort/Outcome Model


2/26

1. Software Metrics Overview

-Software measurement is concerned with deriving a numeric value

for some attribute of a software product or a software process.

Comparing these values to each other and to relevant standards allows drawingconclusions about the quality of softwareproduct or the effectiveness of software

processes.

Examples of software metrics: program size, defect count for a given program,head count required to deliver a system component, development effort and time,

product cost, etc.

Measurement gives a snapshot of specific parameters represented by numbers,

weights or binary statements. Taking measurements over time and comparing

them with specific baseline generate metrics.

Software Measurement and Metrics


3/26

Classes of Software Metrics

-There are 3 classes of entities of interest in software measurement:processes, products, and resources.

processes: Are any software related activities which take place over time.products: are any artifacts, deliverables or documents which arise out of the

processes.

resources: are the items which are inputs to processes

-These result in 3 broad classes of software metrics:

project (i.e., resource), product and process metrics.


4/26

Process Metrics

Measure the efficacy of processes

Focus on quality achieved as a consequence of a repeatable

or managed process

Reuse (historical/past projects) data for prediction

Examples:

- defect removal efficiency- productivity

Product Metrics

Measure predefined product attributes

Focus on the quality of deliverables

Examples of product measures include:

- Code or design complexity

- Program size

- Defect count


5/26

Main goals of project metrics:Assess the status of projects

Track riskIdentify problem areas

Adjust work flow

Project Metrics

Examples of project measurement targets include:

- Effort/time per SE task- Defects detected per review hour

- Scheduled vs. actual milestone dates

- Changes (number) and their characteristics

- Distribution of effort on SE tasks


6/26

-Software quality metrics are a subset of software metrics that focus

on the quality aspects of the product, process, and project.

In general, software quality metrics are more closely associated with processand product metrics than with project metrics.

Nonetheless, project parameters such as the number of developers and their skilllevels, the schedule, the size, and the organization certainly affect product quality.

-Software quality metrics can be further divided into end-product

quality metrics and in-process quality metrics.

Software Quality Metrics

The ultimate goal of software quality engineering is to investigate the relationships

among in-process metrics, project characteristics, and end-product quality, and basedon these findings to engineer improvements in both process and product quality.


7/26

2. Product Quality Metrics

-The de facto definition of software quality consists of two levels:

intrinsic product quality andcustomer satisfaction.

Intrinsic product quality is usually measured by the number of bugs in thesoftware or by how long the software can run before encountering a crash.

In operational definitions, the two metrics used for intrinsic product quality are:-Defect density (rate)

-Mean Time To Failure (MTTF)

-The two metrics are correlated but are different both in purpose and

usage.

MTTF:

- Most often used with special-purpose systems such as safety-critical systems-Measure the time between failures

Defect density:-Mostly used in general-purpose systems or commercial-use software.

-Measure the defects relative to the software size (e.g., lines of code, function points)


8/26

Size may be estimated:

-Directly using size oriented metrics: lines of code (loc)

-Indirectly using function oriented metrics: function points (FP)


9/26

Size Oriented Metrics

-Lines of code metric is anything but simple:

The major difficulty comes from the ambiguity of the operational definition,the actual counting.

In the early days of Assembler programming, in which one physical line wasthe same as one instruction, the LOC definition was clear.

With high-level languages the one-to-one correspondence doesnt work.

Normalising quality/productivity metrics considering sizeSize usually LOC, KLOC, or Pages of Documentation

Defects per KLOC

$ per LOC

Pages of documentation per KLOCLOC per person-month

$ per page of documentation

Lines of Code (LOC)


10/26

Differences between physical lines and instruction statements (or logical linesof code) and differences among languages contribute to the huge variations in

counting LOCs.

Possible approaches used include:

-Count only executable lines

-Count executable lines plus data definitions

-Count executable lines, data definitions, and comments

-Count executable lines, data definitions, comments, and job control language

-Count lines as physical line on an input screen-Count lines as terminated by logical delimiters.

-In any case, when any data on size of program products and their

quality are presented, the method for LOC counting should be

described: whether it is based on physical or logical LOC.When straight LOC count is used, size and defect rate comparisons across

(different) languages are often invalid.

Example: the approach used at IBM consists of counting source instructions includingexecutable lines and data definitions but excluding comments and program prologue.


11/26

-At IBM, the following LOC count metrics, based on logical LOC,

are used:

Shipped Source Instructions (SSI): LOC count for the total product

Changed Source Instructions (CSI): LOC count for the new and changedcode of the new release.

SSI (current release) = SSI (previous release)

+ CSI (new and changed code instructions for current

release)

- deleted code (usually very small)

- changed code (to avoid double count in both SSI & CSI)

Defect Density Metrics

-Defects after the release of the product are tracked. Defects can be

field defects (found by customers), or internal defects (found

internally).

-The relationship between SSI and CSI is given by:


12/26

1. Total defects per KSSI: a measure of code quality of the total product.

2. Field defects per KSSI: a measure of defect rate in the field3. Release-origin defects (field and internal) per KCSI: a measure of

development quality

4. Release-origin field defects per KCSI: a measure of development quality per defects

found by customers.

-Postrelease defect rate metrics can be computed by thousand SSI

(KSSI) or per thousand CSI (KCSI):

Metrics (1) and (3) are the same for the initial release where the entire product is new; thereafter,metric (1) is affected by aging and the improvement (or deterioration) of metric (3).

Metrics (1) and (3) are process measures; their field counterparts, metrics (2) and (4) represent thecustomers perspective.

Given an estimated defect rate (KCSI or KSSI), software developers can minimize the impact to

customers by finding and fixing the defects before customers encounter them.

From the customers point of view, the defect rate is not as relevant as the total number of defectsthat might affect their business. Therefore, a good defect rate target should lead to a release-to-release

reduction in the total number of defects, regardless of size.


13/26

Example3.1: We are about to launch the third release of an inventory managementsoftware system.

-The size of the initial release was 50 KLOC, with a defect density of 2.0 def/KSSI.

-In the second release, 20KLOC of code was newly added or modified (20% ofwhich are changed lines of code, with no deleted code ).

-In the third release, 30KLOC of code was newly added or modified (20% of

which are changed lines of code, with no deleted code ).

-Assume that the LOC counts are based on source instructions (SSI/CSI).

a. Considering that the quality goal of the company is to achieve 10%

improvement in overall defect rates from release-to-release, calculate the

total number of additional defects for the third release.

b. What should be the maximum (overall) defect rate target for the third release to

ensure that the number of new defects does not exceed that of the second release.


14/26

Customer Problems Metric

-Measure the problems customers encounter when using the product.

For the defect rate metric, the numerator is the number of valid defects.

However, from the customers standpoint, all problems they encounter whileusing the product, not just the valid defects, are problems with software.

These include:-Usability problems

-Unclear documentation or information

-Users errors etc.

-The problems metric is usually expressed in terms of problems per

user month (PUM):

PUM = Total problems that customers reported (true defects and non-defect-oriented

problems) for a time period /Total number of license-months of the software during the period

Where:Number of license-months = Number of install licenses of the software

Number of months in the calculation period


15/26

Customer Satisfaction Metrics

-Customer satisfaction is often measured by customer survey data

via the five-point scale: Very satisfied Satisfied Neutral Dissatisfied Very dissatisfied

-Based on the five-point scale, various metrics may be computed.

For example:

(1) Percent of completely satisfied customers

(2) Percent of satisfied customers : satisfied and completely satisfied

(3) Percent of dissatisfied customers: dissatisfied and completely dissatisfied

(4) Percent of nonsatisfied customers: neutral, dissatisfied, and completely dissatisfied

-The parameters used, for instance, by IBM to monitor customers

satisfaction include the CUPRIMDSO categories:

capability, functionality, usability, performance, reliability, installability, maintainability,

documentation/information, service, and overall.


16/26

-Various methods can be used to gather customer survey data, including

face-to-face interviews, phone interviews, and mailed questionnaires.

-When the customer base is large, it is too costly to survey all

customers. Estimating the satisfaction level through a representative

sample is more efficient.

-In general probability sampling methods such as simple random

sampling are used to obtain representative samples.

Methods of Survey Data Collection


17/26

Sample Size

-For each probability sampling method, specific formulas are available

to calculate sample size, some of which are quite complicated.

-For simple random sampling, the sample size n required to estimate

a population proportion (e.g., percent satisfied) is given by:2

2 2

(1 )

(1 )

N Z p pn

N B Z p p

=

+

Where:

- N: population size

- Z: Z statistic from normal distribution

for 80% confidence level, Z =1.28 for 85% confidence level, Z = 1.45

for 90% confidence level, Z = 1.65 for 95% confidence level, Z = 1.96

-p: estimated satisfaction level-B: margin of error

Example: Given a software product, estimate the sample size required to achieve 90%

expected satisfaction level for 10,000 customers for a level of confidencewith 5% margin of error.


18/26

-Defect rate during formal machine testing (after code is integrated

into the system library) is actually positively correlated withthe defect rate in the field.Higher defect rates found during testing is often an indicator that the software has

experienced higher error injection during its development process.

The reason is simply that software defects density never follows the uniform distribution:

if a piece of code has higher testing defects, it is a result of more effective testing or it isbecause of latent defects in the code.

-Hence, the simple metric of defects per KLOC or function point is

a good indicator of quality while the software is still being tested.

-Overall, defect density during testing is a summary indicator; moreinformation are actually given by the pattern of defect arrivals

(which is based on the times between failures).

Even with the same overall defect rate during testing, different patterns of defects

arrivals indicate different quality levels in the field.

3. In-process Quality Metrics

Defect Arrival Pattern During Machine Testing


19/26

Test defect

arrival rate

week

Test defect

arrival rate

weekTest defect

cumulative rate

week

Test defect

cumulative rate

week

Examples: Defect arrival patterns for two different projects

(test scenario 1)(test scenario 2)


20/26

The objective is always to look for defect arrivals that stabilize at

a very low level, or times between failures that are far apart, before

stopping the testing effort and releasing the software to the field.

-The time unit for observing the arrival pattern is usually weeks and occasionally months.

-For models that require execution time data, the time intervals is in units of CPU time.


21/26

Defect Removal Effectiveness

Defect Injection and Removal

-Defects are injected into the software product or intermediate

deliverables of the product at various phases.

Testing itselfBad fixesSystem testTesting itselfBad fixesComponent test

Testing itselfBad fixesUnit test

Build verification TestingIntegration and build processIntegration/build

Code inspectionsCodingCode implementation

Low-level design inspectionsDesign workLow-level design

High-level design inspectionsDesign workHigh-level design

Requirements analysis and reviewRequirements gatheringfunctional specifications

Requirements

Defect RemovalDefect InjectionDevelopment phase

Defects at exit of a development step = defects escaped from previous step

+ defects injected in current step

- defects removed in current step

-The total number of defects at the end of a phase is given by:


22/26

Defect Removal Effectiveness (DRE) Metrics

-Measure the overall defect removal ability of the development

process.It can be calculated for the entire development process, for the front end (beforecode integration) or for each phase:

-When used for the front end, it is referred to as early defect removal

-When used for specific phase, it is referred to asphase effectiveness

The higher the value of the metric, the more effective the development processand the fewer defects escape to the next phase or to the field.

-Defined as follows:

R e

100%in

Defects moved during a Development Phase

DRE Defects latent the product =

-The number of defects latent in the product at any given phase

is usually obtained through estimation as:

Defect latent = Defects removed during the phase + defects found later


23/26

Example: Phase Effectiveness of a Software Project

Development Phase

Phase

Effectiveness

(%)

High level

Design

review (DR)

Low level

Design

review (LR)

Code

Inspection

(CI)

Unit

Test

(UT)

Component

Test (CT)SystemTest (ST)

Weakest phases are UT, CI, and CT:

based on Phase effectiveness metrics, actionsplans to improve the effectiveness of theses

phases would be established and deployed.

4 Eff /O M d l


24/26

4. Effort/Outcome Model

-One major problem with the defect removal model is that the

interpretation of the rates could be flawed:

When tracking the defect removal rates against the model, lower actual defectremoval could be the result of lower error injection or poor reviews and inspections.

-To solve this problem, additional indicators must be incorporatedinto the context of the model for better interpretation of the data.

-One such additional indicator is the quality of process execution. For instance, the metric of inspection effort (operationalized as the number of

hours the team spent on design and code inspections normalized per thousand

of lines of code inspected) is used as an indicator for how rigorous the inspection

process is executed.

This metric combined with the defect rate can provide useful interpretation of thedefect model.

In contrast, higher actual defect removal could be the result of higher errorinjection or better reviews and inspections.


25/26

Cell4

Unsure

Cell3

Worst-Case

Cell2

Best Case

Cell1

Good/Not Bad

(Outcome)

Higher Lower

Higher

(Effort)

Lower

-Specifically a 2x2 matrix, named effort/outcome matrix, can be used.The matrix is formed by combining the scenarios of an effort indicator and anoutcome indicator.

The model can be applied to any phase of the development process with anypairs of meaningful indicators.

The high-low comparisons are between actual data and the model, or between thecurrent and previous releases of the product.

Effort/Outcome Model for Inspection

Defect Rate

Inspection

Effort


26/26

Best-case scenario (Cell2): the design/code was cleaner before inspections, and yet theteam spent enough effort in design review/code inspection that good quality was ensured.

Good/not bad scenario (Cell1): error injection may be higher, but higher effort spent is a

positive sign and that may be why more defects were removed.

Worst-case scenario (Cell3): High error injection but inspections were not rigorous enough.Chances are more defects remained in the design or code at the end of the inspection process.

Unsure scenario (Cell4): one cannot ascertain whether the design and code were better,therefore less time was needed for inspection or inspections were hastily done, so fewer

defects were found.

7010681 Software Quality Metrics

Documents

Transcript of 7010681 Software Quality Metrics