Software Measurement UCLA Computer Science Department CS130 Winter, 2002.

Software Measurement

UCLA Computer Science Department

CS130

Winter, 2002

2

Reference

• Material in this lecture is taken from chapters 1-3 of Software Metrics: A Rigorous and Practical Approach (2nd ed.), Norman E. Fenton and Shari Lawrence Pfleeger, 1997, PWS Publishing Company, Boston, MA, ISBN 0534954251

3

Overview

1. Measurement – what is it and why do we do it?

2. Measurement basics

3. A goal-based software measurement framework

4

Measurement – What Is It and Why Do We Do It?

1. Measurement in Everyday Life

2. Measurement in Software Engineering

3. The Scope of Software Metrics

5

Measurement in Everyday Life

• Measurement governs many aspects of everyday life:– Economic indicators determine prices, pay

raises– Medical system measurements enable

diagnosis of specific illnesses– Measurements in atmospheric systems are

the basis of weather prediction

6

Measurement in Everyday Life• How do we use measurement in our lives?

– In a shop, price is a measure of the value of an item, and we calculate the bill to make sure we get the correct change.

– Height and size measurements ensure clothing will fit correctly.

– When traveling, we calculate distance, choose a route, measure speed, and predict when we’ll arrive

• Measurement helps us to:– Understand our world– Interact with our surroundings– Improve our lives

7


• What is Measurement?– Common thread in previous examples –

some aspect of a thing is assigned a descriptor that allows us to compare it with other things.

– More formally – the process by which• Numbers or symbols are assigned to attributes

of entities in the real world in such a way as to describe them.

• According to clearly defined rules.

8

Measurement in Everyday Life• Definition of measurement process is far from clear cut.

• To understand measurement, must ask questions that are difficult to answer:

– In a room with blue walls, is “blue” a measure of the color of the room?

– A person’s height is a commonly understood attribute that can be easily measured. What about other attributes of people, such as intelligence?

– Some measurements (e.g., intelligence, wine quality) may have wide error margins – is this a reason to reject them?

– How do we decide which error margins are acceptable and which are not?

– When is a measurement scale acceptable for the purpose to which it is put (e.g., is it appropriate to measure a person’s height in kilometers)?

– What types of manipulations can we apply to the results of measurement?

• Material in next section (Measurement Basics) will allow us to answer these questions.

9

Measurement in Everyday Life• Making Things Measurable

– “What is not measurable, make measurable” (Galileo Galilei)

• One aim of science is to find ways of measuring attributes of things we’re interested in.

• Measurement makes concepts more visible, therefore more understandable and controllable.

• Attributes previously thought to be unmeasurable now form basis for decisions affecting our lives (e.g., air quality, inflation index).

– Measuring the unmeasurable improves understanding of particular entities, attributes

• Act of proposing a particular measure can open discussion that will lead to greater understanding

• Making new measurement may requiring modifying environment or practices (e.g., using a new tool, adding a step in a process)

10


• Measurement in Software Engineering– In many instances, measurement is considered a

luxury. For many projects:• Measurable targets are not set (e.g., products are

supposed to be user-friendly, reliable, and maintainable, but we don’t quantify what that means).

• The component costs of projects are not quantified or understood.

• Product quality is not quantified.• Too much reliance on anecdotal evidence (e.g., try our

product and you’ll improve your productivity by 50%!). Most of the time, there’s no measurable basis for the claims.

11


• Measurement in Software Engineering (cont’d)– When measurements are made, they tend to be:

• Incomplete• Inconsistent• Infrequent

– Most of the time, we’re not told anything about:• How experiments were designed• What was measured and how• Realistic error margins• Without this information, can’t decide whether to apply results

to a development effort, and can’t do an objective study to repeat the measurements.

– Lack of measurement in SW engineering is compounded by lack of a rigorous approach.

12


• Software Measurement Objectives– Assessing status

• Projects• Products for a specific project or projects• Processes• Resources

– Identifying trends• Need to be able to differentiate between a healthy project

and one that’s in trouble

– Determine corrective action• Measurements should indicate the appropriate corrective

action, if any is required.

13


• Types of information required to understand, control, and improve projects:– Managers

• What does the process cost?• How productive is the staff?• How good is the code?• Will the customer/user be satisfied?• How can we improve?

– Engineers• Are the requirements testable?• Have all the faults been found?• Have the product or process goals been met?• What will happen in the future?

14


• The Scope of Software Metrics– Cost and effort estimation– Productivity measures and models– Data collection– Quality models and measures– Reliability models– Performance evaluation and models– Structural and complexity metrics– Capability-maturity assessment– Management by metrics– Evaluation of methods and tools

15


• The Scope of Software Metrics – some details– Cost and effort estimation

• Motivation – accurately predict costs early in the development life cycle.

• Numerous empirical cost models have been developed

– COCOMO, COCOMO 2– Putnam’s model (see Pressman Ch 3)– ...

16


• The Scope of Software Metrics – some details– Productivity models and measures

• Estimate staff productivity to determine how much specified changes will cost

• Naive measure – size divided by effort. Doesn’t take into account things like defects, functionality, reliability.

• More comprehensive models have been developed – next slide illustrates a possible model.

17


• The Scope of Software Metrics – some details– Possible productivity model

Productivity

ValueCost

QuantityQuality

Reliability Defects FunctionalitySize

Personnel Resources Complexity

Time

Money

HW

SW

Env Cnstrst Problem difficulty

18

Measurement in Everyday Life• The Scope of Software Metrics – some

details– Software quality modelUse Factor Criteria

Product Operation

Product Revision

Usability

Reliability

Efficiency

Reusability

Maintainability

Portability

Testability

Communicativeness

Accuracy

Consistency

Device Efficiency

Completeness

Structuredness

Conciseness

Device Independence

Legibility

Self-descriptiveness

Traceability

Accessibility

Metrics

19

Overview




20

Measurement Basics

1. Overview

2. The representational theory of measurement

3. Measurement and models

4. Measurement scales and scale types

5. Meaningfulness in measurement

21

Measurement Basics• Overview

– Understanding of software attributes not as deep as understanding of non-software entities (e.g., length, weight, temperature)

– Questions that are relatively easy to answer for non-software entities are difficult for software:

• How much must we know about an attribute before it’s reasonable to consider measuring (e.g., program complexity)?

• How do we know if we’ve really measured the attribute we want to measure? Does a count of the number of defects found in a system measure its quality, or does it measure something else?

• Using measurement, what meaningful statements can we make about an attribute and the entities that possess it (e.g., can we talk about doubling a design’s quality)?

• What meaning operations can we perform on measures (e.g., can we compute the average productivity of a group of developers, or the average quality of a set of modules)?

– Answering these questions requires developing a theory of measurement

22

Measurement Basics

• The representational theory of measurement– Developed as a classical discipline from the

physical sciences– Provides rules for:

• Making consistent measurements• Interpreting data resulting from measurement

– Representational theory of measurement formalizes intuition about the way the world works.

23

Measurement Basics• Empirical relations

– Data obtained as measures should represent attributes of observed entities

– Manipulating data should preserve observed relationships

– Example – “Taller than”• Binary relation defined on the set of pairs of people. Either

– A is taller than B, or– B is taller than A

– Empirical relations are not restricted to binary relations – can be unary (e.g., A is tall), ternary (A sitting on B’s shoulders is taller than C), etc.

24

Measurement Basics• Empirical relations (cont’d)

– Empirical relations are mappings from the empirical, real world to a formal mathematical world.

• Height – maps a set of people to the set of real numbers

• Greater functionality (from survey results)

A B C D

A - 80% 10% 80%

B 20% - 5% 50%

C 90% 95% - 96%

D 20% 50% 4% -

• x has greater functionality than y if (x,y) > 60%. Relation is (C,A), (C,B), (C,D), (A,B), (A,D).

• Surveys can help gain preliminary understanding of relationships.

25

Measurement Basics

• Empirical relations (cont’d)– Definitions

• Measurement – a mapping from the empirical world to the formal, relational world.

• Measure – number or symbol assigned to an entity by the mapping in order to characterize an attribute.

26

Measurement Basics• Rules of Mapping

– Measures must specify domain and range as well as the rule for performing the mapping

• Domain – real world is domain of mapping that defines the measurement• Range – the mathematical world into which real-world attributes are

mapped– Examples

• Measuring height:– Is height measured in inches, centimeters, feet?– Are people measured sitting or standing?– Are shoes allowed to be worn during the measurement?

• Measuring lines of code– Are lines of code reused without change counted?– Are non-executable lines counted?

» Declarations» Compiler Directives» Comments» Blank lines

27

Measurement Basics

• The representation condition– Behavior of measures in number system needs to

be the same as corresponding elements in the real world.

– Formally, a measurement mapping M must map entities into numbers and empirical relations into numerical relations in such a way that:

• Empirical relations preserve numerical relations• Empirical relations are preserved by numerical relations

28

Measurement Basics

• The representation condition – example– Taller than:

• A is taller than B iff M(A) > M(B), where M is a mapping from the empirical world to the real numbers.

– Whenever Joe is taller than Frank, then M(Joe) must be a bigger number than M(Frank)

– Jane can be mapped to a bigger number than John only if Jane is taller than John.

29

Measurement Basics

• The representation condition – example 2– Software failures criticality– Three types of failures examined:

• Delayed response• Incorrect output• Data loss• At this point, we have a relation system consisting of 3

unary relations– R1 for delayed response– R2 for incorrect output– R3 for data loss

• With this information, we can’t yet judge the relative criticality of these types of failures.

30

Measurement Basics• The representation condition – example 2 (cont’d)

– We can find a representation in the set of real numbers by choosing three distinct numbers:

• M(delayed response) = 6• M(incorrect output)=4• M(data loss)=50

– Further investigation of criticality reveals that data loss is more critical than incorrect output, which in turn is more critical than a delayed response.

– To develop a real-number representation for this enriched relation, we must be more careful in assigning numbers.

– Using “>” to mean “more critical than”, data-loss failures must be mapped to a higher number than incorrect output failures, which in turn must mapped to a higher number than delayed responses.

31

Measurement Basics

• The representation condition (cont’d)– There may be many different measures for a given

attribute (e.g., in., cm., furlongs).• Any measure satisfying the representation condition is a

valid measurement

– The richer the empirical relation system, the fewer the valid valid measures

• Relational systems are rich if they have a large number of relations that can be defined.

• As the number of empirical relations increases, so does the number of conditions a measurement mapping must satisfy in its representation condition.

32

Measurement Basics

• Measurement and models– Model – an abstraction of reality allowing us to:

• Strip away unnecessary detail• View an entity or concept from a particular perspective

– Representation condition requires every measure to be associated with a model of how the measure maps real world entities and attributes to elements of a numerical system. These models are essential in:

• Understanding how measure is derived• Interpreting behavior of numerical elements when we

return to the real world.

33

Measurement Basics

• Defining Attributes– Always a temptation to focus too much on formal,

mathematical system, rather than on empirical system.

– Before we set out to measure something (e.g., program complexity), we need to:

• Identify a set of characteristics of the thing we’re trying to measure

• A model that associates the characteristics

– We can then define measures for each characteristic, and use the representation condition to help understand the relationships.

34

Measurement Basics

• Direct and Indirect Measurement– Direct measure – relates an attribute to a

number or symbol without reference to no other object or attribute (e.g., height).

– Indirect measure• Used when an attribute must be measured by

combining several of its aspects (e.g., density)• Requires a model of how measures are related

to each other

35

Measurement Basics

• Direct and Indirect Measures for Software – examples– Direct

• Length or source code (lines of code)• Duration of testing process• Number of defects discovered during test• Time a developer spends on a project

– Indirect• Programmer productivity (LOC/workmonths of effort)• Module defect density (number of defects/module size)• Defect detection efficiency (# defects detected/total defects)• Requirements stability (initial # requirements/total # requirements)• Test effectiveness ratio (number of items covered/total number of items)• System spoilage (effort spent fixing faults/total project effort)

36

Measurement Basics• Measurement for prediction

– So far we’ve talked about measuring some entity that already exists

• Useful for assessing current situation or understanding what has happened in the past

– In many cases, we want to predict an attribute of an entity that doesn’t yet exist (e.g., project cost, reliability of fielded system).

• Requires model relating measurement that can be taken now to attributes that will be predicted

– Empirical cost models– Software reliability models

• Model is not sufficient by itself to perform required prediction. Need a prediction system including:

– A model relating the measurements to the desired attribute– A procedure to model parameters– Procedures for interpreting model results

37

Measurement Basics

• Measurement for prediction– Accurate predictive measurement is always based

on measurement in the assessment sense– Everyone wants to predict key determinants of

success (e.g., effort to build a new system, operational reliability), but...

– There are no magic models. They all depend on:• High-quality measurements of past projects• High-quality measurements of current project

38

Measurement Basics

• Measurement scales and scale types– A measurement scale is our mapping, M, together

with the empirical and numerical relation systems.• If the relation systems (domain and range) are obvious from

context, sometimes M alone is referred to as the scale.– Three important questions concerning

representations and scales:• How do we determine when one numerical relation system

is preferable to another?• How do we know if a particular empirical relation system

has a representation in a given numerical relation system?• What do we do when we have several different possible

representations (and hence many scales) in the same numerical relation system?

39

Measurement Basics• Measurement scales and scale types (cont’d)

– Three questions:• How do we determine when one numerical relation system is

preferable to another?– Answer: We can map the scale to a symbolic relational system. In

practice, this can be unwieldy (symbolic vs. numerical manipulation). We try to use real numbers whenever possible.

• How do we know if a particular empirical relation system has a representation in a given numerical relation system?

– Answer: This is known as the representation problem, one of the basic problems of measurement theory. This is a solved problem for various types of relation systems characterized by specific axioms. Discussion is beyond the scope of this course, but solutions can be found in texts on measurement theory.

• What do we do when we have several different possible representations (and hence many scales) in the same numerical relation system?

– Answer: This is the uniqueness problem. Following slides address this question.

40

Measurement Basics• Measurement scale types

– Nominal– Ordinal– Interval– Ratio– Absolute

• One relational system is richer than another if all relationships in the second system are contained in the first.– Scale types above are listed in order of increasing

richness.

41

Measurement BasicsMeasurement scale types (cont’d)• Why is this important?

– If we have a satisfactory measure for an attribute with respect to an empirical relation system, we want to know what other measures exist that are acceptable.

– Mapping from one acceptable measure to another is called an admissible transformation.

• Example – when considering length, admissible transformations are of the form M’=aM. Transformations of the form M’=b+aM, or M’=aMb are not acceptable when b <> 0.

• The more restrictive the class of admissible transformations, the most sophisticated the measurement scale.

42

Measurement Basics– Nominal scale

• Most primitive form of measurement – define classes or categories, and place each category in a particular class or category

• Two major characteristics– Empirical relation consists only of different classes – no notion of

ordering– Any distinct number or symbolic representation is an acceptable

measure – no notion of magnitude associated with numbers or symbols.

• Any two mappings, M and M’, will be related to each other in that M’ can be obtained from M by a one-to-one mapping

• Example – software faults can belong to one of the following classes, according to where they were first introduced during development:

– Specification– Design– Code

43

Measurement Basics• Measurement types and scale

– Ordinal scale• Augments nominal scale with ordering information.• Three major characteristics

– Empirical relation system consists of classes that are ordered with respect to the attribute

– Any mapping preserving the ordering (i.e., a monotonic function) is acceptable

– Numbers represent ranking only, so arithmetic operations have no meaning

• Set of admissible transformations is set of all monotonic mappings

• Example – software “complexity” – two valid measures

Value Meaning

1 Trivial

2 Simple

3 Moderate

4 Complex

5 Incomprehensible

Value Meaning

2 Trivial

4 Simple

6 Moderate

9 Complex

12 Incomprehensible

44

Measurement Basics• Measurement type and scale

– Interval scale• Captures information about size of intervals that separate

classes.• Three characteristics

– Preserves order– Preserves differences, but not ratios– Addition and subtraction are acceptable, but not

multiplication and division• Class of admissible transformations is the set of affine

transformations: M’=aM+b, where a>0.• Example – software complexity – suppose the difference in

complexity between a trivial and a simple system is the same as that between a simple and a moderate system. Where this equal step applies to each class, we have an attribute measurable on an interval scale.

Value Meaning

1 Trivial

2 Simple

3 Moderate

4 Complex

5 Incomprehensible

Value Meaning

0 Trivial

2 Simple

4 Moderate

6 Complex

8 Incomprehensible

Value Meaning

1.1 Trivial

2.2 Simple

3.3 Moderate

4.4 Complex

5.5 Incomprehensible

45


– Ratio scale• Most useful scale, common in physical sciences – captures

information about ratios• 4 characteristics

– Preserves ordering, size of intervals between entities, and ratios between entities

– There is a zero element, representing total lack of the attribute– Measurement mapping must start at 0 and increase at equal intervals

(units)– All arithmetic can be meaningfully applied to classes in the range of the

mapping.• Acceptable transformations are ratio transformations – M’=aM,

where a is a scalar.• Example – program length can be measured by lines of code,

number of characters, etc. Number of characters may be obtained by multiplying the number of lines by the average number of characters per line.

46


– Absolute scale• Most restrictive in terms of admissible transformations• For any two measures, M and M’, there’s only one admissible

transformation (identity transformation), since there’s only one way to make the measurement.

• 4 characteristics– Measurement is made simply by counting the number of elements

in the entity set.– Attribute always takes the form of “number of occurrences of x in

the entity”– Only one possible measurement mapping, namely the actual count– All arithmetic analysis of the resulting count is meaningful.

• Example – lines of code in a module is an absolute scale measure.

47

Measurement Basics• Measurement type and scale - summary

Scale type Admissible transformations

Examples

Nominal 1-1 mapping Labeling, classifying entities

Ordinal Monotonic increasing function

Preference, hardness, air quality, intelligence tests (raw scores)

Interval M’=aM+b, a >0 Relative time, temperature (Fahrenheit, Celsius), intelligence tests (standardized scores)

Ratio M’=aM, a> 0 Time interval, length, temperature (Kelvin)

Absolute M’=M Counting entities

48

Measurement Basics

• Meaningfulness in measurement– After making measurements, key question is “can

we deduce meaningful statements about entities being measured?”

– Harder to answer than it first appears – consider these statements:1. The number of errors discovered during the integration

testing of a program X was at least 1002. The cost of fixing each error in program X is at least 1003. A semantic error takes twice as long to fix as a syntactic

error4. A semantic error is twice as complex as a syntactic error

49

Measurement Basics• Meaningfulness in measurement (cont’d)

– First statement seems to make sense– Second statement doesn’t make sense – number of errors

may be specified without reference to a particular scale, but cost to fix them must be

– Statement 3 seems sensible – the ratio of time taken is the same, whether time is measured in second, hours, or fortnights

– Statement 4 does not appear to be meaningful and requires clarification:

• If complexity means time to understand the error, than it makes sense

• Other definitions of complexity may not admit measurement on a ratio scale (e.g. examples in previous slides) in which case statement 4 is meaningless.

50

Measurement Basics

• Meaningfulness in measurement– Definition: a statement involving

measurement is meaningful if its truth value is invariant of transformations of allowable scales.

51

Measurement Basics

• Meaningfulness in measurement – examples– John is twice as tall as Fred

• Implies measures are at least on the ratio scale. It’s meaningful because no matter what transformation we use (and all we have is ratio transformations), the truth or falsity of the statement remains constant.

– Temperature in Tokyo today is twice that in London• Implies a ratio scale, but is not meaningful. We measure in ° F and ° C.

If Tokyo is 40° C and London is 20° C, then the statement is true, but if Tokyo is 104° F and London is 68° F, the statement is no longer true.

– Failure x is twice as critical as failure y• Not meaningful if we only have an ordinal scale for criticality (common

scale for software failures is catastrophic, significant, moderate, minor, and insignificant).

52

Measurement Basics

• Meaningfulness in measurement– Note that our notion of meaningfulness

says nothing about• Usefulness• Practicality• Worthwhile• Ease of measurement

53

Measurement Basics• Statistical operations on measures

– Analyses don’t have to be sophisticated, but we want to know something about how a set of data is distributed.

– What types of statistical analysis are relevant to a given measurement scale?

Scale type Defining relations Examples of appropriate statistics

Nominal Equivalence Mode, Frequency

Ordinal Equivalence,

Greater than

Median, Percentile, Spearman r, Kendall r,

Kendall W

Interval Equivalence,

Greater than,

Known ratio of any intervals

Mean, Standard deviation, Pearson product-moment correlation, Multiple product-moment correlation

Ratio Equivalence,

Greater than,

Known ratio of any intervals,

Known ratio of any two scale values

Geometric mean,

Coefficient of variation

54

Measurement Basics

• Indirect measurement and meaningfulness– Done when measuring a complex attribute in terms of simpler

sub-attributes– Scale type for an indirect measure M is generally no stronger

than the weakest of the scale types of the sub-attributes• Example – testing efficiency=defects/effort

– Defects is on the absolute scale, while effort is on the ratio scale. Therefore effort is on the ratio scale.

– What is E=2.7v+121w+26x+12y+22z-497, where» v is the number of program instructions» x and y are the number of internal and external documents» z is the program size in words» w is a subjective measure of complexity

55

Overview




56

A Goal-Based Software Measurement Framework

1. Classifying software measures

2. Determining what to measure

57

A Goal-Based Software Measurement Framework• Classifying software measures

– Three types of software entities to measure• Processes – collections of software related activities• Products• Resources – entities required by a process activity

– Within each class, we have• Internal attributes – measured purely in terms of the entity itself• External attributes – measured with respect to how entity relates to

its environment. Behavior of the entity is important– Managers want to be able to measure and predict external

attributes• However, external attributes are more difficult to measure than

internal ones, and are measured late in the development process• Desire is to predict external attributes in terms of more easily-

measured internal attributes

58


• Determining what to measure– Measurement is useful only if it helps understand

the underlying process or one of its resultant products

– Goal-Question-Metric (GQM) has been proven to be effective in selecting and implementing metrics

• List the major goals of the development project• Derive from each goal the questions that must be

answered to determine if goals are being met• Decide what must be measured in order to be able to

answer the questions adequately

59

• GQM example – goal is to evaluate effectiveness of coding standard


Goal

Who is using standard? What is coder productivity? What is code quality?

Proportion of coders• Using standard• Using language

Experience of coders• With standard• With language• With environment, etc.

Code size (lines of code, function points, etc

Effort Errors

Goal

Questions

Metrics

60

• GQM example 2 – AT&T goals, questions, metrics


Goal Questions Metrics

Plan How much does the inspection process cost?

How much calendar time does the inspection process take?

Average effort per KLOC

Percentage of reinspections

Average effort per KLOC

Total KLOC inspected

Monitor and control

What is the quality of the inspected software?

To what degree did the staff conform to the procedures?

What is the status of the inspection process?

Average faults detected per KLOC

Average inspection rate

Average preparation rate



Average lines of code inspected

Total KLOC inspected

Improve How effective is the inspection process?

What is the productivity of the inspection process?

Defect removal efficiency

Average number of faults detected per KLOC




Average effort per fault detected




61

• Templates for goal definition– Purpose – to (characterize, evaluate, predict, motivate, etc.)

the (process, product, model, metric, etc.) in order to (understand, assess, manage, engineer, learn, improve, etc.) it.

• Example – To evaluate the maintenance process in order to improve it.

– Perspective – Examine the (cost, effectiveness, correctness, defects, changes, product measures, etc.) from the viewpoint of the (developer, manager, customer, user, etc.)

• Example – Examine the cost from the viewpoint of the manager– Environment – The environment consists mainly of the

following: process factors, people factors, problem factors, methods, tools, constraints, etc.

• Example – the maintenance staff are poorly motivated programmers who have limited access to tools.


Software Measurement UCLA Computer Science Department CS130 Winter, 2002.

Documents

Transcript of Software Measurement UCLA Computer Science Department CS130 Winter, 2002.