Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631...

50
www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1 INFO631 Week 3

Transcript of Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631...

Page 1: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.edu

INFO 631 Prof. Glenn Booker

Week 3 – Complexity Metrics and Models

1INFO631 Week 3

Page 2: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 2

Origin

• Complexity metrics were developed by computer scientists and software engineers

• Strongly based on empirical (real world) measurement, with little theory

• Primarily broken into internal and external measures

Page 3: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 3

Internal versus External

• Internal measures describe the complexity within a module (number of decisions, loops, calculations, etc.)

• External measures describe relationships among modules (program or function calls, external file activities, input/output, etc.)

Page 4: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 4

Internal Measures

Page 5: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 5

Internal Product Attributes

• Size measures– Input to prediction models– Normalizing factor for cost, productivity, etc.– Progress during development

• Typically use lines of code (LOC) or function point counts; – LOC is a better measure for predicting cost

and schedule

Page 6: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 6

Lines of Code

• Simple complexity metric, often based on number of executable statements or instruction statements– Highest defect rates often occurs in small

modules– Larger modules have a smaller defect rate

(if they exist at all) - until too cumbersome– Optimum module size ~ 250 lines

Page 7: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 7

Function Points

• Function points help avoid biases due to the programming language(s) used

• Provide a more “fair” basis for comparing different environments

• Focuses on how much work the program accomplishes, not how concisely it is expressed

Page 8: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 8

Halstead Metrics

• Also known as Software Science, 1977• Examine program as compilable “tokens”• Tokens are either operators (+, -) or operands

(variables)• Derive metrics such as Vocabulary, Length, Volume,

Difficulty, etc.• Not widely used

Page 9: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 9

Data Structure (Halstead)

• Halstead’s 2 - number of distinct operands in a module– Operands include: number of variables,

number unique constants, and number of labels

• Operand usage (OU)– OU = 2/N2 where N2 is the total number of

operand references

Page 10: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 10

Software Complexity

• Is a characteristic that influences the resources needed to build and maintain it

• Many different characteristics of software relate to complexity

• These complexity characteristics revolve around the structure of the software

Page 11: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 11

Types of Structural Measures

• Control flow– Addresses sequence in which instructions are

executed– Iteration and looping

• Data flow– Follows trail of data as it is created and

handled– Depicts behavior of data as it interacts with

the program

Page 12: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 12

Types of Structural Measures

• Data structure– Concerned with organization of data itself– Provides information about difficulties in

handling data and in defining test cases

Page 13: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 13

Control Flow

• Modeled by directed graphs (control flow graphs)– Each node corresponds to a single program

statement– Arcs (directed edges) indicate flow of control

from one statement to another

Page 14: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 14

Control Flow

• Control flow graphs are useful for:– Analysis (estimating number of defects)– Expressing complexity by a single value– Assessing testability and test coverage

Page 15: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 15

Basic Control Constructs

If A then X else Y

A

YX

t f

Repeat X until A

X

A

f

t

Case A of a1 : X1

. . an : Xn

...

a1

a2an

A

X1 X2Xn

Note: t=true f=false

If A then X

A

X

t

f

While A do X

A

X

ft

Page 16: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 16

Cyclomatic Complexity

• McCabe, 1976• Based on a program’s control flow chart• Related to number of separate graphable

areas, or number of linearly independent paths in the program

• Complexity MC = edges - nodes + 2*(# of unconnected paths)

Page 17: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 17

Cyclomatic Complexity

• Complexity under 10 generally desired• Can also find M as number of binary

decisions (yes/no) minus one– Multiple choice decisions with ‘n’ choices

count as (n-1) binary decisions• Ignores differences among specific types

of control structures

Page 18: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 18

Cyclomatic Complexity

• Uses of complexity metric:– Identify complex modules needing detailed

inspection or redesign– Identify simple modules needing minimal

inspection and/or testing– Estimate programming, testing and

maintenance effort– Identify potentially troublesome code

Page 19: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 19

Control Flow Representation of Programs

• Software programs can be represented by linear directed segments combined with the basic control flow constructs

• Control flow constructs may be nested, e.g. an IF statement can be inside of a WHILE loop

Page 20: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 20

Control Flow Representation of Programs

• Example:

1 2

3

4

5

6

7

89

10

111213

14

McCabe cyclomatic complexity (MC) - counts the number of linearly independent paths through a program

MC = # of edges - # of nodes +2

Linearly independent paths for example <2, 11> <2, 10, 12, 14> <2, 10, 12, 13, 12, 14> <1, 3, 5, 6, 9> <1, 4, 6,9> <1, 4, 6, 7, 8, 9>

Page 21: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 21

Control Flow--Linearly Independent Paths

b

c

e

g

f

d

a1 2

3 4

5 6

7

8

910

MC = edges - nodes + 2 = 10 - 7 + 2 = 5

Set of linearly independent paths:

b1: abcg

b2: abcbcg

b3: abefg

b4: adefg

b5: adfg

Any arbitrary path is equal to a linear combination of the linearly independent paths listed above

For example, path abcbefg is equal to:

b2 + b3 - b1

Page 22: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 22

Knots - Control Flow Crossovers

• Knot measure -- total number of points at which control flow lines cross

IF (TIME) 30,30,1010 CALL TEMP1 IF (X1) 20,20,4020 Y1=Y+1 Y2=0 CALL TEMP2 GO TO 5030 Z1=140 CALL TEMP3 Z2=Z2+150 CALL TEMP4

How many are here?

Page 23: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 23

Syntactic Constructs

• Examine effect of using specific control structures on defect rate

• Is, by definition, language-specific• Can result in statistically significant

relationships– e.g. Lo used to show that DO WHILE should

be avoided in COBOL

Page 24: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 24

External Measures

Page 25: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 25

Computational Complexity

• Examines algorithmic efficiency and use of machine resources (memory, I/O, storage)

• Studies quantitative aspects of solutions to computational problems

• Examples may include sorting efficiency for a database, managing I/O constraints across a large scale network, etc.

Page 26: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 26

Psychological Complexity

• Concerned with characteristics of software that affect human performance- Injection of defects (when and why does a

programmer make errors?)- Ease of building the software (effort required)- Ease of maintenance (effort required)

Page 27: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 27

Data Structure (Database)

• Database size per program size (DBSPPS)– DBSPPS = DBS/PS

• Where DBS is database size in bytes or characters• PS is program size in source instructions

– Used in COCOMO model as a cost driver• Ordinal scale measure derived from DBSPPS

Page 28: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 28

Fan-in and Fan-out

• Focus is the interaction among code modules– Fan-in = # of modules which call a given

module– Fan-out = # of modules which are called by a

given module• Or, more formally...

Page 29: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 29

Fan-in and Fan-out

• Fan-in of a module is the number of local flows terminating at the module, plus the number of data structures from which info is retrieved by the module

• Fan-out of a module is the number of local flows that emanate from the module, plus the number of data structures (tables, arrays) that are updated by the module

Page 30: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 30

Fan-in and Fan-out

• Do fan-in and fan-out affect software quality?– Large fan-in modules may be interpolation or

look-up routines - no defect correlation– Large fan-out often relates to high defect rate

- has a high defect correlation• Is large fan-in and fan-out bad?

Page 31: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 31

Fan-in and Fan-out

• Information flow complexity– Henry and Kafura: Size*(fan-in * fan-out)2

– Shepperd: (fan-in * fan-out)2

• Henry and Kafura measure helps predict the number of software maintenance problems

Henry, S. and D. Kafura, IEEE Transactions on Software Engineering, 1981. SE-7(5): p. 510-518 Shepperd, M. 1990. Software Engineering Journal 5, 1 (January), pp. 3-10.

Page 32: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 32

Structure Metrics

• Shepperd measure correlates with software development time

• Information flow metric (Henry & Selig) HC = C * (fan-in * fan-out)^2– where C is the cyclometric complexity

Page 33: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 33

Structure Metrics

• System complexity (Card & Glass)– Based on structural complexity (average fan-

out squared) and data complexity (based on number of I/O variables and fan-out)

– Quantified effect of complexity on error rate

Page 34: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 34

Module Call Graph

• Module - a contiguous sequence of program statements, bounded by boundary elements, having an aggregate identifier– Or, a distinct, named group of LOC

• The module call graph shows which modules call each other, and what key information is passed among them

Page 35: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 35

Module Call Graph example

Find_Ave

Main

AverageRead_Scores

Print_Ave

scores

average

average

scores

eof

scores

Page 36: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 36

Module Coupling Measures

• Average number of calls per module (ANCPM)

• Fraction of modules that make calls (FMC)

ANCPM = Number of Interconnections

Number of Modules

FMC = Number of Modules that call

Number of Modules

Page 37: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 37

Information Flow Measures

• Types of information flows– Local direct flow

• Module invokes a 2nd module & passes info to it• Invoked module returns result to the caller

– Local indirect flow• Invoked module returns info that is subsequently passed

to a second invoked module

– Global flow• Info flows from one module to another via a global

data structure

Page 38: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 38

IEEE-STD-982

• Number of Entries and Exits per Module, ‘m’– Like fan-in and fan-out

m = entries + exits• Software Science measures

Page 39: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 39

IEEE-STD-982

• Graph-Theoretic Complexity– Static Complexity

C = Edges - Nodes + 1– Generalized Static Complexity

Based on summing resources needed for each module (e.g. storage, access time, etc.)

– Dynamic complexityComplexity as it changes over time across a network

Page 40: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 40

IEEE-STD-982

• Cyclomatic complexity• Minimal Unit Test Case Determination

– Determine number of independent paths through a module, to get minimum number of test cases for unit testing

• Data or information flow complexity– Fan-in and fan-out of variables

Page 41: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 41

IEEE-STD-982

• Design Structure adds weighted (%) average of six parameters:

1. Whether designed top down (Y/N)

2. Module inter-dependence

3. Module dependence on prior processing

4. Database size (# of elements)

5. Database compartmentalization

6. Module single entrance and exit (Y/N)

– Weighting is chosen to meet project needs

Page 42: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 42

Other Measures

• Compiler measures– Size (bytes of compiled code)– Number of symbols and variables– Cross-reference of all labels– Statement count

Page 43: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 43

Other Measures

• Configuration Management Library Measures– Number of code modules– Number of versions of each module– History of change dates of each module– Module size– Number of related documents for each

module

Page 44: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 44

Availability Metrics

• Most information systems are critical to day-to-day operations– Witness Google or Blackberry being offline

for mere minutes is news• Availability depends on 1) how often the

system goes down, and 2) how long it takes to restore it after a crash

Page 45: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 45

Availability Metrics

• Perfect availability (100%) is nice to dream of, but realistically, higher reliability is more expensive

• Often measure availability by the number of 9’s in the desired level of availability – Two nines is 99%, three nines is 99.9%, four

nines is 99.99%, etc.– How many nines can you afford?

Page 46: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 46

Availability Metrics

No. of 9’s AvailabilityDown time

per year

2 99% 87.6 hours

3 99.9% 8.8 hours

4 99.99% 53 minutes

5 99.999% 5.3 minutes

Page 47: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 47

Achieving High Availability

• Many techniques are used to help ensure that high levels of availability are possible– Duplicate systems (clustering)– RAID data duplication– Duplicate power supplies– Independent power supplies– Uninterruptible power supplies (UPS’)

Page 48: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 48

Availability and Code Quality

• Capers Jones demonstrated a clear connection between code quality (defect rate) and the corresponding mean time to failure (MTTF), which is a key aspect of availability– Consistent methods for measurement and

definitions of terms are needed for further refinement

Page 49: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 49

Customer Outage Data

• In order to determine availability, the actual customer-visible system outage time needs to be collected– In order to get this data, the customer must

place a very high priority on availability– This data could be used to identify software

components which most reduce availability

Page 50: Www.ischool.drexel.edu INFO 631 Prof. Glenn Booker Week 3 – Complexity Metrics and Models 1INFO631 Week 3.

www.ischool.drexel.eduINFO631 Week 3 50

Availability

• We also expect that availability for a new system should increase over the first couple years of its use

• Defect causal analysis can help reduce the root cause of defects, thereby improving availability