Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... ·...

59
Measurement in Science, Engineering, and Software A. Mockus [email protected] Avaya Labs Research Basking Ridge, NJ 07920 http://mockus.org/

Transcript of Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... ·...

Page 1: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Measurement in Science, Engineering, andSoftware

A. Mockus

[email protected]

Avaya Labs Research

Basking Ridge, NJ 07920

http://mockus.org/

Page 2: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

The requirements

2 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 3: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Topics for day’s discussions

Topics on software engineering

3 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 4: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

The process

4 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 5: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

The complication

✦ The only thing I know for sure is that I don’t knowanything

✧ How can I provide topics worthy of discussion?

✧ How can I avoid embarrassing organizers?

5 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 6: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

The way out

Tell what I do not know

6 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 7: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

But how can I tell what I do not know?

7 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 8: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

by asking questions?!

8 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 9: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

The proposed solution

✦ I will ask questions, you will provide answers

✧ “One fool can ask more questions than one hundred wise (wo)men

can answer”

✧ so please don’t get frustrated by silly questions...

9 A. Mockus Measurement in Science and Software EngineeringMonte Verita, 2010

Page 10: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Basic questions

10 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 11: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

The purpose of human endeavor?

✦ If the works succeeds beyond the wildest dreams, will the results

be useful?

✧ For me?

✧ For anyone else?

✧ For people who do what?

✧ How many?

✧ For how long?

11 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 12: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

What are we doing in this room?

✦ What is MSA?

✦ Is it different from MSR?

✦ Is it data mining?

✦ Is it software engineering?

✦ Is it measurement?

✦ Is it science?

12 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 13: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Data Mining?

✦ “the process of extracting patterns from data”

✦ What to do with patterns?

✧ Use them to accomplish specific tasks?

✧ Direct benefits: more revenue/less cost

✧ Recommend a movie?

✧ Pick the advertisement or advertiser?

✧ In software - static analysis (e.g., klockwork), test generation?

✧ Indirect: more reputation/trust?

✧ Provide relevant information/service (search, news, reviews,

people, ...)?

✧ In software?

13 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 14: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Statistics?

“the science of making effective use of numerical data, including not

only the collection, analysis and interpretation of such data, but also

the planning of the collection of data.”

14 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 15: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Measurement?

✦ Why measure? Because without data, you only have opinions? or

✧ to characterize, or gain understanding of your processes, products,

resources, and environments?

✧ to evaluate, to determine your status with respect to your plans?

✧ to predict, by understanding relationships among processes and

products so the values you observe for some attributes can beused to

predict others?

✧ to improve, by identifying roadblocks, root causes, inefficiencies, and

other opportunities for improvement?

✦ Why analyze? Because the data you collect can’t help you if you

don’t understand it anduseit to shape your decisions?

15 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 16: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Software Engineering?

✦ Characterize, understand, and improve software practice?

✧ Inform and predict: (quantitatively) trade-offs among schedule,

quality, cost?

✧ Where effort is spent, where defects are introduced?

✧ What is the impact of technologies/organization/individuals?

✧ Act:

✧ Introduce technologies?

✧ Change organization?

✧ Train individuals?

16 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 17: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Science?

✦ Fundamental questions about human and collective nature?

✧ X is the study ofpast human events and activities

✧ Y is the study of humancultures through therecovery,

documentation and analysis ofmaterial remains

✧ Z is the study of developercultures andbehaviorsthrough the

recovery, documentation and analysis ofdigital remains

✦ Is it X, Y, or Z?

✧ Tomographyis image reconstruction from multiple projections

✧ What is the reconstruction of developer behavior from the digital

traces they leave in the code and elsewhere?

17 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 18: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Method: Software Tomography?

18 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 19: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Software change?✦ Developerscreatesoftware by changes?

✦ All changesare recorded?

Before:

int i = n;

while(i++)

prinf(” %d”, i−−);

After:

//print n integers

int i = n;

while(i++ && i > 0)

prinf(” %d”, i−−);

✦ one line deleted

✦ two lines added

✦ two lines unchanged

✦ Many other attributes: date, developer, defect number, . . .

19 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 20: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Uniform Theory of Everything?

✦ Sales/Marketing: customer information, rating, purchasepatters,

needs: features and quality

✦ Accounting: customer/system/software/billing

✦ Maintenance support: installed system, support level, warranty

✦ Field support: dispatching repairmen, replacement parts

✦ Call center support: customer/agent/problem tracking

✦ Development field support: software related customer problem

tracking, installed patch tracking

✦ Development: feature and development, testing, and field defects,

software change and software build, process WIKI

20 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 21: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Context, data, and software — D-Ware?

✧ Data have meaning without context?

✧ Data have meaning without knowing how it was obtained?

✧ Data have meaning without knowing how it was processed?

✧ Data have “bugs” beyond bugs in the analysis software?

21 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 22: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

SW tomography: all D-Ware has bugs?

✦ Bugs in the phenomena under study — randomness?

✦ Bugs in data recording — people (longitudinal), process, tool

interface and schema (bias)?

✦ Bugs in data processing — software, schema, no “classical”

randomness?

✦ Bugs in interpretation — method?

22 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 23: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

Priority Tot. Prj A Tot. Prj B Tot. Prj C % A % B % C

Critical 10 62 0 0 0 0

High 201 1642 16 5 13 5

Medium 3233 9920 659 84 76 85

Low 384 344 1 10 3 1

Total 3828 12968 676 100 100 100

23 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 24: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

✦ Question: Reliability of SD Flash cards (used to boot the

system)?

✦ Answers:

✧ Lets count the number of cases where customer tickets mention flash

card and divide by the number of all systems/run-time?

✧ Lets count the number of flash card replacement shipments?

24 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 25: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

✧ Lets count the number of flash card replacement shipments?

✧ Unneeded replacements (the card was fine)?

✧ Missed replacements (the card was obtained through other sources)?

25 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 26: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

✧ Lets count the number of cases where customer tickets mentionflash

card?

✧ What if the ticket just mentions the flash card, but there is no

problem with it?

26 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 27: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

✧ Lets count the number of cases where customer tickets mention flash

card?

✧ What if the ticket just mentions the flash card, but there is no

problem with it?

✧ If we eliminate these false matches, what about the rest?

27 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 28: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

✧ Lets count the number of cases where customer tickets mention flash

card?

✧ What if the ticket just mentions the flash card, but there is no

problem with it?

✧ If we eliminate these false matches, what about the rest?

✧ Interview people who worked on the problem — (ground truth -

Terra Verita)?

28 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 29: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Any bugs here?

✧ Lets count the number of cases where customer tickets mention flash

card?

✧ What if the ticket just mentions the flash card, but there is no

problem with it?

✧ If we eliminate these false matches, what about the rest?

✧ Interview people who worked on the problem — (ground truth -

Terra Verita)?

✧ What if we cant trust them? E.g., “the first action in a case of a

problem with a reboot is to replace the card.”

29 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 30: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

SW tomography: bugs in the phenomena?“We get the notions theories are right because we keep talking about them.

Not only are most theories wrong, but mostdata are alsowrong at first

subject to glaring uncertainties. The recent history of X isfull of promising

discoveries that disappeared because they could not be repeated.”

✦ Statistical methods takevariability into account to support making

informed decisions based on quantitative studies designedto answer

specific questions.

✦ Visual displays and summary statistics condense the information in data

sets intousable knowledge.

✦ Randomnessis the foundation for using statistics to draw conclusions

when testing a claim or estimating plausible values for a population

characteristic.

✦ The design of an experiment or sample survey is of critical importance to

analyzing the data and drawing conclusions.30 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 31: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

SW tomography: Debugging✦ Learn the real process

✧ Interview key people: architect, developer, tester, field support,

project manager

✧ Go over recent change(s) the person was involved with

✧ To illustrate the actual process (What is the nature of this work

item, why/where it come to you, who (if any) reviewed it, ...)

✧ To understand what the various field values mean: (When was

the work done in relation to recorded fields, ...)

✧ To ask additional questions: effort spent, information exchange

with other project participants, ...

✧ To add experimental questions

✧ Apply relevant models

✧ Validate and clean recorded and modeled data

✧ Iterate

31 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 32: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

SW tomography: Levels [0-2]✦ Level 0 — actual project. Learn about the project, make copies of

its systems

✦ Level 1 — Extract raw data

✧ change table, developer table (SCCS: prs, ClearCase: cleartool-lsh,

CVS:cvs log), write/modify drivers for other CM/VCS/Directory

systems

✧ Interview the tool support person (especially for home-grown tools)

✦ Level 2 — Do basic cleaning

✧ Eliminate administrative and automatic artifacts

✧ Eliminate post-preprocessor artifacts

32 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 33: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

SW Tomography: Testing/Debugging

Takes up 9[5-9]% of all effort

✦ Use levels and pipes, a la satellite image processing

✦ Validation tools (regression, interactive) for each level/transition

✧ Traceability to sources from each level

✧ Multiple operationalizations within/across levels

✧ Comparison against invariants

✧ Detecting default values

✧ Handling missing values

Version control D-Ware to aid “data debugging”?

***Keep raw data/systems and version control processing scripts?***

33 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 34: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Why software tomography?✧ Non-intrusive, minimizes overhead?What about in-depth

understanding of project’s development process?

✧ Historic calibration, immediate diagnosis?It takes time and effort to get

to that point?

✧ Fine-grain, at the delta level?But aren’t links to more sensible attributes

like features and releases often tenuous?

✧ Everything is recorded?What about entries that are inconsistently or

rarely filled in?

✧ Uniform over time? But process may have changed?

✧ Small effects can be detected with a lot of data?Are the relevant

quantities extractable?

✧ No observer effect?Even when the such data are used widely in

organizational measurement?34 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 35: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Why not Software Tomography?

✦ Apples and oranges:

✧ Do projects use the same rules to divide work (MRs)?

✧ How to compare data from: CVS, ClearCase, SCCS, svn, git, hg,

bzr?

✧ Does every project uses the same tool in the same way: under what

circumstances the change is submitted, when the MR is created?

✦ Easy to get lost analyzing irrelevant things?

✦ Are there change back-projection models of key software

engineering problems?

35 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 36: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Software Tomography: reconstructing theimage

✦ Predicting the quality of a patch [16]

✦ Globalization: move development where the resources are:

✧ What parts of the code can be independently maintained [17]

✧ Who are the experts to contact about any section of the code [13]

✧ Mentorship and learning [11, 20]

✦ Effort: estimate MR effort and benchmark process

✧ What makes some changes hard [7, 6, 10]

✧ What processes/tools work [1, 2, 4, 14]

✧ What are OSS/Commercial process differences [12]

✦ Project models

✧ Release schedule [8, 18, 5]

✧ Release quality/availability [3, 15, 9, 19]36 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 37: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Questions of style and productivity

37 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 38: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Why do easy things?

✦ Counts, trends, patterns?

✦ Open source, popular projects, VCS?

✦ Topics that are well formulated?

✧ Which modules will get defects?

38 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 39: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Patterns: Developer changes over 24 hours — isn’t it beautiful?

39 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 40: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Fascination with defects✦ How to not introduce defects?

✧ Improve requirements and other process?

✧ Improve modularity, increase language level, smarter static

type-checking, LINT-type heuristics, . . . ?

✧ Verification of software models?

✦ How to find/eliminate defects?

✧ Inspections?

✧ Testing?

✧ Debugging?

✦ How to predict defects?

✧ When to stop testing and release?

✧ What files, changes will have defects?

✧ How customers will be affected?

40 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 41: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Where faults will occur?

✦ Assume the best possible outcome, i.e., we can predict exactly!

✦ Does it help?

✧ “We look at defects for features, not for files”

✧ Most defects discovered by static-analysis tools are not fixed?

✧ “often it’s better to leave a known defect unresolved, than fix it and

[by doing that] introduce a defect you don’t know about”

✧ Effort needed to investigate predictions exceeds all QA resources?

41 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 42: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Can bugs be predicted reliably?No

rmali

zed

defe

cts p

er w

eek

GA D

ate

0

5

10

15

20

25

30V 5.6

GA D

ate

V 6.0

Why such huge improvement in quality?

42 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 43: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

How many customers got each release?No

rmali

zed

defe

cts p

er w

eek

GA D

ate

0

5

10

15

20

25

30V 5.6

GA D

ate

V 6.0

V 5.6 ≈ 300, V 6.0 ≈ 0

43 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 44: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Questions of practice

44 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 45: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Practice: how to compare software releases?“we tried to improve quality : get most experienced team members

to test, code inspections, root cause analysis, ...”

“Did it work? I.e., is this release better than previous one?”

Everyone usesdefect density(e.g.,customer reported defects per

1000 changes or lines of code), but “itdoes not reflectfeedback from

customers.”

Ok, then lets measure the probability thata customer will report a

software defect

45 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 46: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

A paradox: large telecom software

DL

DL

DL

DL

DL

DL

0.000

0.005

0.010

0.015

Quan

tity F1

F1

F1F1

F1F1

r1.1 r1.2 r1.3 r2.0 r2.1 r2.2

DLF1

DefPerKLOC/100Probability 1m.

UpDown

DownDown

DownUp

UpDown

UpDown

Does theincreasein defect density make customersmore satisfiedanddecrease less satisfied?

46 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 47: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Is the paradox unique for this product?

D

D

D

D

0.00.5

1.01.5

Quan

tity

F

F

F

F

R4.5 R5.0 R5.5 R6.0

DF

DefPerKLOCProbability 3m

A large product from another company:

Why does theincreasein defect density make customerssatisfied?

47 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 48: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

What fraction of customers are affected (IQ)?

1.1 1.3 2.0 2.1 2.2 3.0 3.1

0−1 months after inst.0−3 months after inst.0−6 months after inst.

Post inst. MR rates. Current Date

0.000

0.005

0.010

0.015

0.020

0.025

***

***

***

**

***

***

***

***

***

***

***

***

✦ Fraction of customers reporting software failures within months of installation

✦ Significant differences from prior releases marked by “*”

✦ “We live or die by this measure.”

— executive for product quality

48 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 49: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Can we move software production to thecheapest location?

Offshoring/Outsourcing/Retirement

Developer Churn

1

11

1

1

1

11

11 1

11

11

1

11

1

1 11

1

11

11

11 1

1 1 1 11

1

1 1

1

1

1

1

1

1998 2000 2002 2004 2006 2008Years

3 3 3 33

3 33

3 33 3 3

3

3 3 3 33 3

33 3 3 3 3

3

3

33

3

3 33

33

3

3 3 33 3

3

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frac

tion

of d

evel

oper

s

13

In project less than 1 yearIn project less than 3 years

49 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 50: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

A plateau?“developers reachfull productivity infew months.”

— a common response from managers and developers

0.5

1.0

1.5

log Modifications ~ ID + Tenure

Tenure (months)

Mod

ifica

tions

/Mon

th fo

r ave

rage

Dev

lpr

1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Modifications per month versus Tenure50 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 51: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Fully productive, but...

“We do not assign important tasksfor developers that have been

less than three yearson a project.”

“We tried to do that aftertwo years, but itdid not work well.”

— Senior architect

51 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 52: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Task’s importance keeps increasing?7.

07.

58.

08.

59.

09.

5

log(Centrality) ~ ID + Tenure

Tenure (months)

Cen

tralit

y fo

r Avg

Dev

lpr

1 2 3 4 5 6 7 8 9 11 13 15 17 19 21 23 25 27 29 31 33 35

Average task’s centrality (average centrality of modules modified by

the task) versus Tenure52 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 53: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Social learning0

2040

6080

log CumLogins ~ ID + startY + popY + Tenure

0.08 0.83 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

Acceleratesafter three to four years?

53 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 54: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

Discussion✦ Entire social and business life is digitally recorded: infinite

resources and opportunities for Software Tomography?

✧ Multiple dimensions of human activity are recorded?

✧ Multiple models (reconstructions) from various fields are

(re)invented?

✦ D-Ware bugs: phenomena, UI, software, data processing, andinterpretation?

✧ Statistical and software “randomness/bugs”?

✦ But how to use these digital projections of human endeavor togetresults relevant to

✧ Yourself?

✧ Someone else?

✧ Many people?

✧ For eternity...54 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 55: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

.References[1] D. Atkins, T. Ball, T. Graves, and A. Mockus. Using version control data to evaluate the

impact of software tools: A case study of the version editor.IEEE Transactions on SoftwareEngineering, 28(7):625–637, July 2002.

[2] D. Atkins, A. Mockus, and H. Siy. Measuring technology effects on software change cost.Bell Labs Technical Journal, 5(2):7–18, April–June 2000.

[3] Marcelo Cataldo, Audris Mockus, Jeffrey A. Roberts, andJames D. Herbsleb. Softwaredependencies, the structure of work dependencies and theirimpact on failures.IEEETransactions on Software Engineering, 2009.

[4] Birgit Geppert, Audris Mockus, and Frank Roßler. Refactoring for changeability: A way togo? InMetrics 2005: 11th International Symposium on Software Metrics, Como, September2005. IEEE CS Press.

[5] J. D. Herbsleb and A. Mockus. An empirical study of speed and communication inglobally-distributed software development.IEEE Transactions on Software Engineering,29(6):481–494, June 2003.

[6] James Herbsleb and Audris Mockus. Formulation and preliminary test of an empirical theoryof coordination in software engineering. In2003 International Conference on Foundations ofSoftware Engineering, Helsinki, Finland, October 2003. ACM Press.

[7] James D. Herbsleb, Audris Mockus, Thomas A. Finholt, andRebecca E. Grinter. Anempirical study of global software development: Distance and speed. In23nd InternationalConference on Software Engineering, pages 81–90, Toronto, Canada, May 12-19 2001.

55 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 56: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

[8] Audris Mockus. Analogy based prediction of work item flowin software projects: a casestudy. In2003 International Symposium on Empirical Software Engineering, pages 110–119,Rome, Italy, October 2003. ACM Press.

[9] Audris Mockus. Empirical estimates of software availability of deployed systems. In2006International Symposium on Empirical Software Engineering, pages 222–231, Rio de Janeiro,Brazil, September 21-22 2006. ACM Press.

[10] Audris Mockus. Organizational volatility and developer productivity. InICSE Workshop onSocio-Technical Congruence, Vancouver, Canada, May 19 2009.

[11] Audris Mockus. Succession: Measuring transfer of codeand developer productivity. In2009International Conference on Software Engineering, Vancouver, CA, May 12–22 2009. ACMPress.

[12] Audris Mockus, Roy T. Fielding, and James Herbsleb. Twocase studies of open sourcesoftware development: Apache and mozilla.ACM Transactions on Software Engineering andMethodology, 11(3):1–38, July 2002.

[13] Audris Mockus and James Herbsleb. Expertise browser: Aquantitative approach toidentifying expertise. In2002 International Conference on Software Engineering, pages503–512, Orlando, Florida, May 19-25 2002. ACM Press.

[14] Audris Mockus, Nachiappan Nagappan, and T Dinh-Trong,Trung. Test coverage andpost-verification defects: A multiple case study. InInternational Conference on EmpiricalSoftware Engineering and Measurement, Lake Buena Vista, Florida USA, October 2009.ACM.

56 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 57: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

[15] Audris Mockus and David Weiss. Interval quality: Relating customer-perceived quality toprocess quality. In2008 International Conference on Software Engineering, pages 733–740,Leipzig, Germany, May 10–18 2008. ACM Press.

[16] Audris Mockus and David M. Weiss. Predicting risk of software changes.Bell Labs TechnicalJournal, 5(2):169–180, April–June 2000.

[17] Audris Mockus and David M. Weiss. Globalization by chunking: a quantitative approach.IEEE Software, 18(2):30–37, March 2001.

[18] Audris Mockus, David M. Weiss, and Ping Zhang. Understanding and predicting effort insoftware projects. In2003 International Conference on Software Engineering, pages 274–284,Portland, Oregon, May 3-10 2003. ACM Press.

[19] Audris Mockus, Ping Zhang, and Paul Li. Drivers for customer perceived software quality. InICSE 2005, pages 225–233, St Louis, Missouri, May 2005. ACM Press.

[20] Minghui Zhou, Audris Mockus, and David Weiss. Learningin offshored and legacy softwareprojects: How product structure shapes organization. InICSE Workshop on Socio-TechnicalCongruence, Vancouver, Canada, May 19 2009.

57 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 58: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

AbstractMeasurement is the essence of science: ”To measure is to know”. In engineering the data can’t help

if you don’t understand it and use it to make decisions. As many professional and social activities

are moving online and rely on software tools, a vast amount ofdata becomes available. Practical

applications in business intelligence, and sciences have been demonstrated that use various models

and methods to solve a particular problem in the corresponding domain. It is, therefore, tempting to

apply these techniques on software engineering data often without the adequate adaptations to the

domain with the completely different needs. Furthermore, as the field of Computer Science matures,

it requires more rigorous empirical approaches and the samecan be said about rapidly maturing

fields of Mining Software Archives/Repositories. Therefore, we discuss common issues facing

researchers with Computer Science background as they move into empirical areas that require

several fundamentally different concepts: variation, reproducibility, and human factors. In addition

to methodological issues, we also look at the future challenges posed by the need to integrate more

and more disparate sources of data, the tradeoffs between using the most easily available and the

more meaningful measures, and the need to address core software engineering concerns.

58 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010

Page 59: Measurement in Science, Engineering, and Softwareffffffff-896a-a3c8-ffff-fffff0ade7dd/... · Measurement? Why measure?Because without data, you only have opinions? or to characterize,

BioAudris Mockus

Avaya Labs Research

233 Mt. Airy Road

Basking Ridge, NJ 07920

ph: +1 908 696 5608, fax:+1 908 696 5402

http://mockus.org, mailto:[email protected],

picture:http://mockus.org/images/small.gif

Audris Mockus is interested in quantifying, modeling, and improving software development. Hedesigns data mining methods to summarize and augment software change data, interactivevisualization techniques to inspect, present, and controlthe development process, and statisticalmodels and optimization techniques to understand the relationships among people, organizations,and characteristics of a software product. Audris Mockus received B.S. and M.S. in AppliedMathematics from Moscow Institute of Physics and Technology in 1988. In 1991 he received M.S.and in 1994 he received Ph.D. in Statistics from Carnegie Mellon University. He works in AvayaLabs Research. Previously he worked in the Software Production Research Department of Bell Labs.

59 A. Mockus Measurement in Science and Software Engineering Monte Verita, 2010