Domain-Driven Software Cost Estimation

31
University of Southern California Center for Systems and Software Engineering Domain-Driven Software Cost Estimation Wilson Rosa (Air Force Cost Analysis Agency) Barry Boehm (USC) Brad Clark (USC) Thomas Tan (USC) Ray Madachy (Naval Post Graduate School) 27th International Forum on COCOMO® and Systems/Software Cost Modeling October 16, 2012 This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Systems Engineering Research Center (SERC) under Contract H98230-08-D-0171. The SERC is a federally funded University Affiliated Research Center (UARC) managed by Stevens Institute of Technology consisting of a collaborative network of over 20 universities. More information is available at www.SERCuarc.org

description

Domain-Driven Software Cost Estimation. Wilson Rosa (Air Force Cost Analysis Agency) Barry Boehm (USC) Brad Clark (USC) Thomas Tan (USC) Ray Madachy (Naval Post Graduate School) 27th International Forum on COCOMO® and Systems/Software Cost Modeling October 16, 2012. - PowerPoint PPT Presentation

Transcript of Domain-Driven Software Cost Estimation

Page 1: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Domain-DrivenSoftware Cost Estimation

Wilson Rosa (Air Force Cost Analysis Agency)Barry Boehm (USC)

Brad Clark (USC)Thomas Tan (USC)

Ray Madachy (Naval Post Graduate School)

27th International Forum on COCOMO®and Systems/Software Cost Modeling

October 16, 2012

This material is based upon work supported, in whole or in part, by the U.S. Department of Defense through the Systems Engineering Research Center (SERC) under Contract H98230-08-D-0171. The SERC is a federally funded University Affiliated Research Center (UARC) managed by

Stevens Institute of Technology consisting of a collaborative network of over 20 universities. More information is available at www.SERCuarc.org

Page 2: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 2

Data Preparation and Analysis

Cost (Effort) = a * Sizeb

Research Objectives• Make collected data useful to oversight and management entities

– Provide guidance on how to condition data to address challenges– Segment data into different Application Domains and Operating

Environments– Analyze data for simple Cost Estimating Relationships (CER) and

Schedule-Cost Estimating Relationships (SCER) within each domain– Develop rules-of-thumb for missing data

Data Records for one Domain

Schedule = a * Sizeb * Staffc

Domain CER/SER

Page 3: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 3

Stakeholder Community • Research is collaborative across heterogeneous

stakeholder communities who have helped us in refining our data definition framework, taxonomy, providing us data and funding

Project has evolved into a Joint Government Software Study

Funding Sources Data Sources

Page 4: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 4

Topics• Data Preparation Workflow

– Data Segmentation• Analysis Workflow• Software Productivity Benchmarks• Cost Estimating Relationships• Schedule Estimating Relationships• Conclusion• Future Work

Page 5: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Data Preparation

Page 6: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 6

Current Dataset• Multiple Data Formats (SRDR, SEER, COCOMO)• SRDR (377 records) + Other (143 records) = 522 total

records1.

1. System/Element Name (version/release): 2. Report As Of:

3. Authorizing Vehicle (MOU, contract/amendment, etc.): 4. Reporting Event: Contract/Release End Submission # ________

(Supersedes # _______, if applicable) Description of Actual Development Organization

5. Development Organization: 8. Lead Evaluator:

7. Certification Date: 9. Affiliation:

10. Precedents (list up to five similar systems by the same organization or team):

Comments on Part 1 responses:

2. Product and Development Description Percent of Product Size

Upgrade or New?

1. Primary Application Type: 2. % 3. 4.

17. Primary Language Used: 18. %

21. List COTS/GOTS Applications Used:

22. Peak staff (maximum team size in FTE) that worked on and charged to this project: __________

23. Percent of personnel that was: Highly experienced in domain: ___% Nominally experienced: ___% Entry level, no experience: ___%

Comments on Part 2 responses:

3.

2. Number of External Interface Requirements (i.e., not under project control)

4. Amount of New Code developed and delivered (Size in __________ )

5. Amount of Modified Code developed and delivered (Size in __________ )

6. Amount of Unmodified, Reused Code developed and delivered (Size in __________ )

Comments on Part 3 responses:

DD Form 2630-3 Page 1 of 2

Software Resources Data Report: Final Developer Report - Sample Page 1: Report Context, Project Description and Size

Report Context

6. Certified CMM Level (or equivalent):

Product Size Reporting Provide Actuals at Final Delivery

Actual Development Process

Code Size Measures for items 4 through 6. For each, indicate S for physical SLOC (carriage returns); Snc for noncomment SLOC only; LS for logical statements; or provide abbreviation _________ and explain in associated Data Dictionary.

1. Number of Software Requirements, not including External Interface Requirements (unless noted in associated Data Dictionary)

3. Amount of Requirements Volatility encountered during development (1=Very Low .. 5=Very High)

Multiple Sources

Page 7: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 7

The Need for Data Preparation• Issues found in dataset

– Inadequate information on modified code (size provided)– Inadequate information on size change or growth– Size measured inconsistently– Inadequate information on average staffing or peak staffing– Inadequate information on personnel experience– Inaccurate effort data in multi-build components– Missing effort data– Replicated duration (start and end dates) across components– Inadequate information on schedule compression– Missing schedule data– No quality data

Page 8: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling

Data Preparation Workflow

8

Start with SRDR submissions

Correct Missing or Questionable Data

Determine Data Quality Levels

Exclude from Analysis

Normalize Data

Inspect each Data Point

No resolution

Segment Data

Page 9: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 9

Segment Data by Operating Environments (OE)

Page 10: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 10

Segment Data by Productivity Type (PT)

1. Sensor Control andSignal Processing (SCP)

2. Vehicle Control (VC)3. Real Time Embedded (RTE)4. Vehicle Payload (VP)5. Mission Processing (MP)6. System Software (SS)7. Telecommunications (TEL)

8. Process Control (PC)9. Scientific Systems (SCI)10.Planning Systems (PLN)11.Training (TRN)12.Test Software (TST)13.Software Tools (TUL)14.Intelligence & Information

Systems (IIS)

• Different productivities have been observed for different software application types.

• SRDR dataset was segmented into 14 productivity types to increase the accuracy of estimating cost and schedule

Page 11: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 11

Example: Finding Productivity TypeFinding Productivity Type (PT) using the Aircraft MIL-STD-881 WBS:

The highest level element represents the environment. In the MAV environment there are the Avionics subsystem, Fire-Control sub-subsystem, and the sensor, navigation, air data, display, bombing computer and safety domains. Each domain has an associated productivity type.

Env Subsys Sub-subsystem Domains PT

MAV Avionics Fire Control Search, target, tracking sensors SCP

Self-contained navigation RTE

Self-contained air data systems RTE

Displays, scopes, or sights RTE

Bombing computer MP

Safety devices RTE

Data Display Multi-function display RTE

and Controls Control display units RTE

Display processors MP

On-board mission planning TRN

Level 1 Level 2 Level 3 Level 4

Page 12: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 12

Operating Environment & Productivity Type

 Operating Environment

GSF GSM GVM GVU MVM MVU AVM AVU OVU SVM SVU

Productivity

Type

SSP                      VC        X              RTE                      VP                      MP                      SS                      

TEL                      PC                      SCI                      PLN                      TRN                      TST                      TUL                      IIS                      

When the dataset is segmented by Productivity Type and Operating Environment, the impact accounted for by many COCOMO II model drivers are considered

Page 13: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Data Analysis

Page 14: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 14

Analysis Workflow

10/16/2012

Prepared, Normalized & Segmented Data

Derive CER Model Form

Derive Final-CER & reference data subsetDerive SCER

Publish SCER

CER: Cost Estimating RelationshipPR: Productivity RatioSER: Schedule Estimating RelationshipSCER: Schedule Compression / Expansion Relationship

Publish Productivity Benchmarks by

Productivity Type & Size Group

Publish CER results

Page 15: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software Productivity Benchmarks• Productivity-based CER • Software productivity refers to the ability of an organization to generate

outputs using the resources that it currently has as inputs. Inputs typically include facilities, people, experience, processes, equipment, and tools. Outputs generated include software applications and documentation used to describe them.

• The metric used to express software productivity is thousands of equivalent source lines of code (ESLOC) per person-month (PM) of effort. While many other measures exist, ESLOC/PM will be used because most of the data collected by the Department of Defense (DoD) on past projects is captured using these two measures. While controversy exists over whether or not ESLOC/PM is a good measure, consistent use of this metric (see Metric Definitions) provides for meaningful comparisons of productivity.

.

Page 16: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software Productivity Benchmarks

PTMIN

(ESLOC/PM)MEAN

(ESLOC/PM)MAX

(ESLOC/PM) Obs.Std. Dev. CV

KESLOC

MIN MAXSCP 10 50 80 38 19 39% 1 162

VP 28 82 202 16 43 52% 5 120RTE 33 136 443 52 73 54% 1 167MP 34 189 717 47 110 58% 1 207SCI 9 221 431 39 119 54% 1 171SYS 61 225 421 60 78 35% 2 215

IIS 169 442 1039 36 192 43% 1 180

Benchmarks by PT, across all operating environments**

** The following operating environments were included in the analysis:• Ground Surface Vehicles• Sea Systems• Aircraft• Missile / Ordnance (M/O)• Spacecraft

Preliminary Results – More Records to be added

Page 17: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software Productivity Benchmarks

PT OEMIN

(ESLOC/PM)MEAN

(ESLOC/PM)MAX

(ESLOC/PM) Obs.Std. Dev. CV

KESLOC

MINMAXSCP GSM 27 56 80 13 17 30% 1 76RTE GSM 51 129 239 22 46 36% 9 89MP GSM 87 162 243 6 52 32% 15 91SYS GSM 115 240 421 28 64 26% 5 215SCI GSM 9 243 410 24 108 44% 5 171IIS GSM 236 376 581 23 85 23% 15 180

Benchmarks by PT, Ground System Manned Only

CV: Cost VarianceESLOC: Equivalent SLOCKESLOC: Equivalent SLOC in ThousandsMAD: Mean Absolute DeviationMAX: MaximumMIN: MinimumPM: Effort in Person-MonthsPT: Productivity TypeOE: Operating Environment

Preliminary Results – More Records to be added

Page 18: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Cost Estimating Relationships

Preliminary Results – More Records to be added

Page 19: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

CER Model Forms• Effort = a * Size• Effort = a * Size + b• Effort = a * Sizeb + c• Effort = a * ln(Size) + b• Effort = a * Sizeb * Durationc

• Effort = a * Sizeb * c1-n

Production Cost(Cost/Unit)

Scaling Factor

% Adjustment Factor

ln(Effort) = b0 + (b1 * ln(Size)) + (b2 * ln(c1)) + (b3 * ln(c2)) + …

Effort = eb0 * Sizeb1 * c1b2 * c2

b3 + …

Log-Log transform

Anti-log transform

19

Page 20: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software CERs by Productivity Type (PT)

PT  Equation Form Obs.R2

(adj) MADPRED (30)

KESLOC

MIN MAXIIS PM = 1.266 * KESLOC1.179 37 90% 35% 65 1 180

MP PM = 3.477 * KESLOC1.172 48 88% 49% 58 1 207

RTE PM = 34.32 + KESLOC1.515 52 68% 61% 46 1 167

SCI PM = 21.09 + KESLOC1.356 39 61% 65% 18 1 171

SCP PM = 74.37 + KESLOC1.714 36 67% 69% 31 1 162

SYS PM = 16.01 + KESLOC1.369 60 85% 37% 53 2 215

VP PM = 3.153 * KESLOC 1.382 16 86% 27% 50 5 120

CERs by PT, across all operating environments**

** The following operating environments were included in the analysis:• Ground Surface Vehicles• Sea Systems• Aircraft• Missile / Ordnance (M/O)• Spacecraft

Preliminary Results – More Records to be added

Page 21: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software CERs for Aerial Vehicle Manned (AVM)

PT OE  Equation Form Obs.R2

(adj) MADPRED (30)

KESLOC

MIN MAXMP MAV PM = 3.098*KESLOC1.236 31 88% 50% 59 1 207

RRTE MAV PM = 5.611*KESLOC1.126 9 89% 50% 33 1 167

SCP MAV PM = 115.8 + KESLOC1.614 8 88% 27% 62 6 162

CERs by Productivity Type, AVM Only

CERs: Cost Estimating RelationshipsESLOC: Equivalent SLOCKESLOC: Equivalent SLOC in ThousandsMAD: Mean Absolute DeviationMAX: MaximumMIN: MinimumPM: Effort in Person-MonthsPRED: Prediction (Level)PT: Productivity TypeOE: Operating EnvironmentPreliminary Results – More Records to be added

Page 22: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software CERs for Manned Ground Systems Manned (GSM)CERs by Productivity Type

PT OE  Equation Form Obs.

R2 (adj

)MAD

PRE(30)

KESLOC

MIN MAXIIS MGS PM = 30.83 + 1.381 * KESLOC1.103 23 16% 91 15 180

MP MGS PM = 3.201 * KESLOC1.188 6 86% 24% 83 15 91

RTE MGS PM = 84.42 + KESLOC1.451 22 24% 73 9 89

SCI MGS PM = 34.26 + KESLOC1.286 24 37% 56 5 171

SCP MGS PM = 135.5 + KESLOC1.597 13 39% 31 1 76

SYS MGS PM = 20.86 + 2.347 * KESLOC1.115 28 19% 82 5 215CERs: Cost Estimating RelationshipsESLOC: Equivalent SLOCKESLOC: Equivalent SLOC in ThousandsMAD: Mean Absolute DeviationMAX: MaximumMIN: MinimumPM: Effort in Person-MonthsPT: Productivity TypeOE: Operating Environment

Preliminary Results – More Records to be added

Page 23: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Software CERs for Space Vehicle Unmanned

PT OE  Equation Form Obs.R2

(adj) MADPRED (30)

KESLOC

MIN MAXVP SVU PM = 3.153*KESLOC 1.382 16 86% 27% 50 5 120

CERs by Productivity Type (PT) - SVU Only

CERs: Cost Estimating RelationshipsESLOC: Equivalent SLOCKESLOC: Equivalent SLOC in ThousandsMAD: Mean Absolute DeviationMAX: MaximumMIN: MinimumPM: Effort in Person-MonthsPRED: Prediction (Level)PT: Productivity TypeOE: Operating Environment

Preliminary Results – More Records to be added

Page 24: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Schedule Estimating Relationships

Preliminary Results – More Records to be added

Page 25: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 25

Schedule Estimation Relationships (SERs)• SERs by Productivity Type (PT), across operating environments**

PT  Equation Form Obs.

R2 (adj

)MAD

PRED (30)

KESLOC

MIN MAXIIS TDEV = 3.176 * KESLOC0.7209 / FTE 0.4476 35 65 25 68 1 180

MP TDEV = 3.945 * KESLOC0.968 / FTE 0.7505 43 77 39 52 1 207

RTE TDEV= 11.69 * KESLOC 0.7982 / FTE 0.8256 49 70 36 55 1 167

SYS TDEV = 5.781 * KESLOC0.8272 / FTE 0.7682 56 71 27 62 2 215

SCP TDEV = 34.76 * KESLOC0.5309 / FTE 0.5799 35 62 26 64 1 165** The following operating environments were included in the analysis:

• Ground Surface Vehicles• Sea Systems• Aircraft• Missile / Ordnance (M/O)• Spacecraft

Preliminary Results – More Records to be added

Page 26: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 26

Size – People – Schedule Tradeoff

Page 27: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

COCOMO 81 vs. New Schedule Equations• Model Comparisons

PT Obs.  New Schedule EquationsCOCOMO 81 Equations

IIS 35 TDEV = 3.176 * KESLOC0.7209 * FTE -0.4476 TDEV = 2.5 * PM0.38

MP 43 TDEV = 3.945 *KESLOC0.968 * FTE-0.7505 TDEV = 2.5 * PM0.35

RTE 49 TDEV= 11.69 *KESLOC 0.7982 * FTE -0.8256 TDEV = 2.5 * PM0.32

SYS 56 TDEV = 5.781 *KESLOC0.8272 * FTE-0.7682 TDEV = 2.5 * PM0.35

SCP 35 TDEV = 34.76 * KESLOC0.5309 * FTE-0.5799 TDEV = 2.5 * PM0.32

** The following operating environments were included in the analysis:• Ground Surface Vehicles• Sea Systems• Aircraft• Missile / Ordnance (M/O)• Spacecraft

Preliminary Results – More Records to be added

Page 28: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

COCOMO 81 vs. New Schedule Equations• Model Comparisons using PRED (30%)

PT Obs.

 New Schedule Equations PRED

(30)COCOMO 81 Equations

PRED (30) IIS 35 68 28MP 43 52 23RTE 49 55 16SYS 56 62 5SCP 35 64 8

Preliminary Results – More Records to be added

** The following operating environments were included in the analysis:• Ground Surface Vehicles• Sea Systems• Aircraft• Missile / Ordnance (M/O)• Spacecraft

Page 29: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Conclusions

.

Page 30: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

27th International Forum on COCOMO® and Systems/Software Cost Modeling 30

Conclusion• Developing CERs and Benchmarks by grouping appears to

account for some of the variability in estimating relationships.

• Grouping software applications by Operating Environment and Productivity Type appears to have promise – but needs refinement

• Analyses shown in this presentation are preliminary as more data is available for analysis– It requires preparation first

Page 31: Domain-Driven Software Cost Estimation

University of Southern California

Center for Systems and Software Engineering

Future Work• Productivity Benchmarks need to be segregated by size-

groups• More data is available to fill in missing cells in the OE-PT

table• Workshop recommendations will be implemented

– New data grouping strategy• Data repository that provides drill-down to source data

– Presents the data to the analyst– If there is a question, it is possible to navigate to the source

document, e.g. data collection form, project notes, EVM data, Gantt Charts, etc.

• Final results will be published online

http://csse.usc.edu/afcaawiki