Download - Achieving High Software Reliability Using a Faster, Easier and Cheaper Method

Achieving High Software Achieving High Software Reliability Using a Faster, Reliability Using a Faster,

Easier and Cheaper MethodEasier and Cheaper Method

NASA OSMA SAS '01

September 5-7, 2001

Taghi M. Khoshgoftaar

The Software Measurement Analysis and Reliability Toolkit

OutlineOutlineIntroductionOverview of SMARTData Analysis and Modeling FeaturesCurrent Utilization of SMARTCase-Based ReasoningEmpirical StudyConclusions

NASA OSMA SAS '01

September 5-7, 2001

IntroductionIntroduction

Necessity of an integrated tool for efficient empirical software quality research

Commercial packages are available but expensive and don’t always match our exact needs

In house development gives us availability, flexibility and possibility to evolve

NASA OSMA SAS '01

September 5-7, 2001

Overview of SMARTOverview of SMART

First version back in 1998Current version 2.0Written in Microsoft Visual C++Runs on Microsoft Windows based

platformsUser friendly GUI

NASA OSMA SAS '01

September 5-7, 2001

SMART GUISMART GUI

NASA OSMA SAS '01

September 5-7, 2001

Features included in SMARTFeatures included in SMARTData managementMultiple Linear Regression (MLR)Case-based reasoning (CBR), Case-based reasoning with two group

clusteringCase-based reasoning with three group

clusteringModule order modeling (MOM)

NASA OSMA SAS '01

September 5-7, 2001

Organizational FlowchartOrganizational Flowchart

NASA OSMA SAS '01

September 5-7, 2001

Current Utilization of SMARTCurrent Utilization of SMARTat the ESEL Laboratoryat the ESEL Laboratory

Empirical research:– Comparative studies of software quality models– Case studies based on real world systems

NASA OSMA SAS '01

September 5-7, 2001

Case Based ReasoningCase Based Reasoning

Based on automated reasoning processesEasy to use Results are easy to understand and to

interpretLooks at past cases that are similar to the

present case in an attempt to predict or classify an instance

NASA OSMA SAS '01

September 5-7, 2001

Case Based Reasoning: Case Based Reasoning: Additional AdvantagesAdditional Advantages

The ability to alert users when a new case is outside the bounds of current experience

The ability to interpret the automated classification through the detailed description of the most similar case

The ability to take advantage of new or revised information as it becomes available

The ability for fast retrieval as the size of the library scales up

NASA OSMA SAS '01

September 5-7, 2001

Case Based ReasoningCase Based Reasoning

Working hypothesis for software quality modeling: – Current cases that are in development will more

than likely be fault-prone if past cases having similar attributes were fault-prone

NASA OSMA SAS '01

September 5-7, 2001

Case Based Reasoning: Case Based Reasoning: Comparing the CasesComparing the Cases

Similar cases to a new module or nearest neighbors are determined by similarity functions:

– Absolute Distance

– Euclidean Distance

– Mahalonobis Distance

NASA OSMA SAS '01

September 5-7, 2001

m

kikjkkij xcwd

1

||

m

kikjkkij xcwd

1

2

12 )))(((

)()( 1ijijij xcSxcd

Case Based Reasoning: Case Based Reasoning: Prediction MethodsPrediction Methods

The value of the dependent variable is estimated using the values of the dependent variables of the nearest neighbors and a solution algorithm:– Unweighted Average– Inverse-Distance Weighted Average

NASA OSMA SAS '01

September 5-7, 2001

Njij

ijij d

d

/1

/1

Nj

jiji yy ˆ

Case Based Reasoning: Case Based Reasoning: Classification MethodsClassification Methods

Used to classify a software module into a particular class (fault-prone, not fault-prone).

The types of classification methods include:– Majority Voting– Data Clustering

NASA OSMA SAS '01

September 5-7, 2001

Case Study:Case Study: System Description System Description

Two data sets were obtained from two large Windows-based applications used primarily for customizing the configuration of wireless products. The data sets were obtained from the initial release of these applications. The applications are written in C++, and they provide similar functionality.

NASA OSMA SAS '01

September 5-7, 2001

Case Study:Case Study: System Description System Description

Applications Configuration of wireless products

Environment Windows

Language C++Application 1: AENCSL*

320 million

Application 1: Actual Lines of Code

29 million

Application 2: AENCSL*

300 million

Application 2: Actual Lines of Code

27.5 million

NASA OSMA SAS '01

September 5-7, 2001

Case Study:Case Study:Data Collection EffortData Collection Effort

Data collected by engineers over several months using the available information in:– Configuration Management Systems– Problem Reporting Systems

Case Study:Case Study:Independent VariablesIndependent Variables

SYMBOL DESCRIPTIONProduct Metrics- Statement MetricsBASE_LOC Number of lines of code for the source file version just

prior to the coding phase. This represents auto-generatedcode.

SYST_LOC Number of lines of code for the source file versiondelivered to system test.

BASE_COM Number of lines of commented code for the source fileversion just prior to the coding phase. This representsauto-generated code.

SYST_COM Number of lines of commented code for the source fileversion delivered to system test.

Process Metrics INSP Number of times the source file was inspected prior to

system test.

NASA OSMA SAS '01

September 5-7, 2001

Case Study:Case Study:Accuracy EvaluationAccuracy Evaluation

Average Absolute Error:

Average Relative Error:

n

1i

ii |yy|n

1 AAE

n

i i

ii

y

yy

n 1 1

ˆ1 ARE

NASA OSMA SAS '01

September 5-7, 2001

Case Study:Case Study:Prediction ResultsPrediction Results

NASA OSMA SAS '01

September 5-7, 2001

AAE ARE1.072 0.279

Cross-ValidationEntire Data Set

Average# of

CasesAAE ARE AAE ARE1.098 0.297 1.115 0.29 10.24

Data SplittingTest Data Set

Cross-ValidationFit Data Set

Nn

Case Study:Case Study:Classification EvaluationClassification Evaluation

NASA OSMA SAS '01

September 5-7, 2001

nfp fp Actual classnfp Type II errorfp Type I errorPredicted class

Case Study:Case Study:Classification ResultsClassification Results

Entire data set:

Fit and Test data set:

ROI

Debug Costs1 14.34% 13.68% 1041:463 33.20% 4.54%2 13.91% 14.50% 721:356 21.64% 3.13%3 14.74% 14.00% 602:321 16.52% 2.31%

Remaining (%) Fault-ProneThreshold Type I Type II Prior (%) Fault- Prone

NASA OSMA SAS '01

September 5-7, 2001

Average Average

Type I Type II Type I Type II Type I Type II C I /C II

1.00 0.14 0.14 0.16 0.16 0.16 0.16 2.28 0.572.00 0.14 0.15 0.15 0.15 0.15 0.16 2.10 0.403.00 0.15 0.14 0.16 0.15 0.15 0.17 3.16 0.37

Cost Ratio

Data Spliting

Fit Data Set Test Data Set

Threshold Cross-Validation (Entire Data Set)

Cross-Validation# of

Cases

Nn

Case Study:Case Study:Return On InvestmentReturn On Investment

Classification using CBR

NASA OSMA SAS '01

September 5-7, 2001

ROIDebug Costs

1 1041:4632 721:3563 602:321

Threshold

ConclusionConclusion

A tool that matches our needsUsed for our extensive empirical workProved useful on large scale case study

– Faster– Easier– Cheaper

Ready for future enhancement

NASA OSMA SAS '01

September 5-7, 2001

ReminderReminder

We will be presenting the tool on Friday, Please feel free to visit us!

NASA OSMA SAS '01

September 5-7, 2001