Achieving High Software Achieving High Software Reliability Using a Faster, Reliability Using a Faster,
Easier and Cheaper MethodEasier and Cheaper Method
NASA OSMA SAS '01
September 5-7, 2001
Taghi M. Khoshgoftaar
The Software Measurement Analysis and Reliability Toolkit
OutlineOutlineIntroductionOverview of SMARTData Analysis and Modeling FeaturesCurrent Utilization of SMARTCase-Based ReasoningEmpirical StudyConclusions
NASA OSMA SAS '01
September 5-7, 2001
IntroductionIntroduction
Necessity of an integrated tool for efficient empirical software quality research
Commercial packages are available but expensive and don’t always match our exact needs
In house development gives us availability, flexibility and possibility to evolve
NASA OSMA SAS '01
September 5-7, 2001
Overview of SMARTOverview of SMART
First version back in 1998Current version 2.0Written in Microsoft Visual C++Runs on Microsoft Windows based
platformsUser friendly GUI
NASA OSMA SAS '01
September 5-7, 2001
SMART GUISMART GUI
NASA OSMA SAS '01
September 5-7, 2001
SMART GUISMART GUI
NASA OSMA SAS '01
September 5-7, 2001
Features included in SMARTFeatures included in SMARTData managementMultiple Linear Regression (MLR)Case-based reasoning (CBR), Case-based reasoning with two group
clusteringCase-based reasoning with three group
clusteringModule order modeling (MOM)
NASA OSMA SAS '01
September 5-7, 2001
Organizational FlowchartOrganizational Flowchart
NASA OSMA SAS '01
September 5-7, 2001
Current Utilization of SMARTCurrent Utilization of SMARTat the ESEL Laboratoryat the ESEL Laboratory
Empirical research:– Comparative studies of software quality models– Case studies based on real world systems
NASA OSMA SAS '01
September 5-7, 2001
Case Based ReasoningCase Based Reasoning
Based on automated reasoning processesEasy to use Results are easy to understand and to
interpretLooks at past cases that are similar to the
present case in an attempt to predict or classify an instance
NASA OSMA SAS '01
September 5-7, 2001
Case Based Reasoning: Case Based Reasoning: Additional AdvantagesAdditional Advantages
The ability to alert users when a new case is outside the bounds of current experience
The ability to interpret the automated classification through the detailed description of the most similar case
The ability to take advantage of new or revised information as it becomes available
The ability for fast retrieval as the size of the library scales up
NASA OSMA SAS '01
September 5-7, 2001
Case Based ReasoningCase Based Reasoning
Working hypothesis for software quality modeling: – Current cases that are in development will more
than likely be fault-prone if past cases having similar attributes were fault-prone
NASA OSMA SAS '01
September 5-7, 2001
Case Based Reasoning: Case Based Reasoning: Comparing the CasesComparing the Cases
Similar cases to a new module or nearest neighbors are determined by similarity functions:
– Absolute Distance
– Euclidean Distance
– Mahalonobis Distance
NASA OSMA SAS '01
September 5-7, 2001
m
kikjkkij xcwd
1
||
m
kikjkkij xcwd
1
2
12 )))(((
)()( 1ijijij xcSxcd
Case Based Reasoning: Case Based Reasoning: Prediction MethodsPrediction Methods
The value of the dependent variable is estimated using the values of the dependent variables of the nearest neighbors and a solution algorithm:– Unweighted Average– Inverse-Distance Weighted Average
NASA OSMA SAS '01
September 5-7, 2001
Njij
ijij d
d
/1
/1
Nj
jiji yy ˆ
Case Based Reasoning: Case Based Reasoning: Classification MethodsClassification Methods
Used to classify a software module into a particular class (fault-prone, not fault-prone).
The types of classification methods include:– Majority Voting– Data Clustering
NASA OSMA SAS '01
September 5-7, 2001
Case Study:Case Study: System Description System Description
Two data sets were obtained from two large Windows-based applications used primarily for customizing the configuration of wireless products. The data sets were obtained from the initial release of these applications. The applications are written in C++, and they provide similar functionality.
NASA OSMA SAS '01
September 5-7, 2001
Case Study:Case Study: System Description System Description
Applications Configuration of wireless products
Environment Windows
Language C++Application 1: AENCSL*
320 million
Application 1: Actual Lines of Code
29 million
Application 2: AENCSL*
300 million
Application 2: Actual Lines of Code
27.5 million
NASA OSMA SAS '01
September 5-7, 2001
Case Study:Case Study:Data Collection EffortData Collection Effort
Data collected by engineers over several months using the available information in:– Configuration Management Systems– Problem Reporting Systems
Case Study:Case Study:Independent VariablesIndependent Variables
SYMBOL DESCRIPTIONProduct Metrics- Statement MetricsBASE_LOC Number of lines of code for the source file version just
prior to the coding phase. This represents auto-generatedcode.
SYST_LOC Number of lines of code for the source file versiondelivered to system test.
BASE_COM Number of lines of commented code for the source fileversion just prior to the coding phase. This representsauto-generated code.
SYST_COM Number of lines of commented code for the source fileversion delivered to system test.
Process Metrics INSP Number of times the source file was inspected prior to
system test.
NASA OSMA SAS '01
September 5-7, 2001
Case Study:Case Study:Accuracy EvaluationAccuracy Evaluation
Average Absolute Error:
Average Relative Error:
n
1i
ii |yy|n
1 AAE
n
i i
ii
y
yy
n 1 1
ˆ1 ARE
NASA OSMA SAS '01
September 5-7, 2001
Case Study:Case Study:Prediction ResultsPrediction Results
NASA OSMA SAS '01
September 5-7, 2001
AAE ARE1.072 0.279
Cross-ValidationEntire Data Set
Average# of
CasesAAE ARE AAE ARE1.098 0.297 1.115 0.29 10.24
Data SplittingTest Data Set
Cross-ValidationFit Data Set
Nn
Case Study:Case Study:Classification EvaluationClassification Evaluation
NASA OSMA SAS '01
September 5-7, 2001
nfp fp Actual classnfp Type II errorfp Type I errorPredicted class
Case Study:Case Study:Classification ResultsClassification Results
Entire data set:
Fit and Test data set:
ROI
Debug Costs1 14.34% 13.68% 1041:463 33.20% 4.54%2 13.91% 14.50% 721:356 21.64% 3.13%3 14.74% 14.00% 602:321 16.52% 2.31%
Remaining (%) Fault-ProneThreshold Type I Type II Prior (%) Fault- Prone
NASA OSMA SAS '01
September 5-7, 2001
Average Average
Type I Type II Type I Type II Type I Type II C I /C II
1.00 0.14 0.14 0.16 0.16 0.16 0.16 2.28 0.572.00 0.14 0.15 0.15 0.15 0.15 0.16 2.10 0.403.00 0.15 0.14 0.16 0.15 0.15 0.17 3.16 0.37
Cost Ratio
Data Spliting
Fit Data Set Test Data Set
Threshold Cross-Validation (Entire Data Set)
Cross-Validation# of
Cases
Nn
Case Study:Case Study:Return On InvestmentReturn On Investment
Classification using CBR
NASA OSMA SAS '01
September 5-7, 2001
ROIDebug Costs
1 1041:4632 721:3563 602:321
Threshold
ConclusionConclusion
A tool that matches our needsUsed for our extensive empirical workProved useful on large scale case study
– Faster– Easier– Cheaper
Ready for future enhancement
NASA OSMA SAS '01
September 5-7, 2001
ReminderReminder
We will be presenting the tool on Friday, Please feel free to visit us!
NASA OSMA SAS '01
September 5-7, 2001
Top Related