Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping...

32
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo

Transcript of Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping...

Page 1: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Software Quality Ranking: Bringing Order to Software

Modules in Testing

Fei Xing

Michael R. Lyu

Ping Guo

Page 2: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

2

Outline

Background Support Vector Machine

Basic theory Ranking SVM Other types of SVM

Our proposed framework Experiments Conclusions

Page 3: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

3

Background

Modern society is fast becoming dependent on software products and systems.

Achieving high reliability is one of the most important challenges facing the software industry.

Software quality models are in desperate need.

Page 4: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

4

Background

Software quality model A software quality model is a tool for focusing

software enhancement efforts. Such a model yield timely predictions on a

module-by-module basis, enabling one to target high-risk modules.

Page 5: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

5

Background

Software complexity metrics A quantitative description of program attributes. Closely related to the distribution of faults in

program modules. Playing a critical role in predicting the quality of

the resulting software.

Page 6: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

6

Background

Software quality prediction Software quality prediction aims to evaluate

software quality level periodically and to indicate software quality problems early.

Investigating the relationship between the number of faults in a program and its software complexity metrics

Page 7: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

7

Several different techniques have been proposed to develop predictive software metrics for the classification of software program modules into fault-prone and non fault-prone categories.

Discriminant analysis, Factor analysis, Classification trees, Pattern recognition,

EM algorithm, Feedforward neural

networks, Random forests

Related work

Page 8: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

8

The limitation of current models

Two categories can not fully reflect the characteristics (human, time, equipment, etc) are limited, some of fault-prone modules should be tested with higher priority

An ideal approach is ranking all the modules according to their fault-prone level

Page 9: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

9

Research Objectives

In search of a well accepted mathematical model for software quality ranking.

Lay out the integrated solution of software quality prediction for real-world project.

Perform experimental comparison for the assessment of the proposed model.

Page 10: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

10

Support Vector Machine

Introduced by Vapnik in the late 1960s on the foundation of statistical learning theory

Traced back to the classical structural risk minimization (SRM) approach

Generalize well even in high dimensional spaces under small training sample conditions

Page 11: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

11

The current state-of-the-art classifier

Decision Plane

Support Vectors

Margin

Basic theory of SVM

Page 12: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

12

The Optimal Separating Hyperplane Place a linear boundary between the two

different classes, and orient the boundary in such a way that the margin is maximized.

The optimal hyperplane is required to satisfy the following constrained minimization as:

( ) 0g x w x b

. . [( ) ] 1 0i is t y w x b

21min{ }

2w

Basic theory of SVM

Page 13: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

13

The Generalized Optimal Separating Hyperplane For the linearly non-separable case, positive slack

variables are introduced:

C is used to weight the penalizing variables , and a larger C corresponds to assigning a higher penalty to errors.

. . [( ) ] 1 0i i is t y w x b

0i

2

1

1min

2

n

ii

C

w

i

Basic theory of SVM

Page 14: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

14

Rank each sample to an appropriate position. For linear case, find a weight vector w which

makes the maximum number of the following inequalities hold:

Constrained optimization problem:

i

Ranking SVM

Page 15: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

15

Other types of SVM

SVM with risk control Transductive Support Vector Machines Support Vector Regression

Page 16: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

16

Ou

r framew

ork

Page 17: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

17

Experiments

Data Description Medical Imaging System (MIS) data set. 11 software complexity metrics were

measured for each of the modules Change Reports (CRs) represent faults

detected.

Page 18: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

18

Total lines of code including comments (LOC) Total code lines (CL) Total character count (TChar) Total comments (TComm) Number of comment characters (MChar) Number of code characters (DChar) Halstead’s program length (N) Halstead’s estimated program length ( ) Jensen’s estimator of program length (NF ) McCabe’s cyclomatic complexity (v(G)) Belady’s bandwidth metric (BW), ……

Metrics of MIS data

Page 19: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

19

Experiments on Model Selection

The later the errors are found, the higher the risk will be

Risk increases as time goes by

e.g. r(t)=bt2 r(t)=aebt

Page 20: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

20

Experiments on Model Selection

Measure of risk

Page 21: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

21

Experiments on Model Selection

Software Development Process Simulation, Case1

# of developed software modules are increasing at a speed of 40 modules at each time advancement

10 percent of all the modules have fault data available

The modules with fault data for training model The 40 newly developed modules for testing

Page 22: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

22

Experiments on Model Selection

Page 23: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

23

Experiments on Model Selection

Software Development Process Simulation, Case2

# of developed software modules are increasing at a speed of 40 modules at each time advancement

The fault data of all the previous modules can be obtained

The modules with fault data for training model The 40 newly developed modules for testing

Page 24: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

24

Experiments on Model Selection

Page 25: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

25

Comparison of ranking models

Applied models LOC: Lines of code PCA: Principal Component Analysis Regression tree SVR: Support Vector Regression Ranking SVM

Evaluation criteria Normalized Discounted Cumulative Gain (nDCG) Average Distance Measure (ADM)

Page 26: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

26

Normalized Discounted Cumulative Gain (nDCG)

The Gain (G) of each software module is its fault-prone score

Page 27: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

27

Comparison on nDCG measure

Page 28: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

28

Average Distance Measure (ADM)

Page 29: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

29

Comparison on ADM measure

Page 30: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

30

Features of this work Introduce ranking model instead of

classification model into software quality prediction

Propose an integrated framework of software quality prediction on real-world project

Discussion

Page 31: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

31

Conclusions

Ranking SVM offers a promising technique in software module ranking.

The ranking model is more efficient than classification model on the case of enough fault data.

For the case of limited fault data, classification model is better than ranking model

Page 32: Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

The end

Thanks

Q&A