Predicting Faults from Cached History

Post on 25-Jun-2015

1.816 views 4 download

Tags:

description

29th International Conference on Software Engineering (ICSE 2007), ACM SIGSOFT Distinguished Paper Award winner.

Transcript of Predicting Faults from Cached History

BugCacheBugCachePredicting Defects

Sung Kim • MITTom Zimmermann • Saarland

UniversityJim Whitehead • UC Santa Cruz

Andreas Zeller • Saarland University

The Problem

How should we How should we allocate our allocate our

resources for resources for quality quality

assurance?assurance?

WhichWhich files files should we focus should we focus

on?on?

WhichWhich files are files are most bug-most bug-

prone?prone?

The Problem

Where are bugs?

Temporal locality:Temporal locality:Defected files are Defected files are likely to have more likely to have more

soon.soon.[Ostrand, Weyuker][Ostrand, Weyuker]

In modified files!In modified files![Nagappan et al.][Nagappan et al.]

In new files!In new files![Graves et al.][Graves et al.]

Spatial locality:Spatial locality:In nearby other In nearby other

bugs!bugs! [Zimmermann et al.][Zimmermann et al.]

Our Solution

• List of most bug-prone files• Combine all bug occurrence models

Cache

Bug Cache

10% filesmost defect-prone

all files

load

pre-fetch

replacementNear by: co changes

Outline•BugCache Model•Cache update•Replacement Policies•Pre-fetch

•Evaluation

•7 open source projects

•Related Work

•Summary

Bug Cache

load

if m

issed load if m

issed

pre

-fe

tch

A

Fix change

Non-fix change

Fix change

Change historyB C

Cache Model

Miss

Cache size: 2

A B C

C

Cache Update

Parameter: Block size (neighborhood size)

• Load missed files • Load nearby files (spatial locality)

FileNumber of

common changes with .

140

C

A

B

D

4B

Cache Model

HitMiss Miss

Cache size: 2Block size: 2

Hit

A B C A DC B B A

CA

B

Which one should be replaced?

Replacement Policies

•Least recently used (LRU)Unload the files that have the least recently found defect.

•Least frequently changed (CHANGE)Unload the files that have the fewest changes.

•Least frequent defects (BUG)Unload the files that have the fewest defects.

Parameter: Replacement Policy

Cache Model

HitMiss Miss

Cache size: 2Block size: 2

Hit

Replacement: BUG

A B C A DC B B A

CA

Block size: 1Cache size: 2File LRU CHANGE BUG

-5 2 2-3 3 1B

C

BUG21

(replace)

B

Pre-fill and pre-fetch

•Pre-fill

•Fill cache with largest files (LOC)

•Pre-fetch

•Load changed files

•Load added files

•Unload deleted files

Parameter: Pre-fetch size

Cache Model

HitMiss Miss

Cache size: 2Block size: 2Replacement: BUGPre-fetch size: 1

A B C A DC B B A

CA

B

Hit rate = #Hits / #Defects = 25%

Pre-fill

Pre-fetch

Miss

D

Pre-fetch

Evaluation

PostgreSQLjEdit

Mozilla

Columba

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

File Function

Hit Rates

Cache size = 10% Block/pre-fetch size = 50% of the cache size

Replacement policy = LRU

67

76

85

83

93

79

71

43

59

55

46

69

67

60

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

File Function

43

59

55

46

69

67

60

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

File Function

Exhaustive Evaluation

•Cache size: fixed to 10%

•Vary block size:0% to 100% of cache size

•Vary pre-fetch size: 0% to 100% of cache size

•Vary replacement: LRU, CHANGE, BUG

Function Level Default vs Optimal Options

43

59

55

46

69

67

60

46

59

55

49

72

68

62

0 25 50

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

Default OptimalCache size = 10% of all

functions/methods

Function Level Optimal Hit Rates

ProjectFunctio

nApache 1.3ColumbaEclipseJEditMozillaPostgreSQL Subversion

2,1138,428

33,2145,489 8,2038,6593,693

Cache size = 10% of all functions/methods

Hit rate62%68%72%49%55%59%46%

Block15%57%20%85%41%29%71%

Pre-fetch17%20% 4% 8%14%17%14%

Replace

BUGBUGBUGBUGLRULRUBUG

File Level Default vs Optimal Options

67

76

85

83

93

79

71

73

79

88

85

95

83

82

0 25 50 75

Subversion

PostgreSQL

Mozilla

JEdit

Eclipse

Columba

Apache 1.3

Default OptimalCache size = 10% of all files

File Level Optimal Hit Rates

Project FilesApache 1.3ColumbaEclipseJEditMozillaPostgreSQL Subversion

1541,4283,330

420396598255

Cache size = 10% of all files

Hit rate82%83%95%85%88%79%73%

Block50%59%20%23%23%22%42%

Pre-fetch0%0%0%0%0%0%0%

Replace

LRUBUGLRULRULRULRULRU

0 25 50 75 100

BugCache. Top 10%

Hassan et al. Top 10%

Ostrand et al. Top 20%

Khoshgoftaar et al. Top 20%

Khoshgoftaar et al. Top 10%

Related Work

0 25 50 75 100

BugCache. Top 10%

Hassan et al. Top 10%

Ostrand et al. Top 20%

Khoshgoftaar et al. Top 20%

Khoshgoftaar et al. Top 10% In previous work,10% predicts 44%~78% 20% predicts 71~93%

10% BugCache predicts 73~95%

Summary

BugCacheBugCachePredicting Defects

Sung Kim • MITTom Zimmermann • Saarland

UniversityJim Whitehead • UC Santa Cruz

Andreas Zeller • Saarland University