Empirical Evaluations in Software Engineering Research: A Personal Perspective

112
1 Empirical Evaluations in Software Engineering Research: A Personal Perspective Ahmed E. Hassan Canada Research Chair in Software Analytics Blackberry Ultra Large Scale Systems Chair School of Computing, Queen’s University Kingston, Canada

Transcript of Empirical Evaluations in Software Engineering Research: A Personal Perspective

Page 1: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1

Empirical Evaluations in Software Engineering Research: A Personal Perspective

Ahmed E. Hassan Canada Research Chair in Software Analytics Blackberry Ultra Large Scale Systems Chair

School of Computing, Queen’s University Kingston, Canada

Page 2: Empirical Evaluations in Software Engineering Research: A Personal Perspective

2

Today’s goal:

Give concrete ideas Inspire change

Page 3: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Kingston Home of the 1,000 Islands

3

Page 4: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Kingston Home of the 1,000 Islands

3

Page 5: Empirical Evaluations in Software Engineering Research: A Personal Perspective

4

Page 6: Empirical Evaluations in Software Engineering Research: A Personal Perspective

5

Intelligence throughout the lifetime of a software system

from inception to production http://sail.cs.queensu.ca

Page 7: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1851

6

SAIL’s new home: A 5,000 sq. ft. historical building

Page 8: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1851

6

SAIL’s new home: A 5,000 sq. ft. historical building

Page 9: Empirical Evaluations in Software Engineering Research: A Personal Perspective

1851

6

SAIL’s new home: A 5,000 sq. ft. historical building

Page 10: Empirical Evaluations in Software Engineering Research: A Personal Perspective

~15 years of Mining Software Repositories

7http://msrconf.org

Page 11: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Early MSR Days

8

Page 12: Empirical Evaluations in Software Engineering Research: A Personal Perspective

9

Early Day Miner

Page 13: Empirical Evaluations in Software Engineering Research: A Personal Perspective

MSR Today: Easy Peasy!

10

Page 14: Empirical Evaluations in Software Engineering Research: A Personal Perspective

11

Data track GHTorrent

Lots of data!

Page 15: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Then

12

Now

Page 16: Empirical Evaluations in Software Engineering Research: A Personal Perspective

13

Thanks to a large scale community

effort!

Page 17: Empirical Evaluations in Software Engineering Research: A Personal Perspective

14

Yet low practitioner interest!!

Page 18: Empirical Evaluations in Software Engineering Research: A Personal Perspective

15

Even though our friendly reviewers are pleased!

Page 19: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Why?!?

16

Page 20: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Why?!?

16

We continue to raise the bar for studies

Page 21: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Empirical evaluations continues to mature and evolve

17

Page 22: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Empirical evaluations continues to mature and evolve

17

No validation

Single System

Multiple Systems

GitHub Scale

+ Practitioner Feedback

Page 23: Empirical Evaluations in Software Engineering Research: A Personal Perspective

18

No validation

Single System

Multiple Systems

GitHub Scale

+ Practitioner Feedback

Page 24: Empirical Evaluations in Software Engineering Research: A Personal Perspective

18

No validation

Single System

Multiple Systems

GitHub Scale

+ Practitioner Feedback

Are things this bad? What can we do to prevent this?

Page 25: Empirical Evaluations in Software Engineering Research: A Personal Perspective

We are killing

research areas

19

Software Visualization

Program Understand.

Page 26: Empirical Evaluations in Software Engineering Research: A Personal Perspective

We are following industry instead of leading!!

Web Apps (e.g, AJAX) Data Analytics

Scripting

Infrastructure Mobile Apps IoT

Page 27: Empirical Evaluations in Software Engineering Research: A Personal Perspective

We are doing this to ourselves!!

21

Page 28: Empirical Evaluations in Software Engineering Research: A Personal Perspective

22

Page 29: Empirical Evaluations in Software Engineering Research: A Personal Perspective

22

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 30: Empirical Evaluations in Software Engineering Research: A Personal Perspective

23

Practitioner buy-in

More studies (N+1)

Replication data

Page 31: Empirical Evaluations in Software Engineering Research: A Personal Perspective

23

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 32: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not Software Enginering

24

Page 33: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not Software Enginering

24

Mobile AppsLogs

Build SystemsLoad Testing

NoSQL Systems

Page 34: Empirical Evaluations in Software Engineering Research: A Personal Perspective

25

Page 35: Empirical Evaluations in Software Engineering Research: A Personal Perspective

25

"I found the overall presentation disjointed…. This needs to focus more

on the IR issues and less on web analysis.” SIGIR Rejection 98

Page 36: Empirical Evaluations in Software Engineering Research: A Personal Perspective

26

Practitioner buy-in

Replication data

Not Soft. Eng!

Page 37: Empirical Evaluations in Software Engineering Research: A Personal Perspective

26

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 38: Empirical Evaluations in Software Engineering Research: A Personal Perspective

27

We can early-detect

brain cancer!!!

Page 39: Empirical Evaluations in Software Engineering Research: A Personal Perspective

28

How about skin/lung cancer?!!!

Page 40: Empirical Evaluations in Software Engineering Research: A Personal Perspective

The “N+1” Critique encourages

29

Case Study Fishing

Shallow Analysis

Page 41: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Generalization of approaches instead of generalization of results

30

Page 42: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

Page 43: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

Page 44: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

Page 45: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

31

N=15..241

Total N= 2,659

Page 46: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

32

Page 47: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Encourage Meta Analysis

32

1972

Page 48: Empirical Evaluations in Software Engineering Research: A Personal Perspective

More is less!

33

Page 49: Empirical Evaluations in Software Engineering Research: A Personal Perspective

34

More studies (N+1)

Replication data

Not Soft. Eng!

Page 50: Empirical Evaluations in Software Engineering Research: A Personal Perspective

34

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 51: Empirical Evaluations in Software Engineering Research: A Personal Perspective

35

Smoking kills!

Page 52: Empirical Evaluations in Software Engineering Research: A Personal Perspective

36

What do smokers think?!

Page 53: Empirical Evaluations in Software Engineering Research: A Personal Perspective

37

54.9% of the time patients did not receive the recommended care for their condition

[New England Journal of Medicine 2003]

Page 54: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Less than half of developers have

a CS degree [stack overflow survey 2016]

38

Page 55: Empirical Evaluations in Software Engineering Research: A Personal Perspective

“A couple billion lines of code later: static checking in the real world” Dawson Engler

39

Page 56: Empirical Evaluations in Software Engineering Research: A Personal Perspective

“A couple billion lines of code later: static checking in the real world” Dawson Engler

39

Page 57: Empirical Evaluations in Software Engineering Research: A Personal Perspective

“A couple billion lines of code later: static checking in the real world” Dawson Engler

39

Page 58: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Do practitioners agree with

each other?

Page 59: Empirical Evaluations in Software Engineering Research: A Personal Perspective

4141

Page 60: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Page 61: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

Page 62: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

Validation survey Interviews

(25 Senior engineers)

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Implications

Page 63: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

Validation survey Interviews

(25 Senior engineers)

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Implications

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Verified implications

Page 64: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Practitioners do not agree with each other

42

Presentation

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

Survey (93 stakeholders)

Semi-structured interviews

(15 Key engineers)

Validation survey Interviews

(25 Senior engineers)

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------

–---------–-------–---------–-------–---------–-------–---------–-------

Interviewee’s list

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Implications

–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------–---------–-------

Verified implications

As low as 68% agreement

Page 65: Empirical Evaluations in Software Engineering Research: A Personal Perspective

43

Page 66: Empirical Evaluations in Software Engineering Research: A Personal Perspective

43

even within the same system

Page 67: Empirical Evaluations in Software Engineering Research: A Personal Perspective

44

Practitioner buy-in

More studies (N+1)

Not Soft. Eng!

Page 68: Empirical Evaluations in Software Engineering Research: A Personal Perspective

44

Practitioner buy-in

More studies (N+1)

Replication data

Not Soft. Eng!

Page 69: Empirical Evaluations in Software Engineering Research: A Personal Perspective

45

Replication data is not sufficient!

Page 70: Empirical Evaluations in Software Engineering Research: A Personal Perspective

MSR Today: Easy Peasy!

46

Page 71: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Limited knowledge of machine learning is a serious risk

47

Page 72: Empirical Evaluations in Software Engineering Research: A Personal Perspective

A Simple Example

Page 73: Empirical Evaluations in Software Engineering Research: A Personal Perspective

A Simple Example

Logistic Regression

Page 74: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Empirical Hypothesis Testing (is complex code more risky?)

49

Bugs ~ f(LOC, CC_max)

NOT about predicting bugs

Page 75: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

50

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

Page 76: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

51

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

Page 77: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

51

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

Page 78: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Including correlated metrics

51

Model M1: Bugs ~ f(CC_max, CC_avg, PAR_max, FOUT_max) Model M2: Bugs ~ f(CC_avg, CC_max, PAR_max, FOUT_max)

60% of 2000-2011 studies do not handle correlated metrics [Shihab 2012]

Page 79: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using F-measure in studies

52

Threshold = 0.2 Threshold = 0.5 Threshold = 0.8

Page 80: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

53

Page 81: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

53

Page 82: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

54

Page 83: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Resampling imbalanced data

54

Page 84: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Variable importance: Original vs under sampled data

55

Page 85: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Variable importance: Original vs under sampled data

55

Page 86: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Variable importance: Original vs under sampled data

55

Page 87: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not including control metrics

56

Model M1: Bugs ~ f( , CC_max, PAR_max, FOUT_max) Model M2: Bugs ~ f(TLOC, CC_max, PAR_max, FOUT_max)

Page 88: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not including control metrics

57

Model M1: Bugs ~ f( , CC_max, PAR_max, FOUT_max) Model M2: Bugs ~ f(TLOC, CC_max, PAR_max, FOUT_max)

Page 89: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Not including control metrics

57

Model M1: Bugs ~ f( , CC_max, PAR_max, FOUT_max) Model M2: Bugs ~ f(TLOC, CC_max, PAR_max, FOUT_max)

Page 90: Empirical Evaluations in Software Engineering Research: A Personal Perspective

58

Unseen100

out of sample bootstrap

10-fold cross validation

10x10 Cross Validation

Using 10 fold cross validation

Page 91: Empirical Evaluations in Software Engineering Research: A Personal Perspective

59

Unseen25

out of sample bootstrap

100 out of sample

bootstrap

10-fold cross validation

10x10 cross validation

Using 10 fold cross validation

Page 92: Empirical Evaluations in Software Engineering Research: A Personal Perspective

60

Unseen25

out of sample bootstrap

100 out of sample

bootstrap

10-fold cross validation

10x10 cross validation

Using 10 fold cross validation

Page 93: Empirical Evaluations in Software Engineering Research: A Personal Perspective

61

Unseen25

out of sample bootstrap

100 out of sample

bootstrap

10-fold cross validation

10x10 cross validation

Using 10 fold cross validation

Page 94: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using default ANOVA

62

Model M1: Bugs ~ f( TLOC, ….. ) Model M2: Bugs ~ f( ……,TLOC )

Page 95: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using default ANOVA

63

Model M1: Bugs ~ f( TLOC, ….. ) Model M2: Bugs ~ f( ……,TLOC )

Page 96: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Using default ANOVA

63

Model M1: Bugs ~ f( TLOC, ….. ) Model M2: Bugs ~ f( ……,TLOC )

Page 97: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Pitfalls in Analytical Modelling

Page 98: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Pitfalls in Analytical Modelling

Joint work with Kla Tantihamthavorn

Page 99: Empirical Evaluations in Software Engineering Research: A Personal Perspective

65

Pitfalls in Analytical Modelling

Page 100: Empirical Evaluations in Software Engineering Research: A Personal Perspective

66

Pitfalls in Analytical Modelling

Page 101: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Page 102: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Encourages community mentorship

Page 103: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Encourages community mentorship

Works even for industrial data

Page 104: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Open Scripts

67

Encourages community mentorship

Works even for industrial data

Ensures research transparency

Page 105: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 106: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 107: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 108: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 109: Empirical Evaluations in Software Engineering Research: A Personal Perspective

68

Page 110: Empirical Evaluations in Software Engineering Research: A Personal Perspective

N

69

Where do we go from

here?

Page 111: Empirical Evaluations in Software Engineering Research: A Personal Perspective

• Trailblazing research should be encouraged and defended

• Inaccurate analysis is a serious and growing threat

• Deeper analysis is more valuable than large scale studies

• Generalization is not feasible nor desired by practitioners

• Industry is a partner not an idol

Parting thoughts

70

http://sail.cs.queensu.ca

Page 112: Empirical Evaluations in Software Engineering Research: A Personal Perspective

Think Different!