Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source...

56
Is the Pareto Principle Applicable to the Core Teams of GitHub Projects? Kazuhiro Yamashita Yasutaka Kamei Shane McIntosh Naoyasu Ubayashi Ahmed E. Hassan

Transcript of Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source...

Page 1: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Is the Pareto Principle Applicable to the Core Teams

of GitHub Projects?

KazuhiroYamashita

YasutakaKamei

ShaneMcIntosh

NaoyasuUbayashi

Ahmed E. Hassan

Page 2: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Core developers play a critical role

in software development

2

Core developers are responsible for guiding and coordinating the development of an OSS project.

The most productive developers who have made roughly 80% of the total contributions.

Nakakoji

Mockus

Page 3: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

In fact, some argue that core developers in OSS projects follow the Pareto Principle

5Effort Result

80% 80%

20%20%

Page 4: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Pareto Principle in Software Development

6

20%

80% 20%

80%

ProjectDevelopers Artifacts

Page 5: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Prior studies have arrived at mixed conclusions about core teams and the Pareto Principle

7

Pareto Non-Pareto

Goeminne IWSQM

RoblesRAMSS

MockusTOSEM

GeldenhuysECSEAA

KochISJ Dinh-Trong

TSE

The results depend on small number of case study systems

Other

Page 6: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Prior studies have arrived at mixed conclusions about core teams and the Pareto Principle

8

< 10 or 15 Other

Goeminne IWSQM

RoblesRAMSS

MockusTOSEM

GeldenhuysECSEAA

KochISJ

Dinh-TrongTSE

Page 7: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

19

Applicability of the Pareto PrincipleNumber of Core Developers

Page 8: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

20

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

Page 9: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Collecting and analyzing GitHub data to study core team activity

21

Filter Heuristics

Core

Non-Core

Core

Non-Core

Calc Prop

Projects

Core

Non-CoreClassifyCommits

Core Team Size Activity

Page 10: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Collecting and analyzing GitHub data to study core team activity

22

Filter Heuristics

Core

Non-Core

Projects

22

Core

Non-Core

Calc Prop

Core

Non-CoreClassifyCommits

Core Team Size Activity

Page 11: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Preprocessing GitHub data to handle forks, duplicates, and to remove immature projects

23

8,510,504 repositories -> 2,496 repositories

Page 12: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Collecting and analyzing GitHub data to study core team activity

24

Filter Heuristics

Core

Non-Core

Projects

24

Core

Non-Core

Calc Prop

Core

Non-CoreClassifyCommits

Core Team Size Activity

Page 13: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Using heuristics to identify core team members

26Commit-based LOC-based Access-based

Core Core Core

Page 14: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

29A B C D

Our commit-based core contributor heuristic

Number of Commits

= Commit

Page 15: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Step1: Sort contributors by their number of commits

30A BC D

Number of Commits

Page 16: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Step2: Compute the proportion of commits that each contributor

32A BC D

60% 20% 10% 10%Commits ratio

Page 17: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Step3: Core contributors are those developers below the 0.8 cumulative contribution cutoff

33A BC D

0.8

1.0

0.6

Cumulativeratio

Pct. CoreDev2/4*100 = 50%

Num CoreDev2

Page 18: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Collecting and analyzing GitHub data to study core team activity

35

Filter Heuristics

Core

Non-Core

Projects

35

Core

Non-Core

Calc Prop

Core

Non-CoreClassifyCommits

Core Team Size Activity

Page 19: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

36

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

Page 20: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

37

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

Page 21: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Collecting and analyzing GitHub data to study core team activity

38

Filter Heuristics

Core

Non-Core

Projects

38

Core

Non-Core

Calc Prop

Core

Non-CoreClassifyCommits

Core Team Size Activity

Page 22: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Our approach to study Core Team Size

40

30%20%10%Percentage of Core Devs

Compliance with the Pareto Principle

Stratify projects along the confounding factors

Small Medium Large Small Medium Large Small Medium LargeLOC Total Author Age

The example project does not follow the Pareto Principle

Page 23: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Core team proportions are widespread

43

Commit-based Divide by LOC

Page 24: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Often, there are fewer than 15 core developers in a projects

44

Number of core developers in projects

88% 98% 96%Commit-Based LOC-Based Access-Based

Page 25: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

45

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

More than half projects do not follow the Pareto principle

Most of projects have 15 or less core developers

Page 26: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

48

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

More than half projects do not follow the Pareto principle

Most of projects have 15 or less core developers

Page 27: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Collecting and analyzing GitHub data to study core team activity

49

Filter Heuristics

Core

Non-Core

Projects

49

Core

Non-Core

Calc Prop

Core

Non-CoreClassifyCommits

Core Team Size Activity

Page 28: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Our approach to study activity

50

By using the keywords, we classify the commits.

DevelopmentActivity Type KeywordsForward Engineering implement, add, requestMaintenanceReengineering optimiz, adjust

Corrective Engineering bug, fix, issue, error

Management license, formatting, TODO

Page 29: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

No big differences in proportions of development activities

54

Commit-Based LOC-Based Access-Based

Page 30: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

55

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

More than half projects do not follow the Pareto principle

Most of projects have 15 or less core developers

There are no big differences between

core and non-core activities

Page 31: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Overview of our study of core teams on GitHub

56

Core and Non-Core Developers Activities

Applicability of the Pareto PrincipleNumber of Core Developers

More than half projects do not follow the Pareto principle

Most of projects have 15 or less core developers

There are no big differences between

core and non-core activities

Page 32: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Extremely large core team may be interesting

58

Heuristic -15 16-20 21-50 51-100 101-

Commit-Based

2,197 98 137 17 47

LOC-Based

2,454 15 13 4 10

Access-Based

1,164 24 24 0 0

Page 33: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Many projects face a risk of bus factor

59

Commit-Based LOC-Based Access-Based43% (Core=1: 8%) 81% (Core=1: 24%) 54% (Core=1: 21%)

In fact, most of projects have less than 5 core developers

Page 34: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Conclusion

63

Page 35: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

64

Page 36: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Core Developer• additional slides

65

Page 37: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Additional description of our definition

66

0.8

1.0

A B C D E Depend on Name

Page 38: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Commit-based

67

Age Total Author

Page 39: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

LOC-based

68

Age Total Author

LOC

Page 40: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Access-based

69

Age Total Author

LOC

Page 41: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

70

8,510,504 repositories -> 4,618 repositories

Page 42: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

71

Page 43: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

72

(1) Filter projects by GHTorrent

Filter forked repositories.

Page 44: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Fork

73

One of the features of GitHub

Fork (clone)

Original Repository

Fork Repository

Pull Request

Page 45: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

74

(1) Filter projects by GHTorrent

Filter forked repositories.

Filter less than 10 devs repositories.

Page 46: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

75

(1) Filter projects by GHTorrent

Filter forked repositories.

Filter less than 10 devs repositories.

Filter repositories which is developed outside of GitHub.

Page 47: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

76

(1) Filter projects by GHTorrent

Filter forked repositories.

Filter less than 10 devs repositories.

Filter repositories which is developed outside of GitHub.

8,510,504 repositories -> 4,618 repositories

Page 48: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

77

Page 49: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

78

(2) Clone repositories

4,618 repositories -> 4,154 repositories

local server

clone

Page 50: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

79

Page 51: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

80

(3) Filter duplicate projects

Project A Fork of Afork

clone

Project Bregister

Clone of A

Page 52: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

81

(3) Filter duplicate projects

4,618 repositories -> 3,533 repositories

Project A Project B

Compare SHAs

c87cce1e1a7260f40ccb5455e44c8b67f28651fa5e

655b8be757dd93a4cf3718145880cf484e34e63bde

Page 53: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

82

Page 54: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

83

(4) Calculate metrics

LOCTotal CommitsTotal Authors

AgeRepository

Page 55: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

84

Page 56: Revisiting the Applicability of the Pareto Principle to Core Development Teams in Open Source Software Projects

Data Extraction

85

(5) Filter projects by metrics

4,618 repositories -> 2,496 repositories

Filter less than 10 devs repositories.

Filter less than 1,000 LOC repositories.