Situated learning among open source software developers
-
Upload
josef-hardi -
Category
Technology
-
view
701 -
download
0
description
Transcript of Situated learning among open source software developers
A Master Thesis Presentation
(Dartington Pottery Training Workshop, 1978)
Author:
Josef HardiEuropean Master in Software Engineering
Supervisors:
Prof. Barbara RussoDr. Richard Torkar
Situated Learning in Open Source Software Developers:
The Case of Google Chrome Project
Thursday, August 4, 2011
Introduction
• Situated Learning is the learning that occurs in workplaces [Brown et al., 1989].
• No separation between ‘knowing’ and ‘doing’.
• Situated learning is primarily practiced by the community of practitioners.
1/18Thursday, August 4, 2011
Existing Findings
2/18
• Learning curve effect.
• “That the more times a task has been performed, the less time will be required on each subsequent iteration.” [T.P. Wright, 1936]
• [Huntley, 2003]: Mozilla is reported to exhibit a strong learning curve compared to Apache.
• [Au et al., 2009]: Learning is universally present in OSS projects.
Thursday, August 4, 2011
• Data are taken from each individual instead of from an aggregation of individuals.
• More insights to individual characteristics.
• i.e., Knowledge depreciation and team roles as factors that affect the learning process.
Distinctions in this Thesis
3/18Thursday, August 4, 2011
4/18
Research Question 1:Is learning present in
OSS developers?
Hypothesis 1:
There is a relation between the accumulated
experience and the performance.
Hypothesis 2:Knowledge depreciates over
time among the OSS developers.
Hypothesis 3:Core developers resolve
issues faster.
Research Question 2:What are the factors that
affect learning?
Thursday, August 4, 2011
• Google Chrome Project.
• Duration: 10 months ~ 10 releases (December 2008 - October 2009).
Case Study
5/18Thursday, August 4, 2011
Construct Input Data
Research Methodology
6/18
Data CollectionData exploration
Review Interaction Data
Issue Report Data ExperiencePerformance Team Role
Identification of Learning Curve Models and Data Fitting
1 2
34
Thursday, August 4, 2011
Research Methodology:
Data Collection
7/18
Issue Report Data(5,160 entries)
1. Unrelated project areas,2. Invalid issue status,3. Empty owner name.
Issue Report =[ID, Type, Area, Status, Owner, Open date,
Assigned date, Started date, Close date]
1 2 3 4
Thursday, August 4, 2011
8/18
Interaction =[Owner, Reviewer, Comment date]
Review Interaction Data(12,037 entries)
"ben","sky",1226700214"ben","sky",1226706864"ben","pkasting",1226707765"mal","tony",1226809276"sgk","tony",1226874776"phajdan.jr","deanm",1227808551"phajdan.jr","deanm",1227809341"phajdan.jr","mark",1228496086...
Research Methodology:
Data Collection1 2 3 4
Thursday, August 4, 2011
Issue Report Data
Issue Report Data
Releases
Dev
elop
ers
...
Experience
Releases
Dev
elop
ers
...
Performance
9/18
Research Methodology:
Data Exploration
Measure Experience Number of resolved issues
Measure PerformanceAverage of issue resolution time.
Sample = 274 developers
1 2 3 4
Thursday, August 4, 2011
10/18
Research Methodology:
Data Exploration
Review Interaction
Data
Releases
Dev
elop
ers
...
Team RoleEstimate Team Role
Core and periphery structure model[Borgatti, 1999]
Sample = 274 developers
1 2 3 4
• Core entails a dense, cohesive structure and periphery entails a sparse, loose structure.
• The estimation is performed by using UCINET.
Thursday, August 4, 2011
Research Methodology:
Construct Input Data
11/18
274 Developers
Not all of them working in a long-term.
Participate for at least 8 releases
38 Long-term Contributors
Refine
new longitudinal data
sets
1 2 3 4
Thursday, August 4, 2011
Releases
Ave
rage
tim
e of
res
olvi
ng is
sues
(log
days
)
12/18
Input data set:
PerformanceThe data distribution in the group of long-term developers
Thursday, August 4, 2011
Am
ount
of r
esol
ved
issu
es(N
)
13/18
The data distribution in the group of long-term developers
Releases
Input data set:
Experience
Thursday, August 4, 2011
46%54%
R1
39%
61%
R2
39%
61%
R3
45%55%
R4
53% 47%
R5
47% 53%
R6
47% 53%
R7
42%58%
R8
42%58%
R9
39%
61%
R10
14/18
The team composition in the group of long-term developers
Input data set:
Team Role
Thursday, August 4, 2011
Note
Research Methodology:
Identification of Learning Curve Models and Data Fitting
15/18
1 2 3 4
Model 1:
Model 2:
Thursday, August 4, 2011
Result Summary
Hypothesis Variable Model 1 Model 2 Supported?
H1 KnowledgeStock -0.01*** -0.01*** Yes
H2 Lambda 0.94*** 0.94*** Yes
H3 TeamRole NA 0.18 No
16/18
*** Statistically significant p < 0.001
Thursday, August 4, 2011
• The improvement in the solving issues might be caused by the improvement in the system design.
• Some of the issue data are incomplete
Threats to ValidityInternal Validity
Construct Validity
• The estimation of Core and Periphery structure might not reflect the real situation. However, the communication pattern is the best indicator.
External Validity
• Both models have a very low statistical prediction power (less than 5%).
17/18Thursday, August 4, 2011
• I affirmed that learning is present in open source software developers.
• Knowledge does not significantly depreciate in the Google Chrome team.
• It is inconclusive to claim core developers work faster than those who are in the periphery.
• Methodological contribution: A method to harvest and analyze data from code review.
Conclusion
18/18Thursday, August 4, 2011
Thank you!
Bolzano, 8 October 2010Thursday, August 4, 2011