Sattose talk
-
Upload
alexander-serebrenik -
Category
Technology
-
view
215 -
download
0
description
Transcript of Sattose talk
![Page 1: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/1.jpg)
Human Aspects of So0ware Engineering
Alexander Serebrenik Eindhoven University of
Technology, NL
![Page 2: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/2.jpg)
![Page 3: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/3.jpg)
1970s
…
1998 2008
2014
SocialCom, SocInfo, SBP, CHASE
MSR
books, conferences
![Page 4: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/4.jpg)
MS Windows post-‐release fault predic5on
Precision predicted & correct / predicted
Recall predicted & correct / correct
Code churn 78.6% 79.9%
Code complexity
79.3% 66.0%
Code coverage 83.8% 54.5%
Code dependencies
74.4% 69.9%
Organiza5onal structure
86.2% 84.0%
Socio-‐technical network
76.9% 70.5%
![Page 5: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/5.jpg)
![Page 6: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/6.jpg)
![Page 7: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/7.jpg)
![Page 8: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/8.jpg)
> 90% in WordPress & Drupal > 95% in FLOSS surveys > 87% in GNOME > 70% in soBware-‐related jobs (NSF)
MEN
![Page 9: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/9.jpg)
FLOSS 2013
YOUNG(?)
median
Is Programming Knowledge Related to Age? An Explora8on of Stack Overflow. Morrison, P., Murphy-‐Hill, E. MSR 2013. FLOSS 2013: A survey dataset about free soPware contributors: challenges for cura8ng, sharing and combining. Robles, G., Arjona-‐Reina, L., Vasilescu, B., Serebrenik, A., Gonzalez-‐Barahona, J.M. MSR 2014
![Page 10: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/10.jpg)
FLOSS 2013
![Page 11: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/11.jpg)
FLOSS 2013
![Page 12: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/12.jpg)
FLOSS 2013
Europe,US,CA,AUBrazil/Argen5na
![Page 13: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/13.jpg)
![Page 14: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/14.jpg)
![Page 15: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/15.jpg)
PAGE 15 08/07/14
.cpp .po
.jpg
/test/
/library/ .doc
makefile .sql .conf
![Page 16: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/16.jpg)
Occasional contributors
Frequent contributors
On the variation and specialization of workload---A case study of the Gnome ecosystem community Vasilescu, B., Serebrenik, A., Goeminne, M., Mens, T. Empirical Sw Engg
![Page 17: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/17.jpg)
For which of those ac[vi[es would Joe Average be more ac[ve than
Jane Average?
For which ac[vi[es no differences would be observed?
![Page 18: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/18.jpg)
PAGE 18 08/07/14
.cpp .po
.jpg
/test/
/library/ .doc
makefile .sql .conf
![Page 19: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/19.jpg)
PAGE 19 08/07/14
.cpp .po
.jpg
/test/
/library/ .doc
makefile .sql .conf
![Page 20: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/20.jpg)
PAGE 20 08/07/14
.cpp .po
.jpg
/test/
/library/ .doc
makefile .sql .conf
No differences in preferences for activities, differences in the amount of commits
![Page 21: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/21.jpg)
sample
Engage for longer
Ask more questions
No diff in #answers
Women can contribute to SO
but choose not to!
Gender, representation and online participation: A quantitative study, Vasilescu, B., Capiluppi, A., and Serebrenik, A., Interacting with Computers
![Page 22: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/22.jpg)
sample
No significant differences in #questions, #answers, length of engagement
Disengagement of women is activity/platform specific!
![Page 23: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/23.jpg)
![Page 24: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/24.jpg)
• More ac[ve commi_ers ask more on SO • More ac[ve askers commit more on GitHub
StackOverflow and GitHub: Associa8ons between soPware development and crowdsourced knowledge, Vasilescu, B., Filkov, V. and Serebrenik, A., In Social Compu8ng, 2013, IEEE.
![Page 25: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/25.jpg)
StackOverflow and GitHub: Associa8ons between soPware development and crowdsourced knowledge, Vasilescu, B., Filkov, V. and Serebrenik, A., In Social Compu8ng, 2013, IEEE.
![Page 26: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/26.jpg)
• Contributors to both communi[es (more likely developers than non-‐developers) are more ac[ve than those who focus on just one.
How Social Q&A sites are changing knowledge sharing in open source soPware communi8es, Vasilescu, B., Serebrenik, A., Devanbu, P.T. and Filkov, V., In CSCW 2014, ACM.
help
![Page 27: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/27.jpg)
How Social Q&A sites are changing knowledge sharing in open source soPware communi8es, Vasilescu, B., Serebrenik, A., Devanbu, P.T. and Filkov, V., In CSCW 2014, ACM.
help
![Page 28: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/28.jpg)
![Page 29: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/29.jpg)
Tools and Techniques
![Page 31: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/31.jpg)
PAGE 31 08/07/14
Ordering Rajesh Sola Sola Rajesh Spelling: misspelling, diacritics, punctuation
Rene Engelhard Fene Engelhard Démurget Demurget J. A. M. Carneiro J A M Carneiro
Middle initials, patronyms, nicknames, additional surnames, incomplete names
Daniel M. Mueth Daniel Mueth Alexander Alexandrov Shopov
Alexander Shopov
Carlos Garnacho Parro Carlos Garnacho Jacob “Ulysses” Berkman
Jacob Berkman
A S Alam Amanpreet Singh Alam Name variants: transliteration, diminutives
Γιωργοσ Georgios Mike Gratton Michael Gratton
Software-specific: usernames, projects, tooling artefacts
mrhappypants Aaron Brown Arturo Tena/libole2 Arturo Tena (16:06) Alex Roberts Alex Roberts
Mix Any combination of those
![Page 32: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/32.jpg)
Jane Smith
Jane Alice Smith,
jsmith@...
J. Smith, jane.smith@...
Jane Smith, jane@...
iden[ty merging/iden[ty reconcilia[on/ developer matching
![Page 33: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/33.jpg)
<Jane, [email protected]> <Jane Smith, [email protected]>
Ø iden[cal mail addresses è the same person Ø also useful for MD5 hashes in Stack Overflow
<Jane Smith, [email protected]> <Jane Smith, [email protected]>
Ø likely to be the same person but might give false posi[ves if these are different Janes…
![Page 34: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/34.jpg)
Latent Seman[c Analysis
1 .. .. ..
1 .. .. ..
1 .. .. ..
? .. .. ..
1 .. .. ..
john
johnd
joseph
jdoe
doe
johnd@domainA: {john, johnd, joseph, doe}
<John Doe, johnd@domainA> <John Joseph Doe, johnd@domainA>
Document-‐term matrix
![Page 35: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/35.jpg)
Latent Seman[c Analysis
1 .. .. ..
1 .. .. ..
1 .. .. ..
3/4 .. .. ..
1 .. .. ..
john
johnd
joseph
jdoe
doe
johnd@domainA: {john, johnd, joseph, doe}
<John Doe, johnd@domainA> <John Joseph Doe, johnd@domainA>
Document-‐term matrix
max similarity(jdoe, {john, johnd, joseph, doe})
= similarity(jdoe, doe) = 1 – Levenshtein(jdoe, doe) /
max( length(jdoe), length(doe)) = 1 – 1/4 = 3/4
![Page 36: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/36.jpg)
Latent Seman[c Analysis
1 .. .. ..
1 .. .. ..
1 .. .. ..
3/4 .. .. ..
1 .. .. ..
john
johnd
joseph
jdoe
doe
Inverse document frequency (Singular value decomposi[on) Rank (noise) reduc[on Cosine between documents Merge similar documents
![Page 37: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/37.jpg)
h_p://uberpython.wordpress.com/2012/01/01/precision-‐recall-‐sensi[vity-‐and-‐specificity/
Which technique is be_er?
![Page 38: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/38.jpg)
h_p://uberpython.wordpress.com/2012/01/01/precision-‐recall-‐sensi[vity-‐and-‐specificity/
![Page 39: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/39.jpg)
Most contributors have only one “iden[ty”: all techniques are good enough for the average case.
Advanced techniques are be_er in presence of noise. Choose your weapons wisely!
![Page 40: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/40.jpg)
/ SET / W&I PAGE 40 08/07/14
What is your gender?
![Page 41: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/41.jpg)
![Page 42: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/42.jpg)
![Page 43: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/43.jpg)
![Page 44: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/44.jpg)
Name + Loca[on = Gender
![Page 45: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/45.jpg)
Lonzo ⇒ Alonzo w35l3y ⇒ wesley
Name + Loca[on = Gender
![Page 46: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/46.jpg)
<[tle>Ben Kamens</[tle> … <h1>We’re willing to be embarrassed about what we <em>haven’t</em> done…</h1>
Heuris8cs: [tle + first h1
Ben Kamens We’re willing to be embarrassed about what we haven’t done…
<PERSON>Ben Kamens</PERSON> We’re willing to be embarrassed about what we haven’t done…
Stanford Named En8ty Tagger
![Page 47: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/47.jpg)
Quality of gender resolu[on: Survey Self-‐iden[fica[on
As inferred Total M F ?
M 60 3 43 106 F 2 5 4 11
Self-‐iden[fica[on
As inferred Total M F ?
M 90 3 13 106 F 2 9 0 11
+ avatars, other social media sites (manually)
![Page 48: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/48.jpg)
![Page 49: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/49.jpg)
![Page 50: Sattose talk](https://reader034.fdocuments.in/reader034/viewer/2022052505/555a385dd8b42a83368b497d/html5/thumbnails/50.jpg)
SANER 2015 22nd Interna[onal Conference on So0ware Analysis, Evolu[on
and Reengineering
• Abstracts: November 7, 2015 • Papers: November 14, 2015