Sattose talk

50
Human Aspects of So0ware Engineering Alexander Serebrenik Eindhoven University of Technology, NL

description

Software engineering is inherently a collaborative venture, involving many stakeholders that coordinate their efforts to produce large software systems. While importance of human aspects in software engineering has been recognised already in the 1970s, emergence of open source software (late 1990s) and platforms such as Stack Overflow and GitHub (late 2000s) enabled application of empirical methods to study of human aspects of software engineering. In the first part of the talk we present a selection of recent results pertaining to two main questions: who are the software developers and in what kind of activities they engage. The second part of the talk focuses on tools and techniques that have been used to obtain the aforementioned results.

Transcript of Sattose talk

Page 1: Sattose talk

Human  Aspects  of  So0ware  Engineering  

Alexander  Serebrenik  Eindhoven  University  of  

Technology,  NL  

Page 2: Sattose talk
Page 3: Sattose talk

1970s  

…  

1998  2008  

2014  

SocialCom,  SocInfo,  SBP,  CHASE  

MSR  

books,  conferences  

Page 4: Sattose talk

MS  Windows  post-­‐release  fault  predic5on  

Precision  predicted  &  correct  /  predicted  

Recall  predicted  &  correct  /  correct    

Code  churn   78.6%   79.9%  

Code  complexity  

79.3%   66.0%  

Code  coverage   83.8%   54.5%  

Code  dependencies  

74.4%   69.9%  

Organiza5onal  structure  

86.2%   84.0%  

Socio-­‐technical  network  

76.9%   70.5%  

Page 5: Sattose talk
Page 6: Sattose talk
Page 7: Sattose talk
Page 8: Sattose talk

>  90%  in  WordPress  &  Drupal  >  95%  in  FLOSS  surveys  >  87%  in  GNOME  >  70%  in  soBware-­‐related  jobs  (NSF)  

MEN  

Page 9: Sattose talk

FLOSS  2013  

YOUNG(?)  

median  

Is  Programming  Knowledge  Related  to  Age?  An  Explora8on  of  Stack  Overflow.  Morrison,  P.,  Murphy-­‐Hill,  E.  MSR  2013.  FLOSS  2013:  A  survey  dataset  about  free  soPware  contributors:  challenges  for  cura8ng,  sharing  and  combining.    Robles,  G.,  Arjona-­‐Reina,  L.,  Vasilescu,  B.,  Serebrenik,  A.,  Gonzalez-­‐Barahona,  J.M.  MSR  2014  

Page 10: Sattose talk

FLOSS  2013  

Page 11: Sattose talk

FLOSS  2013  

Page 12: Sattose talk

FLOSS  2013  

Europe,US,CA,AUBrazil/Argen5na  

Page 13: Sattose talk
Page 14: Sattose talk
Page 15: Sattose talk

PAGE 15 08/07/14

.cpp .po

.jpg

/test/

/library/ .doc

makefile .sql .conf

Page 16: Sattose talk

Occasional contributors

Frequent contributors

On the variation and specialization of workload---A case study of the Gnome ecosystem community Vasilescu, B., Serebrenik, A., Goeminne, M., Mens, T. Empirical Sw Engg

Page 17: Sattose talk

For  which  of  those  ac[vi[es  would    Joe  Average    be  more  ac[ve  than    

Jane  Average?    

For  which  ac[vi[es  no  differences  would  be  observed?    

Page 18: Sattose talk

PAGE 18 08/07/14

.cpp .po

.jpg

/test/

/library/ .doc

makefile .sql .conf

Page 19: Sattose talk

PAGE 19 08/07/14

.cpp .po

.jpg

/test/

/library/ .doc

makefile .sql .conf

Page 20: Sattose talk

PAGE 20 08/07/14

.cpp .po

.jpg

/test/

/library/ .doc

makefile .sql .conf

No differences in preferences for activities, differences in the amount of commits

Page 21: Sattose talk

sample

Engage for longer

Ask more questions

No diff in #answers

Women can contribute to SO

but choose not to!

Gender, representation and online participation: A quantitative study, Vasilescu, B., Capiluppi, A., and Serebrenik, A., Interacting with Computers

Page 22: Sattose talk

sample

No significant differences in #questions, #answers, length of engagement

Disengagement of women is activity/platform specific!

Page 23: Sattose talk
Page 24: Sattose talk

•  More  ac[ve  commi_ers  ask  more  on  SO  •  More  ac[ve  askers  commit  more  on  GitHub  

StackOverflow  and  GitHub:  Associa8ons  between  soPware  development  and  crowdsourced  knowledge,  Vasilescu,  B.,  Filkov,  V.  and  Serebrenik,  A.,  In  Social  Compu8ng,  2013,  IEEE.    

Page 25: Sattose talk

StackOverflow  and  GitHub:  Associa8ons  between  soPware  development  and  crowdsourced  knowledge,  Vasilescu,  B.,  Filkov,  V.  and  Serebrenik,  A.,  In  Social  Compu8ng,  2013,  IEEE.    

Page 26: Sattose talk

•  Contributors  to  both  communi[es  (more  likely  developers  than  non-­‐developers)  are  more  ac[ve  than  those  who  focus  on  just  one.  

How  Social  Q&A  sites  are  changing  knowledge  sharing  in  open  source  soPware  communi8es,    Vasilescu,  B.,  Serebrenik,  A.,  Devanbu,  P.T.  and  Filkov,  V.,  In  CSCW  2014,  ACM.    

help  

Page 27: Sattose talk

How  Social  Q&A  sites  are  changing  knowledge  sharing  in  open  source  soPware  communi8es,    Vasilescu,  B.,  Serebrenik,  A.,  Devanbu,  P.T.  and  Filkov,  V.,  In  CSCW  2014,  ACM.    

help  

Page 28: Sattose talk
Page 29: Sattose talk

Tools  and  Techniques  

Page 30: Sattose talk

PAGE 30 08/07/14

Euphegenia Doubtfire, [email protected]

Robin Williams, [email protected]

Page 31: Sattose talk

PAGE 31 08/07/14

Ordering Rajesh Sola Sola Rajesh Spelling: misspelling, diacritics, punctuation

Rene Engelhard Fene Engelhard Démurget Demurget J. A. M. Carneiro J A M Carneiro

Middle initials, patronyms, nicknames, additional surnames, incomplete names

Daniel M. Mueth Daniel Mueth Alexander Alexandrov Shopov

Alexander Shopov

Carlos Garnacho Parro Carlos Garnacho Jacob “Ulysses” Berkman

Jacob Berkman

A S Alam Amanpreet Singh Alam Name variants: transliteration, diminutives

Γιωργοσ Georgios Mike Gratton Michael Gratton

Software-specific: usernames, projects, tooling artefacts

mrhappypants Aaron Brown Arturo Tena/libole2 Arturo Tena (16:06) Alex Roberts Alex Roberts

Mix Any combination of those

Page 32: Sattose talk

Jane  Smith  

Jane  Alice  Smith,  

jsmith@...  

J.  Smith,  jane.smith@...  

Jane  Smith,  jane@...  

iden[ty  merging/iden[ty  reconcilia[on/  developer  matching  

Page 33: Sattose talk

<Jane,  [email protected]>    <Jane  Smith,  [email protected]>    

Ø iden[cal  mail  addresses  è  the  same  person  Ø also  useful  for  MD5  hashes  in  Stack  Overflow    

<Jane  Smith,  [email protected]>  <Jane  Smith,  [email protected]>  

Ø likely  to  be  the  same  person  but  might  give  false  posi[ves  if  these  are  different  Janes…  

Page 34: Sattose talk

Latent  Seman[c  Analysis  

1   ..   ..   ..  

1   ..   ..   ..  

1   ..   ..   ..  

?   ..   ..   ..  

1   ..   ..   ..  

john  

johnd  

joseph  

jdoe  

doe  

johnd@domainA:      {john,  johnd,      joseph,  doe}  

<John  Doe,                johnd@domainA>  <John  Joseph  Doe,  johnd@domainA>  

Document-­‐term  matrix  

Page 35: Sattose talk

Latent  Seman[c  Analysis  

1   ..   ..   ..  

1   ..   ..   ..  

1   ..   ..   ..  

3/4   ..   ..   ..  

1   ..   ..   ..  

john  

johnd  

joseph  

jdoe  

doe  

johnd@domainA:      {john,  johnd,      joseph,  doe}  

<John  Doe,                johnd@domainA>  <John  Joseph  Doe,  johnd@domainA>  

Document-­‐term  matrix  

max  similarity(jdoe,      {john,  johnd,  joseph,  doe})    

=  similarity(jdoe,  doe)    =  1  –  Levenshtein(jdoe,  doe)  /    

 max(  length(jdoe),  length(doe))  =  1  –  1/4  =  3/4  

Page 36: Sattose talk

Latent  Seman[c  Analysis  

1   ..   ..   ..  

1   ..   ..   ..  

1   ..   ..   ..  

3/4   ..   ..   ..  

1   ..   ..   ..  

john  

johnd  

joseph  

jdoe  

doe  

Inverse  document  frequency    (Singular  value  decomposi[on)    Rank  (noise)  reduc[on    Cosine  between  documents    Merge  similar  documents  

Page 37: Sattose talk

h_p://uberpython.wordpress.com/2012/01/01/precision-­‐recall-­‐sensi[vity-­‐and-­‐specificity/  

Which  technique  is  be_er?  

Page 38: Sattose talk

h_p://uberpython.wordpress.com/2012/01/01/precision-­‐recall-­‐sensi[vity-­‐and-­‐specificity/  

Page 39: Sattose talk

Most  contributors  have  only  one  “iden[ty”:  all  techniques  are  good  enough  for  the  average  case.  

Advanced  techniques  are  be_er  in  presence  of  noise.  Choose  your  weapons  wisely!  

Page 40: Sattose talk

/  SET  /  W&I   PAGE  40   08/07/14  

What  is  your  gender?  

Page 41: Sattose talk
Page 42: Sattose talk
Page 43: Sattose talk
Page 44: Sattose talk

Name  +  Loca[on  =  Gender  

Page 45: Sattose talk

Lonzo  ⇒  Alonzo    w35l3y  ⇒  wesley    

Name  +  Loca[on  =  Gender  

Page 46: Sattose talk

<[tle>Ben  Kamens</[tle>  …  <h1>We&#8217;re  willing  to  be  embarrassed  about  what  we  <em>haven&#8217;t</em>  done&#8230;</h1>  

Heuris8cs:    [tle  +  first  h1  

Ben  Kamens  We’re  willing  to  be  embarrassed  about  what  we  haven’t  done…  

<PERSON>Ben  Kamens</PERSON>  We’re  willing  to  be  embarrassed  about  what  we  haven’t  done…  

Stanford  Named  En8ty  Tagger  

Page 47: Sattose talk

Quality  of  gender  resolu[on:  Survey  Self-­‐iden[fica[on  

As  inferred     Total  M   F   ?  

M   60   3   43   106  F   2   5   4   11  

Self-­‐iden[fica[on  

As  inferred     Total  M   F   ?  

M   90   3   13   106  F   2   9   0   11  

+  avatars,  other  social  media  sites  (manually)    

Page 48: Sattose talk
Page 49: Sattose talk
Page 50: Sattose talk

SANER  2015  22nd  Interna[onal  Conference  on  So0ware  Analysis,  Evolu[on  

and  Reengineering    

•  Abstracts:  November  7,  2015  •  Papers:  November  14,  2015