Dynamic Social Network Analysis (and more!) with eResearch Tools

15
Dynamic Social Network Analysis (and more!) with eResearch Tools Andrea Wiggins iSchool @ Syracuse University 21 July, 2008

description

A presentation for the OSS Watch Expert Workshop on Profiling Communities, demonstrating eResearch methodology applied to replicating research on open source software development.

Transcript of Dynamic Social Network Analysis (and more!) with eResearch Tools

Page 1: Dynamic Social Network Analysis (and more!) with eResearch Tools

Dynamic Social Network Analysis (and more!) with eResearch Tools

Andrea Wiggins

iSchool @ Syracuse University

21 July, 2008

Page 2: Dynamic Social Network Analysis (and more!) with eResearch Tools

eResearch for FLOSS

• An approach to research using cyberinfrastructure

• Collaborative and transparent, like FLOSS

• Large-scale shared data sets– FLOSSmole– Notre Dame SourceForge dumps– CVSanalY– Etc…

• Uses tools and analyses that allow sharing among researchers to support open science ideals– Taverna Workbench– MyExperiment.org

Page 3: Dynamic Social Network Analysis (and more!) with eResearch Tools

Using Taverna

• Scientific analysis workflow tool– Open source development lead by myGrid team

– Target users are UK life sciences community

• Create analysis workflows by connecting modular components through input/output ports– Produces (rigorous) analyses that are replicable, self-

documenting, and easy to share

– Components include WSDL SOAP web services, Beanshell, RShell, and local Java shims

• Collaboratively developing our workflows

Page 4: Dynamic Social Network Analysis (and more!) with eResearch Tools

Replicating FLOSS Research as eResearch

• Replicating a selection of FLOSS papers and presentations, currently in progress

• Demonstrating utility and viability of eResearch approaches for FLOSS and social science

• Building reusable, customizable analysis components specific to FLOSS research, e.g. for data selection, sociomatrix generation for SNA, etc.

• Extending the original research analysis by parameterization (inputs, thresholds) and implementing “future work” suggestions of authors (plus our own ideas, of course)

Page 5: Dynamic Social Network Analysis (and more!) with eResearch Tools

Social dynamics of FLOSS communications

• Replication of Howison, Inoue & Crowston, 2006– Compute dynamic network centrality of projects from

trackers for 120 projects

• Extension– Added exponentially-decayed edge weighting function

(needs sensitivity testing)

– Made sliding window adjustable

– Can apply to any threaded communication venue for which data is available

– Completed: all venues for 2 projects; queued: 216 projects with 635 venues!

Page 6: Dynamic Social Network Analysis (and more!) with eResearch Tools

Workflow for Dynamic SNA

Page 7: Dynamic Social Network Analysis (and more!) with eResearch Tools

Dynamic SNA Across FLOSS Communication Channels

• Clearly a lot of variation across channels (user, developer & trackers), no easily observed patterns except overall trend toward decentralization

• Implications: carefully match theoretical constructs to data sampling, as different venues are very likely to yield different results, which significantly impacts interpretations

Page 8: Dynamic Social Network Analysis (and more!) with eResearch Tools

“Do the Rich Get Richer?”

• Replication of OSCon 2004 presentation by Conklin– Demonstrate scale-free distribution of developers

among projects

• Almost there– A little more analysis to replicate

• Hoping to extend to dynamic analysis of preferential attachment– Showing change to project sizes over time

– Comparing evolution and growth across repositories

Page 9: Dynamic Social Network Analysis (and more!) with eResearch Tools

Workflow for Rich Get Richer

Page 10: Dynamic Social Network Analysis (and more!) with eResearch Tools

• Using a single FLOSSmole summary statistic• Very simple workflow, can expand analysis considerably• Analysis of over 65K projects completes in under 3 minutes!

Scale-free Developer Distribution in FLOSS

Page 11: Dynamic Social Network Analysis (and more!) with eResearch Tools

“Identifying success and tragedy of FLOSS Commons”

• Replication of English & Schweik, 2007– Classification of project success by stage of growth for

110K projects as of August 2006

– Requires data from 2 repositories, FLOSSmole & ND

• Extension– Parameterized all thresholds, makes sensitivity

analysis possible

– Added 2 additional options for a criterion test, one suggested by authors in article

– Limitation: slightly less available data in FLOSSmole, 94K projects as of April 2005

Page 12: Dynamic Social Network Analysis (and more!) with eResearch Tools

Workflow for Success-Abandonment Classification

Page 13: Dynamic Social Network Analysis (and more!) with eResearch Tools

Classifying FLOSS Projects

• Very complex data requirements; meshing across repositories

• Difficult to scale and resource intensive

• Already using this workflow for project sampling

• For small (non-random) sample of 54 projects:– 64% growth, 17% initiation,

19% null (i.e. missing data)• Indeterminate Growth:

18.9%

• Success Growth: 39.6%

• Tragedy Growth: 7.5%

• Other: 34%

amsn,downloaded,growth,enough.releases,active,ok.release.rate,true,SGanjuta,downloaded,growth,enough.releases,active,ok.release.rate,false,SGanon,downloaded,growth,enough.releases,inactive,fast.release.rate,false,TGetc…

Page 14: Dynamic Social Network Analysis (and more!) with eResearch Tools

Future Directions

• Replication of “Evolution & Growth in Large Libre Software Projects” by Robles et al., 2005

• Prototyping OWL ontology of FLOSS communication data, already in use with RDF & SPARQL

• Cross-linking data, analyses, and papers• Increasing scale of analyses to thousands of projects• Extending analyses, sensitivity testing to strengthen

findings• Building reusable analysis components to share,

enabling cumulative research

Page 15: Dynamic Social Network Analysis (and more!) with eResearch Tools

Thanks!

• More at floss.syr.edu/publications/