Dynamic Social Network Analysis (and more!) with eResearch Tools
-
Upload
andrea-wiggins -
Category
Economy & Finance
-
view
1.641 -
download
0
description
Transcript of Dynamic Social Network Analysis (and more!) with eResearch Tools
Dynamic Social Network Analysis (and more!) with eResearch Tools
Andrea Wiggins
iSchool @ Syracuse University
21 July, 2008
eResearch for FLOSS
• An approach to research using cyberinfrastructure
• Collaborative and transparent, like FLOSS
• Large-scale shared data sets– FLOSSmole– Notre Dame SourceForge dumps– CVSanalY– Etc…
• Uses tools and analyses that allow sharing among researchers to support open science ideals– Taverna Workbench– MyExperiment.org
Using Taverna
• Scientific analysis workflow tool– Open source development lead by myGrid team
– Target users are UK life sciences community
• Create analysis workflows by connecting modular components through input/output ports– Produces (rigorous) analyses that are replicable, self-
documenting, and easy to share
– Components include WSDL SOAP web services, Beanshell, RShell, and local Java shims
• Collaboratively developing our workflows
Replicating FLOSS Research as eResearch
• Replicating a selection of FLOSS papers and presentations, currently in progress
• Demonstrating utility and viability of eResearch approaches for FLOSS and social science
• Building reusable, customizable analysis components specific to FLOSS research, e.g. for data selection, sociomatrix generation for SNA, etc.
• Extending the original research analysis by parameterization (inputs, thresholds) and implementing “future work” suggestions of authors (plus our own ideas, of course)
Social dynamics of FLOSS communications
• Replication of Howison, Inoue & Crowston, 2006– Compute dynamic network centrality of projects from
trackers for 120 projects
• Extension– Added exponentially-decayed edge weighting function
(needs sensitivity testing)
– Made sliding window adjustable
– Can apply to any threaded communication venue for which data is available
– Completed: all venues for 2 projects; queued: 216 projects with 635 venues!
Workflow for Dynamic SNA
Dynamic SNA Across FLOSS Communication Channels
• Clearly a lot of variation across channels (user, developer & trackers), no easily observed patterns except overall trend toward decentralization
• Implications: carefully match theoretical constructs to data sampling, as different venues are very likely to yield different results, which significantly impacts interpretations
“Do the Rich Get Richer?”
• Replication of OSCon 2004 presentation by Conklin– Demonstrate scale-free distribution of developers
among projects
• Almost there– A little more analysis to replicate
• Hoping to extend to dynamic analysis of preferential attachment– Showing change to project sizes over time
– Comparing evolution and growth across repositories
Workflow for Rich Get Richer
• Using a single FLOSSmole summary statistic• Very simple workflow, can expand analysis considerably• Analysis of over 65K projects completes in under 3 minutes!
Scale-free Developer Distribution in FLOSS
“Identifying success and tragedy of FLOSS Commons”
• Replication of English & Schweik, 2007– Classification of project success by stage of growth for
110K projects as of August 2006
– Requires data from 2 repositories, FLOSSmole & ND
• Extension– Parameterized all thresholds, makes sensitivity
analysis possible
– Added 2 additional options for a criterion test, one suggested by authors in article
– Limitation: slightly less available data in FLOSSmole, 94K projects as of April 2005
Workflow for Success-Abandonment Classification
Classifying FLOSS Projects
• Very complex data requirements; meshing across repositories
• Difficult to scale and resource intensive
• Already using this workflow for project sampling
• For small (non-random) sample of 54 projects:– 64% growth, 17% initiation,
19% null (i.e. missing data)• Indeterminate Growth:
18.9%
• Success Growth: 39.6%
• Tragedy Growth: 7.5%
• Other: 34%
amsn,downloaded,growth,enough.releases,active,ok.release.rate,true,SGanjuta,downloaded,growth,enough.releases,active,ok.release.rate,false,SGanon,downloaded,growth,enough.releases,inactive,fast.release.rate,false,TGetc…
Future Directions
• Replication of “Evolution & Growth in Large Libre Software Projects” by Robles et al., 2005
• Prototyping OWL ontology of FLOSS communication data, already in use with RDF & SPARQL
• Cross-linking data, analyses, and papers• Increasing scale of analyses to thousands of projects• Extending analyses, sensitivity testing to strengthen
findings• Building reusable analysis components to share,
enabling cumulative research
Thanks!
• More at floss.syr.edu/publications/