Building communities around open-source scientific software

11
Building communities around open- source scientific software Karen Cranston National Evolutionary Synthesis Center (NESCent) @kcranstn http://www.slideshare.net/kcranstn

description

A talk given at the RTP 180 Open Source All of the Things meeting: https://trianglewiki.org/RTP_180%3A_Open_Source_All_things

Transcript of Building communities around open-source scientific software

Page 1: Building communities around open-source scientific software

Building communities around open-source scientific software

Karen CranstonNational Evolutionary Synthesis Center (NESCent)

@kcranstnhttp://www.slideshare.net/kcranstn

Page 2: Building communities around open-source scientific software

NESCentNational Evolutionary Synthesis Center

www.nescent.org

fieldworklabwork

method development

meta-analysisdata synthesis

Page 3: Building communities around open-source scientific software
Page 4: Building communities around open-source scientific software

Species A (mm^2) F (mm^2/mm^2)

N (mm^-2) S (mm^4)

Abelia bifloraAbelia dielsiiAbelia integrifoliaAbelia mosanensisAbelia serrataAbelia spathulataAbutilon fruticosumAbutilon pannosumAcacia albidaAcacia ataxacanthaAcacia borleaeAcacia burkeiAcacia caffra

0.002375829 0.924197654 389.0 6.11E-060.00115375 0.357418211 331.0 3.49E-06

0.001134115 0.240432369 212.0 5.35E-060.000855299 0.632065665 739.0 1.16E-060.000706858 0.206402637 292.0 2.42E-060.000804248 0.230819095 287.0 2.80E-060.001452201 0.137959114 95.0 1.53E-050.003117245 0.124689812 40.0 7.79E-050.012271846 0.049087385 4.0 0.0030679620.013069811 0.169907541 13.0 0.001005370.004071504 0.061072561 15.0 0.0002714340.008992024 0.053952141 6.0 0.0014986710.010207035 0.214347725 21.0 0.000486049

+

trait data about species evolutionary trees

Page 5: Building communities around open-source scientific software

Outcomes: Community

Brian O'Meara, Michael Alfaro, Charles Bell, Ben Bolker, Marguerite Butler, Peter Cowan, Damien de Vienne, Richard Desper, Joe Felsenstein, Luke Harmon, Christoph Heibl, Andrew Hipp, Gene Hunt, Thibaut Jombart, Steve Kembel, Hilmar Lapp, Scott Loarie, Wayne Maddison, Peter Midford, David Orme, Emmanuel Paradis, Sam Price, Dan Rabosky, Brian Sidlauskas, Stacey Smith, Dave Swofford, Todd Vision, Peter Waddell, Amy Zanne, Derrick Zwickl [bold indicates organizer]

Comparative methods in hackathon

Rationale Work at hackathon (Dec. 10-14, 2007)The R statistical analysis package has emerged as a popular platform for implementation of powerful comparative phylogenetic methods to understand the evolution of organismal traits and diversification. It includes methods such as independent contrasts, ancestral state estimation, various models of continuous and discrete trait evolution, lineage through time plots, diversification tests, generalized estimating equations, tree plotting, and more. This event was designed to bring together active R developers as well as end-users working on the integration of comparative phylogenetic methods within R to actively address issues of data exchange standards, code interoperability, usability, documentation quality, and the breadth of functionality for comparative methods available within R. The idea originated from a whitepaper submitted by NESCent postdocs Amy Zanne and Sam Price.

•30 developers and users worked on programming & writing documentation•Split into subgroups on diversification, divergence times, documentation, class design, Mesquite-R interaction, input/output, and trait evolution•Package source code stored on shared repository hosted at R-forge (“PhyloConductor”)

Hackathon participants (red were flown to NESCent, purple participated remotely). Map from Google Maps

•Designed and began implementing a new S4 class for data and trees•Ran “bootcamps” for developers on numerical optimization and S4 coding•Used the Nexus Class Library (Lewis & Holder) and RCpp (Samperi) for reading and interpreting Nexus tree and data files•Began work on R tutorials•Tested existing methods in R, identifying errors•Developed ways for R to call Mesquite and Mesquite to call R

0

150

300

12/10 12/11 12/12 12/13 12/14 12/15

Commits

•R-Phylo Wiki (http://www.r-phylo.org): Tutorials and overview of available analyses and packages from the hackathon have been placed on a public website for all to use and improve. It’s had >7,000 page visits from >30 countries and >600 edits since it went live in March 2008.•R-sig-phylo mailing list (https://stat.ethz.ch/mailman/listinfo/r-sig-phylo): A mailing list for users of R for comparative methods and phylogenetics. Over 100 messages in its first four months.•Comparative methods in R user tutorials planned for 2009 Society for Integrative and Comparative Biology and Evolution meetings.•Addition of R track to NESCent summer course in phyloinformatics, featuring software developed at hackathon and taught by hackathon participant Marguerite Butler.•Proposal to NSF for summer course in R for phyloinformatics.•Ongoing collaborations between hackathon participants.•Two Google Summer of Code projects to sponsor student developers:

•Peter Cowan: Tree and data plotting in the phylobase project (see right)•Matthew Helmus: Enhancing the representation of ecophylogenetic tools in R in the picante project

NESCent informatics

Incompatible tree formats are used in different R packages

Package Function

geiger1.0-9.1 sim.char

ouch1.2-4 brown.dev

picante evolve.brownian

ape2.01 evolve.phylo

Redundancy (at least four functions to evolve traits up the tree using simple Brownian motion)

Can be intimidating to beginners

Coding at hackathon

The US National Evolutionary Synthesis Center (http://www.nescent.org) encourages synthetic, interdisciplinary, and transformative research in evolutionary biology. NESCent, a collaborative effort of Duke, NC State University and UNC Chapel Hill, is located in Durham NC and is supported by the National Science Foundation (EF-0423641).A major goal of NESCent's Informatics branch is to promote community-driven, collaborative open-source software development. This is achieved through hackathons, internships (such as the Google Summer of Code), summer courses, conference workshops, and by

externally funded collaborations for the development and support of important cyberinfrastructure

resources. NESCent accepts whitepapers that provide suggestions for future informatics activities from anyone at any time. See the website or the NESCent booth in the exhibit hall for more information.

Outcomes: Software•Phylobase (http://r-forge.r-project.org/projects/phylobase/): New package for phylogenetic trees and data. Can load trees and data from Nexus files, output to other tree formats, coordinate pruning of taxa from data and tree, traverse tree, handle DNA, morphological, and continuous data types. Work is ongoing (below) to enhance tree plotting and other functions. As with all hackathon products, new developers are welcome to join to further improve the code (one already has).

URL: http://hackathon.nescent.org/R_Hackathon_1 email: [email protected]

0

50

100

150

200

12/16 12/30 1/13 1/27 2/10 2/24 3/9 3/23 4/6 4/20 5/4 5/18 6/1

Commits

•Movement of existing packages to source code repositories allowing more collaborative development (i.e., Picante package has new Google Summer of Code 2008 developer Matthew Helmus)•R-Mesquite interaction: Code written to allow Mesquite (Maddison & Maddison, 2007) to call R packages (such as OUCH (Butler & King 2004) and APE (Paradis et al. 2004)), and for R to call headless Mesquite, although easier installation needs to be created.•Continuing improvement and release of packages by hackathon participants (GEIGER, LASER, ape).•See http://hackathon.nescent.org/R_Hackathon_1 for more info.

Coding for PhyloBase

Nat

ure

Prec

edin

gs :

doi:1

0.10

38/n

pre.

2008

.212

6.1

: Pos

ted

28 J

ul 2

008

O’Meara et al. Nature Preceedings. 2008 http://dx.doi.org/10.1038/npre.2008.2126.1

Page 6: Building communities around open-source scientific software

R-sig-phylo mailing list

Page 7: Building communities around open-source scientific software

32 R packages for comparative biology; maintained by a hackathon participant

Page 8: Building communities around open-source scientific software

Informatics team

Evolutionary biologists

computational skills

domain knowledge

NESCentNational Evolutionary Synthesis Center

www.nescent.orgwww.nescent.org

Page 9: Building communities around open-source scientific software

short bootcamps teaching computational skills to domain scientists

bringing students into open-source programming communities

Page 10: Building communities around open-source scientific software
Page 11: Building communities around open-source scientific software

A grassroots approach to software sustainability. Karen Cranston, Todd Vision, Brian O'Meara, Hilmar Lapp. http://dx.doi.org/10.6084/m9.figshare.790739