Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow 2013-08-20.

26
Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow 2013-08-20

Transcript of Open sharing and maintenance of scientific code Jordan S Read; Luke A Winslow 2013-08-20.

Open sharing and maintenance of scientific code

Jordan S Read; Luke A Winslow2013-08-20

Background

• Who I am– USGS-CIDA– 2012 PhD in physical

limnology (UW-Madison)– Civil Engineer

• My experience with code and model development– Lake Analyzer– CLM– rGDP; rGLM– Numerous collaborations

Background

My philosophy on science code:“Code created for the pursuit of science questions should be open, accessible, and designed to enable others to build from”

• Kind of like your scientific publications, right?• That means I shouldn’t be able to build my scientific

livelihood around a piece of “black-box” code

Background

• My responsibility as a member of the science community:

“Methods used to obtain published results should be clear, transparent and

repeatable”

• My responsibility as a federal employee:“Provide public access to all elements of publicly funded research”

Road map

Part I• My experiences with

science code development

• Motivation to open up your scientific code

Part II• Maintaining and

modifying code• Code collaboration

Lake Analyzer

• GLEON background– Hanson & Hamilton collaboration and student

exchange– Physics & Climate working group

• Requirements– Easy to use– Provide access to complex physical derivatives– Handle dataset irregularities• Errors, gaps, intermittent sampling frequencies, etc.

– Rapid processing of large datasets

Lake Analyzer

• I took on the role of primary coder– Why? GLEON had paid my travel to two

meetings…including NZ!• I did the work in MATLAB, because that is

what I was most familiar with• Side project during grad school• Built from feedback from GLEON physics &

climate group

Lake Analyzer

Lake Analyzer

• Repeatable – .lke file ~ metadata

• Visualizations (plotting options for outputs)

• Easy to use

Lake Analyzer

• Software publication

Lake Analyzer

• Software publication

• Open codebase

• Software publication

• Open codebase

• Platform/language independence

Lake Analyzer

Lake Analyzer

• Software publication

• Open codebase

• Platform/language independence

• Useful and citable

19 citations in ~20 months

Opening up scientific code

• Publishing your code– Would a simple paper of physical derivations be cited

at this rate?– Would a methods paper be as popular if the code

wasn’t available/open?– Additional motivation for creation of code

• Writing open code– More use– Ease of collaboration– Integrity/transparency

Opening up scientific code

• Reasons many choose not to open code– Too much work– Code is too messy– Potential for criticism– Code as scientific livelihood– Has known errors…– Others?

Opening up scientific code

• When to put in the effort– Collaborations– When you are doing it “right”– When you will use it in the future– When you are publishing something– When you have to– Others?

Part II: Maintaining code

So…the code works, what’s next?• How do I take risks with code?– i.e., changing the way a function works– What if I make a mistake? (undo+undo+undo…?)

• How do multiple people collaborate on a single set of scripts? – In serial?– Google docs vs word for writing a paper

Maintaining code

• Risky modifications– Metabolism_modelv28.R?– Metabolism_model_NEW.R?– Metabolism_model_NEWsecondTRY.R?– Metabolism_model_NEWEST.R?

Maintaining code

• When we publish, we use track changes– Can we do the same for code?

• Version management– AKA: version control, revision control, source control– How it works– Why you should know what it means– Benefits to using version management

• Historical record of code evolution• Easy to “roll back” to previous working version• The code has only one home

Maintaining code

How it works– Creates a “life history of code”

Hey, nice sweaterThanks. I travel a

lot. Want to start a project?

Sure! I have some modeling code So do I! Let’s

combine our efforts

Maintaining code

How it works– Creates a “life history of code”

Maintaining code

1 2

Here is a new set of methods

Maintaining code

1 2 3

I made some improvements

Maintaining code

1 2 3 4

Whoops! Fixed a bug

Conclusions

• Code as if it will be seen and used by others– You may be that “other” in 3 years

• Decide if creating publicly usable code makes sense for your research

• Make your code accessible to collaborators• Consider the concepts imbedded in version

management

Jordan S ReadUSGS Center for Integrated Data Analytics608-821-3922 | [email protected]

Questions?

Thanks GLEON FP & TLS!