Software Carpentry for the Geophysical Sciences

56
ESIP Winter Meeting, January 2014 Software Carpentry for the Geophysical Sciences Aron Ahmadia Software Carpentry and US Army ERDC-CHL

Transcript of Software Carpentry for the Geophysical Sciences

Page 1: Software Carpentry for the Geophysical Sciences

ESIP Winter Meeting, January 2014 !

Software Carpentry for the Geophysical Sciences

!Aron Ahmadia

Software Carpentry and US Army ERDC-CHL

Page 2: Software Carpentry for the Geophysical Sciences

Dark Matter, Public Health,

and Scientific Computing

!Greg Wilson

This Talk is a Fork http://www.slideshare.net/gvwilson/dark-matter-public-

health-and-scientific-computing

Page 3: Software Carpentry for the Geophysical Sciences

ESIP Winter Meeting, January 2014 !

Software Carpentry for the Geophysical Sciences

!Aron Ahmadia

Software Carpentry and US Army ERDC-CHL

Page 4: Software Carpentry for the Geophysical Sciences

Computation in the Geophysical Sciences !

Aron Ahmadia Coastal and Hydraulics Lab, US Army ERDC

Page 5: Software Carpentry for the Geophysical Sciences

!5http://farm8.staticflickr.com/7407/11340654005_16dd86b6ab_o.png

Page 6: Software Carpentry for the Geophysical Sciences

This talk is not about GIS

!6

I still brought some maps to this talk for later…

Page 7: Software Carpentry for the Geophysical Sciences

Computational Modeling within ERDC

!7

Page 8: Software Carpentry for the Geophysical Sciences

Computational Modeling within ERDC

!8

Page 9: Software Carpentry for the Geophysical Sciences

Computational Modeling within ERDC

!9

Page 10: Software Carpentry for the Geophysical Sciences

!10

This talk is not about computational modeling

Page 11: Software Carpentry for the Geophysical Sciences

!11

This talk is about geoscientists…constructing software

Page 12: Software Carpentry for the Geophysical Sciences

!12

“Dark Matter Developers”

Scott Hanselman (March 2012) ![We] hypothesize that there is another kind of developer than the ones we meet all the time. We call them Dark Matter Developers. They don't read a lot of blogs, they never write blogs, they don't go to user groups, they don't tweet or facebook, and you don't often see them at large conferences... [They] aren't chasing the latest beta or pushing...limits, they are just producing. !!http://www.hanselman.com/blog/DarkMatterDevelopersTheUnseen99.aspx

Page 13: Software Carpentry for the Geophysical Sciences

!13

We’re Not That Optimistic

l 90% of scientists have their heads down − Producing science and solving problems,

not worrying about the latest in semantic interoperability or standardized metadata

l Not because they don't want to do all that Star Trek stuff

l But because the majority of the field isn’t there yet.

Page 14: Software Carpentry for the Geophysical Sciences

!14

Shiny Toys

Page 15: Software Carpentry for the Geophysical Sciences

!15

Grimy Reality

Page 16: Software Carpentry for the Geophysical Sciences

!16

So Here We Are

Us Them

Page 17: Software Carpentry for the Geophysical Sciences

!17

Surely You're Exaggerating

1. How many graduate students write shell scripts to analyze each new data set instead of running those analyses by hand?

Page 18: Software Carpentry for the Geophysical Sciences

!18

Surely You're Exaggerating

2. How many of them use version control to keep track of their work and collaborate with colleagues?

Page 19: Software Carpentry for the Geophysical Sciences

!19

Surely You're Exaggerating

3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable?

Page 20: Software Carpentry for the Geophysical Sciences

!20

Surely You're Exaggerating

3. How many routinely break large problems into pieces small enough to be - comprehensible, - testable, and - reusable? And how many know those are the same things?

Page 21: Software Carpentry for the Geophysical Sciences

!21

We've left the majority behind

Page 22: Software Carpentry for the Geophysical Sciences

!22

Other than Googling for things, the majority of scientists do not use computers more effectively today than they did 28 years ago.

We've left the majority behind

Page 23: Software Carpentry for the Geophysical Sciences

!23

“Not My Problem”Actually your biggest problem

How can we do better?

Applications Yesterday’s models and tools are solving today’s million dollar problems, because nobody knows how to update them.

Collaboration Hundreds of person-years of effort are redundantly combining to create a year’s worth of result

Credibility Many published results can’t be trusted, because the scientific code isn’t modular, can’t be tested, and sometimes isn’t even released

Page 24: Software Carpentry for the Geophysical Sciences

Your code will outlive your supercomputer. Most

successful supercomputers lasts 5 years. The most

successful codes may reach 5 decades. There are currently 40 year-old modeling codes

still being used.

!24

Jeff Hammond Argonne Leadership

Computing Facility

Page 25: Software Carpentry for the Geophysical Sciences

!25

Where Are Your Goalposts?An Earth Scientist is computationally competent if she can build, use, validate, and share software to:

• Manage software and data

• Tell if it's working correctly

• Find and fix problems when it isn’t

• Keep track of what she’s done

• Share work with others

• Do all of these things efficiently

Page 26: Software Carpentry for the Geophysical Sciences

!26

A Driver's License Exam

l Developed with the Software Sustainability Institute for the DiRAC Consortium

l Formative assessment − Do you know what you need to know in order to

get the most out of this gear? l Pencils ready?

Note: actual exam allows for several different programming languages, version control systems, etc.

Page 27: Software Carpentry for the Geophysical Sciences

!27

A Driver's License Exam

1. Check out a working copy of the exam 2. materials from a git repository. 3. 4. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 28: Software Carpentry for the Geophysical Sciences

!28

A Driver's License Exam

1. Use find and grep in a pipe to create a 2. list of all .dat files in the working copy, 3. redirect the output to a file, and add that 4. file to the repository. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 29: Software Carpentry for the Geophysical Sciences

!29

A Driver's License Exam

1. Write a shell script that runs a legacy 2. program for each parameter in a set. 3. 4. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 30: Software Carpentry for the Geophysical Sciences

!30

A Driver's License Exam

1. Edit the Makefile provided so that if any 2. .dat file changes, analyze.py is re-run to 3. create the corresponding .out file. 4. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 31: Software Carpentry for the Geophysical Sciences

!31

A Driver's License Exam

1. Write four unit tests to exercise a function 2. that calculates running sums. Explain why 3. your four tests are most likely to uncover 4. bugs in the function. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 32: Software Carpentry for the Geophysical Sciences

!32

A Driver's License Exam

1. Explain when and how a function could 2. pass your tests, but still fail on real data. 3. 4. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 33: Software Carpentry for the Geophysical Sciences

!33

A Driver's License Exam

1. Do a code review of the legacy program 2. from Q3 (about 50 lines long) and list 3. the four most important improvements 4. you would make. 5. 6. 7.

Could do it easily 1

Could struggle through 0.5

Wouldn’t know where to start 0

Don’t understand the question -1

Page 34: Software Carpentry for the Geophysical Sciences

!34

How Are We Doing?

A. How well did you do?

Page 35: Software Carpentry for the Geophysical Sciences

!35

How Are We Doing?

A. How well did you do?

B. How well do you think most earth scientists would do?

Page 36: Software Carpentry for the Geophysical Sciences

!36

How Are We Doing?

A. How well did you do?

B. How well do you think most earth scientists would do?

C. Do you think an earth scientist could use your

software without having these skills?

Page 37: Software Carpentry for the Geophysical Sciences

!37

How Are We Doing?

A. How well did you do?

B. How well do you think most earth scientists would do?

C. Do you think an earth scientist could use your

software without having these skills?

D. Or a grasp of the principles behind them?

Page 38: Software Carpentry for the Geophysical Sciences

!38

So Why Is This Your Problem?

If you're only helping the (small) minority lucky enough to have acquired the base skills that use of your tools depends on... !!!!!then your potential user base is many times smaller than it could be.

Page 39: Software Carpentry for the Geophysical Sciences

!39

It Is Therefore Obvious That...

l Put more computing courses in the curriculum! − Except the curriculum is already full

Page 40: Software Carpentry for the Geophysical Sciences

!40

It Is Therefore Obvious That...

l Put a little computing in every course! − Still adds up: 5 minutes/lecture = 4 courses/degree − First thing cut when running late − The blind leading the blind

Page 41: Software Carpentry for the Geophysical Sciences

!41

What Has Worked

Target graduate students Intensive short courses Focus on practical skills

Page 42: Software Carpentry for the Geophysical Sciences

!42

What Has Worked

http://software-carpentry.org

Page 43: Software Carpentry for the Geophysical Sciences

!43

Page 44: Software Carpentry for the Geophysical Sciences

!44

What We Teach (Core Materials)l Unix shell

automate repetitive tasks l Git and GitHub

manage, collaborate, and share work with distributed version control systems

l Python or R grow scripts, programs, and libraries in structured,

modular, testable, and reusable ways l Databases

distinguish between structured and unstructured data, and know when to use each

Page 45: Software Carpentry for the Geophysical Sciences

How We Teach

• Openly licensed content • Managed as a public project on GitHub • IPython Notebooks • RStudio and RMarkdown • Continuum Anaconda and Enthought Canopy • Practical, exercise-driven lessons and lectures • Teams of volunteer instructors

!45

Page 46: Software Carpentry for the Geophysical Sciences

!46

Page 47: Software Carpentry for the Geophysical Sciences

!47

“That's Merely Useful”

l None of this is publishable any longer − Which means it's ineffective career-wise for

academic computer scientists

l But it works − Two independent assessments have

validated the impact of what we do

Page 48: Software Carpentry for the Geophysical Sciences

!48

Software Carpentry is International

Page 49: Software Carpentry for the Geophysical Sciences

Get Involved!

!49

Page 50: Software Carpentry for the Geophysical Sciences

!50

Shine Some Light

l Include a few lines about your software stack and working practices in your scientific papers

l Ask about them when reviewing − Where's the repository containing the code? − What's the coverage of your unit tests? − How did you track provenance?

Page 51: Software Carpentry for the Geophysical Sciences

Attend a Workshop

!51

Page 52: Software Carpentry for the Geophysical Sciences

Host a Workshop

!52

??

???

Page 53: Software Carpentry for the Geophysical Sciences

Host a Workshop

!53

?? ? ?

??

Page 54: Software Carpentry for the Geophysical Sciences

!54

Teach a Workshop

l All our materials are CC-BY licensed l We will help train you

???

Page 55: Software Carpentry for the Geophysical Sciences

!55

Earth Science Workshops

l We are developing geophysical and environmental sciences course material that extends the Software Carpentry training material

l Randal Olson and Aron Ahmadia will be teaching Software Carpentry to Earth Scientists at MIT and ERDC this month

l Are you interested in hosting? Teaching? Contributing material? Get in touch!

Page 56: Software Carpentry for the Geophysical Sciences

!56

Thanks!

Web: http://aron.ahmadia.net Email: [email protected] Twitter: @ahmadia !Software Carpentry: http://software-carpentry.org !US Army Engineer Research and Development Center: http://www.erdc.usace.army.mil/ !ERDC Proteus (high resolution models): https://github.com/erdc-cm/proteus