Data Management: Tips & Tools

Post on 18-Aug-2015

45 views 0 download

Tags:

Transcript of Data Management: Tips & Tools

Data Management

Stephanie WrightUniversity of Washingtonswright@uw.edu

SPATIAL / IsoCampJune 2015

Tips & Tools

Who Am I?

• Computing Trainer• Cruise Ship Lecturer (Love Boat)• Library Merger Manager• Atmospheric Sciences Librarian• Assessment Librarian• Data Services Coordinator

HTTP://GUIDES.LIB.WASHINGTON.EDU/SWRIGHT

Disclaimer I am not a scientist I am a librarian …

Disclaimer I am not a scientist More like this…

What Do I Do?

• Data Management Plans (DMPs)• Courses• Consultations• Research Projects• DataONE, RDA, eScience Institute• Institutional Data Repository (DRUW)

Why?

THEN NOW

THEN

NOW

THEN NOW

A Real Life Example

Many tables

my spreadsheet

No headings

Embedded figures

my spreadsheet

my spreadsheet

my spreadsheet

?

One More Example

https://www.youtube.com/watch?v=66oNv_DJuPc

Data Sharing and Management Snafu in 3 Short Acts 

Why Does It Matter?

From Flickr by tomhilton

HTTP://WWW.SPARC.ARL.ORG/ISSUES/OPEN-DATA/DATA-SHARING-INITIATIVE/POLICIES

… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”

“The best thing to do with your data will be thought of by someone else.”

“We need open data because we don’t just want to use a car we want to poke around in the engine, see how it works and then rebuild it.”

~ Rufus PollockFounder and President of Open Knowledge Foundation (www.okfn.org)

From Flickr by cogdog

WICHERTS JM, BAKKER M, MOLENAAR D (2011) WILLINGNESS TO SHARE RESEARCH DATA IS RELATED TO THE STRENGTH OF THE EVIDENCE AND THE QUALITY OF REPORTING OF STATISTICAL RESULTS. PLOS ONE 6(11): E26828. DOI:10.1371/JOURNAL.PONE.0026828

HTTP://127.0.0.1:8081/PLOSONE/ARTICLE?ID=INFO:DOI/10.1371/JOURNAL.PONE.0026828

How To Do It?

Data planning is more efficient than data forensics.

DATA MANAGEMENT PLANNING•What will be collected•Methods•Standards•Sharing/access•Long-term storage

COLLECTING •Keep raw data raw• Use scripts to process data

ORGANIZING• Machine readable• Human readable• Works well with default ordering

AVOID• spaces• punctuation• special characters• case sensitivity

20130503_DOEProject_DesignDocument_Smith_v2-01.docx20130709_DOEProject_MasterData_Jones_v1-00.xlsx20130825_DOEProject_Ex1Test1_Data_Gonzalez_v3-03.xlsx20130825_DOEProject_Ex1Test1_Documentation_Gonzalez_v3-03.xlsx20131002_DOEProject_Ex1Test2_Data_Gonzalez_v1-01.xlsx20141023_DOEProject_ProjectMeetingNotes_Kramer_v1-00.docx

Eaffinis_nanaimo_2010_counts.xls

Site name

YearWhat was measured

Study organis

m

YYYYMMDD

NOBLE, WILLIAM S. (2009) "A QUICK GUIDE TO ORGANIZING COMPUTATIONAL BIOLOGY PROJECTS." PLOS COMPUTATIONAL BIOLOGY. 5(7): DOI/10.1371/JOURNAL.PCBI.1000424

• Pick a method that works for you and stick to it• DOCUMENT IT!

METADATA•Who?•What?•Where?•When?•How?•Why?

Digital context

• Name of the data set

• The name(s) of the data file(s) in the data set

• Date the data set was last modified

• Example data file records for each data type file

• Pertinent companion files

• List of related or ancillary data sets

• Software (including version number) used to prepare/read the data set

• Data processing that was performed

Personnel & stakeholders

• Who collected

• Who to contact with questions

• Funders

Scientific context

• Scientific reason why the data were collected

• What data were collected

• What instruments (including model & serial number) were used

• Environmental conditions during collection

• Temporal & spatial resolution

• Standards or calibrations used

Information about parameters

• How each was measured or produced

• Units of measure

• Format used in the data set

• Precision & accuracy if known

Information about data

• Definitions of codes used

• Quality assurance & control measures

• Known problems that limit data use (e.g. uncertainty, sampling problems)

Temperature data

Salinity data

Data import into Excel

Analysis: mean, SD

Graph production

Quality control & data cleaning“Clean”

T & S data

Summary

statistics

Data in spread-sheet

Simple: Flow chart

WORKFLOW

Simple: Commented script

Resulting output

More Fancy: Kepler, Taverna

From Flickr by cogdog

BACKING UP: 3 places, 3 ways

From Flickr by lippo

From Flickr by see phar

Original

Near

Far

What software?What hardware?What personnel?

How often?Set up reminders!

Test system

SHARING

RepositoriesInstitutionalDisciplinaryJournalre3data.org

Sustainable formatsOpen, non-proprietaryCommonly used in your disciplineNot encrypted or compressed

Review your DMPDid you do what you said you would?

Photo credit Michael Ham

How Do I Learn More?

•Funding Mandateshttp://chronicle.com/article/Where-Should-You-Keep-Your/231065/http://datapub.cdlib.org/2013/02/28/the-new-ostp-policy-what-it-means/

•File Naming Conventions: http://www.exadox.com/en/articles/file-naming-convention-ten-rules-best-practice

•Folder Structures: http://www.damlearningcenter.com/resources/articles/best-practices-for-folder-organization/

•Metadata:http://www.dcc.ac.uk/resources/metadata-standards

•DataONE Primerhttps://www.dataone.org/best-practices

•Software Carpentryhttp://software-carpentry.org/

•Research Data Alliancehttps://rd-alliance.org/

•Your Libraryhttp://guides.lib.washington.edu/dmg

Tools

•Data Mgmt PlanningDMPTool https://dmptool.org/

•MetadataMorpho https://www.dataone.org/software-tools/morphoNOAA MERMaid http://www.ncddc.noaa.gov/ metadata-standards/mermaid/

•WorkflowsKepler https://kepler-project.org/Taverna http://www.taverna.org.uk/

•Sharing re3data http://www.re3data.org/GitHub https://github.com/

•MiscellaneousEZID http://ezid.cdlib.org/ImpactStory https://impactstory.org/ORCID http://orcid.org/

Any Other Questions? Stephanie Wright

Web data.blogspot.com

Twitter @UWLibsData

Email swright@uw.edu