Data Stewardship for Scientists, for CLIR Postdoc Workshop
-
Upload
carly-strasser -
Category
Technology
-
view
104 -
download
2
description
Transcript of Data Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Researchers
Carly Strasser, PhD California Digital Library
@carlystrasser [email protected]
31 July 2013 CLIR Symposium
From
Calisph
ere, Cou
retsy of U
C Riverside, Califo
rnia M
useu
m of P
hotograp
hy
Tips, Tools, & Guidance
From
Calisph
ere, Cou
rtesy of Tho
usan
d Oak
s Library
Roadmap
4. Toolbox
1. Background
2. Why you should care 3. Best practices
NSF funded DataNet Project Office of Cyberinfrastructure
Two main goals: 1. Build a network for data repositories 2. Build community around data
Focus on Earth | environmental | ecological | oceanographic
data
Why don’t people share data?
Is data management being taught? Do attitudes about
sharing differ among disciplines?
How can we promote storing data in repositories?
What barriers to sharing can we eliminate?
What role can libraries play in data education?
Why is data management a hot topic?
From Flickr by Velo Steve
Back in the day…
Da Vinci
Curie
Newton
classicalschool.blogspot.com
Darwin
Digital data From
Flickr by Flickm
or
From
Flickr by US Arm
y En
vironm
ental C
omman
d
From
Flickr by DW08
25
C. Strasser
Courtesey of W
HOI
From
Flickr by deltaMike
Digital data +
Complex workflows
From Flickr by ~Minnea~
Data management Documentation Reproducibility
From Flickr by iowa_spirit_walker
• Cost • Confusion about standards
• Lack of training • Fear of lost rights or benefits
• No incentives
THE TRUTH
From
san
dierpa
stures.com
Data management
Metadata
Data repositories
Data sharing
RESEARCHERS NEED TO KNOW
ABOUT
From Flickr by johntrainor
Who cares?
From
Flickr by hy
perio
n327
From Flickr by Redden-‐McAllister
… “Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products.”
Back in February:
1. Maximize free public access 2. Ensure researchers create data
management plans
3. Allow costs for data preservation and access in proposal budgets
4. Ensure evaluation of data management plan merits
5. Ensure researchers comply with their data management plans
6. Promote data deposition into public repositories
7. Develop approaches for identification and attribution of datasets
8. Educate folks about data stewardship
From Flickr by Joe Crimmings Photography
From
Flickr by tw
m1340
Culture Shift Ahead
science source notebook content access data government knowledge
From
Flickr by cd
sessum
s
flowingdata.com
Map of Scientific Collaborations
From
Flickr by ~sho
rts an
d long
s
Publications & Their Citation & data availability
Data are being recognized as first class products of research
From Flickr by Richard Moross
Data management plans
Data sharing mandates
Data publications
Data citation
From Flickr by torkildr
Data publications Data citation
Data management plans Data sharing mandates
What should researchers be doing?
From Flickr by whatthefeed
NOT V
C:\Documents and Settings\hampton\My Documents\NCEAS Distributed Graduate Seminars\[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old dataAlgal Washed RocksDec. 16Tray 004
SD for delta 13C = 0.07 SD for delta 15N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 cB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 cB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 cB5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 cC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398
23.78 1.17
Reference statistics:
Sampling Site / Identifier:Sample Type:
Date:Tray ID and Sequence:
From Stephanie Hampton (2010) ESA Workshop on Best Practices
2 tables Random notes
From Stephanie Hampton
C:\Documents and Settings\hampton\My Documents\NCEAS Distributed Graduate Seminars\[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old dataAlgal Washed RocksDec. 16Tray 004
SD for delta 13C = 0.07 SD for delta 15N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 cB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 cB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 cB5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 cC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398
23.78 1.17
Reference statistics:
Sampling Site / Identifier:Sample Type:
Date:Tray ID and Sequence:
From Stephanie Hampton (2010) ESA Workshop on Best Practices
Wash Cres Lake Dec 15 Dont_Use.xls
From Stephanie Hampton
C:\Documents and Settings\hampton\My Documents\NCEAS Distributed Graduate Seminars\[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old dataAlgal Washed RocksDec. 16Tray 004
SD for delta 13C = 0.07 SD for delta 15N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUTB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression StatisticsB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVAC1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance FC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278
23.78 1.17 Total 10 35.55962
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
Reference statistics:
Sampling Site / Identifier:Sample Type:
Date:Tray ID and Sequence:
Random stats output
From Stephanie Hampton
C:\Documents and Settings\hampton\My Documents\NCEAS Distributed Graduate Seminars\[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data Sheet
Wash Cresc Lake Peter's lab Don't use - old dataAlgal Washed RocksDec. 16Tray 004
SD for delta 13C = 0.07 SD for delta 15N = 0.15
Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUTB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression StatisticsB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVAC1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance FC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278
23.78 1.17 Total 10 35.55962
CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
Reference statistics:
Sampling Site / Identifier:Sample Type:
Date:Tray ID and Sequence:
SampleID ALG03 ALG05 ALG07 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07
Weight (mg) 2.91 2.91 3.04 2.95 3.01 3 2.99 2.92 2.9
%C 6.85 35.56 33.49 41.17 43.74 4.51 1.59 4.37 33.58delta 13C -21.11 -28.05 -29.56 -27.32 -27.50 -22.68 -24.58 -21.06 -29.44
delta 13C_ca -20.65 -27.59 -29.10 -26.86 -27.04 -22.22 -24.12 -20.60 -28.98
%N 0.48 2.30 1.68 1.97 1.36 0.34 0.15 0.34 1.74delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62
delta 15N_ca -1.62 -0.06 0.14 2.06 0.34 3.66 -2.34 -2.17 -0.03
-3.00
-2.00
-1.00
0.00
1.00
2.00
3.00
4.00
-35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00
Series1
From Stephanie Hampton
From Flickr by whatthefeed
What should researchers be doing?
data management
From
Flickr by Big Sw
ede Guy
1. Planning 2. Data collection &
organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices
Create unique identifiers • Decide on naming scheme early • Create a key • Different for each sample
2. Data collection & organization
From Flickr by sjbresnahan
From
Flickr by zebb
ie
Standardize • Consistent within columns – only numbers, dates, or text
• Consistent names, codes, formats
Modified from K. Vanderbilt From Pink Floyd, The Wall themurkyfringe.com
2. Data collection & organization
Google Docs Forms
Standardize • Reduce possibility of manual error by constraining entry choices
Modified from K. Vanderbilt
2. Data collection & organization
Excel lists Data
validataion
2. Data collection & organization
Create parameter table Create a site table
From doi:10.3334/ORNLDAAC/777
From doi:10.3334/ORNLDAAC/777
From R Cook, ESA Best Practices Workshop 2010
Use descriptive file names • Unique • Reflect contents
From R Cook, ESA Best Practices Workshop 2010
Bad: Mydata.xls 2001_data.csv best version.txt
Better: Eaffinis_nanaimo_2010_counts.xls
Site name
Year What was measured
Study organism
2. Data collection & organization
*Not for everyone
*
Organize files logically
Biodiversity
Lake
Experiments
Field work
Grassland
Biodiv_H20_heatExp_2005to2008.csv Biodiv_H20_predatorExp_2001to2003.csv … Biodiv_H20_PlanktonCount_2001toActive.csv Biodiv_H20_ChlAprofiles_2003.csv …
From S. Hampton
2. Data collection & organization
Preserve information • Keep raw data raw
• Use scripts to process data & save them with data
Raw data as .csv
R script for processing & analysis
2. Data collection & organization
data management
From
Flickr by Big Sw
ede Guy
1. Planning 2. Data collection &
organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices
Before data collection • Define & enforce standards • Assign responsibility for data quality
3. Quality control and quality assurance
From
Flickr by StacieBe
e
After data entry • Check for missing, impossible,
anomalous values • Perform statistical summaries • Look for outliers
3. Quality control and quality assurance
0
10
20
30
40
50
60
0 10 20 30 40
data management
From
Flickr by Big Sw
ede Guy
1. Planning 2. Data collection &
organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices
4. Metadata basics Why are you promoting Excel?
What is metadata?
• Digital context
• Name of the data set
• The name(s) of the data file(s) in the data set
• Date the data set was last modified
• Example data file records for each data type file
• Pertinent companion files
• List of related or ancillary data sets
• Software (including version number) used to prepare/read the data set
• Data processing that was performed
• Personnel & stakeholders
• Who collected
• Who to contact with questions
• Funders
• Scientific context
• Scientific reason why the data were collected
• What data were collected
• What instruments (including model & serial number) were used
• Environmental conditions during collection
• Where collected & spatial resolution When collected & temporal resolution
• Standards or calibrations used
• Information about parameters
• How each was measured or produced
• Units of measure
• Format used in the data set
• Precision & accuracy if known
• Information about data
• Definitions of codes used
• Quality assurance & control measures
• Known problems that limit data use (e.g. uncertainty, sampling problems)
• How to cite the data set
4. Metadata basics
• Provides structure to describe data
Common terms | definitions | language | structure
4. Metadata basics
• Lots of different standards EML , FGDC, ISO19115, DarwinCore,…
• Tools for creating metadata files
Morpho (EML), Metavist (FGDC), NOAA MERMaid (CSGDM)
What is metadata?
Select the appropriate standard
data management
From
Flickr by Big Sw
ede Guy
1. Planning 2. Data collection &
organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices
Temperature data
Salinity data
Data import into R
Analysis: mean, SD
Graph production
Quality control & data cleaning “Clean” T
& S data
Summary statistics
Data in R format
5. Workflows
Workflow: how you get from the raw data to the final products of your research
Simple workflows: flow charts
• R, SAS, MATLAB • Well-‐documented code is…
Easier to review Easier to share Easier to repeat analysis
5. Workflows
Workflow: how you get from the raw data to the final products of your research
Simple workflows: commented scripts
# % $
&
Fancy Schmancy workflows: Kepler Resulting output
5. Workflows
https://kepler-‐project.org
Workflows enable…
Reproducibility
can someone independently validate findings?
Transparency others can understand how you arrived at your results
Executability
others can re-‐run or re-‐use your analysis
5. Workflows
From Flickr by merlinprincesse
Coming Soon:
workflow shar
ing
requirements!
data management
From
Flickr by Big Sw
ede Guy
1. Planning 2. Data collection &
organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices
Use stable formats csv, txt, tiff
Create back-‐up copies original, near, far
Periodically test ability to restore information
6. Data stewardship & reuse
Modified from R. Cook
Store your data in a repository
Institutional archive
Discipline/specialty archive
6. Data stewardship & reuse
From Flickr by torkildr
Ask a librarian
Repos of repos:
databib.org
re3data.org
Allows readers to find data products Get credit for data and publications
Promotes reproducibility Better measure of research impact
Example: Sidlauskas, B. 2007. Data from: Testing for unequal rates of morphological diversification in the absence of a detailed phylogeny: a case study from characiform fishes. Dryad Digital Repository. doi:10.5061/dryad.20 Persistent Unique
Identifier
6. Data stewardship & reuse
Practice Data Citation
data management
From
Flickr by Big Sw
ede Guy
1. Planning 2. Data collection &
organization 3. Quality control & assurance 4. Metadata 5. Workflows 6. Data stewardship & reuse
Best Practices
A document that describes what you will
do with your data throughout
the research project
From Flickr by Barbies Land
What is a data management plan?
DMP for funders: A short plan submitted alongside grant applications
But they all have different requirements and express them in
different ways
From Flickr by 401(K) 2013
An outline of – what will be collected – methods – Standards – Metadata – sharing/access – long-‐term storage
Includes how and why
DMP supplement may include: 1. the types of data, samples, physical collections, software, curriculum
materials, and other materials to be produced in the course of the project
2. the standards to be used for data and metadata format and content (where existing standards are absent or deemed inadequate, this should be documented along with any proposed solutions or remedies)
3. policies for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements
4. policies and provisions for re-‐use, re-‐distribution, and the production of derivatives
5. plans for archiving data, samples, and other research products, and for preservation of access to them
NSF DMP Requirements
From Grant Proposal Guidelines:
• Types of data • Existing data • How/when/where created?
• How processed?
• Quality control
• Security • Who is responsible
1. Types of data & other information
biology.kenyon.edu
C. Strasser
From Flickr by Lazurite
Wired.com
• Metadata needed • How captured • Standards
2. Data & metadata standards
• Obligation to share
• How/when/where available
• Getting access • Copyright / IP • Permission restrictions • Embargo periods • Ethics/privacy • How cited
3. Policies for access & sharing 4. Policies for re-‐use & re-‐distribution
From
Flickr by maryfranc
esmain
• What & where
• Metadata
• Who’s responsible
5. Plans for archiving & preservation
From Flickr by theManWhoSurfedTooMuch
Don’t forget the budget
dorrvs.com
NSF’s Vision*
DMPs and their evaluation will grow & change over time
Peer review will determine next steps
Community-‐driven guidelines
Evaluation will vary with directorate, division, & program officer
*Unofficially
From
Flickr by celikins
Where to start?
From Flickr by Andy Graulund
Make a resolution • Triage on current projects • Get advisor, lab mates, collaborators on board • Do better next time
Start working online
From Flickr by karindalziel
From Flickr by karindalziel
E-‐notebooks Online science
http://datapub.cdlib.org/software-‐for-‐reproducibility-‐part-‐2-‐the-‐tools/
Reproducibility
From
Flickr by dipster1
Toolbox
Step-by-step wizard for generating DMP
create | edit | re-use | share
Free & open to community
dmptool.org Write a DMP
databib.org
Where should I put my data?
Find a repository
Get help
From
Flic
kr b
y th
ewm
att
Get help from your library From
Flickr by North Carolina Digita
l Herita
ge Cen
ter
From Flickr by Madison Guy
NSF funded DataNet Project Office of Cyberinfrastructure
www.dataone.org
Get help
B
C A
• Data Education Tutorials • Database of best practices &
software tools • Primer on data management • Investigator Toolkit
www.dataone.org
From Flickr by Skakerman
A word about Metrics…
Articles are the butterfly pinned on the wall. Pretty but not very useful. They are only the advertisements for scholarship. – A. Levi, U. Maryland College of Information Studies
From Flickr by LisaW123
How to incentivize good data stewardship?
Data Citation
Altmetrics (Alternative Metrics)
From Flickr by chriscook04
From Flickr by dotpolka
Doing science is a privilege – not a right
There is a social contract of science: we have an obligation to ensure dissemination, validation, & advancement.
To not do so is science malpractice.
Who's responsible? Researchers, publishers, libraries, repositories…
– Brian Hole, Ubiquity Press at UCL
From Flickr by mikerosebery
From Flickr by Michael Tinkler
Data Pub Blog: datapub.cdlib.org
My website Email me Tweet me My slides
carlystrasser.net [email protected] @carlystrasser slideshare.net/carlystrasser