Data and Donuts: The Impact of Data Management
-
Upload
c-tobin-magle -
Category
Data & Analytics
-
view
349 -
download
0
Transcript of Data and Donuts: The Impact of Data Management
The Impact of Data
ManagementC. Tobin Magle, PhD
Sept. 29, 20169:00-10:00 a.m.
Morgan Library Computer Classroom 173
but the same principles apply to both
data management !=
data sharing
Why should I care about data management?
Rinehart, AK. “Getting emotional about data” College & Research Libraries News September 2015 vol. 76 no. 8 437-440
Everything* is digital
• Needs new skills• Data are ephemeral• Facilitates sharing
*ok not everything, but most things
More researchers
https://www.nsf.gov/statistics/2016/nsf16300/digest/nsf16300.pdf
Working Email
Data are extant(If status known)
Status of data (if response)
Response (if email working)
doi:10.1016/j.cub.2013.11.014
We are losing vast amounts of data
00
0
0
0
0
0
0
0
00
0
0
1
1
1
11
1
11
1
1
1
1
1
1
1
0
00
0
0
0
000
000 0
1
1
1 1
10
Research funding is tight
http://www.bu.edu/research/articles/funding-for-scientific-research/
Funders want to do more with less
http://figshare.com/blog/2015_The_year_of_open_data_mandates/143
White House’s 2013 OSTP
“The Obama Administration is committed to the proposition that citizens deserve easy access to the results of research their tax dollars have paid for. That’s why, in a policy memorandum released today, OSTP Director John Holdren has directed Federal agencies with more than $100M in R&D expenditures to develop plans to make the results of federally funded research freely available to the public—generally within one year of publication.”
http://www.whitehouse.gov/blog/2013/02/22/expanding-public-access-results-federally-funded-research
NSF post-award requirements
“Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing.”
http://www.nsf.gov/pubs/policydocs/pappguide/nsf11001/aag_6.jsp#VID4
In other words…In other words…
It’s good for science
• Improves research reproducibility
• Improves efficiency
• Spurs innovation
It’s good for you
• You are the future data user
• Your data get used (and cited)
• Exposure to collaborators
• More competitive grants
But wait…
Barriers to data sharing
“But it’s mine, I don’t want to share!”
• Usually funded by public money• See White House statement
• If you work for CSU, the university actually owns your data
• You are the steward• CSU promotes open data
“But my data are too small to be useful”
“But I work with sensitive/private data”
• CAN share deidentified data
• CAN share summary data • https://clinicaltrials.gov/
• Controlled access• See dbGaP @ NCBI re: NIH genomic data sharing
policy• Release metadata so people know the data exist and
ask for it• Identifying personal genomes by surname
inference• https://www.ncbi.nlm.nih.gov/pubmed/23329047
“But I’m planning applying for a patent!”
• Ok data sharing isn’t right for you
• But good data management practices have benefits even if you don’t share!
• Can share later
What is data management?
The policies, practices and procedures needed to manage the storage, access and preservation of data
produced from a research project
Where does data management fit into
research?
Throughout the whole research cycle
Hypothesis
The research cycle
Hypothesis Experimental design
The research cycle
Hypothesis DataExperimental design
The research cycle
Hypothesis DataExperimental design
Results
The research cycle
Hypothesis DataExperimental design
ResultsArticle
The research cycle
Hypothesis DataExperimental design
ResultsArticle
The research cycle
Hypothesis DataExperimental design
ResultsArticle
Data Management Plans
The research cycle
HypothesisRaw data
Experimental design
Tidy Data
ResultsArticle
Data Management Plans
Cleaning
Analysis
The research cycle
HypothesisRaw data
Experimental design
Tidy Data
ResultsArticle
Data Management Plans
Cleaning
Sharing
Analysis
Open Data
ClosedData
Archiving
The research cycle
HypothesisRaw data
Experimental design
Tidy Data
ResultsArticle
Data Management Plans
Cleaning
Sharing
Analysis
Open Data
Code Reproducible Research
ClosedData
Archiving
The research cycle
HypothesisRaw data
Experimental design
Tidy Data
ResultsArticle
Data Management Plans
Cleaning
Sharing
Analysis
Open Data
Code Reproducible Research
Reuse
ClosedData
Archiving
The research cycle
HypothesisRaw data
Experimental design
Tidy Data
ResultsArticle
Data Management Plans
Cleaning
Sharing
Analysis
Open Data
Code Reproducible Research
Reuse
ClosedData
Archiving
The research cycle