Developing data services: a tale from two Oregon universities
-
Upload
amanda-whitmire -
Category
Education
-
view
880 -
download
1
description
Transcript of Developing data services: a tale from two Oregon universities
Developing data services A tale from two Oregon
universities
NN/LM, Pacific Northwest RegionPNR Rendezvous | 18 June 2014
Melissa HaendelOHSU Library
Amanda WhitmireOSU Libraries
B.S. in Aquatic Biology, 2000Worked in a bioluminescence laboratory
Ph.D. in Oceanography, emphasis in biological oceanography, 2008Dissertation study area: bio-optics; using optical tools to study ocean ecology (N. California Current)
Post-doc in Oceanography, emphasis in biological oceanography, 2008-2012Study area: bio-optics; using optical tools to study ocean ecology in low oxygen zones (N. Chile)
Assistant Professor, Data Management Specialist, Sept. 2012 - present
About Amanda…
Not a librarian.
B.A. in Chemistry, 1990Modeled drug-receptor ligand binding
Ph.D. in Neuroscience, 1999, Dissertation study area: Identification of novel genes involved in neural development in the mouse
Post-doc, 2002-2004Study area: Toxic effects of biocides in zebrafish and salmon
Assistant Professor, Library, 2010 – presentLead semantic research team
About Melissa…
Not a librarian.
Post-doc, 2000-2002, Study area: Role of thyroid hormone during neural cell death in zebrafish
Post-doc, 2002-2004Study area: Ontologies, data models, gene nomenclature, biocuration
?
Do you have any data-related tasks or responsibilities in your job description or duties? [Yes/No]
What role do you believe metadata plays in the modern research cycle? [big, small, none, other]
Questions
Why data management?The researcher perspective
Why libraries?Why bring in non-librarians?
Amanda & Melissa share their experiencesWrap-up
image credit: http://www.flickr.com/photos/54803625@N08/8296296949/
6
“…the recorded factual material commonly accepted in the scientific community as necessary to validate
research findings.”
Research data is:
U.S. Office of Management and Budget, Circular A-110
“Unlike other types of information, research data are collected, observed, or created, for
the purposes of analysis to produce and validate original research results.”
What is research data?
University of EdinburghMANTRA Research Data Management Training,
‘Research Data Explained’
7
Actions that contribute to effective storage, use, preservation, and reuse of data and documentation throughout the research lifecycle.
Data management:
Why data management?
Images collected by DataONE.org
Phot
o co
urte
sy o
f ww
w.c
arbo
afric
a.ne
t
Data is collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship
Data deluge
Phot
o co
urte
sy o
f htt
p://
mod
is.g
sfc.
nasa
.gov
/
Phot
o co
urte
sy o
f htt
p://
ww
w.fu
turle
c.co
m
CC im
age
by ta
jai o
n Fl
ickr
CC im
age
by C
IMM
YT o
n Fl
ickr
Imag
e co
llect
ed b
y Vi
v H
utch
inso
n
Slide credit: http://www.dataone.org/education-modules
Federal movement toward open data
1985: National Research Council
1999: OMB
Circular A-110
revisions
2003: NIH Data Sharing Policy
2008: NIH
Public Access Policy
2011: NSF DMP
requirement
2012: NEH, Office of Digital
Humanities DMP
requirement
2013: NSF bio-sketch change
2013: OSTP
memo on public
access to results of federally funded
data
More funder mandates are coming
22 Feb. 2013
The memorandum states that, “digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” To this end, federal agencies must create a public access plan that includes the following mandates:
• Maximize public access to data while protecting personal privacy and confidentiality, intellectual property, and balancing costs with long-term benefits;
• Ensure that investigators create data management plans that describe strategies for long-term preservation of and access to data;
• Costs of data management are included in proposal budgets;• Ensure that the merits of data management plans are properly evaluated;• Implement mechanisms to ensure that investigators comply with their data
management plans and policies;• Promote deposition of data into publicly accessible repositories;• Encourage private and public cooperation to improve data access and
interoperability;• Develop and standardize approaches to data citation/attribution;• Support training in data management best practices;• Assess needs and strategies for the long-term preservation of data.
Journal data policies
Information propagation tales:The researcher’s perspective
Data isn’t always what it seems
Assertion:
“β amyloid, known for its role in injuring brain in Alzheimer’s disease, is also produced by and injures skeletal muscle fibres in the muscle disease sporadic inclusion body myositis.”
Greenberg 2009
BMJ 2009;339:b2680 doi:10.1136/bmj.b2680
All 242 papers point to 4 from same lab, and very few to the ones with negative results
Greenberg, 2009
How do we believe what we think we know?
Is it true or do we just believe it because everyone else does?
How do we transcend “follow the leader”? What tools can we build to help us?
How reproducible is science?
Let’s start simple.
Do we know what the ingredients were?
Journal guidelines for methods are often poor and space is limited
“All companies from which materials were obtained should be listed.” - A well-known journal
Reproducibility is dependent at a minimum, on using the same resources. But…
How identifiable are resources in the published literature?
An experiment in reproducibility
Gather journal articles
5 domains:ImmunologyCell biologyNeuroscienceDevelopmental biologyGeneral biology
3 impact factors:HighMediumLow
84 Journals
248 papers
707 antibodies
104 cell lines
258 constructs
210 knockdown reagents
437 model organisms
Only ~50% of resources were identifiableVasilevsky et al, 2013, PeerJ
There is no correlation between impact factor and resource identification
Journal Impact Factor
0 10 20 30 40
Fra
ctio
n of
res
ourc
es id
entif
ied
0.0
0.2
0.4
0.6
0.8
1.0AntibodiesCell LinesConstructsKnockdown reagentsOrganisms
Maybe labs are just disorganized?
Meet the Urban Lab
Meet the Urban Lab
A+ organization!
The Urban lab antibodies
Of 9 antibodies published in 5 articles, only 44% were identifiable
Per
cen
t id
enti
fiab
le
Commerical Ab identifiable
Catalog number reported
Source organism reported
Target uniquely identifiable
0%
25%
50%
75%
100%
Resource information is not adequately getting into the literature, EVEN
THOUGH IT IS READILY AVAILABLE
The problem is a lack of standards, review, and tools
LIBRARIES CAN HELP!!!!!!
http://www.force11.org/Resource_Identification_Initiative
Numerous endorsers https://www.force11.org/RII/SignUpImplementation of the new standard http://biosharing.org/bsg-000532
Resource
Identification
Portal
Sample citation: Polyclonal rabbit anti-MAPK3 antibody, Abgent, Cat# AP7251E, RRID:AB_2140114
1. Researcher submits a manuscript for publication
2. Editor or Publisher OR
LIBRARIAN! asks for inclusion of RRID
3. Author goes to Research Identification Portal to locate RRID
4. RRID is included in Methods section and as Keyword
Publishing Workflow
http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble
$1.3 million grant from the Laura and John Arnold Foundation to validate 50 landmark cancer biology studies
Partnership between Science Exchange, PLoS, FigShare, Mendelay, and some of us scientists
Librarians can help researchers understand: How to be critical of data and where it came from
Data provenance and meeting data standards
That there is a need to reinterpret data when new information comes to light
That reproducibility depends on many things, including very basic things
Why both retrospective and prospective efforts are needed to ensure data quality, consistency, and utility
Amanda’s dissertationThe spectral backscattering properties of marine particles
Observationsship-based sampling & moored instruments
Simulation results
scattering & absorption of light
Experimentaloptical properties of
phytoplankton cultures
Derived variablesendless things
Compiled observationsglobal oceanic bio-
optical observations[self + from peers]
Referenceglobal oceanic bio-
optical observations[NASA]
Why libraries?
OSU Libraries Digital Collections | http://oregondigital.org/u?/archives,31
image: http://www.beautiful-libraries.com/7200-1.html
Agricultural Sciences
Engineering
Education
Business
Liberal Arts
Public Health & Human Sciences
Veterinary Medicine
Science
Pharmacy
Forestry
Earth, Ocean &Atmospheric Sci.
Libraries
Libraries
http://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/whitepapers/Tenopir_Birch_Allard.pdf
“Only a small minority of academic libraries in the United States and Canada currently offer research data services (RDS), but a quarter to a third of all academic libraries are planning to offer some services within the next two years.”
“Few academic libraries are responsible for developing research data policies. Being able to serve as a clearinghouse of ideas and to provide expertise to build these policies is an opportunity for libraries to be members of the knowledge creation process.”
“Reassigning existing library staff is the most common tactic for offering RDS.”
Our experiences
http://clubads.com/photos/custom/fish-OutOfWAter.jpg
Timeline of data services at OSUUL & library admin. recognize need for role of RDS on campus that requires a dedicated FTE
late2011
Sept.2012
Data Management Specialist starts
Oct.2013
Data survey launches
Strategic Agenda in place*
Jan.2013
GRAD 521launches
Jan.2014
*Sutton, Shan; Barber, David; Whitmire, Amanda L. (2013): Oregon State University Libraries and Press Strategic Agenda for Research Data Services. Oregon State University Libraries. http://hdl.handle.net/1957/38794.
ESI
OSU Data stewardship survey
Interview by Sarah Abraham from The Noun Project
Responses to the question, “Please indicate whether or not you generate each of the following data format(s) as a part of your research process. Select Yes or No for each.” Color scale indicates what percentage of respondents in each college or unit selected ‘Yes’ for each data type. The number in each tile shows the number of faculty responses for that data type and college/unit.
Scope of Data Services at OSU
Research
Analysis of data management plans as a means to inform and empower academic librarians in providing research data support. National Leadership Grant LG-07-13-0328, Oct 2014 – Sept 2015
Data management plans As a Research Tool The DART Project
Consultations
Teaching: GRAD 521
Logistical Details• http://bit.ly/GRAD521• All course materials on figshare• 2 credits• Discipline-agnostic• Offered annually, winter quarter
Topics covered• Overview of RDM• Types, formats & stages of data• RDM planning• Storage, backup & security• Documentation & metadata• Legal & ethical considerations• Sharing & reuse• Archive and preservation
Timeline of data activities at OHSU
OHSU library awarded eagle-i
late2009
Sept.2012
Monarch Initiative awarded
Oct.2013
Data survey launches
Beyond the PDF1K challenge award
April2013
OHSU hiring CRIO position
Now
ESI
NIH BD2K program
OHSU Data stewardship survey
Interview by Sarah Abraham from The Noun Project
0%10%20%30%40%50%60%
How do you reference your data when you publish, either in the context of a journal publication, or by di-
rect publication of data sets?
Are there any professional community standards in your research area regarding data
management, sharing, storage, archiving, and/or producing metadata or other descriptive information that would apply to your research
data?
Answer Instructor
Assistant Professor,
Research Assistant Professor, or
Assistant Scientist
Associate Professor or
Associate Scientist
Professor or Senior Scientist
Director, Division Head,
Department Head
PostDoc/ResAssoc/
PhDYes 1 9 5 16 6 13No 1 8 9 15 1 10I don't know 1 19 13 14 4 19
Scope of Data Services at OHSU
Open houses,Lib Guides, NIH proposals to improve data education, hosting fellows
New IR, research profiling tools
Participation in national efforts: BD2K, Force11, Galaxy, Biocuration Society
Data consults, collaborations
Consultations
1 | Can facilitate the creation of a smarter body of literature for future research
2 | Train researchers to utilize metadata standards to enable data reuse
3 | Facilitate researchers understanding of available resources
Libraries, in summary…
Members from:Oregon Health & Science UniversityOregon State UniversityUniversity of OregonUniversity of IdahoUniversity of Washington Portland State UniversityReed College
Join us @ bit.ly/pnwdatalibsAlso we need a logo: Free data science training for good suggestions!
PNW Research Data Geeks Group
http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg
How do you think libraries can best facilitate best practices in data management?