Developing data services: a tale from two Oregon universities

59
Developing data services A tale from two Oregon universities NN/LM, Pacific Northwest Region PNR Rendezvous | 18 June Melissa Haendel Amanda Whitmire

description

While the generation or collection of large, complex research datasets is becoming easier and less expensive all the time, researchers often lack the knowledge and skills that are necessary to properly manage them. Having these skills is paramount in ensuring data quality, integrity, discoverability, integration, reproducibility, and reuse over time. Librarians have been preserving, managing and disseminating information for thousands of years. As scholarly research is increasingly carried out digitally, and products of research have expanded from primarily text-based manuscripts to include datasets, metadata, maps, software code etc., it is a natural expansion of scope for libraries to be involved in the stewardship of these materials as well. This kind of evolution requires that libraries bring in faculty with new skills and collaborate more intimately with researchers during the research data lifecycle, and this is exactly what is happening in academic libraries across the country. In this webinar, two researchers-turned-data-specialists, both based in academic libraries, will share their experiences and perspectives on the development of research data services at their respective institutions. Each will share their perspective on the important role that libraries can play in helping researchers manage, preserve, and share their data.

Transcript of Developing data services: a tale from two Oregon universities

Page 1: Developing data services: a tale from two Oregon universities

Developing data services A tale from two Oregon

universities

NN/LM, Pacific Northwest RegionPNR Rendezvous | 18 June 2014

Melissa HaendelOHSU Library

Amanda WhitmireOSU Libraries

Page 2: Developing data services: a tale from two Oregon universities

B.S. in Aquatic Biology, 2000Worked in a bioluminescence laboratory

Ph.D. in Oceanography, emphasis in biological oceanography, 2008Dissertation study area: bio-optics; using optical tools to study ocean ecology (N. California Current)

Post-doc in Oceanography, emphasis in biological oceanography, 2008-2012Study area: bio-optics; using optical tools to study ocean ecology in low oxygen zones (N. Chile)

Assistant Professor, Data Management Specialist, Sept. 2012 - present

About Amanda…

Not a librarian.

Page 3: Developing data services: a tale from two Oregon universities

B.A. in Chemistry, 1990Modeled drug-receptor ligand binding

Ph.D. in Neuroscience, 1999, Dissertation study area: Identification of novel genes involved in neural development in the mouse

Post-doc, 2002-2004Study area: Toxic effects of biocides in zebrafish and salmon

Assistant Professor, Library, 2010 – presentLead semantic research team

About Melissa…

Not a librarian.

Post-doc, 2000-2002, Study area: Role of thyroid hormone during neural cell death in zebrafish

Post-doc, 2002-2004Study area: Ontologies, data models, gene nomenclature, biocuration

?

Page 4: Developing data services: a tale from two Oregon universities

Do you have any data-related tasks or responsibilities in your job description or duties? [Yes/No]

What role do you believe metadata plays in the modern research cycle? [big, small, none, other]

Questions

Page 5: Developing data services: a tale from two Oregon universities

Why data management?The researcher perspective

Why libraries?Why bring in non-librarians?

Amanda & Melissa share their experiencesWrap-up

image credit: http://www.flickr.com/photos/54803625@N08/8296296949/

Page 6: Developing data services: a tale from two Oregon universities

6

“…the recorded factual material commonly accepted in the scientific community as necessary to validate

research findings.”

Research data is:

U.S. Office of Management and Budget, Circular A-110

Page 7: Developing data services: a tale from two Oregon universities

“Unlike other types of information, research data are collected, observed, or created, for

the purposes of analysis to produce and validate original research results.”

What is research data?

University of EdinburghMANTRA Research Data Management Training,

‘Research Data Explained’

7

Page 8: Developing data services: a tale from two Oregon universities

Actions that contribute to effective storage, use, preservation, and reuse of data and documentation throughout the research lifecycle.

Data management:

Page 9: Developing data services: a tale from two Oregon universities

Why data management?

Page 10: Developing data services: a tale from two Oregon universities

Images collected by DataONE.org

Page 11: Developing data services: a tale from two Oregon universities

Phot

o co

urte

sy o

f ww

w.c

arbo

afric

a.ne

t

Data is collected from sensors, sensor networks, remote sensing, observations, and more - this calls for increased attention to data management and stewardship

Data deluge

Phot

o co

urte

sy o

f htt

p://

mod

is.g

sfc.

nasa

.gov

/

Phot

o co

urte

sy o

f htt

p://

ww

w.fu

turle

c.co

m

CC im

age

by ta

jai o

n Fl

ickr

CC im

age

by C

IMM

YT o

n Fl

ickr

Imag

e co

llect

ed b

y Vi

v H

utch

inso

n

Slide credit: http://www.dataone.org/education-modules

Page 12: Developing data services: a tale from two Oregon universities

Federal movement toward open data

1985: National Research Council

1999: OMB

Circular A-110

revisions

2003: NIH Data Sharing Policy

2008: NIH

Public Access Policy

2011: NSF DMP

requirement

2012: NEH, Office of Digital

Humanities DMP

requirement

2013: NSF bio-sketch change

2013: OSTP

memo on public

access to results of federally funded

data

Page 13: Developing data services: a tale from two Oregon universities

More funder mandates are coming

22 Feb. 2013

Page 14: Developing data services: a tale from two Oregon universities

The memorandum states that, “digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding should be stored and publicly accessible to search, retrieve, and analyze.” To this end, federal agencies must create a public access plan that includes the following mandates:

• Maximize public access to data while protecting personal privacy and confidentiality, intellectual property, and balancing costs with long-term benefits;

• Ensure that investigators create data management plans that describe strategies for long-term preservation of and access to data;

• Costs of data management are included in proposal budgets;• Ensure that the merits of data management plans are properly evaluated;• Implement mechanisms to ensure that investigators comply with their data

management plans and policies;• Promote deposition of data into publicly accessible repositories;• Encourage private and public cooperation to improve data access and

interoperability;• Develop and standardize approaches to data citation/attribution;• Support training in data management best practices;• Assess needs and strategies for the long-term preservation of data.

Page 15: Developing data services: a tale from two Oregon universities

Journal data policies

Page 16: Developing data services: a tale from two Oregon universities

Information propagation tales:The researcher’s perspective

Page 17: Developing data services: a tale from two Oregon universities

Data isn’t always what it seems

Page 18: Developing data services: a tale from two Oregon universities

Assertion:

“β amyloid, known for its role in injuring brain in Alzheimer’s disease, is also produced by and injures skeletal muscle fibres in the muscle disease sporadic inclusion body myositis.”

Greenberg 2009

Page 19: Developing data services: a tale from two Oregon universities

BMJ 2009;339:b2680 doi:10.1136/bmj.b2680

All 242 papers point to 4 from same lab, and very few to the ones with negative results

Greenberg, 2009

Page 20: Developing data services: a tale from two Oregon universities

How do we believe what we think we know?

Is it true or do we just believe it because everyone else does?

How do we transcend “follow the leader”? What tools can we build to help us?

Page 21: Developing data services: a tale from two Oregon universities

How reproducible is science?

Let’s start simple.

Do we know what the ingredients were?

Page 22: Developing data services: a tale from two Oregon universities

Journal guidelines for methods are often poor and space is limited

“All companies from which materials were obtained should be listed.” - A well-known journal

Reproducibility is dependent at a minimum, on using the same resources. But…

Page 23: Developing data services: a tale from two Oregon universities

How identifiable are resources in the published literature?

An experiment in reproducibility

Gather journal articles

5 domains:ImmunologyCell biologyNeuroscienceDevelopmental biologyGeneral biology

3 impact factors:HighMediumLow

84 Journals

248 papers

707 antibodies

104 cell lines

258 constructs

210 knockdown reagents

437 model organisms

Page 25: Developing data services: a tale from two Oregon universities

There is no correlation between impact factor and resource identification

Journal Impact Factor

0 10 20 30 40

Fra

ctio

n of

res

ourc

es id

entif

ied

0.0

0.2

0.4

0.6

0.8

1.0AntibodiesCell LinesConstructsKnockdown reagentsOrganisms

Page 26: Developing data services: a tale from two Oregon universities

Maybe labs are just disorganized?

Page 27: Developing data services: a tale from two Oregon universities

Meet the Urban Lab

Meet the Urban Lab

Page 28: Developing data services: a tale from two Oregon universities

A+ organization!

The Urban lab antibodies

Page 29: Developing data services: a tale from two Oregon universities

Of 9 antibodies published in 5 articles, only 44% were identifiable

Per

cen

t id

enti

fiab

le

Commerical Ab identifiable

Catalog number reported

Source organism reported

Target uniquely identifiable

0%

25%

50%

75%

100%

Page 30: Developing data services: a tale from two Oregon universities

Resource information is not adequately getting into the literature, EVEN

THOUGH IT IS READILY AVAILABLE

The problem is a lack of standards, review, and tools

LIBRARIES CAN HELP!!!!!!

Page 31: Developing data services: a tale from two Oregon universities

http://www.force11.org/Resource_Identification_Initiative

Numerous endorsers https://www.force11.org/RII/SignUpImplementation of the new standard http://biosharing.org/bsg-000532

Page 32: Developing data services: a tale from two Oregon universities

Resource

Identification

Portal

Sample citation: Polyclonal rabbit anti-MAPK3 antibody, Abgent, Cat# AP7251E, RRID:AB_2140114

1. Researcher submits a manuscript for publication

2. Editor or Publisher OR

LIBRARIAN! asks for inclusion of RRID

3. Author goes to Research Identification Portal to locate RRID

4. RRID is included in Methods section and as Keyword

Publishing Workflow

Page 33: Developing data services: a tale from two Oregon universities

http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble

Page 34: Developing data services: a tale from two Oregon universities

$1.3 million grant from the Laura and John Arnold Foundation to validate 50 landmark cancer biology studies

Partnership between Science Exchange, PLoS, FigShare, Mendelay, and some of us scientists

Page 35: Developing data services: a tale from two Oregon universities

Librarians can help researchers understand: How to be critical of data and where it came from

Data provenance and meeting data standards

That there is a need to reinterpret data when new information comes to light

That reproducibility depends on many things, including very basic things

Why both retrospective and prospective efforts are needed to ensure data quality, consistency, and utility

Page 36: Developing data services: a tale from two Oregon universities

Amanda’s dissertationThe spectral backscattering properties of marine particles

Observationsship-based sampling & moored instruments

Simulation results

scattering & absorption of light

Experimentaloptical properties of

phytoplankton cultures

Derived variablesendless things

Compiled observationsglobal oceanic bio-

optical observations[self + from peers]

Referenceglobal oceanic bio-

optical observations[NASA]

Page 37: Developing data services: a tale from two Oregon universities

Why libraries?

OSU Libraries Digital Collections | http://oregondigital.org/u?/archives,31

Page 38: Developing data services: a tale from two Oregon universities

image: http://www.beautiful-libraries.com/7200-1.html

Page 39: Developing data services: a tale from two Oregon universities

Agricultural Sciences

Engineering

Education

Business

Liberal Arts

Public Health & Human Sciences

Veterinary Medicine

Science

Pharmacy

Forestry

Earth, Ocean &Atmospheric Sci.

Libraries

Page 40: Developing data services: a tale from two Oregon universities

Libraries

Page 41: Developing data services: a tale from two Oregon universities

http://www.ala.org/acrl/sites/ala.org.acrl/files/content/publications/whitepapers/Tenopir_Birch_Allard.pdf

“Only a small minority of academic libraries in the United States and Canada currently offer research data services (RDS), but a quarter to a third of all academic libraries are planning to offer some services within the next two years.”

“Few academic libraries are responsible for developing research data policies. Being able to serve as a clearinghouse of ideas and to provide expertise to build these policies is an opportunity for libraries to be members of the knowledge creation process.”

“Reassigning existing library staff is the most common tactic for offering RDS.”

Page 42: Developing data services: a tale from two Oregon universities

Our experiences

http://clubads.com/photos/custom/fish-OutOfWAter.jpg

Page 43: Developing data services: a tale from two Oregon universities

Timeline of data services at OSUUL & library admin. recognize need for role of RDS on campus that requires a dedicated FTE

late2011

Sept.2012

Data Management Specialist starts

Oct.2013

Data survey launches

Strategic Agenda in place*

Jan.2013

GRAD 521launches

Jan.2014

*Sutton, Shan; Barber, David; Whitmire, Amanda L. (2013): Oregon State University Libraries and Press Strategic Agenda for Research Data Services. Oregon State University Libraries. http://hdl.handle.net/1957/38794.

ESI

Page 44: Developing data services: a tale from two Oregon universities

OSU Data stewardship survey

Interview by Sarah Abraham from The Noun Project

Page 45: Developing data services: a tale from two Oregon universities

Responses to the question, “Please indicate whether or not you generate each of the following data format(s) as a part of your research process. Select Yes or No for each.” Color scale indicates what percentage of respondents in each college or unit selected ‘Yes’ for each data type. The number in each tile shows the number of faculty responses for that data type and college/unit.

Page 46: Developing data services: a tale from two Oregon universities

Scope of Data Services at OSU

Page 47: Developing data services: a tale from two Oregon universities

Research

Analysis of data management plans as a means to inform and empower academic librarians in providing research data support. National Leadership Grant LG-07-13-0328, Oct 2014 – Sept 2015

Data management plans As a Research Tool The DART Project

Page 48: Developing data services: a tale from two Oregon universities

Consultations

Page 49: Developing data services: a tale from two Oregon universities

Teaching: GRAD 521

Logistical Details• http://bit.ly/GRAD521• All course materials on figshare• 2 credits• Discipline-agnostic• Offered annually, winter quarter

Topics covered• Overview of RDM• Types, formats & stages of data• RDM planning• Storage, backup & security• Documentation & metadata• Legal & ethical considerations• Sharing & reuse• Archive and preservation

Page 50: Developing data services: a tale from two Oregon universities

Timeline of data activities at OHSU

OHSU library awarded eagle-i

late2009

Sept.2012

Monarch Initiative awarded

Oct.2013

Data survey launches

Beyond the PDF1K challenge award

April2013

OHSU hiring CRIO position

Now

ESI

NIH BD2K program

Page 51: Developing data services: a tale from two Oregon universities

OHSU Data stewardship survey

Interview by Sarah Abraham from The Noun Project

Page 52: Developing data services: a tale from two Oregon universities

0%10%20%30%40%50%60%

How do you reference your data when you publish, either in the context of a journal publication, or by di-

rect publication of data sets?

Page 53: Developing data services: a tale from two Oregon universities

Are there any professional community standards in your research area regarding data

management, sharing, storage, archiving, and/or producing metadata or other descriptive information that would apply to your research

data?

Answer Instructor

Assistant Professor,

Research Assistant Professor, or

Assistant Scientist

Associate Professor or

Associate Scientist

Professor or Senior Scientist

Director, Division Head,

Department Head

PostDoc/ResAssoc/

PhDYes 1 9 5 16 6 13No 1 8 9 15 1 10I don't know 1 19 13 14 4 19

Page 54: Developing data services: a tale from two Oregon universities

Scope of Data Services at OHSU

Open houses,Lib Guides, NIH proposals to improve data education, hosting fellows

New IR, research profiling tools

Participation in national efforts: BD2K, Force11, Galaxy, Biocuration Society

Data consults, collaborations

Page 55: Developing data services: a tale from two Oregon universities

Consultations

Page 56: Developing data services: a tale from two Oregon universities

NIH Big Data to Knowledge Initiative

http://bd2k.nih.gov/

Page 57: Developing data services: a tale from two Oregon universities

1 | Can facilitate the creation of a smarter body of literature for future research

2 | Train researchers to utilize metadata standards to enable data reuse

3 | Facilitate researchers understanding of available resources

Libraries, in summary…

Page 58: Developing data services: a tale from two Oregon universities

Members from:Oregon Health & Science UniversityOregon State UniversityUniversity of OregonUniversity of IdahoUniversity of Washington Portland State UniversityReed College

Join us @ bit.ly/pnwdatalibsAlso we need a logo: Free data science training for good suggestions!

PNW Research Data Geeks Group

http://commons.wikimedia.org/wiki/File:DARPA_Big_Data.jpg

Page 59: Developing data services: a tale from two Oregon universities

How do you think libraries can best facilitate best practices in data management?