Meeting Federal Research Requirements

62
From Data Sharing to Data Stewardship: Meeting Federal Data Sharing Requirements ACRL 2015 Thursday, March 26, 2015 ICPSR – University of Michigan Hashtag: #icpsr

Transcript of Meeting Federal Research Requirements

Page 1: Meeting Federal Research Requirements

From Data Sharing to Data Stewardship: Meeting Federal Data Sharing Requirements

ACRL 2015

Thursday, March 26, 2015

ICPSR – University of Michigan

Hashtag: #icpsr

Page 2: Meeting Federal Research Requirements

https://www.flickr.com/photos/29261037@N02/8896766525

Page 3: Meeting Federal Research Requirements

https://www.flickr.com/photos/shawnhoke/6040690284

Page 4: Meeting Federal Research Requirements
Page 5: Meeting Federal Research Requirements

Direct identifiers

• Addresses, including ZIP and other postal codes

• Telephone numbers, including area codes

Indirect identifiers

• Exact dates of events (birth, death, marriage)

• Detailed income

• Detailed geographic information (e.g., county)

Page 6: Meeting Federal Research Requirements

“The study is composed of about 180,000 autopsy x-ray image files taken of 58 corpses. The images originally arrived on DVD and are formatted to comply with the Digital Imaging and Communications in Medicine (DICOM) standard…. The images are the data of the study, the images files themselves contain metadata (metadata on the images) scrubbed of identifiers but there isn't much in terms of documentation.”

Page 7: Meeting Federal Research Requirements

http://www.wired.com/wp-content/uploads/2014/04/480815249-660x672.jpg

Page 8: Meeting Federal Research Requirements

Today

• History (brief!) of federal data sharing requirements

• What is good data sharing? How do you achieve data stewardship?

• Public data sharing services – tours & take-away tips

• Resources for creating data management plans and funding quotes

Page 9: Meeting Federal Research Requirements

You should leave this session with -

• Keen understanding of several sustainable data sharing models

• Ability to assess data sharing services

– Through review of several services

– Walk-away tips for evaluating

• Knowledge (a portal) of resources for creating data management plans for grant applications

Page 10: Meeting Federal Research Requirements

• 50+ years of experience

• Data stewardship

• Data management

• Data curation

• Data preservation

ICPSR

Page 12: Meeting Federal Research Requirements

Recent Federal Data Sharing Initiatives

• NIH: 2003 – data sharing plans

• NSF: 2011 – data management plans

• OSTP: 2013 – Memo with subject “Increasing Access to the Results of Federally Funded Scientific Research”

Page 13: Meeting Federal Research Requirements
Page 14: Meeting Federal Research Requirements

https://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

Page 15: Meeting Federal Research Requirements

http://sites.nationalacademies.org/DBASSE/CurrentProjects/DBASSE_082378

Page 16: Meeting Federal Research Requirements

http://www.icpsr.umich.edu/files/ICPSR/ICPSRComments.pdf

Page 17: Meeting Federal Research Requirements

http://guides.library.oregonstate.edu/federaloa

Page 18: Meeting Federal Research Requirements

http://bit.ly/FedOASummary

Page 19: Meeting Federal Research Requirements

Data Portion of Memo - 13 Elements

• The elements are also summarized online within ICPSR’s Web site: http://icpsr.umich.edu/content/datamanagement/ostp.html

Page 20: Meeting Federal Research Requirements

1.Maximize access

2.Protect confidentiality and privacy

3.Appropriate attribution

4.Long term preservation and sustainability

5.Data management planning

Page 21: Meeting Federal Research Requirements

UK results on data sharing attitudes

• In 2011 survey, 85% of researchers said they thought their data would be of interest to others.

• Only 41% said they would be happy to make their data available.

• Only a third had previously published data.

Source: DaMaRO Project, University of Oxfordhttp://www.slideshare.net/DigCurv/15-meriel-patrick

Page 22: Meeting Federal Research Requirements

Data Sharing Status

Federal Agency

Shared Formally, Archived(n=111)

Shared Informally, NotArchived(n=415)

Not Shared(n=409)

NSF (27.3%)

22.4% 43.7% 33.9%

NIH(72.7%)

7.4% 45.0% 47.6%

Total 11.5% 44.6% 43.9%

Pienta, Gutmann, & Lyle (2009). “Research Data in The Social Sciences: How Much is Being Shared?”

http://ori.hhs.gov/content/research-research-integrity-rri-conference-2009

See also: Pienta, Gutmann, Hoelter, Lyle, & Donakowski (2008). “The LEADS Database at ICPSR:

Identifying Important ‘At Risk’ Social Science Data.”

http://www.data-pass.org/sites/default/files/Pienta_et_al_2008.pdf

Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The

Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307

Page 23: Meeting Federal Research Requirements

What is good data sharing - the basis of data stewardship?

1.Maximize access

2.Protect confidentiality and privacy

3.Appropriate attribution

4.Long term preservation and sustainability

5.Data management planning

Page 24: Meeting Federal Research Requirements

Maximize Access (Data Curation)

Page 25: Meeting Federal Research Requirements

Discoverable

http://www.flickr.com/photos/papertrix/38028138/

Page 26: Meeting Federal Research Requirements

Accessible

http://www.guardian.co.uk/science/grrlscientist/2012/mar/29/1

Page 27: Meeting Federal Research Requirements

A well-prepared data collection “contains information intended to be complete and self-explanatory” for future users.

Do no harm.

Page 28: Meeting Federal Research Requirements

Protect confidentiality and privacy

• It is critically important to protect the identities of research subjects

• Disclosure risk is a term that is often used for the possibility that a data record from a study could be linked to a specific person

• Data with these risks can be shared via a secured virtual environment

• Data concerning very sensitive topics can also be shared via a secured environment

Page 29: Meeting Federal Research Requirements

Appropriate Attribution

• Properly citing data encourages the replication of scientific results, improves research standards, guarantees persistent reference, and gives proper credit to data producers.

• Citing data is straightforward. Each citation must include the basic elements that allow a unique dataset to be identified over time: title, author, date, version, and persistent identifier.

• Resources: ICPSR's Data Citations page , IASSIST's Quick Guide to Data Citation, DataCite.

Page 30: Meeting Federal Research Requirements

Long term preservation and sustainability

Page 31: Meeting Federal Research Requirements

“Digital information lasts forever or five years, whichever comes first”.

-Jeff Rothenberg

Page 32: Meeting Federal Research Requirements

https://flic.kr/p/arHsh4

Page 33: Meeting Federal Research Requirements

http://www.flickr.com/photos/blude/2665906010/

Page 34: Meeting Federal Research Requirements
Page 35: Meeting Federal Research Requirements

Data Management Planning

• Data management plans describe how researchers will provide for long-term preservation of, and access to, scientific data in digital formats.

• Data management plans provide opportunities for researchers to manage and curate their data more actively from project inception to completion.

• See ICPSR's resource: Guidelines for Effective Data Management Plans

Page 36: Meeting Federal Research Requirements

The Status of Data Sharing

– Good data sharing exists!

– Good data sharing requires funding -sustainable funding!

– Sustainable funding for free public access remains a challenge

Page 37: Meeting Federal Research Requirements

Sustainable Data Sharing Models –Three to Explore

• Fee for access model (subscription model)

• Agency model (agency or foundation funds public access)

• Fee for deposit model (researcher writes fee into grant and pays at deposit to fund public access)

Page 38: Meeting Federal Research Requirements

I. Fee-for-Access Data Sharing• Funding is maintained by annual subscription fees charged to

institutions; individuals at subscribing institutions have free (open) access to data

• Pooled (ongoing) subscriber fees are used to acquire, curate, and maintain the service

• The service, open to everyone, is thus sustained by subscribers, but agencies indicate these models are not ‘open enough’ because of the access fees

Page 39: Meeting Federal Research Requirements

II. Agency-funded Data Sharing• Agency sponsors/funds (ongoing) data curation & sharing enabling the

public to access without charge

• The archive is hosted with a curation entity like ICPSR where the public can easily discover and access data and restricted-use data can also be securely shared

• Agency directs data selection and compliance policies

Page 40: Meeting Federal Research Requirements

III. Fee-for-Deposit Data Sharing• Depositor (individual or entity) pays for data to be

curated and stored – a fee at deposit

• Deposit fees should be written into the grant application

• Incoming deposit fees sustain the service and the professionals behind it

• Sustainability risk fairly high in this model as it depends upon:– Continuous influx of deposit fees

– Depositors to put allocated fees towards curation & sharing

• Data tends to be bit-level (not curated): WIDIWYG

Page 41: Meeting Federal Research Requirements

Fee for Deposit Services Arriving Daily! (tips for evaluating coming shortly)

Page 42: Meeting Federal Research Requirements

First: A Side-Note on Sharing Restricted-Use Data

• Data with disclosure risk –potential to identify a research subject

• Data with highly sensitive personal information

What is Restricted-Use Data?

Page 43: Meeting Federal Research Requirements

Common Objection/Misperception: “My data are too sensitive to share. . .”

• ICPSR has been sharing restricted-use data for over a decade. Three methods are used:– Secure Download

– Virtual Data Enclave

– Physical Enclave

• ICPSR stores & shares over 6,400 restricted-use datasets associated with over 2,000 ‘active’ restricted-use data agreements

Page 44: Meeting Federal Research Requirements

Reality: Restricted-use data can be effectively shared with the public

• Through the use of a virtual data enclave where the data never leave the server

• Where there is a process (and understanding!) to garner IRB approval from the requesting scientist’s university

• Where there is a system, technology, data professionals, and collaboration space in place to disseminate (expensive to build!)

• Because agencies do allow for an incremental charge to the data requestor to offset marginal costs

Page 45: Meeting Federal Research Requirements

Review of Public Data Sharing Services• Overview of public data sharing services we have

reviewed

– Some key strengths of each

• Disclaimer: ICPSR has recently launched a public access service (hosted)

– You’ll likely notice some bias when we talk about the strengths of openICPSR

– And because we built the service, we know much more about it

– Still, ICPSR’s public access service isn’t for everyone –more on that shortly

Page 46: Meeting Federal Research Requirements

Public Data Sharing Services

Page 47: Meeting Federal Research Requirements

openICPSR – www.openicpsr.org

Page 48: Meeting Federal Research Requirements

How is openICPSR unique?

openICPSR is a public data-sharing service:

• Where the deposit is reviewed by professional data curators who are experts in developing metadata (tags) for the social and behavioral sciences = discoverable

• With an immediate distribution network of over 750 institutionslooking for research data, that has powerful search tools, and a data catalog indexed by major search engines = usage

• Sustained by a respected organization with over 50 years of experience in reliably protecting research data = sustainable

• Prepared to accept and disseminate sensitive and/or restricted-use data in the public-access environment = protection of research subjects

Page 49: Meeting Federal Research Requirements

How will openICPSR disseminate sensitive data to the public?

• The deposit of sensitive (restricted-use) data is similar to the deposit of non-sensitive data except that the depositor will indicate that the data should be for restricted-use only

• Dissemination of sensitive data will be through ICPSR’s virtual data enclave; in this environment, data never leave the secure server and analysis takes place in the virtual space

• Scientists desiring to access the data will need to apply for the data and will pay an access fee

• openICPSR has already received sensitive (restricted-use) and dissemination of these data has begun

Page 50: Meeting Federal Research Requirements

openICPSR for Institutions and Journals

• Uses openICPSR platform

• Fully hosted in the ICPSR cloud – no tech or patches needed

• Branded with a logo and colors

• Deposits incorporated into ICPSR’s data catalog

• On-demand administrative usage tools

Page 51: Meeting Federal Research Requirements

A final note: openICPSR accepts research data from a wide array of disciplines/fields, but not all

Page 52: Meeting Federal Research Requirements

Tips for Evaluating a Data Sharing Service

• How will the service sustain itself? Does it have a long term funding stream?

• How will the service care for my data in the long term should the service fail? Is there a plan? A safety net?

• Can the service quickly maximize discoverability of my data? Does it explain how it will do so?

• Does the service have a network of interested researchers & students seeking data? Will my data get used?

• Does the service have knowledge of international archiving standards? • Does the service provide a DOI, data citation, and version control should I

need to update my files?• I have sensitive data or data with some disclosure risk to deposit. Does

the service understand how to secure it upon intake and when sharing? Does it have experience in this area?

Questions to consider when selecting a data sharing service:

Page 53: Meeting Federal Research Requirements

Resources for Creating Data Management Plans for Grant

Applications

Page 54: Meeting Federal Research Requirements

ICPSR’s Data Management & Curation Site

http://www.icpsr.umich.edu/datamanagement/

Page 55: Meeting Federal Research Requirements

Purpose of Data Management Plans

• Data management plans describe how researchers will provide for long-term preservation of, and access to, scientific data in digital formats.

• Data management plans provide opportunities for researchers to manage and curate their data more actively from project inception to completion.

Page 56: Meeting Federal Research Requirements

Data Management Plan Resources

Page 57: Meeting Federal Research Requirements

DMP Template Tool to Get You Started!

Page 59: Meeting Federal Research Requirements

And still more guidelines after the project is awarded:

• Guide emphasizes preparation for data sharing throughout the project

• Available online and via download (pdf)

Page 60: Meeting Federal Research Requirements

ICPSR Data Curation Training Workshops

• 1-5 day workshops on data curation/data repository management decisions

– Participants learn about best practices and tools for data curation, from selecting and preparing data for archiving to optimizing and promoting data for reuse

• Available via ICPSR Summer Program (Ann Arbor – July 27-31, 2015) or onsite at your institution

Page 61: Meeting Federal Research Requirements

Copies of these Slides & Use

• Feel free to share it; present it; cite it!

• Find copies of these slides on Slideshare.net

– Several notes and additional links are found in the notes view

Page 62: Meeting Federal Research Requirements

Get More information• Visit ICPSR’s Data Management &

Curation site: http://www.icpsr.umich.edu/datamanagement/index.jsp

• Contact us:

[email protected]

– (734) 647-2200

• More on Assuring Access to Scientific Data: white paper –“Sustaining Domain Repositories for Digital Data”