Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

32
Building Communities and Services in Support of Data-Intensive Research Stephen Abrams University of California Curation Center California Digital Library August 20, 2013

description

Information technology and resources are an integral and indispensable part of the contemporary academic enterprise. In particular, technological advances have nurtured a new paradigm of data-intensive research. However, far too much of this activity still takes place in silos, to the detriment of open scholarly inquiry, integrity, and advancement. To counteract this tendency, the University of California Curation Center (UC3) has been developing and deploying a comprehensive suite of curation services that facilitate widespread data management, preservation, publication, sharing, and reuse. Through these services UC3 is engaging with new communities of use: in addition to its traditional stakeholders in cultural heritage memory organizations, e.g., libraries, museums, and archives, the UC3 service suite is now attracting significant adoption by research projects, laboratories, and individual faculty researchers. This webinar will present an introduction to five specific services – DMPTool, DataUp, EZID, Merritt, Web Archiving Service (WAS) – applicable to data curation throughout the scholarly lifecycle, two recent initiatives in collaboration with UC campuses, UC Berkeley Research Hub and UC San Francisco DataShare, and the ways in which they encourage and promote new communities of practice and greater transparency in scholarly research.

Transcript of Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Page 1: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Building Communities and Services in Support of Data-Intensive Research

Stephen AbramsUniversity of California Curation Center

California Digital Library

August 20, 2013

Page 2: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Topics

Data curation UC3 services

DMPTool DataUp EZID Merritt WAS

Collaborative initiatives DataShare Research Hub

Conclusions

Page 3: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Why is data curation important?

Integrity Enabling appropriate scrutiny, debate, reproduction, and

verification of results

Efficiency Avoiding needless duplication of effort

Policy Complying with institutional policies, publication requirements,

and funder mandates

“[Data] is a valuable national asset whose value is multiplied when it is made easily accessible to the public”

– Office of Science and Technology Policy

Page 4: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Why is data curation important?

Catalyzing Promoting progress through new collaborations and creative

(re)use of data

“If I have seen further it is by standing on the shoulders of giants”– Isaac Newton, 1676

Page 5: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

What is the library’s role?

A continuation of its long-standing mission and practice to connect patrons with content of interest in meaningful ways across barriers of space and timeCf. Tenopir et al. (2012), “Academic librarians and research data services: Preparation and attitudes,” 78th IFLA

General Conference and Assembly, Helsinki, http://conference.ifla.org/past/ifla78/116-tenopir-en.pdf

Offering solutions that enhance the natural points of alignment between the scholarly research and information lifecycles

Publish

Reuse

ShareCreate

Discover

Collect

PreserveAccessResearchResearch CurationCuration

Scholarly lifecycle Information lifecycle

Page 6: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Why is data curation hard?

Ever increasing number, size, and diversity of content Inevitability of disruptive change Resources not keeping pace with growth Stakeholders outside of traditional cultural heritage domains,

with lots of questions

Who can give me advice on what I should do? How should I describe and package my data? How can I cite my data in order to receive

credit for it? How can I share my data? What can I do with web published data?

Page 7: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DMPTool – guidance and resourcesFinalist, 2012 DPC Award for Research and Innovation

http://dmptool.org/ Create, edit, and share data

management plans Meet funder requirements

Provide institutional guidance Links to local resources

Page 8: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DMPTool – guidance and resourcesFinalist, 2012 DPC Award for Research and Innovation

http://dmptool.org/ Create, edit, and share data

management plans Meet funder requirements

Provide institutional guidance Links to local resources

Page 9: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DMPTool – guidance and resourcesTwo recently funded projects Functional

enhancements and open source community developmentSloan Foundation

Training and outreachIMLS

http://dmptool.org/ New options for DMP

collaboration and formal and ad hoc review

Stronger administrative control and customization

Page 10: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DataUp – description and packaging

http://dataup.cdlib.org/ http://www.dataup.org/

“It’s easier to augment systems than it is to change behavior”

Curation for tabular datasets Excel add-in Azure cloud service

Page 11: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DataUp – description and packaging

http://dataup.cdlib.org/ http://www.dataup.org/

Best practices check Data description Identifier and citation generation Repository submission to

ONEShare

Curation for tabular datasets Excel add-in Azure cloud service

Page 12: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DataUp – description and packaging

http://dataup.cdlib.org/ http://www.dataup.org/

What researchers don’t need to know Schema definition and XML syntax Identifier registration procedures Citation format Repository packaging and submission Harvesting for aggregation

2013 Innovation Award winner

Recently funded project Functional enhancements and open source

community developmentNSF

Page 13: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

EZID – identification and citation

http://n2t.net/ezid/

UC3 is a founding member of the DataCite consortium

Mint DOI and ARK

Add descriptive metadata

Receive QR code Global resolution Aggregated

discovery Updatable

resolution URLs

Establish and maintain persistent two-way linkages between the literature and the data that underlies its results

Page 14: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

EZID – identification and citationUC3 is a founding member of the DataCite consortium

Mint DOI and ARK

Add descriptive metadata

Receive QR code Global resolution Updatable

resolution URLs

Link to dataset in repository

http://n2t.net/ezid/

Page 15: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

EZID – identification and citationUC3 is a founding member of the DataCite consortium

Mint DOI and ARK

Add descriptive metadata

Receive QR code Global resolution Updatable

resolution URLs

Link from dataset landing page to article citing the data

Page 16: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

EZID – identification and citationUC3 is a founding member of the DataCite consortium

Mint DOI and ARK

Add descriptive metadata

Receive QR code Global resolution Updatable

resolution URLs

Link from article back to dataset

Page 17: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

EZID – identification and citationUC3 is a founding member of the DataCite consortium

Aggregated discovery via DataShare and Ex Libris Primo Later this year, aggregation via T-R Data Citation Index

Page 18: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

EZID – identification and citationUC3 is a founding member of the DataCite consortium

SEI for public visibility in leading search engines

Page 19: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Merritt – preservation and access Content agnostic,

model free Micro-service

architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB

http://merritt.cdlib.org/

Enforceable Data Use Agreements (DUAs) in response to concerns over potential loss of control over dissemination and reuse

Open to the UC community and external partners

Dark archive for long-term assurance

Bright archive for sharing

Integration with preservation grids

Integration with public access portals

Integration with CMS

Page 20: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Merritt – preservation and access Content agnostic,

model free Micro-service

architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB

For curatorially-designated collections and objects, a download request triggers …

Open to the UC community and external partners

Dark archive for long-term assurance

Bright archive for sharing

Integration with preservation grids

Integration with public access portals

Integration with CMS

Page 21: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Merritt – preservation and access Content agnostic,

model free Micro-service

architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB

Open to the UC community and external partners

Dark archive for assurance

Bright archive for sharing

Integration with preservation grids

Integration with public access portals

Integration with CMS

Click-through DUA; acceptance of terms of use triggers …

Page 22: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Merritt – preservation and access Content agnostic,

model free Micro-service

architecture UI and RESTful API 26 curatorial units 271 collections 325,000 objects 450,000 versions 4,500,000 files 13 TB

Open to the UC community and external partners

Dark archive for assurance

Bright archive for sharing

Integration with preservation grids

Integration with public access portals

Integration with CMS

From: [email protected]: Merritt DUA acceptance

Name: Stephen AbramsAffiliation: California Digital LibraryCollection: UCSF DataShareObject: Frontotemporal Lobar Degeneration (FTLD)Date: 2013-05-31 09:50:34 PDTTerms of use: As part of this agreement, Consumer submits to the following statements: (1) I will receive access to de-identified data and will not attempt to establish the

identity of any of the study subjects.(2) I will share these data only with my immediate co-workers, and I will not transfer

these data to other research groups. I understand that these data are available to other research groups through the process by which I obtain them.

(3) I will require anyone in my group who utilizes these data, or anyone with whom I share these data to comply with this data use agreement

...

Email notification to consumer and curator Delivery of requested content

Page 23: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Web Archiving Service

http://was.cdlib.org/

Collect, describe, manage, preserve, and provide access to web sites

Analysis tools Full-text search 27 curatorial units 185 collections 10,772 web sites 97,121 captures 64 TB

“You can’t study life in our time without the Internet, so we must preserve it”

– René Vourburg, KB

Initially developed as part of the NDIIPP-funded Web at Risk project

The web has become the publication platform of choice

Source of important primary and secondary research data

Page 24: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Web Archiving Service

http://was.cdlib.org/

Collect, describe, manage, preserve, and provide access to web sites

Analysis tools Full-text search 27 curatorial units 185 collections 10,772 web sites 97,121 captures 64 TB

“You can’t study life in our time without the Internet, so we must preserve it”

– René Vourburg, KB

Initially developed as part of the NDIIPP-funded Web at Risk project

For example, California water district web sites supplement UC Davis source water assessment and protection (SWAP) Merritt collections

Page 25: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Connecting to communities of practice

Engage with new user communities where and how they already work

Shifting user roles, shifting expectations Institutional individual researcher Behavioral expectations set by the commercial/mobile web

Page 26: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DataShare – catalyzing science

UCSF Clinical and Translational Science Institutehttp://ctsi.ucsf.edu/

UCSF Libraryhttp://www.library.ucsf.edu/

UCSF Center for Imaging of Neurodegenerative Diseasehttp://www.radiology.ucsf.edu/cind/

http://datashare.ucsf.edu/

“Making data transparent and available is going to accelerate all of science; it's a relatively inexpensive way to get more value out of all of the work that we do”

– Michael Weiner, UCSF

Pilot project in biomedical imaging

“The goal is to catalyze widespread sharing of scientific research data”

Prepare Describe Upload Curate Discover Share

Page 27: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

DataShare – catalyzing science

UCSF-developed submission client, supporting intuitive drag & drop operation and metadata entry

EZID for DOIs; Merritt for preservation XTF-based faceted search/browse portal

http://xtf.cdlib.org/

http://datashare.ucsf.edu/

“Making data transparent and available is going to accelerate all of science; it's a relatively inexpensive way to get more value out of all of the work that we do”

– Michael Weiner, UCSF

Pilot project in biomedical imaging

“The goal is to catalyze widespread sharing of scientific research data”

Prepare Describe Upload Curate Discover Share

Page 28: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Research Hub – content mgmt and collaboration 3,900 users 770 projects Alfresco CMS

Desktop sync Mobile apps Abode Creative

Suite

Personal file management

Project collaboration

Departmental resource pooling

Research data sharing

“Powerful tools for management and collaboration”

Create Organize and enrich Keep safe Share

http://hub.berkeley.edu/

UC Berkeley Information Services &Technologieshttp://ist.berkeley.edu/

Page 29: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Research Hub – content mgmt and collaboration 3,900 users 770 projects Alfresco CMS

Desktop sync Mobile apps Abode Creative

Suite

Personal file management

Project collaboration

Departmental resource pooling

Research data sharing

“Powerful tools for management and collaboration”

Create Organize and enrich Keep safe Share

http://hub.berkeley.edu/

Primary discovery and access via Research Hub EZID for DOIs; Merritt for preservation Merritt access called for in succession plans

Page 30: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Data curation

“Access to and sharing of data are essential for the conduct and advancement of science”

— Arzberger et al. (2004), “Promoting access to public research data for scientific, economic, and social development,” Data Science Journal 3: 135-52, doi:10.2481/dsj.3.135

Pro-active curation of research outputs is necessary to ensure their ongoing viability and use

Good for research; good for researchers Quicker, more innovative science; higher impact factor

Increasingly necessary for conformance to institutional policies, publication requirements, and funder mandates

Page 31: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

Data curation

Widespread adoption is dependent on outreach, education, and minimal intrusion into existing disciplinary workflows and common community practices

The most effective – and sustainable – curation services are composed from best-of-breed components

Libraries are a natural curation partner for the research community

Page 32: Uc3 pasig-asis&t-2013-08-20-support-of-data-intensive-research

For more information UC Curation Center

http://www.cdlib.org/uc3/[email protected] Abrams David LoyPatricia Cruse Mark ReyesShirin Faenza Joan StarrScott Fisher Carly StrasserErik Hetzner Marisa StrongJoshua Hubbard Bhavitavya VedulaGreg Janée Kenneth WeissJohn Kunze Perry WilletRosalie Lack

DataSharehttp://datashare.ucsf.edu/Geoffrey Boushey Megan LauranceAnirvan Chatterjee Angela Rizk-JacksonManinder Kahlon Michael WeinerJulia Kochi

Research Hubhttp://hub.berkeley.edu/Ian Crew Patrick McGrathMichael McCarthy Noah Wittman