(Toward) Making Data Management Easy

52
Making Data Management Easy ALA Annual 2011 Joan Starr University of California Curation Center California Digital Library Toward…

description

Data Management Presentation at ALA Annual to ACRL STS Hot Topics mtg

Transcript of (Toward) Making Data Management Easy

Page 1: (Toward) Making Data Management Easy

Making Data Management

Easy A L A A n n u a l 2 0 1 1

J o a n S t a r rU n i v e r s i t y o f C a l i f o r n i a C u r a ti o n C e n t e r

C a l i f o r n i a D i g i t a l L i b r a r y

Toward…

Page 2: (Toward) Making Data Management Easy

STS Programs are sponsored by:

HOT TOPICS DISCUSSION GROUP

Page 3: (Toward) Making Data Management Easy

• Introductions• The research life cycle• Some examples from CDL/UC3 (curation micro-

services and more!)• …with a focus on EZID• Discussion/Questions

Page 4: (Toward) Making Data Management Easy

California Digital Library (CDL)

Page 5: (Toward) Making Data Management Easy

University of California Curation Center, California Digital Library

Page 6: (Toward) Making Data Management Easy

Research has a life cycle.

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Page 7: (Toward) Making Data Management Easy

Librarians can jump in at any point.

Ims.photo: http://www.flickr.com/photos/bigblackbox/4805557065/

Page 8: (Toward) Making Data Management Easy

TOOLS & SERVICES• To enable data

preservation• To bake data curation

into data creation• To enhance data sharing,

collecting and gathering• To facilitate data publication

PARTNERSHIPS• To promote data discovery and access• To help researchers comply with new requirements

What this means for Data Management

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 9: (Toward) Making Data Management Easy

TOOLS & SERVICES• Micro-services & Merritt• DCXL• WAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

Examples from CDL & UC3

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 10: (Toward) Making Data Management Easy

Curation Micro-services

Individualsmall & self-containedcomponentsin custom combinationscan solve complex problems.

photo by Joan Starr

Page 11: (Toward) Making Data Management Easy

• Persistent identifiers• Persistent storage• Fixity• Replication• Characterization• Discovery• Transformation• Notification• Annotation

Building blockshttps://confluence.ucop.edu/display/Curation/Home

Windell Oskay: http://www.flickr.com/photos/oskay/265899811

Page 12: (Toward) Making Data Management Easy

• Persistent identifiers• Persistent storage• Fixity• Replication• Characterization• Discovery• Transformation• Notification• Annotation

Version 2

Merritt is: Micro-services “Off the Shelf”

http://www.cdlib.org/services/uc3/merritt

EZIDCAN/Pairtree/Dflat/ReDDFixityReplicationJHOVE2XTF

Page 13: (Toward) Making Data Management Easy

Merritt repository

Preservation back-end for existing discovery services

Dark archive for preservation masters

Integration with distributed data gridsBright archive for

preservation and end-user access

Page 14: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & Merritt• DCXL• WAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

PRESERVE

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

From CDL/UC3

Page 15: (Toward) Making Data Management Easy

WHY EXCEL?

• CON: poor feature set and scalability compared to DBMSs

• PRO: ubiquity, familiarity, ease-of-use

DCXL: Data Curation Excel

Cody Simms: http://www.flickr.com/photos/jcodysimms/246023851

Page 16: (Toward) Making Data Management Easy

What an Excel add-in could do

• Permit standardized column headers• Versioning and standard date formats• Auto-archiving and persistent id assignment• “Speed bumps” to discourage macros et al.

• NOTE: This will be released as OPEN SOURCE!

Page 17: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & MerrittDCXL• WAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

From CDL/UC3

CREATE

PRESERVE

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 18: (Toward) Making Data Management Easy

Web Archiving Service snapshot

Stats: Since January 200721 organizations using service4,681 sites captured44,468 captures run26.4 terabytes100 + archives under construction35 archives published

In partnership with the IIPC consortium of national libraries.

Page 19: (Toward) Making Data Management Easy

Archiving the Gulf oil spillImproving support for collaboration

946 sites8,400 + captures1.3 TBBegan May 5

Page 20: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & MerrittDCXLWAS• Data Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

From CDL/UC3

SHARE

CREATE

GATHERPRESERVE

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 21: (Toward) Making Data Management Easy

The Data Paper Model

• Minimal: a cover sheet and a set of links to archived artifacts

• Best practice: citation elements (including persistent identifier)

Kevin Steele: http://www.flickr.com/photos/kevinsteele/20631162 /

Page 22: (Toward) Making Data Management Easy

The Data Paper Model

1. Cover sheet with citation data

2. title, date, authors, abstract, and persistent identifier (DOI, ARK, etc.)

Page 23: (Toward) Making Data Management Easy

• A data journal– Incorporation of elements to enrich discovery, re-use,

and archiving

– Discipline specific

– Peer reviewed

The Data Paper Model

Page 24: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper model• EZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

From CDL/UC3

SHARE

CREATE

GATHER

PUBLISH

PRESERVE

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 25: (Toward) Making Data Management Easy

An article about data, but no data

Page 26: (Toward) Making Data Management Easy

FTP site

And then the hunt for the data…

Page 27: (Toward) Making Data Management Easy

University of California Curation Center, California Digital Library

The EZID difference: data linked…

Page 28: (Toward) Making Data Management Easy

University of California Curation Center, California Digital Library

…to the scholarly publication

Page 29: (Toward) Making Data Management Easy

• Create a persistent identifier: DOI or ARK• Add object location• Add metadata• Update object location• Update object metadata

Page 30: (Toward) Making Data Management Easy

Meeting researcher needs

• Early in the research life cycle• Working on a federated team• Making a career move• Meeting funder requirements

Page 31: (Toward) Making Data Management Easy

Data-intensive research Writing up the results+

Where’s the data? What if I

move it?

Early in the research life cycle

With EZID: all your references, citations, links, etc. will be stable!

by Dave Rogers http://www.flickr.com/photos/dave-rogers/2815036285/

Page 32: (Toward) Making Data Management Easy

Working on a federated team

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5405812887

Data-intensive research + Regional research center

+ Aging infrastructure

Where’s the data? We have to

move it!

With EZID: all your references, citations, links, etc. will be stable!

Page 33: (Toward) Making Data Management Easy

Making a career move

• Data-intensive research +

I know where my data is and I’m

taking it with me!

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5406308654

• Researcher(s) on the move

With EZID: all your references, citations, links, etc. will be stable!

Page 34: (Toward) Making Data Management Easy

Meeting funder requirements

• Data-intensive research + • Grantor requirements for data management plan

How do we track the data?

What do we put here?

With EZID: track your data from capture to publication and beyond.

By David Mellis, http://www.flickr.com/photos/mellis/7675610/

Page 35: (Toward) Making Data Management Easy

Working with Libraries & Data Centers

• Libraries– Extending an historic role

• Data Centers & Publishers– Providing workflows and standards

Page 36: (Toward) Making Data Management Easy

EZID: Meeting library needs

• New kinds of scholarlyoutput

+ • Continued need to build collections

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5098256828

With EZID: you can extend your historic activities & preserve your institution’s research investment.

How do we keep track of all this new stuff?

Page 37: (Toward) Making Data Management Easy

EZID: Meeting data center needs

©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5325618610

• New demands for storage

• Changing landscape+

With EZID: use simple tools, and easy workflows. Work with international standards.

They want what?

When?

Page 38: (Toward) Making Data Management Easy

http://n2t.net/ezid/

Page 39: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID

PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool

Examples from CDL/UC3

SHARE

CREATE

GATHER

PUBLISH

PRESERVE

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 40: (Toward) Making Data Management Easy

enable new science and knowledge creation through universal access to data about life on earth and the

environment that sustains it

1. Build on existing cyberinfrastructure

2. Create new cyberinfrastructure

3. Create new communities of practice

Working at the Network Level

Page 41: (Toward) Making Data Management Easy

DataONE’s new infrastructurehttps://www.dataone.org/

Page 42: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID

PARTNERSHIPSDataONE & DataCite• Data Management Plan Tool

From CDL/UC3

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 43: (Toward) Making Data Management Easy

Data Management Plan Toolhttps://bitbucket.org/dmptool/main/wiki/Home

• Collaborative effort• Funders’ data mgmt/sharing polices• Journals’ (Nature, Science, and PLoS) data sharing requirements. • Researchers– Distributing research results leads to increased citations

(Piwowar et al., 2007)– A shared, common data set may help researchers

collaborate and accelerate discoveries (NY Times, 2010). – Better organization, leading to easier preservation– Cultivate quality and efficiency

Thanks to Jeffrey Loo, Chemical Informatics Librarian, UCB

Page 44: (Toward) Making Data Management Easy

University of California Curation Center, California Digital Library

Home screen: once the user has logged in presented with a view of their work and options

1.

2.

3.

University of California

Libraries

Page 45: (Toward) Making Data Management Easy

University of California Curation Center, California Digital Library

1.

2.

3.

University of California

Libraries

Page 46: (Toward) Making Data Management Easy

TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID

PARTNERSHIPSDataONE & DataCiteData Management Plan Tool

From CDL/UC3

DISCOVERSHARE

CREATE

GATHER

PUBLISH

PRESERVE

ACCESS

COLLECT

Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/

Page 47: (Toward) Making Data Management Easy

Summary: Just how easy is it for you?

• Build your own (Curation micro-services)– specs

– code

• Open source tools– DCXL

– Data Management Plan tool

• Off the shelf options– Merritt

– EZID

– WASliquidnight: http://www.flickr.com/photos/liquidnight/3101493460/

Page 48: (Toward) Making Data Management Easy

& how easy is it for researchers?

• For organizing their data– DCXL , EZID

• To keep their data safe– Merritt, Micro-services

• To help them get grants – Data Management Plan tool

• To help get their worknoticed– EZID, Data Papers

• To help them find otherdata– EZID, Data Papers

liquidnight: http://www.flickr.com/photos/liquidnight/3101493460/

TOOLS!

Page 49: (Toward) Making Data Management Easy

• CURATECamp: unconference events connecting practitioners & technologists interested in digital curation and data management.

• Next f2f event: August 15 – 16, 2011Stanford University, Palo Alto, California

• http://www.regonline.com/Register/Checkin.aspx?EventID=953543

• http://groups.google.com/group/digital-curation

• http://curatecamp.org/

But wait, there’s more: Community!

courtesy of Oxnard Public Library, http://content.cdlib.org/ark:/13030/kt6c600758

Page 50: (Toward) Making Data Management Easy

and more information!

UC Curation Centerhttp://www.cdlib.org/[email protected]

EZIDhttp://n2t.net/ezid/

Micro-serviceshttp://www.cdlib.org/uc3/curationhttp://groups.google.com/group/digital-curation

UC3/CDLStephen Abrams David LoyPatricia Cruse Lisa ColvinScott Fisher Mark Reyes Erik Hetzner Tracy Seneca Greg JanéeJoan StarrJohn KunzeMarisa StrongMargaret Low Perry Willett

Page 51: (Toward) Making Data Management Easy

…and here’s how to find me.

Joan [email protected]

@joan_starrhttp://www.slideshare.net/joanstarr

Page 52: (Toward) Making Data Management Easy

Image credits for Opening Slide

Optical Shop, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379477315Streetcar, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379459127Jazz Gumbo, Adam Reeder, http://www.flickr.com/photos/adamreeder/5380083448Streetcar, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379459127Boat, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379429155Garden, ncpttmedia, http://www.flickr.com/photos/ncpttmedia/4008605841Shutters, OZinOH, http://www.flickr.com/photos/75905404@N00/379444291

}