(Toward) Making Data Management Easy
-
Upload
joan-starr -
Category
Technology
-
view
4.148 -
download
2
description
Transcript of (Toward) Making Data Management Easy
Making Data Management
Easy A L A A n n u a l 2 0 1 1
J o a n S t a r rU n i v e r s i t y o f C a l i f o r n i a C u r a ti o n C e n t e r
C a l i f o r n i a D i g i t a l L i b r a r y
Toward…
STS Programs are sponsored by:
HOT TOPICS DISCUSSION GROUP
• Introductions• The research life cycle• Some examples from CDL/UC3 (curation micro-
services and more!)• …with a focus on EZID• Discussion/Questions
California Digital Library (CDL)
University of California Curation Center, California Digital Library
Research has a life cycle.
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
DISCOVERSHARE
CREATE
GATHER
PUBLISH
PRESERVE
ACCESS
COLLECT
Librarians can jump in at any point.
Ims.photo: http://www.flickr.com/photos/bigblackbox/4805557065/
TOOLS & SERVICES• To enable data
preservation• To bake data curation
into data creation• To enhance data sharing,
collecting and gathering• To facilitate data publication
PARTNERSHIPS• To promote data discovery and access• To help researchers comply with new requirements
What this means for Data Management
DISCOVERSHARE
CREATE
GATHER
PUBLISH
PRESERVE
ACCESS
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
TOOLS & SERVICES• Micro-services & Merritt• DCXL• WAS• Data Paper model• EZID
PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool
Examples from CDL & UC3
DISCOVERSHARE
CREATE
GATHER
PUBLISH
PRESERVE
ACCESS
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
Curation Micro-services
Individualsmall & self-containedcomponentsin custom combinationscan solve complex problems.
photo by Joan Starr
• Persistent identifiers• Persistent storage• Fixity• Replication• Characterization• Discovery• Transformation• Notification• Annotation
Building blockshttps://confluence.ucop.edu/display/Curation/Home
Windell Oskay: http://www.flickr.com/photos/oskay/265899811
• Persistent identifiers• Persistent storage• Fixity• Replication• Characterization• Discovery• Transformation• Notification• Annotation
Version 2
Merritt is: Micro-services “Off the Shelf”
http://www.cdlib.org/services/uc3/merritt
EZIDCAN/Pairtree/Dflat/ReDDFixityReplicationJHOVE2XTF
Merritt repository
Preservation back-end for existing discovery services
Dark archive for preservation masters
Integration with distributed data gridsBright archive for
preservation and end-user access
TOOLS & SERVICESMicro-services & Merritt• DCXL• WAS• Data Paper model• EZID
PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool
PRESERVE
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
From CDL/UC3
WHY EXCEL?
• CON: poor feature set and scalability compared to DBMSs
• PRO: ubiquity, familiarity, ease-of-use
DCXL: Data Curation Excel
Cody Simms: http://www.flickr.com/photos/jcodysimms/246023851
What an Excel add-in could do
• Permit standardized column headers• Versioning and standard date formats• Auto-archiving and persistent id assignment• “Speed bumps” to discourage macros et al.
• NOTE: This will be released as OPEN SOURCE!
TOOLS & SERVICESMicro-services & MerrittDCXL• WAS• Data Paper model• EZID
PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool
From CDL/UC3
CREATE
PRESERVE
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
Web Archiving Service snapshot
Stats: Since January 200721 organizations using service4,681 sites captured44,468 captures run26.4 terabytes100 + archives under construction35 archives published
In partnership with the IIPC consortium of national libraries.
Archiving the Gulf oil spillImproving support for collaboration
946 sites8,400 + captures1.3 TBBegan May 5
TOOLS & SERVICESMicro-services & MerrittDCXLWAS• Data Paper model• EZID
PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool
From CDL/UC3
SHARE
CREATE
GATHERPRESERVE
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
The Data Paper Model
• Minimal: a cover sheet and a set of links to archived artifacts
• Best practice: citation elements (including persistent identifier)
Kevin Steele: http://www.flickr.com/photos/kevinsteele/20631162 /
The Data Paper Model
1. Cover sheet with citation data
2. title, date, authors, abstract, and persistent identifier (DOI, ARK, etc.)
• A data journal– Incorporation of elements to enrich discovery, re-use,
and archiving
– Discipline specific
– Peer reviewed
The Data Paper Model
TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper model• EZID
PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool
From CDL/UC3
SHARE
CREATE
GATHER
PUBLISH
PRESERVE
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
An article about data, but no data
FTP site
And then the hunt for the data…
University of California Curation Center, California Digital Library
The EZID difference: data linked…
University of California Curation Center, California Digital Library
…to the scholarly publication
• Create a persistent identifier: DOI or ARK• Add object location• Add metadata• Update object location• Update object metadata
Meeting researcher needs
• Early in the research life cycle• Working on a federated team• Making a career move• Meeting funder requirements
Data-intensive research Writing up the results+
Where’s the data? What if I
move it?
Early in the research life cycle
With EZID: all your references, citations, links, etc. will be stable!
by Dave Rogers http://www.flickr.com/photos/dave-rogers/2815036285/
Working on a federated team
©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5405812887
Data-intensive research + Regional research center
+ Aging infrastructure
Where’s the data? We have to
move it!
With EZID: all your references, citations, links, etc. will be stable!
Making a career move
• Data-intensive research +
I know where my data is and I’m
taking it with me!
©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5406308654
• Researcher(s) on the move
With EZID: all your references, citations, links, etc. will be stable!
Meeting funder requirements
• Data-intensive research + • Grantor requirements for data management plan
How do we track the data?
What do we put here?
With EZID: track your data from capture to publication and beyond.
By David Mellis, http://www.flickr.com/photos/mellis/7675610/
Working with Libraries & Data Centers
• Libraries– Extending an historic role
• Data Centers & Publishers– Providing workflows and standards
EZID: Meeting library needs
• New kinds of scholarlyoutput
+ • Continued need to build collections
©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5098256828
With EZID: you can extend your historic activities & preserve your institution’s research investment.
How do we keep track of all this new stuff?
EZID: Meeting data center needs
©All rights reserved by University of California, http://www.flickr.com/photos/universityofcalifornia/5325618610
• New demands for storage
• Changing landscape+
With EZID: use simple tools, and easy workflows. Work with international standards.
They want what?
When?
http://n2t.net/ezid/
TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID
PARTNERSHIPS• DataONE & DataCite• Data Management Plan Tool
Examples from CDL/UC3
SHARE
CREATE
GATHER
PUBLISH
PRESERVE
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
enable new science and knowledge creation through universal access to data about life on earth and the
environment that sustains it
1. Build on existing cyberinfrastructure
2. Create new cyberinfrastructure
3. Create new communities of practice
Working at the Network Level
DataONE’s new infrastructurehttps://www.dataone.org/
TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID
PARTNERSHIPSDataONE & DataCite• Data Management Plan Tool
From CDL/UC3
DISCOVERSHARE
CREATE
GATHER
PUBLISH
PRESERVE
ACCESS
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
Data Management Plan Toolhttps://bitbucket.org/dmptool/main/wiki/Home
• Collaborative effort• Funders’ data mgmt/sharing polices• Journals’ (Nature, Science, and PLoS) data sharing requirements. • Researchers– Distributing research results leads to increased citations
(Piwowar et al., 2007)– A shared, common data set may help researchers
collaborate and accelerate discoveries (NY Times, 2010). – Better organization, leading to easier preservation– Cultivate quality and efficiency
Thanks to Jeffrey Loo, Chemical Informatics Librarian, UCB
University of California Curation Center, California Digital Library
Home screen: once the user has logged in presented with a view of their work and options
1.
2.
3.
University of California
Libraries
University of California Curation Center, California Digital Library
1.
2.
3.
University of California
Libraries
TOOLS & SERVICESMicro-services & MerrittDCXLWASData Paper modelEZID
PARTNERSHIPSDataONE & DataCiteData Management Plan Tool
From CDL/UC3
DISCOVERSHARE
CREATE
GATHER
PUBLISH
PRESERVE
ACCESS
COLLECT
Judy Baxter, http://www.flickr.com/photos/judybaxter/9825836/
Summary: Just how easy is it for you?
• Build your own (Curation micro-services)– specs
– code
• Open source tools– DCXL
– Data Management Plan tool
• Off the shelf options– Merritt
– EZID
– WASliquidnight: http://www.flickr.com/photos/liquidnight/3101493460/
& how easy is it for researchers?
• For organizing their data– DCXL , EZID
• To keep their data safe– Merritt, Micro-services
• To help them get grants – Data Management Plan tool
• To help get their worknoticed– EZID, Data Papers
• To help them find otherdata– EZID, Data Papers
liquidnight: http://www.flickr.com/photos/liquidnight/3101493460/
TOOLS!
• CURATECamp: unconference events connecting practitioners & technologists interested in digital curation and data management.
• Next f2f event: August 15 – 16, 2011Stanford University, Palo Alto, California
• http://www.regonline.com/Register/Checkin.aspx?EventID=953543
• http://groups.google.com/group/digital-curation
• http://curatecamp.org/
But wait, there’s more: Community!
courtesy of Oxnard Public Library, http://content.cdlib.org/ark:/13030/kt6c600758
and more information!
UC Curation Centerhttp://www.cdlib.org/[email protected]
EZIDhttp://n2t.net/ezid/
Micro-serviceshttp://www.cdlib.org/uc3/curationhttp://groups.google.com/group/digital-curation
UC3/CDLStephen Abrams David LoyPatricia Cruse Lisa ColvinScott Fisher Mark Reyes Erik Hetzner Tracy Seneca Greg JanéeJoan StarrJohn KunzeMarisa StrongMargaret Low Perry Willett
Image credits for Opening Slide
Optical Shop, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379477315Streetcar, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379459127Jazz Gumbo, Adam Reeder, http://www.flickr.com/photos/adamreeder/5380083448Streetcar, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379459127Boat, Adam Reeder, http://www.flickr.com/photos/adamreeder/5379429155Garden, ncpttmedia, http://www.flickr.com/photos/ncpttmedia/4008605841Shutters, OZinOH, http://www.flickr.com/photos/75905404@N00/379444291
}