Final data presentation_clir_july2014

42
Data Curation and Data Curation Services in Academic Libraries - an Overview Patricia Hswe | Pennsylvania State University Libraries Digital Content Strategist Head, ScholarSphere User Services CLIR Postdoctoral Fellowship Program Summer Seminar Bryn Mawr College | 30 July 2014 1

description

This presentation was given at Bryn Mawr College to the 2014-2016 cohort of CLIR postdoctoral fellows.

Transcript of Final data presentation_clir_july2014

Page 1: Final data presentation_clir_july2014

Data Curation and Data Curation Services in Academic Libraries - an Overview

Patricia Hswe | Pennsylvania State University Libraries Digital Content Strategist Head, ScholarSphere User Services

CLIR Postdoctoral Fellowship Program Summer Seminar Bryn Mawr College | 30 July 2014

1

Page 2: Final data presentation_clir_july2014

My ContextHow I Got Here

2

“Cubicle wall” - by sidewalk flying CC BY 2.0

Page 3: Final data presentation_clir_july2014

First CLIR postdoc cohortU. Illinois at Urbana-Champaign: Slavic & East European Library, 2004-2006

Slavic Digital Humanities Fellow

Digital projects, reference services, workshops Summer 2004

3

Page 4: Final data presentation_clir_july2014

Library and information science school - GSLIS

Digital Library track

Document Modeling; Use and Users; Interfaces to Information Systems; Digital Libraries; Information Modeling; Metadata; Information Transfer & Collaboration in Science; Ontology Development

Graduate and research assistantships - Digital preservation projects, Engineering and Mathematics libraries

4

Page 5: Final data presentation_clir_july2014

National Digital Information & Infrastructure Preservation

Program

Publishing and Curation Services Penn State University Libraries

{ Since 2008 }

5

Page 6: Final data presentation_clir_july2014

Template for the Talk

What is data curation?

Curation is not a solo undertaking.

Developing a repository service - one piece of the data curation puzzle.

Suggested essentials for getting started.

6

Page 7: Final data presentation_clir_july2014

What is data curation?What is data? What isn’t data curation? Purposes.

7

“20080528-002corrupted”- by nvshn CC BY SA 2.0

Page 8: Final data presentation_clir_july2014

Research data may be -

Observational (e.g., sensor data, data from surveys)

Experimental (e.g., gene sequencing data)

Simulation (e.g., climate modeling data)

Derived or compiled (e.g., text mining, 3D models)

Reference or canonical (e.g., static, peer-reviewed data sets, likely published and curated)

(from U. of Edinburgh, Information Services)

8

Page 9: Final data presentation_clir_july2014

Examples of data formats -

Text

Numerical

Multimedia

Models

9

Data specific to a discipline (e.g., Chrystallographic Information File, or CIF)

Data specific to an instrument (e.g., images from a digital microscope)

from U. Edinburgh, Information Services

Page 10: Final data presentation_clir_july2014

10“Walt Whitman’s Leaves of Grass” by dbking CC BY 2.0

Page 11: Final data presentation_clir_july2014

11

Page 12: Final data presentation_clir_july2014

What isn’t data curation?

Only backing up data

Not equal to data management, though overlaps

Digital preservation

Vs. digital curation

12

“Data” - by UWW ResNET CC BY-NC-SA 2.0

Page 13: Final data presentation_clir_july2014

13

Transformation (migration, derivatives)

Collection of data (selection and appraisal)

Documentation / description (metadata)

Ingest / preservation / storage

Dissemination / sharing access / use / reuse

Standards

Policies, rights,

conditions

I N T E R O P E R A B I L I T Y

“Gulliver’s collections” by Yohan Creemers CC BY-SA 2.0

INFRASTRUCTURE

Page 14: Final data presentation_clir_july2014

Examples of data curation activities -

Organizing data

Reformatting, compressing (e.g., .tar, .zip), normalizing

Planning for long-term security

Protecting, redacting, anonymizing data

Applying licenses to data

Providing data citation guidance

14

“Clean up, or you’re out! :Brooklyn Street Sign” - by emilydickinsonridesabmx CC BY 2.0

Page 15: Final data presentation_clir_july2014

Purposes: proper curation of research data should enable -

Verifiability

Understanding of data use

Identification of responsibility

“Findability”

New research

Improved guidelines, policies

15

“Ecosystem” - by moizissimo CC BY-NC-ND 2.0

Page 16: Final data presentation_clir_july2014

Keep in mind: use cases

Researchers’ goals

Current research workflows and desired workflows

Role / context of “data keeper”

Intended uses

16

Primary users of the data

Types of access

Frequency of access / use

Potential outcomes / impacts

Page 17: Final data presentation_clir_july2014

Curation is not a solo undertakingThinking beyond models

17“Soloist” - by Chris JL CC BY-NC-ND 2.0

Page 18: Final data presentation_clir_july2014

18

Plan research/ design methodology

Collect/Generate data

Document/Describe

Process/Analyze data

Preserve (Prepare for ingest)

Store data

Share/Access data

Create derivatives / reuse

A mashup of a few models

Page 19: Final data presentation_clir_july2014

19

Plan research/ design methodology

Collect/Generate data

Document/Describe

MANAGE DATA

Process/Analyze data

Preserve (Prepare for ingest)

MANAGE DATA

Store data MANAGE DATA

Share/Access data MANAGE DATA

Create derivatives / reuse

MANAGE DATA

Page 20: Final data presentation_clir_july2014

20

Plan research/ design methodology

Collect/Generate data

Document/Describe

MANAGE DATA

Process/Analyze data

Preserve (Prepare for ingest)

MANAGE DATA

Store data MANAGE DATA

Share/Access data MANAGE DATA

Create derivatives / reuse

MANAGE DATA C O L L A B O R A T I O N

C O L L A B O R A T I O N

Researchers

Researchers

Researchers

ResearchersData curators /

librarians, ITData curators /

librarians, IT

Data curators / librarians, IT

Data curators / librarians, IT

Page 21: Final data presentation_clir_july2014

21

Plan research/ design methodology

Collect/Generate data

Document/Describe

MANAGE DATA

Process/Analyze data

Preserve (Prepare for ingest)

MANAGE DATA

Store data MANAGE DATA

Share/Access data MANAGE DATA

Create derivatives / reuse

MANAGE DATA C O L L A B O R A T I O N

C O L L A B O R A T I O N

Researchers

Researchers

Researchers

ResearchersData curators /

librarians, IT

Publishers

Data curators / librarians, IT

Data curators / librarians, IT

Data curators / librarians, IT

Page 22: Final data presentation_clir_july2014

22

Plan research/ design methodology

Collect/Generate data

Document/Describe

MANAGE DATA

Process/Analyze data

Preserve (Prepare for ingest)

MANAGE DATA

Store data MANAGE DATA

Share/Access data MANAGE DATA

Create derivatives / reuse

MANAGE DATA C O L L A B O R A T I O N

C O L L A B O R A T I O N

Researchers

Researchers

Researchers

ResearchersData curators /

librarians, IT

Publishers

Data curators / librarians, IT

Data curators / librarians, IT

Data curators / librarians, IT

Users of the

data!

Page 23: Final data presentation_clir_july2014

Curation involves building and sustaining -

Community Partnerships Integration

toward solutions

23

“Relationships” - by beatnic CC BY 2.0

Page 24: Final data presentation_clir_july2014

Examples of libraries leading in data curation services

Purdue University Libraries

The Library - UC San Diego

NYPL Labs

University of Minnesota Libraries

24

Page 25: Final data presentation_clir_july2014

Developing a Repository ServiceOne piece of the data curation puzzle

25

Page 26: Final data presentation_clir_july2014

ScholarSphere didn’t happen overnight.

26

Page 27: Final data presentation_clir_july2014

Text

Platform review, pilot project, prototype2010 through 2011

27

Building a Community of Practice: The Curation Architecture Prototype Services (CAPS) Project at Penn State University

!BACKGROUND:!Review!of!Legacy!Pla:orms! !Myriad,!unconnected!workflows!

 !Two!pla6orms!effec9vely!moribund!

 !Another!pla6orm!without!upgrades!since!2007!

 !Impossible!to!search!content!across!pla6orms!

 !No!support!for!eErecords,!liFle!for!research!data!sets! !Lack!of!unifying!architecture!

Co#authors:+Michelle+Belden,+Kevin+Clair,+Daniel+Coughlin,++Michael+Giarlo,+Patricia+Hswe,+and+Linda+Klimczyk++

CAPS%architecture%–%our%point%of%departure%

Dashboard%interface%giving%view%of%objects%in%system%%

Wiki%where%stakeholders%documented%testing%of%CAPS%

How!Prototyping!a!CuraCon!Service!Led!to!Building!a!Community!of!PracCce!–!The!Story!of!CAPS!

 !Aggressive!Cme!line!–!January!to!March!2011!

 !Key!project!roles:!project!manager,!digital!library!architect,!programmer!analyst,!metadata!

librarian,!access!archivist,!and!digital!collec9ons!curator!

 !Use!cases!kickEstarted!prototyping! !Daily!meeCngs!allowed!team!to!check!in!briefly!for!updates!&!discussion!

 !Weekly!mee9ngs!with!stakeholders!(archivists,!subject!experts,!digi9za9on!staff)!!

 !Stakeholders!were!consulted!early!on!for!metadata!needs,!with!concerted!aFen9on!to!crea9ng!a!framework!allowing!for!annota9on!of!objects!throughout!their!lifecycle!!

 !Followed!agile!methods!to!drive!development,!collabora9on,!feedback,!itera9ons!

 !Listened!to!&!learned!from!stakeholders,!being!as!responsive!as!possible! !User!requirements!were!prioriCzed,!but!all!of!them!were!documented;!no!need!was!dismissed!–!both!to!inform!future!development!but!also!to!assure!stakeholders!their!

perspec9ves!counted!and!were!crucial!to!future!development!and!itera9ons!

 !Survey!of!stakeholders!at!project’s!end!revealed!they!felt!listened!to!by!the!project!team,!

and!the!resul9ng!prototype!reflected!their!requirements!and!deliverables!

 !Next!steps:!chart!a!produc9on!path!and!priori9ze!pilot!projects!for!further!tes9ng!

Prototype%of%public%interface%for%curated%collections%

Ingest%tool%interface%to%upload%and%describe%digital%object%

Digital!Library!Technologies!

CAPS Poster - https://scholarsphere.psu.edu/files/r781wj67c

Page 28: Final data presentation_clir_july2014

In 2012: We did 9 months of development before beta release . . .

28

Page 29: Final data presentation_clir_july2014

. . . 9 months during which we prioritized user engagement

Enlisted liaison librarians

Held bi-weekly “stakeholder” meetings to discuss use cases

Met with researchers to gauge their needs, learn their use cases

Demoed the evolving tool, got feedback

Frequent communications about progress

Usability testing w/ librarians, faculty & students; developers present

29

“011 Researcher by Equipment” - by IPASadelaide CC BY 2.0

Page 30: Final data presentation_clir_july2014

30

Page 31: Final data presentation_clir_july2014

31

Page 32: Final data presentation_clir_july2014

32

Page 33: Final data presentation_clir_july2014

Uses of ScholarSphere include -

Supplemental files for electronic theses and dissertations (ETDs)

Compliance with NSF, NIH, and other funding agency requirements

“Legacy” data sets - i.e., “I need to get these off my hard drive” data files

Data sets linked to journal articles (publisher requirement)

Student research (“scholarship as pedagogy”)

Service / platform to help diversify what Libraries collect33

Page 34: Final data presentation_clir_july2014

34

Page 35: Final data presentation_clir_july2014

35

Page 36: Final data presentation_clir_july2014

We aimed for breadth, rather than depth, initially.Because we knew we’d be evolving and improving the service continually. And we are.

36

Page 37: Final data presentation_clir_july2014

37

!!!- 10 releases since 2012 - Next release will be ScholarSphere 2.0 in fall 2014: new UI/UX - Expectation management is a thing!

Page 38: Final data presentation_clir_july2014

How engagement is paying off

Inquiries from researchers looking for help with managing their data >> new partners.

Subject specialist inquiries about the service >> more faculty and student users.

Inquiries from departments to use service for student scholarship >> more content.

Feature requests from users >> improved service.

Librarians + researchers understand better the importance of data & data curation >> opportunities for infrastructure, collections, publishing, partnerships, outreach, new roles.

38

Page 39: Final data presentation_clir_july2014

Current work, next stepsCurrent: ScholarSphere Users Group and incremental feedback on UI/UX work

Usage statistics

More integration with cloud services

Marketing and promotion / outreach (ongoing)

Next: mediated deposit service, “work” data model, different metadata templates, citation support (DOIs), notification framework, Zotero

39

Page 40: Final data presentation_clir_july2014

Suggested essentials- 1

Find out who your collaborators will be.

Who do you want to collaborate with?

Learn the technology infrastructure at your library and institution.

Be your own use case.

Learn the basics of project management.

Aim to work in partnership with researchers. 40

Page 41: Final data presentation_clir_july2014

Suggested essentials - 2

Discover new mentors.

Definitely seek out training that skills you up - problem-based.

Also: MANTRA, online data mgmt course.

Looking for examples of digital curation in practice? There’s an active Google group for this.

Cultivate a sense of experimentation & of play by trying out tools and applications new to you.

41

Page 42: Final data presentation_clir_july2014

Text

Thank you!Keep in touch: [email protected]

42