Data Library Services In The Data Stewardship Lifecycle

41
Data Library Services in the Data Stewardship Lifecycle Charles (Chuck) Humphrey University of Alberta

Transcript of Data Library Services In The Data Stewardship Lifecycle

Page 1: Data Library Services In The Data Stewardship Lifecycle

Data Library Services in the Data Stewardship Lifecycle

Charles (Chuck) HumphreyUniversity of Alberta

Page 2: Data Library Services In The Data Stewardship Lifecycle

Outline

• Canada as a case study for data library services: a twenty-year experiment

• Lessons learned from Canada• General observations about forces shaping data

library services• Data and other digital collections• The data “continuum of access” in collection

development• Data reference and technical services

• Planning levels of service for data libraries• Applying a data stewardship lifecycle model

Page 3: Data Library Services In The Data Stewardship Lifecycle

The Canadian experience

1970 1980 1990 2000 2010

Introduction of public use data products from the 1971 Census in digital format

A set of the 1981 Census data products cost~$12,000

The cost of 1986 Census data products > $200,000

CARL Census Data Consortium was formed in 1989

The “Modern” Census Era

Page 4: Data Library Services In The Data Stewardship Lifecycle

The Canadian experience

• 1989 is a benchmark year in the development of data library services in Canada, which arose out of a response to Statistics Canada’s new pricing policy mandated by the Conservative government in power.

1970 1980 1990 2000 2010

CARL Census Data Consortium was formed in 1989

Page 5: Data Library Services In The Data Stewardship Lifecycle

Data Library Context in 1989

• 8 data libraries

3 in the west

5 in the east

• 3 in libraries

2 in academic computing centres 2 in research centres

1 library hybrid

Data Library Context in 1989

• 8 data libraries

3 in the west

5 in the east

• 3 in libraries

2 in academic computing centres 2 in research centres

1 library hybrid

Page 6: Data Library Services In The Data Stewardship Lifecycle

Data Library Context in 1989

• 8 data libraries

3 in the west 5 in the east

• 3 in libraries 2 in academic computing centres 2 in research centres

1 library hybrid

The closest data library to

my service was 1,200 km away.

Page 7: Data Library Services In The Data Stewardship Lifecycle

Data Library Context in 2009• 75 data library services 25 in the west 50 in the east• All 75 are located in libraries

Page 8: Data Library Services In The Data Stewardship Lifecycle

Changes between 1989 and 1998

1989 1994 1999 2004 2009

CARL Census Data Consortium, 1989

CARL General Social Survey Microdata Consortium,1991

COPPUL/ICPSR Federation, 1993

CARL Data Consortium for the 1991 Census, 1994

Data Liberation Initiative Pilot Launched, 1996

Annual COPPUL Data Service Training Workshops, 1992

Ontario-Quebec/ICPSR Federation, 1994

DLI Regional Training Workshops, 1997

Page 9: Data Library Services In The Data Stewardship Lifecycle

Changes between 1999 and 2009

1999 2004 2009

Research Data Centre Network, 2000

National Data Archive Consultation, 2001-2002

Consultation on Access to Scientific Research Data, 2005

Research Data Strategy Working Group, 2008

DLI Train the Trainers Workshop, 2004

Canadian Digital Information Strategy, 2007

Page 10: Data Library Services In The Data Stewardship Lifecycle

General lessons from Canada• Collections were a driving force behind

libraries introducing data library services.• Institutions working through cooperative

arrangements helped introduce data as a library resource.

• Collection development at the local level was largely driven by the general availability of data.

• Training has been an ongoing factor in the continued participation of libraries in data collection initiatives.

Page 11: Data Library Services In The Data Stewardship Lifecycle

General lessons from Canada• Peer-to-peer training has been an effective

method in DLI, using the general rule that as you learn, you teach others.

• Training has allowed for differences in data needs and data cultures across institutions and regions in the country.

• Training opportunities have been continuous through annual regional workshops. DLI workshops have become an expectation.

• IASSIST conferences provide an immersion to data services and should be viewed as a training opportunity.

Page 12: Data Library Services In The Data Stewardship Lifecycle

General lessons from Canada• National consultations and international

pressures have made data a well-discussed topic in this decade but have failed to make data a political priority.

• While everyone seems to be talking about data, few are actually doing anything to address concerns about data access and preservation.

• Part of the inability to mobilise a collective response to data access and preservation in Canada is the absence of a forum for data people to plan and coordinate work together. The Research Data Strategy Working Group is a first attempt at this.

Page 13: Data Library Services In The Data Stewardship Lifecycle

Collection development• Data collections are part of a growing number of

digital collections being managed in today’s libraries.

• Libraries face buying or leasing these collections, producing their own through digitisation projects, or serving as stewards for collections that are entrusted with them.

• In Canada, data collections have tended to be leased. Most data licenses require that the producer’s data must be destroyed once a lease is terminated. One result is that these leases have become long-term commitments by libraries.

Page 14: Data Library Services In The Data Stewardship Lifecycle

Collection development• With leased data, the role of a data librarian

becomes one of managing the contractual relationship between data producers and her or his local institution.

• Collection development consists of choosing data producers that have data corresponding to patron needs on campus. Often these omnibus collections have a mix of data that will support a variety of research interests. One can characterise these as “collections of access.”

• One strategy in Canada has been to select data collections that support a continuum of access to products.

Page 15: Data Library Services In The Data Stewardship Lifecycle

Continuum of access

Open accessFree access

Published statistics

Restricted accessExpensive accessConfidential data

Conditional accessFees for access

Anonymised dataAggregate databases

Page 16: Data Library Services In The Data Stewardship Lifecycle

Continuum of access

Open accessFree access

Published statistics

Restricted accessExpensive accessConfidential data

Conditional accessFees for access

Anonymised dataAggregate databases

Web AccessData Enclave

Access

Web and in person Access

Page 17: Data Library Services In The Data Stewardship Lifecycle

Collection preservation• Unlike many other countries that have national

data archives, Canada is without an institution providing stewardship for the long-term preservation of and access to data.

• This institutional gap in Canada is now being addressed by proposals to establish trusted data repositories in some universities. The goal is to build a network of repositories nationally to preserve data collections.

• Data libraries, to a limited extent, have helped fill the gap in the absence of a national data archive.

Page 18: Data Library Services In The Data Stewardship Lifecycle

Data reference• Data reference is dependent on the level of

service being supported, which can include:• locating data that has been requested by title,• finding data to support a line of research

enquiry,• interpreting data documentation,• extracting subsets of data and providing the

data in a format directly useable by a patron,• merging and manipulating data files to

produce new data products;• providing advice to researchers throughout a

project on metadata and data management.

Page 19: Data Library Services In The Data Stewardship Lifecycle

Technical services• Metadata for data collections should include

(i) a general item description and (ii) a detailed content description that documents the data for machine processing as well as human understanding.

• A general item description in MARC format is typically produced for online catalogues and may be generated by the data producer or local bibliographic services.

• The detailed content metadata is generated by the data producer and can be delivered in a variety of formats and in conventions that often are not based on standards.

Page 20: Data Library Services In The Data Stewardship Lifecycle

Technical services• The computing support will depend on the

level of service being provided, just as collection development and data reference services depend of a defined level of service.

• Web 2.0 services for data delivery are tempered by the license agreements with data producers. Typically, institutions are required to use IDs and passwords to access data holdings.

• As federated authentication systems become more widely shared across institutions, redundant storage of data collections will lessen as institutions share the physical storage of data.

Page 21: Data Library Services In The Data Stewardship Lifecycle

Planning levels of service• The importance of levels of service has been

mentioned repeatedly in the context of data collections, data reference and technical services.

• What are the options for levels of service? How does one go about planning for levels of service? What co-dependencies must be including in data service plans?

• These questions can be addressed using a new framework based on the data stewardship lifecycle.

Page 22: Data Library Services In The Data Stewardship Lifecycle

A framework for planning data services

• The concept of data stewardship in combination with a lifecycle model of data provides a useful tool for planning data library services.

• Data stewardship identifies the roles and responsibilities of all individuals and groups engaged in the production, access and preservation of data throughout its lifecycle.

• A data service plan should clearly state the roles and responsibilities identified with the level of service to be supported.

Page 23: Data Library Services In The Data Stewardship Lifecycle

A framework for planning data services

• The lifecycle of data is a representation of the various stages through which data flow from production to use to preservation to new uses.

• Each stage consists of a set of related activities that culminate in a significant product, which is then passed to a subsequent stage.

• By linking together a series of stages in logical sequence, the processes of data production and use are described.

Page 24: Data Library Services In The Data Stewardship Lifecycle

Lifecycle of data• As with any project management operation, the

views of a project vary depending on the granularity at which activities are described.

• Similarly, the stages in the data’s lifecycle can be aggregated or disaggregated into larger or smaller groupings, depending on the viewpoint one desires.

• Keep these points in mind while examining a couple of lifecycle representations.

• The first model is the widely circulated DCC curation lifecycle.

Page 25: Data Library Services In The Data Stewardship Lifecycle

http://www.dcc.ac.uk/lifecycle-model/

Page 26: Data Library Services In The Data Stewardship Lifecycle

http://www.dcc.ac.uk/lifecycle-model/

Page 27: Data Library Services In The Data Stewardship Lifecycle
Page 28: Data Library Services In The Data Stewardship Lifecycle

This table lists changes to the stages in the DCC model, re-aggregating activities in the lifecycle to create a data library viewpoint.

DCC Data Lib

create or receive data production

appraisal and select dissemination

ingest, store, access and use data repository

discovery

transform repurpose

Page 29: Data Library Services In The Data Stewardship Lifecycle

Data lifecycle for data libraries

Data Repurposing

Data ProductionData Repository

Data Dissemination

Data Discovery

Page 30: Data Library Services In The Data Stewardship Lifecycle

Data production• Stewardship role:• Responsible for the terms of data use

specified in the license of the data producer;• Serve as lifecycle advisor to local data

producers.

• Potential data services activities:• Help local project managers develop data

plans incorporating a lifecycle perspective; • Provide researchers who are collecting data

on human subjects with support statements for their ethics approval applications;

• Provide researchers with support statements on data management in grant applications;

Page 31: Data Library Services In The Data Stewardship Lifecycle

Data production• Provide feedback to data producers about data

in demand on campus;• Provide data producers with usage statistics on

their data;• Assist with literature and data searches in the

study design stage; • Consult with local data producers on metadata

standards for data documentation; • Organise training on the DDI metadata standard;• Provide data preservation services throughout

the data production stage.

Page 32: Data Library Services In The Data Stewardship Lifecycle

Data dissemination• Stewardship role: • Responsible for communicating the terms of the

license with patrons;• Ensure the data products that are delivered are

complete, documented and machine readable;• Ensure the appropriate level of security is

maintained for the data.

• Potential data services activities:• Monitor the release dates of data from producers; • Acquire data and metadata from data producers;• Prepare catalogue records for data titles;

Page 33: Data Library Services In The Data Stewardship Lifecycle

Data dissemination

• Develop and maintain local access to data, providing formats appropriate for local needs;

• Support infrastructure that provides online access to data;

• Provide data dissemination services for researchers on campus;

• Provide data anonymisation services for human-subject data collected on campus;

• Coordinate the deposit of local research data with a data archive or repository.

Page 34: Data Library Services In The Data Stewardship Lifecycle

Data repository• Stewardship role: • Responsible for the data collection in local

repository;• Responsible for service plan and operation of

local repository;• Responsible for metadata practices in local

repository.

• Potential data services activities:• Prepare and implement a data collection

development plan;• Acquire and ingest data from producers; • Ensure data product authenticity;

Page 35: Data Library Services In The Data Stewardship Lifecycle

Data repository• Appraise, select and ingest data originating on

your campus into the repository;• Provide services to support the use of data,

including help with data extractions, reformatting and subsetting;

• Manage the collection of data and metadata, including refreshing digital media and migrating data to new digital media;

• Coordinate activities with the local Institutional Repository;

• Achieve and maintain “trusted” repository status.

Page 36: Data Library Services In The Data Stewardship Lifecycle

Data discovery• Stewardship role: • Responsible for providing each patron with a

comprehensive data reference interview or, when appropriate, for making an informed referral.

• Potential data services activities:• Provide reference services to assist patrons in

their search for data;• Produce and maintain metadata services to

help find data (may involve metadata production, loading records into local OPAC, supporting Nesstar or Dataverse service, etc.);

Page 37: Data Library Services In The Data Stewardship Lifecycle

Data discovery

• Conduct data literacy training for library colleagues and establish the grounds for informed referrals to data services;

• Conduct data literacy training for patrons;• Promote data citation practices on your

campus as part of the ethics of academic integrity;

• Contribute to the development of new tools to exploit metadata.

Page 38: Data Library Services In The Data Stewardship Lifecycle

Data repurposing• Stewardship role: • Responsible for ensuring permissions are in

place for repurposing data;• Ensure the deposit of new data in a repository.

• Potential data services activities:• Help patrons mine metadata to discover data

for repurposing;• Provide technical support to help patrons

merge data from multiple sources for new purposes;

Page 39: Data Library Services In The Data Stewardship Lifecycle

Data repurposing

• Participate in projects seeking new ways of exploiting metadata;

• Integrate the production of metadata into data repurposing practices;

• Engage in tools development to support data mining and data visualisation.

Page 40: Data Library Services In The Data Stewardship Lifecycle

Across all stages of the lifecycle

• Work with data producers on campus to establish roles and responsibilities for the long-term access and preservation of data by clarifying stewardship roles.

• Work to ensure that information gaps do

not occur throughout the lifecycle, such as files getting stranded on hard drives.

Page 41: Data Library Services In The Data Stewardship Lifecycle

Start a service appropriate for today

• A data library does not have to begin with a large mandate but can be tailored to the level of support that can be maintained and staffed today.

• Having said this, plan for expansion! The services offered by a data library have a way of generating new expectations that will require further resources, spanning staff, training and infrastructure.

• The returns on the service will surprise you!