ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15,...

17
ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008

Transcript of ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15,...

Page 1: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

ARL/CNI e-Science Fall Forum 2008

Data Curation Panel

Pam Bjornson, Director GeneralOctober 15, 2008

Page 2: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Introduction

Some common elements from yesterday’s presentations:

• E-science is a new way of working for scientists that requires new organizational support

• There is a series of layers from the individual scientist or research team layer through e-dbases, archives and collections, to the tools and services layer, to the cyberinfrastructure layer of supercomputers, HPC and national networks

• Beyond collaboration to engagement with scientists and researchers – new way of working for libraries across and within institutions

All of the above = Disruptive change? Transition? Is this an adoption of our earlier practices or real disruptive change?

Page 3: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Two Canadian Data Initiatives

• Research Data Strategy Working Group– Multiple agencies, cross-disciplinary– Policy level

• <odesi> – Ontario Council of University Libraries (OCUL)– Data documentation project building on successful Scholars’

Portal initiative

Page 4: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Research Data Strategy Working Group

Page 5: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

About the Research Data Strategy WG

What it isA collaborative effort to address the challenges and issues surrounding the access and preservation of data arising from Canadian research.

Who it is

Multi-disciplinary group with representation from university research libraries and CIOs, national institutions, federal granting agencies, federal research institutes, and individual researchers e.g. CARL, CUCCIO, LAC, NRC-CISTI, CANARIE, NSERC, SSHRC, CIHR, CFI, CODATA Canada

Page 6: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Activities to Date

Stewardship of Research Data in Canada: Gap Analysis• Analysis of current state versus ideal state via 10 indicators• Identification of gaps• Final Report to be posted shortly

Three Task Groups formed:• Policies, Funding and Research

– Team Lead - Walter Stewart, CANARIE• Infrastructure and Services

– Team Lead - Chuck Humphrey, U. Alberta Data Centre• Capacity (Skills, Training, Rewards System)

– Team Lead - Margaret Haines, Carleton University Librarian

Page 7: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

RDS CanadaCurrent State - Gaps

Indicator Gap level

Policies

Moderate

Funding Large

Roles and responsibilities Large

[Trusted digital] data repositories Moderate

Standards Moderate

Skills and training Large

Reward and recognition systems Large

Research and Development Moderate

Access Moderate

Preservation Large

Page 8: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Ontario Data Documentation, Extraction Service and Infrastructure

Page 9: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

<odesi> background<odesi> background

An intuitive data portal for researchers, teachers and students; inspiring, developing and supporting research excellence.

• includes Statistics Canada and public opinion poll data

• jointly funded project between the Ontario Council of University Libraries (OCUL) and OntarioBuys (BPS Supply Chain Secretariat, Ontario Ministry of Finance)

• <odesi> is a centralised, standardised web-based data exploration/extraction system delivered through the OCUL Scholars Portal

• only provincially available tool that allows the user to search multiple datasets at the variable level

Page 10: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

<odesi> n<odesi> next stepsext steps

• seek new datasets

• creating a suite of tutorials and other training materials

• investigate providing access to <odesi> to the wider education sector

• work toward creating a national co-ordination committee for DDI projects in Canada

• investigate using <odesi> as a depository for Ontario research data

• explore links with CARL and CISTI in the aim of creating a national data archive

• explore international links e.g. CESSDA, IFDO

Page 11: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Comments/Questions Engagement w/ researchers and their workflow

Right now, many science projects that collect data are curating their own data, without the help of the library (e.g. astronomy and some big science projects are well advanced in data management).

In fact, not all are convinced that the library, or librarians, have a role to play in helping them manage data. We have heard about the skills gap – technical, team and even partnership (heedful interaction)

My own institution, the National Research Council of Canada, is very diverse, as are your own institutions. We are planning a project to assess needs, but in early stages at present. The Data Audit noted by Liz Lyon is a great tool.

Page 12: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Comments/Questions - Resources

• Universities and research organizations, and the agencies that fund them, have to consider the usual questions when we direct resources (people, operational $ and attention) to new activities:– Will new funding be found for this activity, and if so from where?– What will we stop doing or do less of?– How will we re-allocate funds to support this rather than that?

• At the granting agency level, there is concern that funding data could erode support available to new research. Or there is project money for start-up or transformational projects but no ongoing resources for sustainability

• Engagement – have to define the problem of data stewardship as compelling, urgent (risks and opportunities) with economic and competitive consequences

Page 13: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Comments/Questions Models and Issue of Coherence/Linkages

There are very different models (note examples) for data stewardship emerging in different countries and in different disciplines :

• Distributed versus centralized• Disciplinary and international versus institutional and local (e.g.

Euopean Bioinformatics Institute, MARS in meteorology)• Major national funding to non-existent

Question: Can you speak about some of the advantages and disadvantages of these models, and whether you see some as positive or others perhaps impeding the type of collaboration that seems to be demanded in order to realize the full value of access to data at web scale?

Page 14: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Comments/Questions- Policy Issues

• Funding agency policies – will they begin to include and mandate access to data as well as to published literature?

• Some data retention already regulated and mandated by granting agencies, but there is not always a capacity to actually confirm to the policy

• What is the readiness of our institutions to handle that if it were to happen?

Page 15: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Long term roles – looking forward

• There is a lifecycle to data creation, curation, dissemination and preservation. Is there also a lifecycle to what we are experiencing now as we work with researchers to integrate data into the arena of accessible knowledge?

• For example, data curation is labour-intensive, team-based (domain, computer, information skills), and particular to domains. Are we in an early “pioneer” stage for data? Will data curation evolve to be web-scaled, accomplished through network-enabled protocols and standards? Or will there always be a continuum?

Page 16: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

Comments/Questions

Themes emerging from papers and recent reports:

In the flow – question of how libraries co-evolve with user behaviours, researcher workflows and also tie in to the network environment and networked flows. Partners in research.- questions around resources, skills, long term commitment

In the cloud – if libraries use web-scaled tools and protocols, how much data curation would be simplified, how much work could be done collaboratively, in a federated way, collectively?- questions around coherence and linkages - whether that would be a self-organizing model that will emerge or one that will be guided/created? Role of private sector versus public sector?

Page 17: ARL/CNI e-Science Fall Forum 2008 Data Curation Panel Pam Bjornson, Director General October 15, 2008.

urls

CARL

Research Data Strategy Working Group http://data-donnees.gc.ca/eng/index.html

ODESI www.odesi.ca

CISTI http://cisti-icist.nrc-cnrc.gc.ca/main_e.html