British Library Datasets Programme Feb 2011
-
Upload
datasets-at-the-british-library -
Category
Education
-
view
106 -
download
2
description
Transcript of British Library Datasets Programme Feb 2011
British LibraryDatasets Programme
JISC RSP Winter School
February 2011
Max Wilkinson
2
Today’s Talk
1. The British Library
2. Data in scholarly communication
3. The problem with data
4. The Datasets Programme Vision Strategy Activity (DataCite)
5. Other Projects
3
The British Library
Exists for everyone who wants to do research – for academic, personal, and commercial purposes.
Covers all subject areas – sciences, technology, medicine, arts, humanities, social sciences…
Receives a copy of every item published in the UK.
Holds over 150 million items, with 3 million items added each year.
Used by over 16,000 people each day (on site and online).
The British Library: some facts and figures
Helping people advance knowledge
to enrich lives
GIA Funding 08/09:£94.8m operational, £12m capital
Other funding secured 07/08: c.£33m
National library of the UK.
Serves researchers, business, libraries, education & the general public
Collection includes over 2m sound recordings, 5m reports, theses and conference papers, the world’s largest patents collection (c.50m)
3 main sites in London and Yorkshire. Circa 2,000 staff
Business and IP Centre: Providing inspiration, and enabling protection of creative capital and business development
Generates value to the UK economy each year of 4.4 times public funding
Collection fills over 600km of shelving and grows at 11km per year
70 Tb of digital material through voluntary deposit
British Library Act 1972National centre for reference, study, bibliographical and other information services, in relation both to scientific and technological matters, and to the humanities.
Science and Innovation Investment Framework 2004-2014, H.M. Treasury (2004)UK research base must have ready and efficient access to information of all kinds – such as experimental data sets, journals, theses, conference proceedings and patents. This is the life blood of research and innovation.
The largest document supply service in the world. Secure e-delivery and ‘just in time’ digitisation enables desktop delivery within 2 hours
5
Who do we serve?
The Researcher – We provide access to research level materials to all sectors including academia, industry, government, charities and NGOs.
Business -The British Library also has a critical role supporting businesses of all sizes, from individual entrepreneurs through to major organisations.
The Learner - We have an important role to play in supporting education from primary schools to developing future researchers of any age.
The Library Community – We play a key role in supporting the wider UK Library Community and information network.
The General Public - The services we offer include exhibitions and events, tours and web services which digitally showcase our collection.
6
Modern science relies on good data
7
Scholarly record
Discovery
AccessRecordPermanence
Citation
Metadata Exposure
Trust Fabrics
Copyright
Scholarlyrecord
8
The Foundation for Research
Data is a crucial component of the scholarly record.
Re-acquisition may be impossible
Datasets are essential to the British Library’s mission to advance the World’s knowledge.
9
Current Situation
No effective way to link between datasets and article;
No widely used method to identify datasets;
No widely used method to cite datasets.
10
As a result…
Datasets are:
Difficult to discover Difficult to access In danger of being lost
11
Difficult to Discover. Good luck finding the data!
“Source: Committee on Climate Change”
12
Data are diverse in the Digital Landscape
Seismic measurements taken by a geologist.
An audio archive of birdsong created by an ornithologist.
Genetic data collected by a medical researcher.
A survey of public opinions collected by a sociologist.
13
Re-join the gap…
(No) effective way to link between articles and datasets
(No) widely used method to identify datasets
(No) widely used method to cite datasets
Articles
Underlying data
14
Datasets – first class citizens?
Data is difficult to manage after project funding ceases
Informal networks provide the primary means of sharing
Only 21% use a national or international facility
Datasets are not included in impact analysis
Good luck finding it or getting permission to use it (your discipline may vary)
Source: UKRDS Study:The Data Imperative. Managing the UK’s research data for future use (Feb 2009)
15
Scholarly record
Discovery
AccessRecordPermanence
Citation
Metadata Exposure
Trust Fabrics
Copyright
Scholarlyrecord
16
Research training based on scholarly communication
Discovery
AccessRecordPermanence
Citation
Metadata Exposure
Trust Fabrics
Copyright
Scholarlyrecord
Rarely includes data
17
Scholarly communication requires intellectual exchanges
Discovery
AccessRecordPermanence
Citation
Metadata Exposure
Trust Fabrics
Copyright
Scholarlyrecord
No such data fabric
18
Scholarly discourse requires a record and provenance
Discovery
AccessRecordPermanence
Citation
Metadata Exposure
Trust Fabrics
Copyright
Scholarlyrecord
Almost non-existent for data
19
The Datasets Programme
We envision a future where researchers can:
Discover, access, reuse, and reference datasets.
Track the impact of the data that they generate and receive appropriate credit.
Our approach is to:
Provide a focus for the community to establish needs, requirements and agreement.
Explore novel technology and creative solutions.
20
Two key concepts
INCENTIVE
SUSTAINABILITY
21
Projects and activities
www.bl.uk/datasetsFollow us on twitter @datasetsBL
22
A Key Component for Many Goals
MakeVisible
Find
AccessTrackImpact
Verify
Reuse
Cite
?PersistentIdentification
23
Citation using Digital Object Identifiers (DOIs)
DatasetG.Yancheva, N. R. Nowaczyk et al (2007)Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA
Article CitationG. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoonNature 445, 74-77
How to reference
Published Article (Abstract or full text)
The DOI system offers an easy, internet actionable way to connect the article with the
underlying publication
But a complete scholarly record would also link to the evidential datasets and their
location, e.g. PANGAEA
doi:10.1038/nature05431
24
doi:10.1038/nature05431 leads to a landing page
25
Digital Object Identifiers (DOIs) offer a solution
Mostly widely used identifier for scientific articles
Researchers, authors, publishers know how to use them
Put datasets on the same playing field as articles
Connecting an Article with the Underlying Data
DatasetYancheva et al (2007). Analyses on sediment of Lake Maar. PANGAEA.doi:10.1594/PANGAEA.587840
URIs are commonly used but can decay
(e.g. Wren JD: URL decay in MEDLINE- a 4-year follow-up study. Bioinformatics. 2008, Jun 1;24(11):1381-5).
26
doi:10.1594/PANGAEA.587840
27
Dataset citation using Digital Object Identifiers (DOIs)
DatasetG.Yancheva, N. R. Nowaczyk et al (2007)Rock magnetism and X-ray flourescence spectrometry analyses on sediment cores of the Lake Huguang Maar, Southeast China, PANGAEA
doi:10.1594/PANGAEA.587840
ArticleG. Yancheva, N. R. Nowaczyk et al (2007) Influence of the intertropical convergence zone on the East Asian monsoonNature 445, 74-77
doi:10.1038/nature05431
Data CitationScholarly record is complete
28
Projects – DataCite
DataCite is an international consortium which aims to:
Establish easier access to scientific research data on the Internet
Increase acceptance of research data as legitimate, citable contributions to the scientific record
Support data archiving that will permit results to be verified and re-purposed for future study.
29
DataCite
Support researchers by enabling them to locate, identify, and cite research datasets with confidence
Support data centres by providing persistent identifiers for datasets, workflows and standards for data publication
Support publishers by enabling research articles to be linked to the underlying data
DataCite : Data Centres :: CrossRef : Publishers
30
Digital Object Identifier (DOI)
doi:10.4124 / 0003.569Prefix Suffix
31
DOI prefix
doi:10.4124/0003.569Prefix Suffix
The British Library provides data centres with a unique prefix for DataCite DOI
For example, Archaeology Data Service uses 10.5284
32
DOI suffix
doi:10.4124/0003.569Prefix Suffix
Suffix generated by the data centre
Guidelines for DOI syntax are being developed
33
Resolving a DOI
doi:10.4124/0003.569Prefix Suffix
Resolving the DOI:
http://dx.doi.org/10.4124/0003.569
34
DOIs resolve to an open landing page
35
DataCite Service
Built a service for data centres to mint DOIs for datasets and store associated metadata (http://api.datacite.org)
British Library is trialling the service with several UK data centres, including:
36
Projects and activities
www.bl.uk/datasets
37
SageCite: Data citation in bioinformatics workflow
•Sage bionetworks data capture and analysis workflow (Tavenra: MyExperiemnt)•Data Citation service integration points and citation targets (e.g. data-models)•Recommendations•Benefits analysis
SageCite: Integration of data citation services into multi-contributor bio-informatics workflow. Establishing data attribution and credit mechanisms.► INCENTIVE
Sage Bionetworks: Aggregating datasets from contributors to create massive coherent datasets that can be used for systems level analysis of disease
38
Dryad UK: Repository sustainability
•Expand Publisher base•Seamless integration into publisher workflow•Sustainability models for datasets supplementary to publication
Dryad UK: Define a business case and pilot service integrating DataCite DOIs and dataset archiving into publisher workflows
► SUSTAINABILITY
Leveraging the Dryad Consortium, which is addressing the acquisition and storage of long tail supplementary data
39
For more information on the BL Datasets Programme
Max Wilkinson: Programme Manager; Datasets
Email:[email protected]
Email: [email protected]
WebSite www.bl.uk/datasets
Follow us on twitter @datasetsBL