Cross-linking and Referencing Data and Publications in CLADDIER

16
Cross-linking and Referencing Data and Publications in CLADDIER Brian Matthews, E-Science Centre, STFC Rutherford Appleton Laboratory

description

Cross-linking and Referencing Data and Publications in CLADDIER. Brian Matthews, E-Science Centre, STFC Rutherford Appleton Laboratory. Bryan Lawrence (PI, BADC) Sam Pepler (Project Manager, BADC) Sue Latham (BADC) Pauline Simpson (NOCS) Jessie Hey (Southampton) Brian Matthews (STFC) - PowerPoint PPT Presentation

Transcript of Cross-linking and Referencing Data and Publications in CLADDIER

Page 1: Cross-linking and Referencing Data and Publications in CLADDIER

Cross-linking and Referencing Data and Publications in

CLADDIER

Brian Matthews, E-Science Centre,

STFC Rutherford Appleton Laboratory

Page 2: Cross-linking and Referencing Data and Publications in CLADDIER

About CLADDIER

Bryan Lawrence (PI, BADC)Sam Pepler (Project Manager, BADC)

Sue Latham (BADC)Pauline Simpson (NOCS)

Jessie Hey (Southampton)Brian Matthews (STFC)Catherine Jones (STFC)

Alistair Miles (STFC)Katie Portwin (STFC)Shoaib Sufi (STFC)Kevin O’Neil (STFC)

Katherine Bouton (Reading, NCAS)

Citation, Location and Deposition in Discipline and Institutional Repositories

Funded via a JISC grant, through the Digital Repositories programme - July 2005-Oct 2007

Page 3: Cross-linking and Referencing Data and Publications in CLADDIER
Page 4: Cross-linking and Referencing Data and Publications in CLADDIER

Citation and linking in repositories

In order to achieve this scenario we need to provide a set of key mechanisms

• Publishing of Data

– Conventions for the citation of data– Can then treat data citation in similar way to publications

• Browsing and searching– across different repositories– across data and publication

• Cross-citation of data and publication– forward and backward citation– need to maintain currency of citation links – A simple mechanism to push citation information between repositories

A practical look at citation of data and how repositories could communicate citation information.

Page 5: Cross-linking and Referencing Data and Publications in CLADDIER

Data PublicationIn this context “publication” is defined as the process through which data is fixed and made

retrievable over the long term, and may imply that there has been some quality control process.

– Defining data : fixing and encapsulating a “meaningful” data set

– Quality Control : Publishers, Data Centres

Natural Environment Research Council, Mesosphere-Stratosphere-Troposphere Radar Facility [Thomas, L.; Vaughan, G.] . Mesosphere-Stratosphere-Troposphere Radar Facility at Aberystwyth, [Internet]. Version 2, Cartesian products. British Atmospheric Data Centre (BADC), 1990- [cited 2006 Apr 25]. Available from http://badc.nerc.ac.uk/data/mst.

Page 6: Cross-linking and Referencing Data and Publications in CLADDIER

Browsing and SearchingBrowsing and searching

• across different repositories• across data and publication

CLADDIER has provided a harvesting and search tool to support cross-repository searching

Page 7: Cross-linking and Referencing Data and Publications in CLADDIER

Discovery ServiceThe Discovery Service gives a broad-brush search

• Give you both publications and data sets

• indexed by keyword

Google across repositories.

Uses OAI-PMH – a conventional approach

• Simple – but it works!• Simple key-word searching• Three participating repositories in the pilot: BADC, STFC ePubs, SOTON ePrints

Page 8: Cross-linking and Referencing Data and Publications in CLADDIER

Adding Cross-Citations

Traditional Citation

Cross Citation

Cannot tell whether the data and publication are actually related.– what data and

publications inspire a piece of work (generating a new data set)

– what publications arise from a data set

We need to exploit the concept of cross-

citation to see whether items are actually

related.

Page 9: Cross-linking and Referencing Data and Publications in CLADDIER

Maintaining LinksIdeally the archives holding the datasets and publications would be notified that a paper citing them had been submitted.

– Metadata associated with those records would be updated to reflect the citations.

– The metadata in the publication repository should also link to the metadata in the data archives and vice versa.

– It would be great if this notification could be done automatically.

• Tedious to enter citations• “forward citations” (“cited-by”) are hard to track

We adapted a protocol from the world of Blogging– Trackback– Designed to allow cross-referencing of blog articles – Extended to allow richer metadata

Page 10: Cross-linking and Referencing Data and Publications in CLADDIER

Trackback Protocol

Page 11: Cross-linking and Referencing Data and Publications in CLADDIER

Sender Publication

This publication has a citation to a technical report

Page 12: Cross-linking and Referencing Data and Publications in CLADDIER

Adds Citation

Sends trackback call to this URI

Page 13: Cross-linking and Referencing Data and Publications in CLADDIER

Embedded Metadata

Trackback URI

Formats accepted

Page 14: Cross-linking and Referencing Data and Publications in CLADDIER

After Trackback – cited-by link added

Receiver Publication

Added this cited by link

Page 15: Cross-linking and Referencing Data and Publications in CLADDIER

Notes on Trackback•A simple existing protocol

– P2P – loosely federates repositories

– Extended to carry metadata of the citation

– To add “cited-by” links•Can also indicate which metadata is expected

– Simple Dublin Core – ePrints Application

Profile•Can also use the metadata of the receiver

– Improves the citation metadata

•Implemented in ePubs– Also partially in BADC– Receiver only – send

email to admin.

Some problems or extensions are under consideration•Link to metadata– not full text•Spamming – anyone could send trackbacks

– Whitelists– Administrator intervention

•Multiple entries– Same citation multiple times– Same citation in different

repositories•Retraction of citation

– A delete protocol

Page 16: Cross-linking and Referencing Data and Publications in CLADDIER

Conclusions

CLADDIER supports the scientific process with federated repositories

This requires the cross-linking network of information objects.

Which needs to be stored, maintained and searchedNow doing some user testing

Tools and ideas relatively straightforwardLots of gluing of existing components

Keep it simple – so it will get used

http://claddier.badc.ac.uk/