Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford...

25
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton, UK 15 th May 2007

Transcript of Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford...

Page 1: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Data and Publication Discovery

Brian Matthews, Information Management Group,

STFC Rutherford Appleton Laboratory

CLADDIER workshop, Chilworth, Southampton, UK 15th May 2007

Page 2: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Microsoft’s Science 2020 Report

Modern scientific communication relies on both journals and databases. At present these are not integrated.

By 2020 mutual linking will be commonplace and publications just containing peer-reviewed data will become available.

http://research.microsoft.com/towards2020science/downloads.htm

Page 3: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

The Use CaseJoanna, at the University of Southampton, has done some work on the biology of seawater off the coast of Cornwall. As part of her analysis she needs (from a number of locations):

•Publications and data describing prior or similar work.•Oceanic profiles of salinity and temperature from the closest cruise in time and space,•Meteorological data to accompany both her own sampling and the oceanic data,•Remotely sensed ocean colour imagery (to add additional information on the biota).

She will then publish a paper that cites the datasets, lodge the paper in her own institutional repository and also deposit her datasets in one or more appropriate data repositories (e.g. both the NOCS data archive, and the, BODC).

The work Joanna has done is of interest in calibrating a global earth system model to compare simulations of oceanic CO2 production with the scenarios used in the model.

Fred, at Reading University needs to be able to find Joanna’s paper and data either via citations or directly from publication repositories. Having found the paper, the data should be obtainable via the citation and the data archive.

As part of his work he checks back through the other datasets used and cited as inputs to Joanna’s data, as before he uses Joanna’s data, he suspects Joanna’s work could be recalibrated by using better quality meteorological re-analyses.

Page 4: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 5: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

What does that need?1. Joanna’s own data acquisition2. Location and acquisition of prior publications and data3. Location and acquisition of remote datasets required as part of the

analysis4. Creation of personal metadata for new data5. Data analysis and paper writing6. Citation of remote papers and datasets7. Paper submission to a journal and acceptance8. Repository submission of paper (maybe a preprint)9. Repository submission of data10. Further metadata creation for the data (at the data repository).11. Further metadata creation for the publication (at the institutional

repository)12. Linking between institutional repositories and the data held at the

discipline repository13. All the datasets and publications cited need to be annotated with the

citation information

1. Discovery of Joanna’s work by Fred (either from Joanna’s publication or datasets or citations thereof)

2. Acquisition of all the relevant publications and datasets by Fred 3. Analysis and Publication by Fred (and all the same steps from 5 as

required by Joanna)4. External Adjudicators need to be able to find and acquire citation

information.

Page 6: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

So what services do we need?

In order to achieve this scenario we need to provide a set of key services

• Publishing of Data • Browsing and searching

– across different repositories– across data and publication

• Cross-citation of data and publication– forward and backward citation– need to maintain currency of citation links

Page 7: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Browsing and Searching

•Browsing and searching– across different repositories– across data and publication

CLADDIER has provided a harvesting and search tool to support cross-repository searching

• Uses OAI-PMH – a conventional approach– Simple – but it works!– Simple key-word searching– Three participating repositories in the pilot– BADC, STFC ePubs, e-Prints Soton

Page 8: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 9: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 10: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 11: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 12: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 13: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 14: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Adding cross-citation

The Discovery Service gives a broad-brush search

• Give you both publications and data sets – which are indexed on a key word

• A Google across repositories• Currently, cannot tell whether the data and publication are actually related

– what data and publications inspire a piece of work (generating a new data set)

– what publications arise from a data set

We need to exploit the concept of citation to see whether relationships are actually related

Page 15: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Traditional Citations

Page 16: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Cross-citation

Page 17: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Adding Citations to the Metadata Model

Adding Citations has been considered in standard metadata models.

•e.g. Scholarly Works Application Profile – JISC funded initiative– Dublin Core Application Profile– Describing Scholarly Publications (ePrints)– Based on the FRBR model– Does consider Citations– But breaks citations up into small components – This is highly labour intensive to enter– Does not have a notion of back citation

Page 18: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

FRBR Model

Page 19: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

ePubs and Cross-Citations

STFC ePubs has a metadata model based on FRBR

•Need to extend this to support cross-citation•Keep it simple•Can support forward and back links•Have developed a simple model for citations

Page 20: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Citation Model

Page 21: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 22: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Page 23: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Maintaining LinksIdeally the archives holding the datasets and publications would be notified that a paper citing them had been submitted.

Metadata associated with those records would be updated to reflect the citations.

The metadata in the publication repository should also link to the data in the data archives and vice versa.

It would be great if this notification could be done automatically.

Page 24: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Notification ServicesTo support this, we need to provide a

notification service.

• Federated Repositories register with the service• Repositories notify the service of citations• The service informs (via broadcasting or targeting) repositories of citation, •Service provides sufficient information to update metadata

•Still under development.•Note Blogging software.

Page 25: Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,

Conclusions

The Use Case supports the scientific process with repositories

This requires the cross-linking network of information objects

Which needs to be stored, maintained and searched

Tools and ideas relatively straightforwardLots of gluing of existing components

Keep it simple – so it will get used