© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry,...

21
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. [email protected]

Transcript of © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry,...

Page 1: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Institutional Data Repositories for Chemistry

Simon Coles

School of Chemistry,

University of Southampton, U.K.

[email protected]

Page 2: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Why? Funding Body Viewpoint

Page 3: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Why? Curation in the Laboratory

“Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant”

“Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits”

“To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data”

“Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.”

‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

Page 4: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Why?Publishing and the Data Deluge

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

30,000,000

1.5,000,000

450,000

Page 5: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Why?Publishing Data and Information Loss

Page 6: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Separating Data from Interpretations Underlying data

(Institutional data repository)

Intellect & Interpretation

(Journal article, report,

etc)

Page 7: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Data capture and curation at the point of generation in the laboratory

The Repository for the Laboratory – R4L

Page 8: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Laboratory IRs and Information Management

Page 9: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

The R4L Repository

Deposit

Search / Browse

Create new compound Add experiment data and metadata

Page 10: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Data dissemination and curation by the scientist and host institution

eBank-UK and the eCrystals Repository

Page 11: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

The eCrystals Data Archive

http://ecrystals.chem.soton.ac.uk

Page 12: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Metadata Publication

• Using simple Dublin Core • Crystal structure• Title (Systematic IUPAC Name)• Authors• Affiliation• Creation Date

• Additional chemical information through Qualified Dublin Core• Empirical formula• International Chemical Identifier (InChI)• Compound Class & Keywords

• Specifies which ‘datasets’ are present in an entry

• DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145

• Rights & Citation http://ecrystals.chem.soton.ac.uk/rights.html

• Application Profile http://www.ukoln.ac.uk/projects/ebank-uk/schemas/

Page 13: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Metadata and Data Quality Control Data manipulation toolbox

Associated Metadata

Value added

Format conversion

Page 14: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Laboratory Data Management and Archive

Page 15: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Institutional data repositories and harvesting, aggregation and curation

by data centres and third party services

eBank-UK Phase 3 – The eCrystals Federation

Page 16: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Aggregator services

Institutional data repositoriesValidation

Deposit

Publication

Validation

Data analysis, transformation, mining, modelling

Search, harvest

Presentation services / portals

Data discovery, linking, citation

Laboratory repositoryDeposit

The eCrystals ‘Global Federation’ Model

Publishers: peer-review journals, conference proceedings, etc

Curation

Preservation

Subject Repository

Institution Library & Information Services

Page 17: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Exploring the heterogeneous landscape of data repositories

• Different software platforms

• Different administrative domains

• Different Institutional structure

• Institutional vs Subject repositories

• Data Repository Interoperability ORE

Page 18: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Preservation and curation

by data centres & Institutions

G bytesM bytes

k bytes

Institution Library & Information Services

Page 19: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Harvesting, aggregation, value

addition and curation by data centres

Page 20: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

The relationship with (conventional?) publication protocols and procedures

• Discipline-based publication

• Domain-based publication

• Open Access publication

Page 21: © S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk.

                                                             

© S.J. Coles 2006

Aggregation, linking and information provision by third party services

• Indexing and aggregating with other datasets

• Aggregating and linking between datasets and articles

• Integration into information portals