The CLARION Project for the Infrastructure for Integration in Structural Sciences (I2S2) mtg,...
-
Upload
jasmine-hammond -
Category
Documents
-
view
215 -
download
0
Transcript of The CLARION Project for the Infrastructure for Integration in Structural Sciences (I2S2) mtg,...
The CLARION Project for the
“Infrastructure for Integration in Structural Sciences”(I2S2) mtg, Rutherford Labs, 11th February 2010
CLARION – Chemical Laboratory Repository In/Organic NotebooksPrincipal Investigator: Peter Murray-Rust
Co-Investigator: Jim DowningProject Team: Nick Day, Sam Adams, Brian Brooks
Unilever Centre, Department of Chemistry, University of Cambridge
CHEM-0repository
EmMaEmbargo Mgr
ELN(IDBS)
Crystall-ography
Files (CIF)
NMRfiles
CML, RDF
RDF triplestores
SPARQL interface
CLARION query app
CLARION overview
CHEM-1repository
Data Releaser
Publicationsdatabase
JUMBO converters
EmMa user
interface
ExternalScientist
InternalScientist
1. Scientist collects data & stores it in variety of locations2. EmMa is notified about the new content3. Scientist specifies the release conditions for the data4. Timer waits until release conditions are met5. Data is moved into CHEM-1 repository...6. ... and (at some time) into CHEM-0 repository7. Repository queried by scientists
1
2
3
4 6Data
Loader
5 7
ELN server
File Feed
ELN Feed
Lensfield Loader
ELN
DataFiles
CHEM-0/1repository
• Jetty webserver• cron jobs• Java
• Jetty webserver• cron jobs• Java
GUI client
Design principles used:•Decoupling through standard web interfaces (http, Atom)•Avoid data duplication (by using http references unless a copy is required)•Don’t do manually that which can be done automatically•Manual semantification as early as possible•Automatic semantification as late as possible•Give ability to undo an action during a grace period rather than getting confirmation
• Jetty webserver• Java• H2db for metadata
• JUMBO converters• Ontologies:
• ChemAxiom• ORE• ORE Chem Expt
• Jetty webserver• Java & Clojure
CML
RDF
RDFTriplestore
ChemicalStructureindex
• Jetty webserver• Java• SPARQL
Blue boxes indicate logical machine environments
CLARION architecture
• SOAP
CLARION repository
• Sesame• Chemicx
EmMa’s role:•Adds metadata•Defines embargo release conditions•Is the gatekeeper for metadata quality•Is the gatekeeper for security (trust, authentication, authorisation)
EmbargoManager(EmMa)
QuerySystem
Scientists presented with data records to which they add metadata and then set embargo release conditions
EmMaSources RepositoryData
Loader
Stage 1
Stage 2
Stage 3
1
2
3
CLARION development stages & timings
Stage 1: First data-feed into EmMa•Atom-feeds from file stores•EmMa feed-readers•EmMa user review tool•EmMa output atom-feeds
Stage 2: Basic functionality to store first data-type into repository•Lensfield reads EmMa feeds•Process data to CML•Process CML to RDF•Store triples into triple-store•Indexing of chemical structures
Stage 3: Basic querying functionality•Authentication & authorisation•Pilot users loading data•V1 query tool
Data stored in RDF and chemical structures indexed System in use by pilot
users & simple query interface for SSS & RDF queries. Querying by outside users.
EmMa
EmMa: A general tool for controlling data release between systems ?
ISIS
ELN
XRay
NMR
Etc
PubChem
PDB
Chem-1
Chem-0
NCS eCrystals
Atom feed
Atom feed
Atom feed
Atom feed
Atom feed
PublicAtom feed
Fully semantified data (RDF)
Original data plus basic metadata
PrivateAtom feed
Pump
Pump
Institution A
EmMa
Rutherford
neutron
Institution B
EmMa
Events:1.Scientist sends sample to Rutherford2.Rutherford stores data locally and sends copy back to scientist3.Institution’s EmMa is informed about new data4.Scientist specifies data release conditions5.Release conditions reached, data released to public repository6.Rutherford monitors institution’s atom feed, detects data is released7.Rutherford makes data visible in their own public-access repository
Private repository
Public repository1
234
65
7
How EmMa could facilitate data release in collaborating institutions