EBI is an Outstation of the European Molecular Biology Laboratory.
Chemical classification for the Semantic Web
Janna Hastings, EBI Cheminformatics and Metabolism
ACS Skolnik Symposium, Philadelphia,
21 August 2012
Classification conveys the type for data
The Semantic Web makes data of all types
available, open and interlinked
Classification using OWL ontologies
dramatically enhances the potential of the
chemical Semantic Web
21.08.2012 2
Why classify for the Semantic Web?
RDF “triples”:
?subject ?relationship ?object
rdf:type
21.08.2012 3
21.08.2012 4
Chemicals, classes and information
21.08.2012 5
This is not a molecule
21.08.2012 6
Molecules are small
They are three-dimensional
Their structures can vary according to their environment
We say they have the same type
when they share important properties
All caffeine molecules have type caffeine
21.08.2012 7
There are many different ways to
represent molecules
InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES=Cn1cnc2n(C)c(=O)n(C)c(=O)c12
Name=caffeine
Name=1,3,7-trimethyl-3,7-dihydro-1H-purine-2,6-dione
Identifier=KEGG COMPOUND:C07481
None of these are (themselves) molecules
They describe and approximate
21.08.2012 8
?subject ?relationship ?object
Science aims to make discoveries of general rules
about the things that that data are about
Classification puts the scientific
knowledge into the data
RDF is a technology for data representation,
OWL is a technology for classification
Hastings et al. Journal of Cheminformatics 2012
4:8 doi:10.1186/1758-2946-4-8
21.08.2012 9
The Web Ontology Language (OWL)
Hierarchical
organisation Synonyms
Cross-references
Logical
definitions
Can be re-used across
data sources
root
leaves
Hastings et al. Journal of Cheminformatics 2012
4:8 doi:10.1186/1758-2946-4-8
21.08.2012 10
Chemical entity
carboxylic acid
acetylsalicylic acid
(aspirin) chlorfenvinfos
organophosphorous
compound
aldehyde
organic molecular entity
inorganic molecular entity
pyridoxal
(vitamin B6)
sodium chloride
Molecular entity
Group hydroxy
group
Chemical substance
ChEBI
21.08.2012 11
InChI=1S/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3
SMILES=Cn1cnc2n(C)c(=O)n(C)c(=O)c12
Name=caffeine
Name=1,3,7-trimethyl-3,7-dihydro-1H-purine-2,6-dione
Identifier=KEGG COMPOUND:C07481
…
…
rdf:type
owl:subClassOf
Your data, your favourite identifier
21.08.2012 12
21.08.2012 13
Thanks
Christoph Steinbeck
Marcus Ennis, Gareth Owen, Steve Turner, Adriano Dekker,
Venkatesh Muthukrishnan, ChEBI users
Leonid Chepelev, Michel Dumontier, Colin Batchelor, Evan
Bolton, Nico Adams, Egon Willighagen, Despoina Magka,
Robert Stevens, Andrew Dalke
Funding: BBSRC, EU
21.08.2012 14
Questions?
Top Related