Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous...

26
Data and Ontologies Colin Batchelor, Royal Society of Chemistry Leah McEwen, Cornell University

Transcript of Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous...

Page 1: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Data and Ontologies

Colin Batchelor, Royal Society of Chemistry Leah McEwen, Cornell University

Page 2: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Overview

Thinking about chemical safety

Representing experiments

Ontologies and the IUPAC Colour books

Gaps

Page 3: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

What can ontologies do for my use case?

Provide a controlled vocabulary for referents – what your data describes, where it came from.

Provide a shared vocabulary for integrating with other people’s data.

Page 4: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Safety

Recall the noun classes in Dyirbal (a language spoken in Queensland) (1) Men, most animate objects (2) Women, fire and dangerous things (3) Edible fruit and vegetables (4)  Things not mentioned in the first three classes What dangerous things should we be identifying?

Page 5: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Caution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during this work, it should be treated with great caution owing to their potential explosive nature. Thus, it should only be handled in small amounts.” doi:10.1039/b702988h “Whilst no problems were encountered in the course of this work, perchlorate mixtures are potentially explosive and should therefore be handled with appropriate care.” doi:10.1039/b304841a “Thallium compounds are highly toxic; they should therefore be handled with extreme caution and all operations must be carried out in an efficient fume hood.” doi:10.1039/b102192n “we have not seen deflagration or detonation of any unconfined samples in the ignition experiments, some salts with high-oxygen and high-nitrogen content are known to be explosives, so appropriate precautions are advisable with new compounds” doi:10.1039/b602086k “Metal azide complexes are potentially explosive. Only a small amount of material should be prepared and should be handled with caution” doi:10.1039/b106314f “perchlorate salts of metal complexes are potentially explosive” doi:10.1039/b005671p “Special caution was taken in the handling of fluoranthene and in the preparation of the MIPs.” doi:10.1039/b502706c “anhydrous HF is an extremely corrosive and low boiling gas (19.5 °C) and should be handled in a well ventilated hood with protective gloves, face mask and clothing.” doi:10.1039/b206168f

Page 6: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

A ‘simple’ taxonomy of dangerous things

Dangerous chemicals: perchlorates, thallium compounds, polonium, azides

… which have the disposition to take part in…

Dangerous processes: explosion, decarbonylation

… under certain experimental conditions…

… and these processes can be prevented (blocking the dispositions) or mitigated by…

Page 7: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Safety measures

Fume hoods mitigate carbon monoxide emission (part of decarbonylation)

Protective clothing prevents burning.

Greener solvents improve waste handling.

How do we express all this in an ontological framework?

Page 8: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Representing processes (1)

RDF is based on binary relations.

A process: “Brutus stabbed Caesar”

Simplest RDF form: ORCID:Marcus_Junius_Brutus  ONT:stabbed  ORCID:Gaius_Julius_Caesar  .  

Unsafe workplace practices, 44 BC

The Death of Caesar, Vincenzo Camuccini (1771–1844), via Wikipedia.

Page 9: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Representing processes (1)

A naïve representation of: “Brutus stabbed Caesar”:

Subject:  ORCID:Marcus_Junius_Brutus  Predicate:  ONT:stabbed  Object: ORCID:Gaius_Julius_Caesar  .  

Page 10: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Representing processes (2)

But what about “Brutus stabbed Caesar in the Senate on the Ides of March”?

Make the focus of the RDF the process rather than the participants.

Hence (next page):

Page 11: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Representing processes (2) _:e1  a  ONT:stabbing;  

 ONT:has_agent  ORCID:Marcus_Junius_Brutus;  

 ONT:has_patient  ORCID:Gaius_Julius_Caesar;  

 ONT:has_location  ONT:Senate;  

 ONT:at_time  "-­‐0043-­‐15-­‐03T14:00:00"^^xsd:datetime  .    

We can now add arbitrarily many facts about this event without minting too many new predicates; better for OWL; better for risk assessment.

Page 12: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Example synthesis

purging addition

addition

dinitrogen

pivaloyl chloride

tert-butyl alcohol

stirring and heating

addition with stirring

trifluoromethanesulfonic acid

flask

cooling

heating mantle

ice bath

precipitation

washing

diethyl ether

diethyl ether

drying filter

air

PRODUCT

Page 13: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Chemical processes: OreChem OreChem has a planning/enactment split.

From OreChem’s perspective, the account of planning is prospective and the account of enactment is retrospective.

Planning relates processes to other processes. Such-and-such a process follows another.

Enactment relates the products of processes to other products of processes. Such-and-such an artefact is produced from another.

Page 14: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Method signatures

A sample preparation step takes a material entity and converts it into some other. (m → m)

Detection methods and measurement methods take material entities and produce data. (m → d)

Data transformation methods take data and transform it into other data (d → d)

Page 15: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Ontology scope

Scale Process signature Molecular Laboratory m → m RXNO CHMO m → d CHMO CHMO d → d CHEMINF

Page 16: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Chemical processes: CHMO

CHMO is a Chemical Methods Ontology that provides classes that describe both the processes (in a planning view) and the artefacts (in an enactment view).

From the safety discussion: where are the gaps? •  Connections to chemical hazard information

resources (GHS and Bretherick’s) •  More thorough description of common lab apparatus •  Combined processes (addition while stirring)

Page 17: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Other chemical hazard and safety concepts

Page 18: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Ontologies and the IUPAC Colour Books: an important question

Is an ontology the best tool for codifying a colour book? If it contains terminological recommendations, then these definitions can be used for the ontology. Examples follow.

Page 19: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Red Book: counterexample This specifies an algorithm for relating names to structures rather than a definition. It could be part of a recipe for generating definitions for an identified set of hydride molecules.

Page 20: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Gold Book example

“Phototransistor = A bipolar transistor with its base-collector junction acting as a photodiode, which, if irradiated, controls the response of the device.”

This is a classical Aristotelian (genus–differentia) definition and is well suited to an ontology.

Page 21: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Green Book examples

Definitions of fundamental constants would fit well into an appropriate ontology: Notational recommendations would fit better into an automated article checker or writing assistant:

Page 22: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Ontologies and the IUPAC Colour Books: an overview

Page 23: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

Ontologies and the IUPAC Colour Books: where are the gaps?

ChEBI

CHMO

gap

gap OPSIN

gap gap

gap

OBO ontologies

Page 24: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

What if nothing already exists?

Are there vocabulary recommendations of the right sort?

(Suited to an ontology rather than some other sort of tool.)

If not, and in any case these will be incomplete, try…

Page 25: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

a Gedankenexperiment

… following the event-based approach described earlier to divide your domain into processes and their participants.

Page 26: Data and Ontologies - bulletin.acscinf.orgbulletin.acscinf.org/PDFs/247nmACS03.pdfCaution! Dangerous things! “Although we have encountered no problems in handling Cu-azido during

www.irampp.org/blog