Linking Open Drug Data to Cheminformatics and Proteochemometrics

26
Linking Open Drug Data to Cheminformatics and Proteochemometrics Egon Willighagen <http://chem-bla-ics.blogspot.com/> Bioclipse & Proteochemometric Group (Prof. Wikberg) Department of Pharmaceutical Biosciences Uppsala University 2009-11-20

description

My talk at SWAT4LS 2009 in Amsterdam.

Transcript of Linking Open Drug Data to Cheminformatics and Proteochemometrics

Page 1: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Linking Open Drug Data toCheminformatics andProteochemometrics

Egon Willighagen <http://chem-bla-ics.blogspot.com/>

Bioclipse & Proteochemometric Group (Prof. Wikberg)Department of Pharmaceutical Biosciences

Uppsala University

2009-11-20

Page 2: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Knowledge...

Solanum lycopersicum...

We model our world, but ...Life is not uni- or bivariateKnowledge is not eitherBut we think of it as suchInformation Loss!

2009-11-20 Bioclipse & Proteochemometric Group - 2 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 3: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Names...

benzene3-[4-[3-(1-methyl-7-oxo-3-propyl-4H-pyrazolo[4,3-d]pyrimidin-5-yl)-4-propoxyphenyl]sulfonylpiperazin-1-yl]propanoicacidInChI=1S/C25H34N6O6S/c1-4-6-19-22-23(29(3)28-19)25(34)27-24(26-22)18-16-17(7-8-20(18)37-15-5-2)38(35,36)31-13-11-30(12-14-31)10-9-21(32)33/h7-8,16H,4-6,9-15H2,1-3H3,(H,32,33)(H,26,27,34)

2009-11-20 Bioclipse & Proteochemometric Group - 3 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 4: Linking Open Drug Data to Cheminformatics and Proteochemometrics

... Molecular reality...

1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000

2009-11-20 Bioclipse & Proteochemometric Group - 4 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 5: Linking Open Drug Data to Cheminformatics and Proteochemometrics

... and Numbers

2009-11-20 Bioclipse & Proteochemometric Group - 5 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 6: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Knowledge Representation: InformationLoss

2009-11-20 Bioclipse & Proteochemometric Group - 6 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 7: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Data Analysis

2009-11-20 Bioclipse & Proteochemometric Group - 7 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 8: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Proteochemometrics

2009-11-20 Bioclipse & Proteochemometric Group - 8 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 9: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Main Theme

How do we navigate dimensionality space?How include prior knowledge?While minimizing information loss?With optimal knowledge extraction?And maximizing interpretability?Without ending up in random correlation?

2009-11-20 Bioclipse & Proteochemometric Group - 9 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 10: Linking Open Drug Data to Cheminformatics and Proteochemometrics

OpenMolecules RDF: dereferenceable URI

http://rdf.openmolecules.net/

2009-11-20 Bioclipse & Proteochemometric Group - 10 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 11: Linking Open Drug Data to Cheminformatics and Proteochemometrics

OpenMolecules RDF: linked data

http://rdf.openmolecules.net/

2009-11-20 Bioclipse & Proteochemometric Group - 11 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 12: Linking Open Drug Data to Cheminformatics and Proteochemometrics

The Chemistry Development Kit

A Family of ProjectsCDK-Taverna (chemoinformatics workflows)JChemPaint (semantic 2D editor)ChemoJava (GPL-ed extension)

Goalslibrary of cheminformatics algorithmseducational

UsageCDK: 100+ times cited in scientific literatureBioclipse, KNIME, Jumbo (CML), AMBIT, ...

C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003C. Steinbeck et al., Curr.Pharm.Design, 2006

2009-11-20 Bioclipse & Proteochemometric Group - 12 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 13: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Bioclipse

O. Spjuth et al., BMC Bioinformatics 2007, 8:59

2009-11-20 Bioclipse & Proteochemometric Group - 13 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 14: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Integration

Servicesdatabases: PubChemweb servicesGoogle SpreadsheetsMyExperiment.org: BioclipseScripting LanguageTwitter, ...journals, ...

TechniquesSOAP, REST, XMPP, . . .Resource Description Frameworkdedicated APIs

2009-11-20 Bioclipse & Proteochemometric Group - 14 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 15: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Bioclipse-RDF

local RDF storageread/write RDF/XML, N3run SPARQL queries (local and remote)extract RDF from XHTML/RDFa

Thanx to Jena and Pellet.

2009-11-20 Bioclipse & Proteochemometric Group - 15 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 16: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Quote of the Day

"There are too many people doing data integration,this is a waste of a lot of smart people’s time"

@alanruttenberg at #swat4ls2009 via dullhunk - twitter

2009-11-20 Bioclipse & Proteochemometric Group - 16 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 17: Linking Open Drug Data to Cheminformatics and Proteochemometrics

SPARQL end points

GNU FDLNMRShiftDB data (also available via Bio2RDF)

CC0ChemPediaOpen Notebook Science Solubility

2009-11-20 Bioclipse & Proteochemometric Group - 17 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 18: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Names 2 Graphs 2 Numbers...

2009-11-20 Bioclipse & Proteochemometric Group - 18 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 19: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Disease 2 PDB

2009-11-20 Bioclipse & Proteochemometric Group - 19 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 20: Linking Open Drug Data to Cheminformatics and Proteochemometrics

CDK as RDF

model1:atom1a cdk:Atom ;cdk:hasFormalCharge "1" ;cdk:symbol "O" .

model1:atom2a cdk:Atom ;cdk:symbol "C" .

model1:mol1a cdk:Molecule ;dc:title "Methanol" ;owl:sameAs <http://rdf.openmolecules.net/?InChI=1/CH4O/c1-2/h2H,1H3> ;cdk:hasAtom model1:atom2 ,

model1:atom1 ;cdk:hasBond model1:bond1 .

2009-11-20 Bioclipse & Proteochemometric Group - 20 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 21: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Proteochemometrics

2009-11-20 Bioclipse & Proteochemometric Group - 21 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 22: Linking Open Drug Data to Cheminformatics and Proteochemometrics

OWL for Descriptors

Used for model and data.

2009-11-20 Bioclipse & Proteochemometric Group - 22 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 23: Linking Open Drug Data to Cheminformatics and Proteochemometrics

MyExperiment: Bioclipse ScriptingLanguage

2009-11-20 Bioclipse & Proteochemometric Group - 23 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 24: Linking Open Drug Data to Cheminformatics and Proteochemometrics

What does this bring us?

Platform to integrate the RDF with the computation worldBioclipse as single point of accessScripting, sharing of scripts with MyExperiment.orgBridge the nominal with the numerical world

2009-11-20 Bioclipse & Proteochemometric Group - 24 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 25: Linking Open Drug Data to Cheminformatics and Proteochemometrics

Where next?

FrameworkTriple generation on demand (XMPP, SADI, ...)Ontology alignmentsSemantic Mediawiki integration

ProteochemometricsKnowledge discoveryData set aggregationAutomated model validation

2009-11-20 Bioclipse & Proteochemometric Group - 25 - Egon Willighagen | chem-bla-ics.blogspot.com

Page 26: Linking Open Drug Data to Cheminformatics and Proteochemometrics

The Details

http://www.citeulike.org/user/

egonw/tag/papers

http:

//chem-bla-ics.blogspot.com

http://egonw.github.com

waveto:

[email protected]

2009-11-20 Bioclipse & Proteochemometric Group - 26 - Egon Willighagen | chem-bla-ics.blogspot.com