Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy
-
Upload
egon-willighagen -
Category
Documents
-
view
112 -
download
2
description
Transcript of Open Data, Open Source, and Open Standards in Drug Discovery, Metabolomics, Toxicoloy
ODOSOS in Life Sciences
Egon Willighagen <http://chem-bla-ics.blogspot.com/>
Prof. Peter Murray-Rust Group
Unilever Center for Molecular Informatics
University of Cambridge
2010-12-13
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
ODOSOS in Life Sciences
Open Data
Open Source
Open Standards(Speci�cations)
Drug Discovery(pharmaceuticalbiosciences)
Metabolomics
Predictive ToxicologyODOSOS in chemometrics!
2010-12-13 University of Cambridge - 2 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Knowledge...
Solanum lycopersicum...
We model our world, but ...
Life is not a latin name
Transformations areneeded
Knowledge is hidden inPDFs
Methods are hidden inproprietary software
Information Loss!
2010-12-13 University of Cambridge - 3 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Knowledge Representation: Information
Loss
2010-12-13 University of Cambridge - 4 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Not paying attention?
EL Willighagen, J. Chem. Inf. Model. 2006, 46:487-494
2010-12-13 University of Cambridge - 5 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
... Molecular reality...
1 000 000 000 000 000 000 000 000000 000 000 000 000 000 000 000000 000 000 000
... and that just the chemical graphs ...
2010-12-13 University of Cambridge - 6 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Uncertainty
Metabolomics
92.0938 m/z, glycerol?
2010-12-13 University of Cambridge - 7 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Underlying problems..
scientists are sloppy
context is lost, causing confusion
2010-12-13 University of Cambridge - 8 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Building Blocks
Statistics
Molecular Representation(cheminformatics /semantics)
HPC / eScience
2010-12-13 University of Cambridge - 9 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Reproducibility needs ODOSOS
Open Data
No Intellectual Monopoly
Open Source
algorithms are complex
implementations even more
strong interaction with representation
Open Standards
Semantic Web
formats
unique identi�ers
http: // en. wikipedia. org/ wiki/ Glyn_ Moody
2010-12-13 University of Cambridge - 10 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Open Data? Linking Data?
http://rdf.openmolecules.net/
2010-12-13 University of Cambridge - 11 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
W3C Health Care and Life Sciences
Working Group
M. Samwald, Linked Open Drug Data for Pharmaceutical Research and
Development, submitted.
2010-12-13 University of Cambridge - 12 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
But what about similarity?
identitity: owl:sameAs
stereochemistry: rdf:seeAlso ?
similar molecules: rdf:seeAlso, chem:hasHighTanimoto ?
has spectrum like ?
E.L. Willighagen, et al. Linking the Resource Description Framework to
Cheminformatics and Proteochemometrics, J. Biomed. Sem., in print.
2010-12-13 University of Cambridge - 13 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Open Source: The Chemistry Development
Kit
A Family of Projects
CDK-Taverna (chemoinformatics work�ows)
JChemPaint (semantic 2D editor)
ChemoJava (GPL-ed extension)
Goals
library of cheminformatics algorithms
educational
Usage
CDK: 100+ times cited in scienti�c literature
Bioclipse, KNIME, Jumbo (CML), AMBIT, ...
C. Steinbeck et al., J.Chem.Inf.Comput.Sci, 2003
C. Steinbeck et al., Curr.Pharm.Design, 2006
2010-12-13 University of Cambridge - 14 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Bioclipse
O. Spjuth et al., BMC Bioinformatics 2007, 8:59
2010-12-13 University of Cambridge - 15 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
QSAR Wizards
2010-12-13 University of Cambridge - 16 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Substructure Mining
A. Andersson, M.Sc. Report
2010-12-13 University of Cambridge - 17 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
OpenTox
E.L. Willighagen, in preparation2010-12-13 University of Cambridge - 18 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Chemical Translation Service
G. Wolgemuth, et al., Bioinformatics. 2010 Oct 15;26(20):2647-82010-12-13 University of Cambridge - 19 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Reference Databases?
A. Williams, Community Views and Trust in Public Domain Chemistry
Resources, 2010.
2010-12-13 University of Cambridge - 20 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
How do we use these in Chemometrics?
More clear where sources of error are
We can validate with way more data
We can aggregate new data to make better models
2010-12-13 University of Cambridge - 21 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Visualization: Self-Organizing Maps
4
4
4
4
4 4
4
4
44
4
8
4
4444
121212121212121212
2222
55
53
5
7
101010
5
1111
44
99 9
9
9
11
222
9
10
55555
1212
5
1 77
77
5
222 222
555
666666
3
12
9
10
7
5555
9
1 777
8
3
111111
3
3
3
3
33
3
1
8
3
8
1212
77
1 7
7
6
4
33
3
3
6
1111111111 3
9
9999
7
3
9
4
77
6
17
1
66
3 3
3
9
3
4
11
744
10
8888
3
6
71
88
8
8
3
222223
7
3
11
1
3
66
99
99
10
3
33
1
3
33
Non-Linear Mapping
Similar objects aregrouped together
Similar classes are groupedtogether
EL Willighagen, Crystal Growth & Design 2007, 7, 1738-1745.
2010-12-13 University of Cambridge - 22 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
What about the n spaces, you showed
earlier?
2010-12-13 University of Cambridge - 23 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Bayesian Statistics
E.L. Willighagen, et al. Linking the Resource Description Framework to
Cheminformatics and Proteochemometrics, J. Biomed. Sem., in print.
2010-12-13 University of Cambridge - 24 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Blue Obelisk
R Guha et al., J.Chem.Inf.Model.,
2006
2010-12-13 University of Cambridge - 25 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
Changes are coming!
2010-12-13 University of Cambridge - 26 - Egon Willighagen | chem-bla-ics.blogspot.com
Setting
Problems
BuildingBlocks
Chemometrics
Conclusion
The Details
http://www.citeulike.org/user/
egonw/tag/papers
http://chem-bla-ics.blogspot.com
http://egonw.github.com
2010-12-13 University of Cambridge - 27 - Egon Willighagen | chem-bla-ics.blogspot.com