Representation of chemical data in QSAR and Crystallography
-
Upload
egon-willighagen -
Category
Health & Medicine
-
view
106 -
download
1
description
Transcript of Representation of chemical data in QSAR and Crystallography
Representation of chemical data in QSAR and crystallography
–Egon Willighagen, Lunteren 2005
Computer representation of molecular structures
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
•Connection Table
•Dietz Representation
•Physical Properties•Molecular Invariants
•Schrödinger Equation
•Spectra (NMR, IR, ...)
Computer representations of molecular structures
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
•Connection Table
•Dietz Representation
•Physical Properties•Molecular Invariants
•Schrödinger Equation
•Spectra (NMR, IR, ...)
Representing relations between descriptors
•• Descriptor ontology: explicit definition of descriptor
types and descriptor properties•••••
•C.Steinbeck, C.Hoppe, S.Kuhn, M.Floris, R.Guha, E.L.Willighagen, Recent Developments of the Chemistry Development Kit (CDK) - An Open-Source Java Library for Chemo- and Bioinformatics, Current Pharmaceutical Design, accepted
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Representation does make a difference
• Use of NMR spectra in Quantitative Structure Activity Relationship (QSAR) modeling
• Three representations: simulated 1H NMR, 13C NMR spectra and theoretical descriptors
• Three data sets:– water solubility of 431 compounds (WS)– boiling points of 277 compounds (BP)– LogP values of 154 compounds (LogP)–
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
•E.L.Willighagen, H.M.G.W.Denissen, R.Wehrens, L.M.C.Buydens, On the use of 1H and 13C NMR spectra as QSAR descriptors, submitted
How the experiment is performed...
• Partial Least Squares
– 220 NMR bins– 220 randomly selected theoretical descriptors (Dragon)–– five random divisions in training and test sets–– number of latent variables chosen with leave-one-out cross
validation
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Number of latent variables
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
1H and 13C NMR versus Dragon Descriptors
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Prediction Errors
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Model interpretation?
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
What can we conclude?
• Representation has a large effect
• 1H NMR models have no predictive power
• 13C NMR models have some predictive power ... but no advantages
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Finding a representation for crystal structures
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Electronic Radial Distribution Function (ReDF)
•• Describes patterns in atom interactions in and around the
unit cell
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
• Å
•E.L.Willighagen, R.Wehrens, P.Verwer, R.de Gelder, L.M.C.Buydens, A Method for the Computational Comparison of Crystal Structures, Acta.Cryst., 2005, B61, 29-36
Quantifying Similarities
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Quantifying Similarities
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
ReDF 1
ReDF 2
Weighted
Cross
Correlation
Similarity [0,1]
Cephalosporin crystal structures
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Polymorph Prediction
• Polymorphs: different crystal structures for the same molecular compound
• Polymorph Prediction: computational method to predict the polymorphs given a molecular structure
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Polymorphic estrone crystal structures
• Better trend in similarity going from identical to different structures :
ReDF + WCC Cerius2
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Conclusions
• It is important to pick a proper representation
• 1H and 13C NMR spectra are not good representations for QSAR models
• A new crystal structure descriptor for gives chemically better interpretable similarities
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005
Acknowledgments
● Ron Wehrens, Lutgarde Buydens (supervisors)
● René de Gelder, Paul Verwer (crystal structures)
● Harm Denissen (QSAR)
● Peter Murray-Rust (Cambridge University, UK) Christoph Steinbeck (Cologne University, DE)
● NWO (for financial support)
Representation of chemical datain QSAR and crystallography
Egon Willighagen, Lunteren 2005