Advances in Cheminformatics

40
Advances in Advances in Cheminformatics Cheminformatics Applications in Biotechnology, Drug Applications in Biotechnology, Drug Design and Design and Bioseparations Bioseparations Curt M. Breneman Curt M. Breneman Department of Chemistry and Chemical Department of Chemistry and Chemical Biology/Center for Biotechnology and Biology/Center for Biotechnology and Interdisciplinary Studies Interdisciplinary Studies Rensselaer Polytechnic Institute Rensselaer Polytechnic Institute Presented at Siena College, NY 4/29/05

Transcript of Advances in Cheminformatics

Page 1: Advances in Cheminformatics

Advances in Advances in CheminformaticsCheminformatics

Applications in Biotechnology, Drug Applications in Biotechnology, Drug Design and Design and BioseparationsBioseparations

Curt M. BrenemanCurt M. Breneman

Department of Chemistry and Chemical Department of Chemistry and Chemical Biology/Center for Biotechnology and Biology/Center for Biotechnology and

Interdisciplinary StudiesInterdisciplinary Studies

Rensselaer Polytechnic InstituteRensselaer Polytechnic Institute

Presented at Siena College, NY 4/29/05

Page 2: Advances in Cheminformatics

The Informatics ProcessThe Informatics Process

WISDOM

DATA

INFORMATION

UNDERSTANDING

KNOWLEDGE

Page 3: Advances in Cheminformatics

Representing MoleculesRepresenting Molecules

OH3C

NN

CH3

N

CH3

Page 4: Advances in Cheminformatics

Descriptors: Quantifying Molecular Descriptors: Quantifying Molecular PropertiesProperties

Page 5: Advances in Cheminformatics

Molecular Surface PropertiesMolecular Surface Properties

Electronic PropertiesElectronic Properties–– Electrostatic PotentialElectrostatic Potential

–– Electronic Kinetic Energy DensityElectronic Kinetic Energy Density

–– Electron Density GradientsElectron Density Gradients ∇ρ∇ρ••NN

–– LaplacianLaplacian of the Electron Density of the Electron Density

–– Local Average Ionization PotentialLocal Average Ionization Potential

–– Bare Nuclear Potential (BNP)Bare Nuclear Potential (BNP)

–– Fukui functionFukui function F+(rF+(r) = ) = ρρHOMO(r)

EP ( r ) =Z α

r − Rαα∑ −

ρ (r' )dr 'r − r'∫

K ( r ) = −(ψ * ∇ 2ψ + ψ∇ 2ψ *)

G (r ) = −∇ ψ * .∇ ψ

L(r) = −∇ 2ρ(r) = K (r) − G(r)

PIP ( r ) =ρ i ( r ) ε i

ρ ( r )i∑

HOMO(r)

Page 6: Advances in Cheminformatics

Why use Electron DensityWhy use Electron Density--Derived Derived Molecular Descriptors?Molecular Descriptors?

MotivationsMotivations–– Electron Density Distributions represent molecular Electron Density Distributions represent molecular

properties that are key to biological activitiesproperties that are key to biological activities

Enabling TechnologiesEnabling Technologies–– Fast methods (TAE/RECON) for obtaining electron Fast methods (TAE/RECON) for obtaining electron

densitydensity--derived propertiesderived properties

Encoding schemesEncoding schemes–– Surface Property distributions (Histograms, Wavelets, Surface Property distributions (Histograms, Wavelets,

DixelsDixels))

–– Shape/Property hybrid distributions (PEST)Shape/Property hybrid distributions (PEST)

SynergiesSynergies–– Complementary to topological descriptorsComplementary to topological descriptors

Page 7: Advances in Cheminformatics

Surface Property Distribution Histograms Surface Property Distribution Histograms (RECON/TAE) Descriptors(RECON/TAE) Descriptors

Molecular surface property distributions can be represented as Molecular surface property distributions can be represented as RECON/TAE histogram bin descriptorsRECON/TAE histogram bin descriptors

Page 8: Advances in Cheminformatics

Surface Property EncodingSurface Property Encoding

Page 9: Advances in Cheminformatics

Molecular Surface Properties:Molecular Surface Properties:Wavelet Coefficient Descriptors (WCD)Wavelet Coefficient Descriptors (WCD)

Wavelet Surface Wavelet Surface Property Reconstruction:Property Reconstruction:

16 coefficients from S7 and 16 coefficients from S7 and D7 portions of the WCD D7 portions of the WCD vector represent surface vector represent surface property densities with property densities with >95% accuracy.>95% accuracy.

1024 raw wavelet coefficients capture PIP distribution on molecular surface.

Wavelet Wavelet Decomposition:Decomposition:

–– Creates a set of Creates a set of coefficients that coefficients that represent a represent a waveform.waveform.

–– Small coefficients Small coefficients may be omitted to may be omitted to compress data.compress data.

Page 10: Advances in Cheminformatics

Wavelet Representations of HighWavelet Representations of High--Resolution Resolution Molecular Surface Property Densities. Molecular Surface Property Densities.

(1,2,10 and 20 Coefficient Decompositions)(1,2,10 and 20 Coefficient Decompositions)

Page 11: Advances in Cheminformatics

Wavelet Representations of HighWavelet Representations of High--Resolution Resolution Molecular Surface Property Densities.Molecular Surface Property Densities.

Page 12: Advances in Cheminformatics

Molecular Shape Encoding Molecular Shape Encoding

Karthigeyan Nagarajan, Randy Zauhar, and William J. Welsh, “Enrichment of Ligands for the Serotonin Receptor Using the Shape Signatures Approach” J. Chem. Inf. Model., 45, 49-57 (2005)

Curt M. Breneman, C. Matthew Sundling, N. Sukumar, Lingling Shen, William P. Katt and Mark J. Embrechts, “New developments in PEST shape/property hybrid descriptors” J. Computer-Aided Mol. Design, 17, 231–240, (2003)

Page 13: Advances in Cheminformatics

PEST: Molecular Shape/Property Hybrid PEST: Molecular Shape/Property Hybrid EncodingEncoding

PEST PEST (Property(Property--Encoded Encoded Surface Translation)Surface Translation)–– Adds shape information to encode Adds shape information to encode

the spatial relationships of surface the spatial relationships of surface propertiesproperties

Page 14: Advances in Cheminformatics

PEST Molecular Ray Tracing AlgorithmPEST Molecular Ray Tracing Algorithm

Page 15: Advances in Cheminformatics

PEST PropertyPEST Property--Encoded RaysEncoded Rays

Page 16: Advances in Cheminformatics

PEST Hybrid Shape/Property Histogram PEST Hybrid Shape/Property Histogram Convergence : Four sets of initial conditionsConvergence : Four sets of initial conditions

Page 17: Advances in Cheminformatics

Machine Learning and Machine Learning and Model BuildingModel Building

Page 18: Advances in Cheminformatics

Model Building and ValidationModel Building and Validation

DATASET

Test set

PredictiveModel

Prediction

Training set

Training Validation

Bootstrap sample k

Tuning /Prediction

LearningModel

Y-scrambling model validation!

Page 19: Advances in Cheminformatics

ξ *ξ

( ) ( )f x wx b ε= + +

( ) ( )f x wx b ε= + −

Support Vector Regression

Empirical errorε-insensitive loss function:

( ) max(0, | ( ) | )L x y f xε ε= − −

( )f x wx b= +

( )

( )

*

1 1

1

*

1

*

. + ( ) ( )

. .

, , , , 0 , 1, , 1, ,

n l

i i i ii i

n

j i i ji ji

n

i i ji j ji

i i j j

Cm in C b u vl

y u v x b

s t u v x b y

u v j l i n

νε ξ ξ

ε ξ

ε ξ

ξ ξ ε

= =

=

=

+ + + +

− − − ≤ +

− + − ≤ +

≥ = =

∑ ∑

∑… …

11

. ( )m

ii

m in C L x wε=

+∑

Linear hypotheses

Minimize:

Empirical error + Complexity

Complexity controll1-norm weight vector:

11

n

ii

w=

= ∑w

l1-norm l2-norm

Page 20: Advances in Cheminformatics

hERG: ROC Curve Comparisonsleave-one-out results from different models

Before Feature Selection After Feature Selection

Page 21: Advances in Cheminformatics

45 109 36

KPLS Test

Page 22: Advances in Cheminformatics

3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFA

PEST CoMFAYesYes

YesYes

Generate property Generate property isosurfacesisosurfaces

~10 minutes / mol.~10 minutes / mol.

PLS, kPLS, k--PLS, any PLS, any induction learners induction learners (NN, decision trees)(NN, decision trees)

DifficultDifficult

NoNoNoNoGrid resolution, Grid resolution, and fieldsand fieldsDependsDepends……PLSPLS

IntuitiveIntuitive

Align. independent

Unsupervised

Preparation

Computation Runtime

Model Building

Model Interpretation

Page 23: Advances in Cheminformatics

3D QSAR: 3D QSAR: CoMFACoMFAComparative Molecular Field Analysis Comparative Molecular Field Analysis

Standard in 3D QSAR methodsStandard in 3D QSAR methodsRequires alignmentRequires alignment

Alignment rules do not perform well in unsupervised operations!

Page 24: Advances in Cheminformatics

3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFARESULTS: trypsin

q2 (training)q2 (training) # # componentscomponents

r2 (testing)r2 (testing)

PLSPLS 0.0.6161 77 0.0.6565PLSPLS

PLSPLS

BootstrapBootstrap--PLSPLS

kk--PLSPLS

0.0.7676 99 0.0.88550.0.8787 44 0.0.75750.0.8888 44 0.0.7979

0.0.9696 44 0.0.7979

CoMFA

PEST

Page 25: Advances in Cheminformatics

3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFARESULTS

CoMFA can do this, …

What can PEST do?

Page 26: Advances in Cheminformatics

RESULTS

3D QSAR: PEST vs. 3D QSAR: PEST vs. CoMFACoMFA

PLS indicates which descriptors are most PLS indicates which descriptors are most important in the modelimportant in the model

Graphical Analysis can localize PEST descriptor Graphical Analysis can localize PEST descriptor contributions, for examplecontributions, for example……

Page 27: Advances in Cheminformatics

PEST Descriptor EP(6,1)PEST Descriptor EP(6,1)

Page 28: Advances in Cheminformatics

PEST Descriptor EP(2,5)PEST Descriptor EP(2,5)

Page 29: Advances in Cheminformatics

PEST for Protein PEST for Protein CharacterizationCharacterization

Page 30: Advances in Cheminformatics

1AO6 and 135L Protein Surfaces (EP)1AO6 and 135L Protein Surfaces (EP)

1AO6 135L

Page 31: Advances in Cheminformatics

PPEST 1AO6.epPPEST 1AO6.ep

Page 32: Advances in Cheminformatics

PPEST 135L.epPPEST 135L.ep

Page 33: Advances in Cheminformatics

PPEST 1AO6.mlp2PPEST 1AO6.mlp2

Page 34: Advances in Cheminformatics

PPEST 135L.mlp2PPEST 135L.mlp2

Page 35: Advances in Cheminformatics

P.PHENYL (RECON+MOE)

Page 36: Advances in Cheminformatics

P.PHENYL (RECON+MOE)

Page 37: Advances in Cheminformatics

P.Phenyl (RECON+PEST+MOE)P.Phenyl (RECON+PEST+MOE)

Page 38: Advances in Cheminformatics

SummarySummary

Electron DensityElectron Density--Derived molecular property descriptors Derived molecular property descriptors contain valuable physicochemical informationcontain valuable physicochemical information

TAE descriptors are useful for building virtual highTAE descriptors are useful for building virtual high--throughput screening models (ADME, bioassay)throughput screening models (ADME, bioassay)

Predictive models can be built using TAE and PEST Predictive models can be built using TAE and PEST descriptorsdescriptors

Proteins (or protein binding sites) may be characterized Proteins (or protein binding sites) may be characterized using Protein PEST techniquesusing Protein PEST techniques

Page 39: Advances in Cheminformatics

Current SoftwareCurrent SoftwareRECON 5.8 + Analyze w/Outlier detectionRECON 5.8 + Analyze w/Outlier detection–– RADRAD–– Fast KPLS test set mode with low memory footprintFast KPLS test set mode with low memory footprint

RECON for MOERECON for MOE–– DropDrop--in interactive or batch RECON 5.8 for MOE 2003in interactive or batch RECON 5.8 for MOE 2003

RECON 2001 for protein characterizationRECON 2001 for protein characterization–– Property moment descriptors (Cramer)Property moment descriptors (Cramer)–– Binding site/Binding site/ligandligand scoring using Universal Descriptor Space scoring using Universal Descriptor Space

((TropshaTropsha))TAE/DIXELTAE/DIXEL–– DNA Characterization and bioinformatics (Lawrence)DNA Characterization and bioinformatics (Lawrence)

PEST (Compatible with Gaussian or Jaguar 5.0)PEST (Compatible with Gaussian or Jaguar 5.0)–– PADPAD–– WSADWSAD–– WaveletsWavelets

Page 40: Advances in Cheminformatics

ACKNOWLEDGMENTSACKNOWLEDGMENTSMembers of the DDASSL groupMembers of the DDASSL group

–– Breneman Research Group (RPI Chemistry)Breneman Research Group (RPI Chemistry)N. N. SukumarSukumarM. SundlingM. SundlingC. Whitehead (Pfizer)C. Whitehead (Pfizer)L. L. ShenShenL. Lockwood (Albany Molecular)L. Lockwood (Albany Molecular)M. SongM. SongD. D. ZhuangZhuangW. W. KattKattQ. Q. LuoLuo

–– Embrechts Research Group (RPI DSES)Embrechts Research Group (RPI DSES)–– TropshaTropsha Research Group (UNC Chapel Hill)Research Group (UNC Chapel Hill)–– Bennett Research Group (RPI Mathematics)Bennett Research Group (RPI Mathematics)

JinboJinbo BiBi

Collaborators:Collaborators:–– Lawrence Research Group (NYS Wadsworth Labs)Lawrence Research Group (NYS Wadsworth Labs)

Inna Inna VitolVitol–– Cramer Research Group (RPI Chemical Engineering)Cramer Research Group (RPI Chemical Engineering)

FundingFunding–– NIH (GM047372NIH (GM047372--07)07)–– NSF (BESNSF (BES--0214183, BES0214183, BES--0079436, IIS0079436, IIS--9979860)9979860)–– GE Corporate R&D CenterGE Corporate R&D Center–– Millennium PharmaceuticalsMillennium Pharmaceuticals–– Concurrent PharmaceuticalsConcurrent Pharmaceuticals–– Pfizer PharmaceuticalsPfizer Pharmaceuticals–– ICAGEN PharmaceuticalsICAGEN Pharmaceuticals–– Eastman Kodak CompanyEastman Kodak Company