Molecular Similarity Searching Using Atom Environments and Surface Point...

46
Molecular Similarity Searching Using Atom Environments and Surface Point Environments Andreas Bender [email protected] Unilever Centre for Molecular Informatics Cambridge University, UK

Transcript of Molecular Similarity Searching Using Atom Environments and Surface Point...

Page 1: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Molecular Similarity Searching Using Atom Environments and Surface

Point Environments

Andreas [email protected]

Unilever Centre for Molecular Informatics Cambridge University, UK

Page 2: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Outline• Objective: More efficient searching of

chemical databases• New methods developed to detect molecules

with similar biology: One is based on connectivity (2D), the other on surface points (3D)

• Details of the algorithms presented here, starting with the 2D type

• Results: Lead Discovery – finding new drugs, finding new chemotypes

• Feature: Discovering Binding Patterns

Page 3: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Descriptor Choice

Page 4: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

2D: Environment around an atom

• E.g. 6-aminoquinoline Assign Sybyl mol2 atom typesfind connectionsfind connections to connectionscreate a tree down to n levels‘bin’ the atom types for each levelcreate a ‘fingerprint’ for this atom

N2

Car Car

Level 0

Level 1

Level 2Car, Car, Car

1 21 1These features are created for every (heavy) atom in the molecule

Page 5: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Feature Selection

• E.g. comparing faces first requires the identification of key features.

• How do we identify these?

• The same applies to molecules.

Page 6: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

B) Information-Gain Feature Selection

• We wish to select the important features.• To do this we calculate the entropy of the

data as a whole and for each class.• This is used to select those features with the

highest discrimination, e.g. active and inactive molecules. ppS ∑−= 2log

vv

v SSS

SI ∑−=Entropy of the

whole setEntropy of subsets

Information gain (to be maximized)

Page 7: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Classification

• The next step is to identify which molecules belong to which class.

• To do this we use a Naïve Bayesian Classiferusing the features (atom environments) we have identified as being important.

Page 8: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

C) Naïve Bayesian Classifier (“classification by presumptive evidence”)

• Include all selected features fi in calculation of

• Ratio > 1: Class membership 1• Ratio < 1: Class membership 2• F: feature vector• fi:feature elements

∏=i i

i

CLfPCLfP

CLPCLP

FCLPFCLP

)|()|(

)()(

)|()|(

2

1

2

1

2

1

Page 9: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Application: lead discovery• Database: MDL Drug Data Report (MDDR)• 957 ligands selected from MDDR

49 5HT3 Receptor antagonists,40 Angiotensin Converting Enzyme inhib. (ACE),111 HMG-Co-Reductase inhibitors (HMG),134 PAF antagonists and49 Thromboxane A2 antagonists (TXA2)574 “inactives”[Briem and Lessel, Perspect Drug Discov Des 2000, 20, 245-264.]

• Calculated Hit rate among ten nearest neighbours for each molecule

Page 10: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Comparison

Briem and Lessel, Perspectives in Drug Discovery and Design 2000, 20, 245-264.

0%

10%

20%

30%

40%

50%

60%

70%

80%

AE - Tan

imoto

Featur

e Tree

sAE - B

ayes

ISIS M

OLSKEYS

DAYLIGHT Fing

erprin

ts

SYBYL Holo

gram Q

SARFLE

XSIM-X

FLEXSIM

-S

DOCKSIM

Aver

age

Hit

Rat

e

Using BayesianUsing Tanimoto Coefficient

Page 11: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Combining Information in Molecules

• In this method, we can extend the approach by extracting from a set of molecules those features having the best information gain

• This can describe patterns in molecules much better than individual cases

Page 12: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Combining Information of 5 “Actives”

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

AE - Bay

es, 5

Mol

AE - Tan

imoto

Featur

e Tree

sAE - B

ayes

ISIS MOLS

KEYS

DAYLIGHT Fing

erprin

ts

SYBYL Holo

gram Q

SARFLE

XSIM-X

FLEXSIM

-SDOCKSIM

Ave

rage

Hit

Rat

e

Page 13: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Comparison using Large Data Set *• 102,000 structures from the MDDR• 11 Sets of Active Compounds, ranging in size from

349 to 1246 entries – large and diverse data set• Performance Measure: Fraction of Active Structures

retrieved in Top 5% of sorted library• Atom Environments were compared to Unity

Fingerprints in Combination with Data Fusion (MAX) and Binary Kernel Discrimination

• In case of Binary Kernel Discrimination and the Bayes Classifier 10 actives and 100 inactives used for training

* Hert et al., J. Chem. Inf. Comput. Sci. 2004 (ASAP Article)

Page 14: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Comparison of Methods

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

5HT3

5HT1A

5HTReu

pD2 A

nta

Renin

IAng

II Anta

Throm I

Sub P

Anta

HIVProt

I

Cox I

Prot K

in C I

Averag

e

Unity SingleAE SingleData-Fusion MAXBKD,10a100i,k=100AE 10a100i

Page 15: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Conclusions 2D Method

• Atom Environments suitable descriptor, perform well with Tanimoto

• Atom Environments / Bayesian Classifier outperform Unity Fingerprints in combination with Data Fusion and Binary Kernel Discrimination on a Large Dataset -> information fusion prior to screening superior

• Average Hit Rate ~ 10% higher (65% vs. 57%) than the second best method

• Results on diverse targets may imply that method is generally applicable at high performance levels

Page 16: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Transformation to 3D

• Idea: To develop an analogous translationally and rotationally invariant (TRI) descriptor based on surface points

• Advantage: Switching from element atom types to interaction energies gives more general model -> scaffold hopping?

• In Addition: Local Description hopefully less conformationally dependent

• Approach to Fingerprint Surfaces; Tanimoto and other methods become applicable (until now mainly used for 2D fingerprints)

Page 17: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Transformation to 3D

• Two parts: Interaction fingerprint and shape description; here results using only interaction fingerprints are shown, shape description under development

• Information was merged from multiple molecules by using information-gain feature selection and the Naïve Bayesian Classifier

Page 18: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

3D: Environment around a surface point: solvent accessible surface

Central Point (“Layer 0”)

Points in Layer 1

Points in Layer 2

Etc.

Page 19: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Algorithm

Interaction Energies at Surface Points, one Probe at a time

00010000 – 01100010 - 011101100

-0.35 EU

Binning Scheme-1.0

-0.45-0.4-0.3-0.10.00.2

Surface Point Environment

-0.35

Page 20: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Relation to other algorithms• Surface Autocorrelation: Averaging of

interaction energies – Here a favourable and unfavourable interaction in a given layer will both remain in the fingerprint

• GRIND: continuous variables from GRID; entire field of interaction energies; simplified; only maximum product enters descriptor

• MaP: categorical variables, counts are kept –size description

• (In addition the feature selection and scoring are handled differently)

Page 21: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Algorithm Flow

Binning, number of bins, threshold levels

Perl script

Transformation of interaction energies into descriptors

Probe (and various others)

GRIDCalculation of Interaction Energies

Sphere radius, probe size, triangulation density

msmsCalculation of Surface Points

ConcordGeneration of 3D coordinates

ParametersProgram used

Step

Page 22: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Standard Parameters

• MSMS: Probe radius 1.5 Å, Density 0.5-2.0 Points/ Å2, double Van-der-Waals radii for atoms, giving effectively solvent accessible surface

• GRID: DRY, C3, N1+, N2, O, O- probes, otherwise standard parameters

• Binning: Using variable number of layers, 8 bits, cutoffs were set that equal frequencies are observed

Page 23: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Effect of Probe and Depth for Descriptor Generation on Overall Performance

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

ALL C3 DRY N1+ N2 O O-Probe

Aver

age

Hit

Rat

e (o

ut o

f 10)

Random

Layer0

L0-1

L0-5

L0-2

L0-3

L0-4

Parameterisation – Effect of Probe Type and Number of Layers (Briem Dataset, 5 Actives)

Page 24: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Surface Fingerprints & Tanimoto

• Tanimoto coefficient used for 2D fingerprints in combination with a variety of descriptors, here applied to surfaces

• Random Selection of single active compounds from MDDR dataset

• Calculation of average hit rates of Top 10 list for whole dataset (5HT3, ACE, HMG, PAF, TXA2)

• Question: Is “scaffold hopping” observed?• Examples: ACE, TXA2

Page 25: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Overall Performance Comparable to 2D methods

0%

10%

20%

30%

40%

50%

60%

70%

80%

AE - Tan

imoto

Featur

e Tree

sAE - B

ayes

ISIS

MOLS

KEYS

Surfac

e FP, T

anim

oto, 1

Mol

DAYLIGHT Fing

erprin

ts

SYBYL Holo

gram Q

SARFLE

XSIM-X

FLEXSIM

-SDOCKSIM

Ave

rage

Hit

Rat

e

Page 26: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Example: ACE, Query, Actives Found in Top 10, sorted

NO

O

N

O

O

O OO

N

O

O

O

N

OO

N

O N

OO

OO

NO

O

N O

O O

O O

NO

O

O

N

O

O

S

N OO

O

O

Page 27: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Example: ACE, Query, Actives Found in Top 10

NO

O

N

O

O

O OO

N

O

N

O

O

OO

NN

O

O

O

N

O

O NO

S

O

OO

N

O O

OO

PO

O

N

N

N

O

O

O

O

N

O

O

H

HH

Page 28: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

TXA2, 10 Hits among Top 10 (Sorted)

SN

OO

Br

O

O

N

O

O

NS

O

O

Cl

SN

OO

Cl

O

O O

N

S

O

N

O O

S

O

N

O O

S

N O

N

N

N

O

N

1

2

3

45

Stereoisomers

May be Cl

Para-Halide Sulfonamide

Page 29: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

TXA2, 10 Hits among Top 10 (Sorted)

SN

OO

Br

O

O

N

O

OO

O

O

O

NS OO

NO

O

O

Cl

N

O

N O

O

O

S

O

Br

N

O OS

O

O

O

N O

N

O

OO

6 7

8

9

10

Page 30: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Surface Environments – Merging Information

0%

10%

20%

30%

40%

50%

60%

70%

80%

AE - Tan

imoto

Featur

e Tree

sAE - B

ayes

ISIS

MOLS

KEYS

Surfac

e FP, B

ayes

, 5 M

ol

Surfac

e FP, T

anim

oto, 1

Mol

DAYLIGHT Fing

erprin

ts

SYBYL Holo

gram Q

SARFLE

XSIM-X

FLEXSIM

-SDOCKSIM

Ave

rage

Hit

Rat

e

Page 31: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Conformational Variance

• MDDR Dataset (5HT3, ACE, HMG, PAF, TXA2)

• 10 Randomly selected compounds each• 10 Conformations generated by GA search

with large window (10° for rigid 5HT3, 100°for ACE, HMG, PAF, TXA2), giving diverse conformations

• One force field optimized conformation (Concord-generated) used to find other conformations of the same molecule in whole database of 937 structures, using Tanimoto Coefficient

Page 32: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Overall findings• 64% of conformations found at the top 10

positions -> 2/3 of compounds identified as being most similar (among list of > 900 structures and 40-134 structures of same active dataset)

• >90% of conformations found in Top 5% of sorted database

• Conclusion: If molecules with the right features are present in the database, they will not be missed (in most cases) because they are represented by a particular conformation

Page 33: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Example: 5HT3-0 (Rigid) – all 10 Conformations identified as identical

O

N

NO

OCl

N

Page 34: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

ACE-7 – 9 Conf. identified as identical

N

O

O

ON

OOO

NO

O

N

H

H

NO

10th hit

Page 35: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Which features are selected for classification?

• Even if your classifier works, do the selected features make sense?

• Set of active vs. inactive molecules• Information Gain calculated for each feature,

those which are much more frequent among actives are “suspicious” and might constitute the pharmacophore

• Look at features from ACE, HMG and TXA2

Page 36: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Selected Features - HMG

• Binding Site: HMG + rigid lipophilic ring

Page 37: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

HMG-15

Page 38: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

HMG-19

Page 39: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

ACE – Binding Site

Snake venom peptide analog with putative binding motif to angiotensin used in early compound design(Cushman et al., Biochemistry (1977), 16, 5484-5491.)

O

NH

CH2 O

NH CH

CH3 O

N O

O

X X XH X: S1 Zn++ H S1’ H S2’ +

O

NH

CH2 O

NH CH

CH3 O

N O

O

X X XH X: S1 Zn++ H S1’ H S2’ +

Page 40: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Selected Features – ACE-31

Page 41: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Selected Features – ACE 39

Page 42: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

TXA2

• Yamamoto et al., J. Med. Chem. 1993 (36) 820

Yellow: lipophilic side chains

Page 43: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

TXA2-44

Page 44: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

TXA2-7

Page 45: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Summary• 2D Method: Performs about as other 2D methods

for single molecule searches, outperforms them by a large margin when combining information from multiple molecules (published in J. Chem. Inf. Comput. Sci. (2004) 44, 170-178)

• 3D Method: TR invariant, conformationally tolerant; combines high enrichment factors with scaffold hopping – discovery of new chemotypes

• Features shown to correlate with binding patterns• Performance (at least in part) due to Bayesian

Classifier, which is able to take multiple structures and active and inactive information into account

Page 46: Molecular Similarity Searching Using Atom Environments and Surface Point Environmentscisrg.shef.ac.uk/shef2004/talks/ABender.pdf · 2004-04-27 · Molecular Similarity Searching Using

Acknowledgements

• Robert C Glen (Unilever Centre, Cambridge, UK)• Hamse Y. Mussa (Unilever Centre, Cambridge, UK)• Stephan Reiling (Aventis, Bridgewater, USA)• David Patterson (Tripos)

Software• GRID, CACTVS, gOpenMol; many, many others

Funding• The Gates Cambridge Trust, Unilever, Tripos