Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in...

17
http://www.simbiosys.ca eHiTS Score Darryl Reid , Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically derived empirical scoring function.

Transcript of Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in...

Page 1: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

eHiTS Score

Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon,

The next stage in scoring function evolution: a new statistically derived empirical scoring function.

Page 2: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Overview

● eHiTS_Score: new scoring function that takes advantage of the

temperature factors in PDB files to better capture the interaction

geometries between ligands and receptors.

● An "empirical" function is fitted to represent the statistical

interaction data and trained using experimentally derived binding

affinities

● This novel scoring function has the additional benefit of family

training based on automatic clustering of input receptor structures.

● Very good correlation to known binding affinities on very large

and diverse test set of 884 PDB structures

Page 3: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

eHiTS Algorithm● Ligands are divided into rigid fragments

and flexible connecting chains● Rigid Dock: Each fragment is docked

INDEPENDENTLY everywhere in the receptor

● Pose Match: A fast graph matching algorithm finds all matching solutions to reconstruct the original molecule

● Local Energy Optimization: structure is optimized within the receptor

● Ranking: structures are ranked based on scoring function

NO

N

O

N

O

N

O

HN

N

N

O

H2

HN

N

HN

NHN

N

HN

N

HN

N

NH

N

HN

N

H2H

2

H2

H2

H2

H2

H2

Reconnected Ligand Pose:

HN

N

NO

H2

Page 4: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Novel Approach to Scoring

● In PDB files the given coordinates are derived from a space and time averages of observed positions

● There is a temperature factor that describes the three dimensional probability density of the displacement of the atom from the specified coordinates (the resonance)

● Therefore rather than using the PDB coordinates we have used the probability functions to create a continuous function for interactions

Page 5: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Interaction Surface Point (ISP) Types

● Interactions can not be described by

distance alone, the angles to the surface

points, shown as LP and H, (α,β) as well

as the torsions between (δ) them must be

considered

HLP

d

αβ

δ

● METAL● CHARGED_HPLUS● PRIMARY_AMINE_HLP● HDONOR● WEAK_HDONOR● CHARGED_LONEPAIR● ACID_LONEPAIR● LONEPAIR

● HYDROPHOB● H_AROM_EDGE● WS_LIPO● NEUTRAL● PI_AROMATIC● PI_RESON_POLAR● PI_RESON_CARBON

● AMBIVALENT_HLP● ROTATABLE_H● ROTATABLE_LP● WEAK_LONEPAIR● PI_SP2_POLAR● PI_SP2_CARBON● HALOGEN● SULFUR

23 Surface point types:

Page 6: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Interaction Surface Point (ISP) Types

Page 7: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Statistically derived empirical scoring function

● Gathered interaction statistics from 2500 PDB structures (Gold-

Astex/PDBbind, high resolution <2.5Å)

● The probability of the geometric descriptors (d,α,β,δ) falling into

specific ranges is based on the temperature factors using

volumetric integrals

● Sum the integral values for all observed interactions in the

complexes and deposit into a 4D data array

● 4 variable analytic functions are fitted to the 4D data array

● These functions form the terms of the new scoring function

Page 8: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Family-based Training

1420 PDB Complexes

eHiTS Training eHiTS

Scoring Functions

2. Complexes are clustered automatically into 97 protein families, plus one default, global set

1. 2500 PDB complexes chosen to represent a wide range of protein families

3. eHiTS training utility optimizes scoring functions (weights) for each family

4. Scoring functions for each family are outputted and used as default scoring functions of eHiTS

Page 9: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Additional scoring terms

The 276 interaction functions are mapped to 6 weighting factors which are varied during the family-based training. In addition to these the weights of following additional terms are also optimized on a per family basis.

● steric clash (quadratic

penalty function)

● depth value within binding

pocket

● solvation

● family-coverage

● conformational strain

energy of the ligand

● intra-molecular

interactions within the

ligand

● entropy loss due to frozen

rotatable bonds

Page 10: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Tuning the component weights

● Goal function combines 4 terms:

– Convergence of local minimisation (funnel shape)

– Solution pose ranking (identify low RMSD as best)

– Correlation to experimental binding energy

– Separation of actives from decoys (enrichment)

● Stochastic (simulated annealing) + Powell engine

● Overfitting test: tune on half, test on the other half

Page 11: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Results: Docking 1568 complexes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.50

20

40

60

80

100

120 RigiDock

Cluster

PoseMatch

DockOptim

TopRank

RMSD from X-ray (Angstroms)

Pe

rce

nt c

om

ple

xes

- Resolution <= 2.5Å- 97 protein families (5+) - 349 singletons- PDB-bind 2004- Astex-GOLD validation

Closest average: 0.73ÅTopRank ave.: 1.10Å

ClosestClosest

Top RankTop Rank

Page 12: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

● eHiTS (far right) docked

59 of the 69 complexes

within 1.5Å of the x-ray

pose and 67 of 69

within 3.5Å,

outperforming the

published[1] results of

the other 5 docking

tools on this set of

proteins 1 Maria Kontayianni, Laura M. McClellan, and Glenn S. Sokol, Evaluation of Docking Performance:

Comparative Data on Docking Algorithms. J. Med. Chem. 2004, 47. 558-565.

Docking accuracy comparison

Page 13: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Correlation to binding affinity

884 PDB complexesR = 0.75q = 1.61

Page 14: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

VHTS Filter: eHiTS Filter

The eHiTS Filter is based on ligand surface points. All chemically interesting points on the surface of the ligand are assigned surface point types (SPT), indicated by triangles on the histidine ring shown. Each SPT has associated chemical properties (indicated by their color), such as H-bond donor, H-bond acceptor, hydrophobic, π-stacking, etc. The count each of the 23 surface point types creates the feature vector for that ligand.The Filter is based on the assumption that ligands with similar feature vectors have similar activity.

Feature Vector:

Ligand DB

Feature Vectorsactive

inacti

ve

Neural Network

TrainedNetwor

kfile

Feature Vectors

TrainedNetwor

kfile

eHiTS Filter

eHiTS Docking

0.9999

0.0000

Score + pose

Score + pose

Ranked List Re-rankeddocked poses

Ligands

10 21 3

Training eHiTS Filter Screening with eHiTS Filter Docking

Page 15: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Diversity of Actives and decoys● For each set of actives, the average

feature vectors was calculated

(represented by the blue star)

● The RMSD from this feature vector was

calculated for each active and decoy.

The plot below shows the average

RMSD for the actives and the decoys, as

well as the MAX RMSD for the actives

● For 15 of the 18 codes even the max

RMSD of the actives is less than the

average RMSD of the decoys

x✶

x x

x

x

x

x

x

x x

✶✶

✶✶✶

✶✶✶

x

xx

x

x x

xx

xx

x

x✶x

x

18 31 28 24 32 13 7 52 20 33 54 25 60 11 47 22 5 9

0

0.5

1

1.5

2

2.5

3

3.5

4

RMS deviations from the average feature vector of actives

Max RMSD Active

Ave RMSD Active

Ave RMSD decoy

Family Label

RM

SD

Page 16: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Enrichment results of eHiTS_Filter

eHiTS_Filter was used to screen a dataset of 869 decoys plus actives (ranging from 5 to 20). The results show remarkable enrichment across a wide range of receptor families, with the average enrichment of ~80% of the actives recovered in the top 10% of the ranked database.

Pham, T.A. and Jain, A.N. Parameter Estimation for Scoring Protein-Ligand Interactions Using Negative Training Data J. Med. Chem., 2005, 10.1021

1ajq 1bzh 1c4v 1e66 1f4g 1fjs 1fmo 1gj7 1pro 1qhc 1rnt 2qwg 2xis 3pcj 3std 4tmn 7cpa 7tim er tk0

0.2

0.4

0.6

0.8

1

1.2

20 Codes out of the 29 Surflex set - screened with eHiTS_Filter

top 10%

top 5%

top 2%

Page 17: Http:// eHiTS Score Darryl Reid, Zsolt Zsoldos, Bashir S. Sadjad, Aniko Simon, The next stage in scoring function evolution: a new statistically.

http://www.simbiosys.ca

Scoring Function

Let's define some helper functions:a(x):=P

0*x+ P

1*x2 +P

2*sqrt(x)+ P

3

b(x):=P4*x+ P

5*x2 +P

6*sqrt(x)+ P

7

g(x):=P8*(x-P

9)

c(x):=cos(g(x)) if g(x)>-п and g(x)<п, -1 otherwised(x):=P

10*x+ P

11*x2+ P

12*x3+ P

13*g(x)*g(x)+ P

14*c(x)+ P

15

t(x):=P16

*x+ P17

*x2+P18

*sqrt(x)+ P19

Then the scoring function is:f(,dist,)= a() * b() * d(dist) * t()