Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling
-
Upload
yael-hodge -
Category
Documents
-
view
36 -
download
0
description
Transcript of Shuxing Zhang, Alexander Golbraikh and Alex Tropsha The Laboratory for Molecular Modeling
Development of Novel Geometrical Chemical Descriptors and Their Application to the
Prediction of Ligand-Protein Binding Affinity
Shuxing Zhang, Alexander Golbraikh and Alex Tropsha
The Laboratory for Molecular ModelingSchool of Pharmacy
University of North Carolina at Chapel Hill
April 19, 2023
Problem
Given a protein-ligand complex, predict ligand binding affinity.
Knowledge-based (Statistical) Potentials
• Two Body potentialsPMF Muegge, I.; Martin, Y.C.; J.Med.Chem.1999, 42, 791-804
BLEEP Mitchell, J.B.; Laskowski R.A.; Alex A.; Thornton, J.M.; J. Comp. Chem.
1999, 20,1165-1176 DrugScore Gohlke, H.; Hendlich, M.; Klebe,G.; J Mol Biol 2000, 295, 337-356
SMoG DeWitte, R. S.; Shakhnovich, E.I. J Am. Chem. Soc. 1996, 118,11733-11744 SMoG2001 Ishchenko. A. V.; Shakhnovich, E. I.; J. Med. Chem. 2002, 45,
2770-2780 • Four-Body contact potential (By Jun Feng)
Full Atom-based Delaunay tessellation of Protein-ligand Interface (5HVP)
RRRLRRLLRLLL
RRRL: Formed by 3 receptor atoms and 1 ligand atomsRRLL: Formed by 2 receptor atoms and 2 ligand atomsRLLL: Formed by 1 receptor atoms and 3 ligand atoms
Three Types of Tetrahedra at Protein-ligand Interface
LRRR
RRRLRRRL ff
fE ln
LLRR
RRLLRRLL ff
fE ln
LLLR
RLLLRLLL ff
fE ln
Earlier work: Four-Body Statistical Contact Scoring Function Based on Delaunay
Tessellation
R2 = 0.4678-100
-80
-60
-40
-20
0
-100 -80 -60 -40 -20 0
DDG, calc
DDG,
exp
RLLLRRLLRRRL EEEE
Correlation between experimental and calculated binding free energy for PMF dataset using four-body scoring function
Training Set size
Test Set size
Test Set R2
BLEEP 351 90 0.53
PMF 697 77 0.61
SMoG96 120 46 0.42
SMoG2001 725 111 0.436
DT2001 319 67 0.71
DT2002 319 107 0.54
Comparison of Current Scoring Functions
Multiple CG descriptors of protein-ligand interface and correlation with ligand affinity
• Define the ligand-receptor interface by the means of DT
• Calculate chemical descriptors for nearest neighbor atom quadruplets.
• Use statistical data modeling approach to correlate descriptors and affinity
µ: Electronegativity (chemical potentials) of atoms
Q: Partial charges on atoms
Η: Hardness kernel
Descriptors derived from atomic electronegativity
Ligand Atom TypesO EN = 3.4
N EN = 3.0
C EN = 2.5
S EN = 2.4
X P and Halgens, EN = 2.0 ~ 2.4, 4.0
M Metal and all other unexpected atom types, EN = 0.6 ~ 1.6
Receptor Atom TypesO EN = 3.4
N EN = 3.0
C EN = 2.5
S EN = 2.4
There are 554 possible interfacial quadruplet composition types. After processing 517 complexes, 100 are found to occur with high frequency (at least 50 times).
Atom Type Definition based on En values
m: m-th tetrahedral composition typej: Vertex of a tetradedronn: Number of m-th composition type
Thus, there are 100 descriptors for each protein-ligand complex
Descriptor Calculation
S_L
C_R
O_L
N_R
2.5
2.4
3.0
3.4
n
i jijmEN
1
4
EN
Flowchart of Novel Descriptor GenerationFlowchart of Novel Descriptor Generation
Process files and assign atom type
based on EN value
Define interaction interface with DT and record all interfacial tetrahedra
264 complexes
Classify interfacial tetrahedra into different composition
types and calculate their EN values (Descriptors)
Correlate with
Binding
Data ModelingData Modeling
Structure Binding CG Descriptors
Comp.1 Value1 D1 D2 D3 D4
Comp.2 Value2 " " " "
Comp.3 Value3 " " " "
Comp.N-264 Value264 " " " "
- - - - - - - - - - - - - -
Goal: Establish correlations between descriptors and the binding affinity capable of predicting binding of novel complexes
{Binding affinity} = K{descriptor diversity}^
0
5
10
15
20
25
30
Complex Families
Num
ber o
f Com
plex
es
Diversity of the dataset: 264 Complexes, 33 families
Only accept models that have a
q2 > 0.6R2 > 0.6, etc.
Multiple Training Sets
Validate Predictive Models with Randomly Selected
External Sets (24)
Data Modeling WorkflowData Modeling Workflow
264 Complexes
Multiple Test Sets
Variable Selection kNN to build modelsSplit 240 into
Training and Test Sets
Binding Prediction
Y-Randomization
Randomly Exclude 24 Complexes as
External Set
Leave out one complex from the training set and calculate distance between the eliminated and all remaining compounds
(in the original 100 descriptor space)
k Nearest Neighbork Nearest Neighbor (k (kNN) with Variable SelectionNN) with Variable Selection
Randomly select a subset of descriptors (a hypothetical descriptor pharmacophore)
Leave out a complex
Find k nearest neighbors in the training set
Predict the binding affinity of the eliminated complex by weighted kNN using the identified k nearest neighbors.
Select acceptable models (with q2 > 0.6)Calculate the predictive ability (q2) of the model
N
times
N
times
SA
0
2
4
6
8
10
12
0 2 4 6 8 10 12
Actual PKi
Pre
dic
ted
PK
i
Correlation of Actual ~ Predicted Binding Affinity for 49 Test Set Complexes
0
2
4
6
8
10
12
0 2 4 6 8 10 12
Actual PKi
Pre
dict
ed P
Ki
Correlation of Actual ~ Predicted Binding Affinity for 24 Complexes with Best Model
Training Set size
Test Set size
Test Set R2
BLEEP 351 90 0.53
PMF 697 77 0.61
SMoG96 120 46 0.42
SMoG2001 725 111 0.436
DT2001 319 67 0.71
DT2002 319 107 0.54
CG 191 49 0.78
Comparison of Current Scoring Functions
• Novel geometrical chemical descriptors have been developed
• These simple yet fundamental descriptors can be used to predict binding affinity using correlation approaches; have high prediction power for diverse ligand-protein structures
• The statistical models can be used for fast and accurate scoring of complexes resulting from docking studies
Conclusions