Download - ligand Docking - Tel Aviv Universitybioinfo3d.cs.tau.ac.il/Education/CS0304/Lecture 7...Structural Bioinformatics 2004 Prof. Haim J. Wolfson 15 Docking graph The docking graph is motivated

Structural Bioinformatics 2004 Prof. Haim J. Wolfson

1

Protein-protein and Protein-ligand Docking

The geometric filtering


2

The Docking Problem� Input : A pair of molecules represented by

their 3D structures.� Tasks :

– Decide whether the molecules will form a complex (interact/bind).

– Determine the binding affinity.– Predict the 3D structure of the complex.– Deduce function.


3

Significance/ Applications/Motivation

� Computer aided drug design – detection of lead compounds. Prediction of side-effects.

� Elucidation of biochemical pathways -governed by biomolecular recognition and interactions.

� Huge no. of potential complexes – requires computational screening.


4

Forces governing biomolecular recognition

Depend on the molecules involved and the solvent.

� Van der Waals.� Electrostatics.� Hydrophobic contacts.� Hydrogen bonds� Salt bridges .. etc.All interactions act at short ranges.Implies that a necessary condition for tight

binding is surface complementarity.


5

Necessary condition for Docking

� Given two molecules find significant surface complementarity :

+

=

Recep

tor Ligand

T

Complex


6

Geometric Docking Algorithms

� Based on the assumption of shape complementarity between the participating molecules.

� Molecular surface complementarity - protein-protein, protein-ligand, (protein - drug).

� Hydrogen donor/acceptor complementarity -protein-drug.

Remark : usually “protein” here can be replaced by “DNA” or “RNA” as well.


7

Issues to be examined when evaluating docking methods

� Rigid docking vs Flexible docking :– If the method allows flexibility:

• Is flexibility allowed for ligand only, receptor only or both ?• No. of flexible bonds allowed and the cost of adding additional flexibility.

� Does the method require prior knowledge of the active site ?� Performance in “unbound” docking experiments.� Speed - ability to explore large libraries.


8

Bound Docking

� In the bound docking we are given a complex of 2 molecules.

� After artificial separation the goal is to reconstruct the native complex.

� No conformational changes are involved.� Used as a first test of the validity of the algorithm.

Docking Algorithm


9

Unbound Docking

� In the unbound docking we are given 2 molecules in their native conformation.

� The goal is to find the correct association.

� Problems: conformational changes (side-chain and backbone movements), experimental errors in the structures.


10

Bound vs. Unbound10 highly penetrating residues

Kallikrein A/trypsin inhibitor complex (PDB codes 2KAI,6PTI)

Receptor surface

Ligand


11

Rigid and Flexible Docking

PDBFiles

MolecularMolecularSurfaceSurface

Focusing OnFocusing OnActive SitesActive Sites

CriticalCriticalFeaturesFeatures

GeometricGeometricMatchingMatching Verification &Verification &

Geometric Geometric ScoringScoring

EnergyEnergyCalculationsCalculations

OtherInputs

Active Sites& Rotationand TranslationPossibilities

Another Flow PossibilityAnother Flow Possibility

One Flow PossibilityOne Flow Possibility


12

Recommended Literature – survey papers(see references of major methods therein)� I. Halperin, B. Ma, H. Wolfson & R. Nussinov,

Principles of Docking: An overview of Search Algorithms and a Guide to Scoring Functions, PROTEINS, 47, 409—443, (2002).

� T. Lengauer & M. Rarey, Computational Methods for Biomolecular Docking, Curr. Opin. Struct. Bio., 6, 402—406, (1996).

� G.R. Smith & M. J.E. Sternberg, Prediction of protein-protein interactions by docking methods, Curr. Opin. Struct. Bio., 12, 28—35, (2002).


13

Dock - I. Kuntz (pioneering system)Protein - ligand :� Compute Connolly’s MS of the receptor.� Detect cavities by SPHGEN.� Match SPHGEN pseudo-atom centers

with ligand atom centers by detecting cliques in the “docking graph”.

� “Grid based” verification of candidate transformations.

� Calculate energy of binding.


14

Dock general scheme


15

Docking graph� The docking graph is motivated by the

inner distance preservation in rigid bodies.� Vertices - pairs of points (Ri,Lj) belonging to

the receptor and ligand respectively.� Edges btwn a pair of vertices iff the

corresponding inner distances are ‘equal’. � A maximal clique in the graph = maximal

matching subset btwn the receptor and the ligand points.


16

Clique detection� Detection of all cliques of size < k is NP-

complete (exp in k).� Apply an “interpretation tree” type method

to detect all the cliques of size 4 - O(n8).


17

References

� I. Kuntz et al., A geometric approach to macromolecule-ligand interactions, J. Mol. Bio., 161, 269—288, (1982).

� T.J.A. Ewing, S. Makino, A.G. Skillman, I.D. Kuntz, Dock 4.0: Search strategies for automated molecular docking of flexible molecule databases, J. Comp. Aided Mol. Design, 15, 411—428, (2001).


18

Katchalski-Katzir et al.


19

Molecular Volume Representation

Given two molecules A and B, projected on a 3D grid of N× N× N size, define the discrete functions a and b as:

=−

molecule theoutside ,0 molecule theinside ,

molecule of surface on the ,1,, ρnmla

=−

molecule theoutside ,0 molecule theinside ,

molecule of surface on the ,1,, δnmlb

l,m,n = 1, 2, ... , N

Goal : Compute a score which favors surface complementarityand penalizes penetration.


20

γβαγβα +++= = =∑∑∑= nmlnml

N

l

N

m

N

nbac ,,,,

1 1 1,, .

αααα, ββββ, γγγγ - translation in the grid.

What should be the parameters ρρρρ and δδδδ in order to achievethe original goal ?


21


22

Algorithm


23

Katzir et al - summary� Exhaustive enumeration of the rotational and

translational degrees of freedom.� Since the scoring function is represented as a

convolution, the enumeration of the translational degrees of freedom is made efficient exploiting FFT - O(N3 logN).

� Allows introduction of electrostatics as the “complex number” component of the convolution.

� Time consuming, yet conceptually simple and appealing.

� Volumetric – does not use MS representation.


24

References

� Katchalski-Katzir et al., Molecular surface recognition: Determination of geometric fit between protein and their ligands by correlation techniques, Proc. Nat. Acad. Sci. USA, 89, 2195—2199, (1992).

� H. Gabb, R. Jackson & M. Sternberg, Modellingprotein docking using shape complementarity, electrostatics and biochemical information, J. Mol. Bio, 272, 106—120, (1997).


25

Geometric Hashing (Fischer, Lin, Nussinov, Wolfson)

� Connolly’s MS.� SPHGEN or other cavity detection method

(optional).� Caps and pits with “reasonable” patch

support +normals.� Geometric Hashing on points+normals.� Penetration filter.� Energy evaluation.


26


27

Norel, Nussinov, Wolfson (inspired by Connolly)

� Connolly’s MS.� Knobs and holes with associated normals.� Matching of complementary point+normal

pairs .� Geometric score function with a 3 layered

molecule volume representation.� Performed well both in unbound cases and in

protein/DNA - drug docking.� Current variation – PatchDock (Dina

Schneidman-Duhovny et al.)


28

Shape feature and signature (Norel et al.)


29

Unbound docking examples


30

PatchDock Algorithm

� Based on local shape feature matching.� Focuses on local surface patches divided

into three shape types: concave, convex and flat.

� The geometric surface complementarity scoring employs advanced data structures for molecular representation: Distance Transform Grid and Multi-resolution Surface.


31

Docking Algorithm Scheme� Part 1: Molecular surface representation

� Part 2: Features selection

� Part 3: Matching of critical features

� Part 4: Filtering and scoring of candidate transformations


32

1. Surface Representation

� Dense MS surface (Connolly)

� Sparse surface (Shuo Lin et al.)


33

1. Surface Representation

� Dense MS surface (Connolly)

� Sparse surface (Shuo Lin et al.)

82,500 points 4,100 points


34

Sparse Surface Graph -Gtop

� Caps (yellow), pits (green), belts (red):

� Gtop – Surface topology graph:

V=surface pointsE={(u,v)| u,v belong to the

same atom}


35

Docking Algorithm Scheme� Part 1: Molecular

surface representation




2.1 Coarse Curvature calculation2.2 Division to surface patches of similar curvature


36

2.1 Curvature Calculation• Shape function is a measure of local curvature.

• ‘knobs’ and ‘holes’ are local minima and maxima (<1/3 or >2/3), ‘flats’ –the rest of the points (70%).

knobs flats holes• Problems: sensitivity to molecular movements, 3 sets of points with different sizes.

• Solution: divide the values of the shape function to 3 equal sized sets: ‘knobs’, ‘flats’ and ‘holes’.

knob

hole

flat


37

2.2 Patch Detection

Goal: divide the surface into connected, non-intersecting, equal sized patches of critical points with similar curvature.

� connected – the points of the patch correspond to a connected sub-graph of Gtop.

� similar curvature – all the points of the patch correspond to only one type: knobs, flats or holes.

� equal sized – to assure better matching we want shape features of almost the same size.


38

� Construct a sub-graph for each type of points: knobs, holes, flats. For example Gknob will include all surface points that are knobs and an edge between two ‘knobs’ if they belong to the same atom.

� Compute connected components of every sub-graph.

� Problem: the sizes of the connected components can vary.

� Solution: apply ‘split’ and ‘merge’ routines.

Patch Detection by Segmentation Technique


39

Split and Merge� Geodesic distance between two nodes is a

weight of the shortest path between them in surface topology graph. The weight of each edge is equal to the Euclidean distance between the corresponding surface points.

� Diameter of the component – is the largest geodesic distance between the nodes of the component. Nodes s and t that give the diameter are called diameter nodes.

st


40

Split and Merge (cont.)� The diameter of every connected component is

computed using the APSP (All pairs shortest paths) algorithm.

1. low_patch_thr ≤ diam ≤ high_patch_thr � valid patch2. diam > high_patch_thr � split3. diam < low_patch_thr � merge

low_patch_thr = 10Åhigh_patch_thr = 20Å


41

Split and Merge (cont.)� Split routine: compute Voronoi cells of the

diameter nodes s,t. Points closer to s belong to new component S, points closer to t belong to new component T. The split is applied until the new component has a valid diameter.

� Merge routine: compute the geodesic distance of every component point to all the patches. Merge with the patch with closest distance.

� Note: the merge routine may merge point with patch of different curvature type. s

t


42

Examples of Patches for trypsin and trypsin inhibitor

Yellow – knob patches, cyan – hole patches, green – flat patches, the proteins are in blue.


43

Complementarity of the Patches:

Interface knob patches of the ligand

Interface hole patches of the receptor


44

Cox-2 cavity represented by a single hole patch:

Indomethacininside the COX-2 hole patch

Indomethacininside its knob patches


45

Shape Representation Part

We focus on sparse surface features, preserving the quality of shape representation.The sparse features reduce the complexity of the matching step.


46

Active Site FocusingThere are major differences in the interactions of different types of molecules (protease-inhibitor, antibody-antigen, protein drug). Studies have shown the presence of energetic hot spots in the active sites of the molecules.

Protease/inhibitor – select patches with high enrichment of hot spot residues (Ser,Gly,Asp and His for protease; and Arg,Lys,Leu,Cys and Pro for protease inhibitor).

Antibody/antigen – 1.detect CDRs of the antibody. 2. select hot spot patches (Tyr,Asp,Asn,Glu,Ser and Trp for antibody; and Arg,Lys,Asn and Asp for antigen)

Protein/drug – select largest protein cavity (highest value of average shape function for the patch)


47

Active Site Focusing

surfacesurfaceresidue

patchpatchresiduei areaarea

areaareapatchresiduepropensity

i

i

//

),(,

,=

• The enrichment of hot spot residue in patch is measured by propensity. Propensity is a ratio of residue frequency in patch and residue frequency in surface.

• The CDRs are detected by aligning the sequence of the given antibody to the consensus sequence of the library of the antibodies.


48

Docking Algorithm Scheme• Part 1: Molecular surface

representation

• Part 2: Features selection

• Part 3: Matching of critical features

• Part 4: Filtering and scoring of candidate transformations


49

3. Matching of patchesThe aim is to align knob patches with hole patches, and flat patches with any patch. We use two types of matching:

• Single Patch Matching – one patch from the receptor is matched with one patch from the ligand. Used in protein-drug cases.

• Patch-Pair Matching – two patches from the receptor are matched with two patches from the ligand. Used in protein-protein cases.


50

Single Patch Matching

� Base: a pair of critical points with their normals from one patch.

� Match every base from a receptor patch with all the bases from complementary ligand patches.

� Compute the transformation for each pair of matched bases.

Receptor hole patch Ligand knob patch

Transformation


51

Patch-Pair MatchingLigand patches

Transformation

Receptor patches

� Base: 1 critical point with its normal from one patch and 1 critical point with its normal from a neighboring patch.

� Match every base from the receptor patches with all the bases from complementary ligand patches.

� Compute the transformation for each pair of matched bases.


52

Base Compatibility

The signature of the base is defined as follows:

1. Euclidean and geodesic distances between the points: dE, dG

2. The angles α, β between the [a,b] segment and the normals

3. The torsion angle ω between the planes

Two bases are compatible if their signatures match.

dE, dG, α, β, ω


53

Geometric Hashing

� Preprocessing: the bases are built for all ligand patches (single or pairs) and stored in hash table according to base signature.

� Recognition: for each receptor base access the hash-table with base signature. The transformations set is computed for all compatible bases.


54

Clustering

� Since local features are matched, we may have multiple instances of “almost”the same transformation.

� We combine 2 clustering techniques:1.Clustering transformation parameters – coarse but very fast.

2.RMSD clustering – accurate but slow. (according to FLEXX, Rarey et al., 1996)


55

Docking Algorithm Scheme� Part 1: Molecular surface representation





56

Distance Transform GridDense MS surface

(Connolly)

0+1

-1


57

Filtering Transformations with Steric Clashes

• Since the transformations were computed by local shape features matching they may include unacceptable stericclashes.

• Candidate complexes with slight penetrations are retained due to molecular flexibility.

Steric clash test: For each candidate ligand transformation

transform ligand surface points For each transformed point

access Distance Transform Grid and check distance valueIf it is more than max_penetrationDisqualify transformation


58

Scoring Shape Complementarity

• The scoring is necessary to rank the remaining solutions.

• The surface of the receptor is divided into five shells according to the distance function: S1-S5

[-5.0,-3.6), [-3.6,-2.2), [-2.2, -1.0), [-1.0,1.0), [1.0�).

• The number of ligand surface points in every shell is counted.

• Each shell is given a weight: W1-W5

-10, -6, -2, 1, 0.

• The geometric score is a weighted sum of the number of ligand surface points Ninside every shell:

iS ii

score N W=∑


59

Filtering and ScoringPerformance Problem: the number of surface points for high resolution MS surface may reach 100,000. For each candidate transformation, for each surface point we apply the transformation and access distance transform grid.

We develop multi-resolution surface data structure that supports fast queries for penetrations and geometric score.

119,000 points 16,000 points 4,100 points 1,000 points


60

Multi-resolution surface (Oct-tree structure)

Level 0: Connolly Surface points

Level 1:

Level 2:

� point� radius� number of leaves� low-level pointers

Node:


61

Queries in Multi-resolution surface data structure

� The queries are: isPenetrating(trans, threshold), maxPenetration(trans), score(trans), interface(trans).

� All the searches are performed by DFS.� We check every node from highest level and go

down if it is in interface.� For each node we check distance transform value

and radius. If they are within the threshold we don’t check the children.

� Worst case complexity of each query: O(interface size + highest level size)


62

Scoring with Specified Binding Site: Antibody-Antigen Example

• Although only the patches including CDRs are used in the matching stage, the results may still include transformations where most of the interface doesn’t belong to CDRs.

• In addition to regular score, we compute the percentage of the interface included in the CDRs. All the transformations with less than 70% of CDRs are disqualified.


63

ResultsDatasets:

Protein-Protein docking:� Enzyme-inhibitor – 22 cases� Antibody-antigen – 13 casesProtein-DNA docking: 2 unbound-bound casesProtein-drug docking: tens of bound cases (Estrogen

receptor, HIV protease, COX)Performance:

Several minutes for large protein molecules and seconds for small drug molecules on standard PC computer.


64

Enzyme-inhibitor cases


65

Enzyme-inhibitor results (unbound)


66

Antibody-antigen cases


67

Antibody-antigen results (unbound)


68

Example 1: Enzyme-inhibitor docking (unbound case)

Α-chymotrypsin (5CHA) with Eglin C (1CSE(I)). RMSD 1.46Å, rank 10

trypsininhibitor from complexdocking solution


69

Example 2: Antibody-antigen docking (unbound case)

Antibody Fab 5G9 (1FGN) with tissue factor (1BOY). RMSD 2.27Å, rank 8

antibodytissue factor from complexdocking solution


70

Example 3: Protein-DNA docking (semi-unbound case)

Endonuclease I-PpoI (1EVX) with DNA (1A73). RMSD 0.87Å, rank 2

DNA strandendonucleasedocking solution


71

Example 4: Protein-drug docking (bound case)

Estrogen receptor with estradiol(1A52). RMSD 0.9Å, rank 1, running time: 11 seconds

Estrogen receptorEstradiol from complexdocking solution


72

Example 5: Using information from Multiple Structural Alignment

Hemagglutinin molecule

Structurally conserved residues


73

Example 5 – final results

RMSD 3.10, Rank 6 (instead of 30), running time: 5 min

The original complex antibody is in red and our solution is in green.

Virus capsidprotein

hemagglutininantibody from complexdocking solution


74

Docking Algorithm Scheme� Part 1: Molecular

surface representation




WANTED

Biochemical Scoring function !!!

The correct solution is found in 90% of the cases with RMSD under 5A.

The rank of the correct solution can be in the range of 1 – 1000.


75

Factors that influence the rank of the correct solution

� Shape complementarity� Interface shape – in the concave/convex

interfaces (enzyme-inhibitor, receptor-drug), shape complementarity is easier to detect comparing to flat interfaces (antibody-antigen).

� Sizes of molecules – the larger the molecules the higher the number of the false-positive results.


76

Flexible Docking - general methodology

Three major approaches :

� Rigid subpart docking :– Split the flexible molecule into rigid subparts.– Dock independently each subpart.– Pair the top hypotheses for each subpart to detect

hinge consistency.Example : Des Jarlais, Sheridan, Dixon, Kuntz,

Venkatraghavan (1986).


77

� Anchor fragment method :– Position a ‘preferred’ anchor fragment. – Rotate sequentially the flexible bonds to position the

other fragments.Example: Leach & Kuntz (1992); Lengauer et al. - FLEXX.

� Hinge scoring method:– Incorporate bond information already in the initial

filtering steps by accumulating information at the hinges. – No preference for specific parts. Reminds the first

method yet exploits the consistency of neighboring part placement in the initial stages.

Example : Sandak, Nussinov, Wolfson (1995).


78

GGH based flexible docking

Applies either to flexible ligands or to flexible receptors.


79

General Algorithm outline

Can be applied either to a dataset of ligands vs a receptor or a dataset of receptors vs. a ligand.

� Calculate the molecular surface of the receptor and the ligands and their interest points (+ normals).

� Match the interest points and recover candidate multi-transformations.

� Check for inter-molecule and intra-molecule penetrations and score the amount of contact.

� Rank by energies.


80

Point Matching algorithm- prepr.

� For each database molecule :

– Define r.f.’s at every hinge.– For each minimal feature (e.g. triplet) compute an r.f.

and shape signature.– For each (triplet based) reference frame compute the

transformation btwn that frame and the hinge based frame and store (molec., part, r.f., transf.) in a hash (lookup) table at an entry addressed by the r.f. shape signature.


81

Point Matching algorithm- recognition

� For the target molecule :– For each minimal feature compute an r.f. and shape

signature.– Access the table by the shape signature, and for

each transformation appearing there :• transform the r.f. to ‘hypothesized’ hinge position;• advance the counter of that hinge location for the appropriate

molecule and part.

– Check highest scoring hinges .– Verify the resulting transformations .


82

Flexible DockingCalmodulin with M13 ligand


83

Flexible Docking HIV Protease Inhibitor