Protein-Protein Docking
Transcript of Protein-Protein Docking
Center for Bioinformatics, Saarbrücken, Germany
Celera Genomics, Rockville, MD, USA
Protein-Protein DockingBasics and New Applications
Oliver Kohlbacher
Hans-Peter Lenhof
Overview• Introduction
• Basic Algorithms + Scoring Functions
• Integration of Protein Flexibility– Flexibility in Proteins
– Algorithms for Semi-Flexible Docking
• Integration of Experimental Data– NMR Spectroscopy
– NMR-Based Protein Docking
• Outlook + Summary
Protein-Docking: Was, wie, warum?
Protein Docking: Introduction
Protein-Protein Docking
Protein Docking: Introduction
Protein-Protein Docking
Protein Docking: Introduction
Protein-Protein Docking
• Given two proteins A and B
• Predict complex structure AB
Protein Docking: Introduction
Protein-Protein Docking
• Uniform chemistry
• Understanding protein
interactions
• Speed-up structure
elucidation
• Predict protein
interactions
Protein Docking: Introduction
Protein-Ligand Docking
• Small, flexible Ligand
• Ligands are
chemically diverse
• Crucial for Drug
Design
• Virtual screening
Protein Docking: Introduction
Lock-and-Key Principle
Emil Fischer 1894“ To use an image, I would say that
enzyme and glycoside have to fit into each other like a lock and a key, in order to exert a chemical effect on each other.”
Protein Docking: Introduction
Lock-and-Key Principle
Protein Docking: Introduction
Lock-and-Key Principle
Protein Docking: Introduction
Lock-and-Key Principle
Protein Docking: Introduction
Lock-and-Key Principle
Protein Docking: Introduction
Lock-and-Key Principle
Protein Docking: Introduction
Lock-and-Key Principle
Protein Docking: Introduction
Lock-and-Key Principle
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Lock-and-Key Principle
18Algorithms for Protein Docking: Basics
Protein Docking: How?
• Structure generation
19Algorithms for Protein Docking: Basics
Protein Docking: How?
• Structure generation
• Filtering
20Algorithms for Protein Docking: Basics
Protein Docking: How?
• Structure generation
• Filtering
21Algorithms for Protein Docking: Basics
Protein Docking: How?
• Structure generation
• Filtering
• Evaluation
22Algorithms for Protein Docking: Basics
Protein Docking: How?
< <
• Structure generation
• Filtering
• Evaluation
Algorithms for Protein Docking: Basics
Connolly SurfaceConnolly 1986 Bacon, Moult 1992Fischer et al. 1995Lin et al. 1994Norel et al. 1995Sandak et al. 1995Campbell et al. 1996Ackermann et al. 1995
Fuzzy LogicExtner, Brickmann 1997
Cube RepresentationJiang, Kim 1991
Graph RepresentationShoichet et al. 1992Shoichet, Kuntz 1996Kasinos et al. 1992
CorrelationKatchalski-Katzir et al.1992Vakser, Aflalo 1994Vakser 1996Gabb et al. 1997Meyer et al. 1996
Monte Carlo ApproachCherfils et al. 1991Totrov, Abagyan 1994
Slices RepresentationWalls, Sternberg 1992Helmer-Citterich et al. 1994Ausiello et al. 1997
Genetic AlgorithmsLevine et al. 1997
DatabaseEster et al. 1995
Overview of Docking Techniques
Algorithms for Protein Docking: Basics
Connolly SurfaceConnolly 1986 Bacon, Moult 1992Fischer et al. 1995Lin et al. 1994Norel et al. 1995Sandak et al. 1995Campbell et al. 1996Ackermann et al. 1995
Fuzzy LogicExtner, Brickmann 1997
Cube RepresentationJiang, Kim 1991
Graph RepresentationShoichet et al. 1992Shoichet, Kuntz 1996Kasinos et al. 1992
CorrelationKatchalski-Katzir et al.1992Vakser, Aflalo 1994Vakser 1996Gabb et al. 1997Meyer et al. 1996
Monte Carlo ApproachCherfils et al. 1991Totrov, Abagyan 1994
Slices RepresentationWalls, Sternberg 1992Helmer-Citterich et al. 1994Ausiello et al. 1997
Genetic AlgorithmsLevine et al. 1997
DatabaseEster et al. 1995
Overview of Docking Techniques
Algorithms for Protein Docking: Basics
What are the differences?
• Method for determining tentative structures– Based on surface complementarity
– Based on triangles
– ….
• Scoring functions employed– Contact surface area
– Electrostatics
– Hydrogen bonds
– ….
Algorithms for Protein Docking: Basics
Structure Generation• Assumption: Proteins are rigid bodies!• Three-dimensional “puzzle”• Six degrees of freedom
– Rotation– Translation
• Identify rigid transformations bringing B in contact with A
• Discretization– Of proteins, protein surface– Rotational/translational space
Algorithms for Protein Docking: Basics
Structure Generation
• Katchalski-Katzir et al., Proc. Natl. Acad.
Sci. USA, 1992
– Grid-based correlation of A and B
• Lenhof, RECOMB 97
– Mapping of triangles
Lenhof, RECOMB 1997
Hash Table
Structure Generation
Calculate “critical points” on surface of A
Calculate triangles define by critical points
Calculate triangles defined by atoms of B
Search for triangles with similar side lengths
Lenhof, RECOMB 1997
Structure Generation
Calculate “critical points” on surface of A
Calculate triangles define by critical points
Calculate triangles defined by atoms of B
Search for triangles with similar side lengths
Lenhof, RECOMB 1997
Structure Generation
Scoring Functions for Protein Docking
Scoring Functions
• What distinguishes the true complex structure from “ false positives”?
• Physical chemistry:Complex structure with the lowest binding free
energy is the one observed in nature.
• Caveat: relies on sufficiently complete sampling of conformation space
Scoring Functions for Protein Docking
Prediction of Free Energy
• Also hardest part in related problems:
– Ligand docking
– Protein structure prediction
• Many methods neglect
– entropic contributions
– solvent effects
Scoring Functions for Protein Docking
Scoring Functions
• Simplest functions: Geometry
– Key-and-lock principle
– Large contact areas are favorable
– Overlaps between A and B are unfavorable
• More sophisticated: “Chemistry”
– Models based on physicochemistry
– Compromise between complexity and accuracy
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Scoring Function
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Scoring Function
Scoring Functions for Protein Docking
Geometric Scoring Function
Scoring Functions for Protein Docking
Geometric Scoring Function
Determine distance between (a, b) (a ∈A, b∈B)|}0.4.),(75.2|),{ (|)( <<= badistbacontact AB
|}75.2.),(|),{ (|)( <= badistbaoverlap AB)()()(_ 21 ABABAB overlapkcontactkscoregeom ⋅−⋅=
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Scoring Function
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Scoring Function
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Scoring Function
Scoring Functions for Protein Docking
Chemical Scoring
�∈
=contactba
btypeatypeMscorechem),(
)(),()(_ AB
Scoring Functions for Protein Docking
Geometry
Chemistry
++- -
Scoring Function
B
A
2PTC
2PTC AB
2PTC No. 1 RMSD = 1.94 A
Scoring Functions for Protein Docking
Results: Docking of 2PTC
score
RM
SD
[Å
]
Katchalski-Katzir et al., PNAS 1992
Basic Ideas• Protein on grid
Katchalski-Katzir et al., PNAS 1992
Basic Ideas
000outside
01δ > 0surface
0ρ < 0ρ* δ < 0inside
outsidesurface insideA B
• Protein on grid• Assign values
– ai,j,k =• 1 at the surface of A• ρ << 0 inside A• 0 outside A
– bi,j,k =• 1 at the surface of B• δ > 0 inside B• 0 outside B
Katchalski-Katzir et al., PNAS 1992
Basic Ideas• Protein on grid• Assign values
– ai,j,k =• 1 at the surface of A• ρ << 0 inside A• 0 outside
– bi,j,k =• 1 at the surface of B• δ > 0 inside B• 0 outside B
000outside
01δ > 0surface
0ρ < 0ρ* δ < 0inside
outsidesurface insideA B
Katchalski-Katzir et al., PNAS 1992
Calculate Correlation cα,β,γ
• Correlation of a and b identifies those translations where A and B are in contact:
� +++⋅=kji
kjikji bac,,
,,,,,, γβαγβα
• Repeat for all translation vectors (α, β, γ)
���� Run time O(N6)!
Katchalski-Katzir et al., PNAS 1992
From: Katchalski-Katzir et al., PNAS 1992, 2195
Cross Section cα=0,β,γ
Katchalski-Katzir et al., PNAS 1992
Algorithm
• Calculate ai,j,k and A* = [FFT(a)]*
• For all rotations of B:– Calculate bi,j,k and B = FFT(b)
– Calculate C = A * B
– Calculate ci,j,k = IFT(C)
– Identify tentative transformations (α, β, γ) as strongly positive peaks in c
From Rigid to Flexible Docking
3122114TPI
31225521TGS
1221802TGP
8166951TEC
5133812SNI
1317272136914SGB
3124924935182SEC
11221612PTC
8122492MHB
912291256722KAI
1012224HVP
5>125106104176373HFM
28252467922HFL
1233345529001243091FDL
51331064CPA
312221CHO
L97AHP+96MWS96NLW+95NLW+95NLW+95PDB ID
Comparison of AlgorithmsComparison of Algorithms
13122114TPI
131225521TGS
11221802TGP
18166951TEC
15133812SNI
11317272136914SGB
13124924935182SEC
111221612PTC
18122492MHB
1912291256722KAI
11012224HVP
15>125106104176373HFM
128252467922HFL
171233345529001243091FDL
151331064CPA
1312221CHO
L97AHP+96MWS96NLW+95NLW+95NLW+95PDB ID
Comparison of AlgorithmsComparison of Algorithms
13122114TPI
131225521TGS
11221802TGP
18166951TEC
15133812SNI
11317272136914SGB
13124924935182SEC
111221612PTC
18122492MHB
1912291256722KAI
11012224HVP
15>125106104176373HFM
128252467922HFL
171233345529001243091FDL
151331064CPA
1312221CHO
L97AHP+96MWS96NLW+95NLW+95NLW+95PDB ID
Comparison of AlgorithmsComparison of Algorithms
RMSD
2PTC_E 2PTC_I
Fit = GeomScore + ChemScore
One PS in the Life of PTI
RMSD
1TPO 4PTI
Fit = GeomScore + ChemScore
2PTC 1TPO 4PTI
2PTC 1TPO 4PTI
2PTC 1TPO 4PTI
From Rigid to Flexible Docking
Protein Flexibility• Domain movements
– Large scale movements of protein domains
– Mainly backbone movements
– Highly flexible “hinge” regions: hinge bending
• Side-chain flexibility– Rather rigid backbone
– Local rearrangements of side chains
– Flexibility based on side-chain torsions
From Rigid to Flexible Docking
Semi-Flexible Docking• Assumptions
– Rigid backbone
– Flexible side chains
• Basic Algorithm
– Start with rigid docking candidates
– Demangle overlapping side chains
– Calculate energy
Good approximation for Trypsin/BPTI!
Protein Docking with Flexible Side Chains
Protein Docking with Flexible Side Chains
• Apply the RBD algorithm
Protein Docking with Flexible Side Chains
• Apply the RBD algorithm
• Optimize the side chains in the docking site of the best RBD candidates
Protein Docking with Flexible Side Chains
• Apply the RBD algorithm
• Optimize the side chains in the docking site of the best RBD candidates
w.r.t. the potential energy (AMBER)
14.12.99
ARG 39
Protein Docking with Flexible Side Chains
• Apply the RBD algorithm
• Optimize the side chains in the docking site of the best RBD candidates
w.r.t. the potential energy (AMBER)
using a rotamer library
From Rigid to Flexible Docking
Torsion angle distributionLYS
From Rigid to Flexible Docking
SER
Rotamer LibrarySide-chain conformational space is adequately represented
by a discrete set of rotamers
(1) Determine all side chains in the docking site!
(1) Determine all side chains in the docking site!
S1
S2
(1) Determine all side chains in the docking site!
S1
S2
(1) Determine all side chains in the docking site!(2) Build sets of potential side chain orientation
S1 R1 R2
S2
(1) Determine all side chains in the docking site!(2) Build sets of potential side chain orientation
S1 R1 R2
S2 R1 R2
(1) Determine all side chains in the docking site!(2) Build sets of potential side chain orientation
S1 R1 R2
S2 R1 R2
(1) Determine all side chains in the docking site!(2) Build sets of potential side chain orientation(3) Determine the rotamer combination with minimal energy!
S1 R1 R2
S2 R1 R2
S1 R1 R2
S2 R1 R2
X11
X22X21
X12
S1 R1 R2
S2 R1 R2
X11
X22X21
X12
Xij ∈ { 0, 1}
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)
Xij ∈ { 0, 1}
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ ij
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ i
min
j
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ i
min ET
j
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ i
min ET + � Xij Eijij
j
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ i
min ET + � Xij Eij + � � Xij Xkl Eijklij klij
k≠i
j
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ i
min ET + � Xij Eij + � � Xij Xkl (Eijkl - Emax)ij kl
k≠i
j
ij
S1 R1 R2
S2 R1 R2
X11
X22X21
X12 (X11 , X12 , X21 , X22)( 0 , 1 , 1 , 0 )
Xij ∈ { 0, 1}
� Xij = 1 ∀ i
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)ij kl
k≠i
j
ij
Solve ILP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
Solve ILP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
Solve ILP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
Solve ILP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
Solve LP
Solve ILP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
Solve ILP
Solve LP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
Solve ILP
Solve LP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
infeasible
Solve ILP
Solve LP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
infeasible
Solve ILP
Solve LP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
infeasible
Solve ILP
Solve LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
Search for a cutting plane
infeasible
Solve ILP
Solve LP
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
add it to the LP
non feasible
Solve ILP
Solve LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
feasible
non feasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ [ 0, 1]
� Xij = 1 ∀ i Xij ∈ [ 0, 1]
ij ij
j
kl
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Solve ILP xi j= 0
Solve ILP xi j= 1
0<Xij<1
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Solve ILP xi j= 0
Solve ILP xi j= 1
0<Xij<1
min ET + � Xij Eij + � � Yijkl (Eijkl - Emax)
Yijkl ≤ Xij , Yijkl ≤ Xkl Yijkl ∈ { 0, 1}
� Xij = 1 ∀ i Xij ∈ { 0, 1}
ij ij
j
kl
From Rigid to Flexible Docking
Test Case: Trypsin/BPTI
BPTI
Trypsin
first approximation: rank 3
(docking of native structures)
BALL: Design und Architektur
What’s the problem?
From Rigid to Flexible Docking
What’s the problem?
•LYS 15 moves on docking
•overlapping atoms
•rigid docking must fail
From Rigid to Flexible Docking
Semi-Flexible Protein Docking
• Structure generation
• Filtering
• Final energetic evaluation
From Rigid to Flexible Docking
Semi-Flexible Protein Docking
• Structure generation
• Filtering
• Final energetic evaluation
From Rigid to Flexible Docking
Semi-Flexible Protein Docking
• Structure generation
• Filtering
• Side-chain demangling
• Final energetic evaluation
From Rigid to Flexible Docking
Side-Chain Flexibility
• Bond distances and bond angles constant
• Torsion angles variable
From Rigid to Flexible Docking
Torsion Angle Space
• Instead of 10 - 20 atoms with 3 coordinates
• Up to 4 torsion angles
• Typical binding site:
– 50 amino acids, 600 atoms
– 200 instead of 1800 degrees of freedom
From Rigid to Flexible Docking
Torsion angle distribution
0 60 120 180 240 300 3600
200
400
600
800
1000
1200co
unts
χ1 [°]
From Rigid to Flexible Docking
Torsion angle distribution
0 60 120 180 240 300 3600
60
120
180
240
300
360χ 2 [°
]
χ1 [°]
Flexible Docking: Combinatorial Problem
Combinatorial Problem
• Identify set of rotamers with lowest energy
• Typical binding site: 50 side-chains
• Binding site defined via distance between A/B
• Number of possible combinations: ~1060
Flexible Docking: Combinatorial Problem
Scoring Function
• No sophisticated energetic evaluation feasible
• Simple and fast scoring function: AMBER
• Decomposition into three components
���<
++=i ij
pwji
i
tpli
tpltotal
srrEEEE ,
Flexible Docking: Combinatorial Problem
Earlier Approaches
• Leach (1994): A* algorithm for side-chain
optimization (ligand docking)
• Desmet et al. (1992): Dead End Elimination
– Simple inequality
– Iteratively applied
Flexible Docking: Combinatorial Problem
How to solve the problem?
• Multi Greedy method
– Fast and simple heuristic
– Suboptimal solutions
• ILP formulation
– Optimal solution
Flexible Docking: Multi Greedy Approach
Multi Greedy Methodroot node
Flexible Docking: Multi Greedy Approach
Multi Greedy Methodroot node
11 12
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
11 12
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
11 12
21 2122 2223 2324 24
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
11 12
21 2122 2223 2324 24
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
E1/1 E1/2
E1
���<
++=i ij
pwji
i
tpli
tpltotal
srrEEEE ,
tplEE111 =
pwtpltpl EEEE1111 2,1211/1 ++=
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
E1
E1/1 E1/2 E1/3 E1/4 E2/1 E2/2 E2/3 E2/4
E2
Etpl
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Method
Flexible Docking: Multi Greedy Approach
Multi Greedy Approach
• Fast
• Correctly demangles side chains for test set
• May yield suboptimal solutions
• How to determine the quality of the solution?
• Optimal algorithm: based on polyhedral optimization, ILP formulation
ILP-Based Algorithm
Graph Problem
For each rotamer r of side-chain i create a node with weight tplii rr
EvE =)(ri
v
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
For each rotamer r of side-chain i create a node with weight tplii rr
EvE =)(ri
v
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
A k-partite graph with partitions V1…Vk is constructed
ILP-Based Algorithm
V1
Graph Problem
Pairwise interactions are represented by edges uv with weight
V2 Vk……..
pwji sr
EuvE ,)( =
ILP-Based Algorithm
V1
Graph Problem
Pairwise interactions are represented by edges uv with weight
V2 Vk……..
pwji sr
EuvE ,)( =
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
ILP-Based Algorithm
Graph Problem
rotamer graphV1 V2 Vk……..
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
���<
++=i ij
pwji
i
tpli
tpltotal
srrEEEE ,
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
��∈∈
+=RGuvRGv
uvEvERGE )()()(
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
• Binary decision variables for nodes and edges
– xv for each node v
– xuv for each edge uv
• Node is selected if xv = 1
• Edge is selected if xuv = 1
��∈∈
+=RGuvRGv
uvEvERGE )()()(
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
��∈∈
+=Euv
uvVv
v uvExvExE )()(
• Binary decision variables for nodes and edges
– xv for each node v
– xuv for each edge uv
• Node is selected if xv = 1
• Edge is selected if xuv = 1
ILP-Based Algorithm
Graph Problem
V1 V2 Vk……..
��∈∈
+=Euv
uvVv
v uvExvExE )()(
ILP-Based Algorithm
Integer Linear Program
��∈∈
+=Euv
uvVv
v uvExvExE )()(
Determine xv, xuv that minimize E
ILP-Based Algorithm
ILP: Basic Constraint System
��
���
� + ��∈∈ Euv
uvVv
v uvExvEx )()(min
E uvxx
E uvxx
vuv
uuv
∈≤
∈≤
allfor
allfor
}...1{ allfor 1 kixiVv
v ∈=�∈
s.t.
ILP-Based Algorithm
��
���
� −+− ��∈∈
))(())((min maxmax EuvExEvExEuv
uvVv
v
ILP: Basic Constraint System
E uvxx
E uvxx
vuv
uuv
∈≤
∈≤
allfor
allfor
}...1{ allfor 1 kixiVv
v ∈=�∈
s.t.
ILP-Based Algorithm: Branch & Cut
Solve ILP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
Branch & Cut
Solve ILP
ILP-Based Algorithm: Branch & Cut
Solve ILP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
Solve LP
Solve ILP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
Solve ILP
Solve LP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
Solve ILP
Solve LP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
infeasible
Solve ILP
Solve LP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
infeasible
Solve ILP
Solve LP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
infeasible
Solve ILP
Solve LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
Search for a cutting plane
infeasible
Solve ILP
Solve LP
Branch & Cut
ILP-Based Algorithm: Branch & Cut
add it to the LP
non feasible
Solve ILP
Solve LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Branch & Cut
ILP-Based Algorithm: Branch & Cut
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Solve ILP xi j= 0
Solve ILP xi j= 1
Branch & Cut
ILP-Based Algorithm: Branch & Cut
feasible
infeasible
Solve ILP
Solve LP
add it to the LP
Search for a cutting plane
Solve ILP xi j= 0
Solve ILP xi j= 1
Branch & Cut
Semi-Flexible Docking: Results
Energetic Evaluation
• Based on Jackson & Sternberg
• Energetic evaluation has to be extended:
– inclusion of internal energies
• AMBER torsion energies:
torscavelebind GGGG ∆+∆+∆=∆
ILP-Based Algorithm
Implementation
• Molecular DS, rotamers, energies: BALL
• Branch-&-Cut: LEDA and ABACUS
• LP solver: CPLEX and SOPLEX
ILP-Based Algorithm: Results
Docking Trypsin/BPTI - Results
Semi-Flexible Docking: Results
Running Times(UltraSparc II, 333 MHz, per candidate)
Avg. time [min]Stage
Multi Greedy ILPCalculation of energies 30Combinatorial problem 3 14Side-chain optimization 5Energetic evaluation 70Total 108 119
Semi-Flexible Docking: Results
Results I
example RMSD/Å
1TPO/4PTI 1.41SBC/2CI2 1.85CHA/2OVO 3.2
Semi-Flexible Docking: Results
Results IIBPTI (LYS 15)
Semi-Flexible Docking: Results
Results III
-200 -100 0 100 200 300 4000
10
20
30
40
50
60
RM
SD
[10-1
0 m]
∆Gbind
[kJ/mol]
Trypsin/BPTI
Holm&Sander 1992 Monte-CarloBruccoleri&Novotny 1992 Exhaustive SearchDesmet et al. 1992 Dead-End-EliminationTotrov&Abagyan 1994a Monte-CarloTotrov&Abagyan 1994b Monte-CarloLaughton 1994 Local Homology ModelingLeach 1994 A*-AlgorithmWeng et al. 1996 Exhaustive SearchLeach&Lemon 1998 A*-Algorithm
182
Integration of Experimental Data
• Main problem in Docking: Energy Function
• Avoid the energy calculation!
• Include experimental data
– Containing geometric information
– Derived from simple and inexpensive experiment
� Nuclear Magnetic Resonance (NMR)
NMR-Based Protein Docking
Overview
• What is NMR spectroscopy?
• Defining a NMR-based scoring function– Predicting chemical shifts
– Synthesizing spectra
– Comparing NMR spectra
• Results
184NMR - Basics
NMR - Basics• 1H nuclei possess spin angular momentum
185NMR - Basics
NMR - Basics• 1H nuclei possess spin angular momentum
• In an external magnetic field B0, each nucleus can assume one of two spin states: α or β
α
β
0B E∆
186NMR - Basics
NMR - Basics• 1H nuclei possess spin angular momentum
• In an external magnetic field B0, each nucleus can assume one of two spin states: α or β
• Supplying energy ∆E enables transition
α
β
E∆ν⋅h
187NMR - Basics
NMR - Basics• 1H nuclei possess spin angular momentum
• In an external magnetic field B0, each nucleus can assume one of two spin states: α or β
• Supplying energy ∆E enables transition
α
β
E∆
188NMR - Basics
NMR - Basics• ∆E depends on
– magnitude of external field
– electronic structure surrounding the nucleus
δ
NMR - Basics
NMR: The Hardware
190NMR - Basics
1D 1H-NMR spectra
191NMR - Basics
Structural Information in Spectra
• Chemical shift in proteins depends on– Topology (chemical environment)
– Geometry (conformation)
• Certain experiments can– identify which proton causes which peak
(assignment)
– yield distance information (e.g., NOE constraints)
192NMR-Based Protein Docking
NMR-based Protein Docking
∆ ∆ ∆ ∆
Experiment
Docking
193NMR-Based Protein Docking
NMR-based Protein Docking
∆ ∆ ∆ ∆
Experiment
Docking
< < <
194NMR-Based Protein Docking
RCδδ =
Random coil shift
195NMR-Based Protein Docking
RCδδ =
+
E�
ESRC δδδ +=
Electrostatic field
196NMR-Based Protein Docking
ESRC δδδ +=
Ring current effect
197NMR-Based Protein Docking
ESRC δδδ +=
RSB�
RSESRC δδδδ ++=
Ring current effect
198NMR-Based Protein Docking
RSESRC δδδδ ++=Magnetic Anisotropy
AnisoB�
AnisoRSESRC δδδδδ +++=
199NMR-Based Protein Docking
Spectrum Synthesis/Comparison
• Removal of rapidly exchanging protons
• Lorentzian peaks of equal width
�−+
=i
Wi
S2
)(1
1)(δδ
δ
• Spectrum comparison via difference area ∆
−=−∆ δδδ dSSSS expcalcexp.calc |)()(|)( ...
NMR-Based Protein Docking
Synthesized Spectra
NMR-Based Protein Docking
Results – Test Set
• Bound complex structures (PDB)– 1DT7 A/B
– 1DT7 A/X
– 1CFF
– 1CKK
• NMR assignments (BMRB)– Reconstruction of exp. spectra
NMR-Based Protein Docking
Results - Overview
121DT7 A/B
121CKK
241CFF
101DT7 A/X
Rank of 1st true positive
# false positive among top 10Complex
NMR-Based Protein Docking
1DT7 A/X (NMR-based)
NMR-Based Protein Docking
1DT7 A/X (ACE)
NMR-Based Protein Docking
206NMR-Based Protein Docking
Open Problems
• Improving the shift model
– H-bonds
– Solvent effects
• Modeling peak width, couplings?
• Blind predictions of protein complexes
• Validation with larger data set and exp. data
Summary
• Protein Docking is an interesting problem!
• Key difficulties
– Energetic evaluation
– Protein flexibility
• Integration of experimental evidence
improves scoring
Scoring Functions for Protein Docking
Free Energy Functions
• Contact-based
– Atomic contact potential
– Residue contact potentials
• Molecular Mechanics
• Surface-based potentials
• …..
Scoring Functions for Protein Docking
Contributions to Free Energy I
• Hydrogen bonds
• Salt bridges
• Entropic contributions
– Solvent effects
– Loss of side-chain entropy
– Loss of DOF on binding
Scoring Functions for Protein Docking
Contributions to Free Energy II
• Electrostatic interactions
– Coulomb-based
– Continuum models (solvent effects)
• Hydrophobic interactions
• Van-der-Waals interactions
• ….
Scoring Functions for Protein Docking
Energetic EvaluationJackson, Sternberg (1994):
– continuum electrostatics
• solution of the Poisson-Boltzmann equation
• includes electrostatic solvent effects
– cavitation free energy
• measure for the hydrophobic interaction
vdWconfcavelebind GGGGG ∆+∆+∆+∆=∆ cavelebind GGG ∆+∆=∆
Scoring Functions for Protein Docking
Energetic Evaluation
Jackson, Sternberg (1994):
cavelebind GGG ∆+∆=∆
Scoring Functions for Protein Docking
Energetic Evaluation
Jackson, Sternberg (1994):
cavelebind GGG ∆+∆=∆BA
intBsol
Asolele GGGG −∆+∆∆+∆∆=∆
BALL: Design und Architektur
Poisson Boltzmann Equation
• Spatially dependent dielectric constant ε(r)– ~2-4 inside protein– 78 outside (water)
• Charge distribution ρ � potential φ• Solve differential equation on grid• Yields total electrostatic free energy ∆GES
( )0
2 )()(sinh)()()(
ερϕκϕε r
Tk
rerrr
0 ����� −=��
�
�
��
�
�−∇∇
Scoring Functions for Protein Docking
Continuum Electrostatics
AsolG∆∆
BsolG∆∆
BAintG −∆
BAintG −∆
Scoring Functions for Protein Docking
Energetic Evaluation
Jackson, Sternberg (1994):
cavelebind GGG ∆+∆=∆BA
intBsol
Asolele GGGG −∆+∆∆+∆∆=∆
)( BAABMSMSMScav AAAAG −−=∆=∆ γγ
From Rigid to Flexible Docking
Docking of Unbound Structures
• All previous examples are “easy”– structures A and B derived from true complex
structure (bound structures)
– Geometric complementarity guaranteed!
• More interesting problem:– Use unbound structures
– Problem: few test sets - A, B, and AB have to be known!
Protein Docking: Introduction
Why Docking?
Helmer-Citterich, Tramontano, J. Mol. Biol. 1994
Decomposition of A/B into Disks
• Calculate protein surface• Decompose into disks along z-axis• Decompose disk into segments of equal area• Differential surface: (r2 – r1), (r3 – r2),…
Fig. after Helmer-Citterich, Tramontano, JMB 1994
r1 - rn….r6 – r5r5 – r 4r4 – r3r3 – r2r2 – r1
Helmer-Citterich, Tramontano, J. Mol. Biol. 1994
Matching of Disc Segments
• Try to match windows, i.e. stretches of successive radii in a disk, from A and B
• Matching windows define tentative transformations
• Aggregate overlapping windows to regions
Helmer-Citterich, Tramontano, J. Mol. Biol. 1994
Horizontal Windows GrowingA
B
Helmer-Citterich, Tramontano, J. Mol. Biol. 1994
Overall Algorithm• Determine surface of A and B
• Decompose A into disks along z-axis
• For all rotations of B:– Decompose B into disks
– Find matching windows on differential surface
– Grow windows horizontally
– Grow windows vertically (→ regions)
– Each region defines a transformation