Alchemical Free Energy Calculation With Gromacs

55
Alchemical free energy calculations with gromacs gromacs workshop - Stanford, CA - Apr 7-8, 2008 John D. Chodera 1 , David L. Mobley 2 , Michael R. Shirts 3 1 Department of Chemistry, Stanford University 2 Department of Pharmaceutical Chemistry, University of California, San Francisco 3 Department of Chemistry, Columbia University Building upon the work of many, many others... 1

Transcript of Alchemical Free Energy Calculation With Gromacs

Page 1: Alchemical Free Energy Calculation With Gromacs

Alchemical free energy calculations with gromacs

gromacs workshop - Stanford, CA - Apr 7-8, 2008

John D. Chodera1, David L. Mobley2 , Michael R. Shirts3

1 Department of Chemistry, Stanford University2 Department of Pharmaceutical Chemistry, University of California, San Francisco3 Department of Chemistry, Columbia University

Building upon the work of many, many others...

1

Page 2: Alchemical Free Energy Calculation With Gromacs

Many questions in biology and pharmaceutical chemistry involve free energy differences

How does compound X partition between different environments?

How tightly does compound X bind protein Y? Protein Z?

How do I affect the binding affinity if I modify compound X? Or if I mutate or modify the protein?

How do one or more mutations at a protein-protein interface affect binding?

e.g. how well does lipitor partition between octanol and water?

e.g. how well does clomifene bind/discriminate ERα/ERβ

e.g. what is the difference in binding between pseudoephedrine and phenylephrine to adrenergic receptor?

e.g. which residues at the interface contribute most to binding, and how can mutations cause disease?

2

Page 3: Alchemical Free Energy Calculation With Gromacs

Free energy differences are often easier to compute through alchemical routes

Absolute free energies of bindingRelative free energies of binding

!Gbind PLP + L

PøP + ø

!G1

!G2

!Gb!Ga

!!G = !Ga ! !Gb

PL1P+L1

PL2P+L2

3

Page 4: Alchemical Free Energy Calculation With Gromacs

Zn =

!dx e

!!U(x)

Alchemical intermediates can facilitate convergence

Z =

!e!!H(x)dx (1)

!F = !!!1 ln Z (2)

P (x) =e!!H(x)

Z= e!(F!H(x)) (3)

PA(x)

PB(x)=

exp !(FA !HA(x))

exp !(FB !HB(x))= e!(!F!!H(x)) (4)

PA(x)

PB(x)= exp (!(!F !!E(x)) (5)

PA(!E

PB(!!E)= exp (!(!F !!E)) (6)

PA(x) = PB(x)e!!F!!!E(x) (7)

1 ="(e!(!F!!E(x))

#B

(8)

!F = !!!1 ln"e!!!E(x)

#B

(9)

!F1"N = !!!1 lnZN

Z1= !!!1 ln

Z2

Z1· Z3

Z2· · · ZN

ZN!1=

N!1$

n=1

!Fn"n+1 (10)

1

D. Wu and D. A. Kofke. "Phase-space overlap measures. I. Fail-safe bias detection...", J. Chem. Phys. 123: 054103 (2005).

number of samples

bias

(kT)

Error increases exponentially with diminishing phase space overlap

Instead, introduce intermediate states to ensure a contiguous chain of good overlap

e.g. two harmonic oscillators separated by various distances

4

Page 5: Alchemical Free Energy Calculation With Gromacs

!F =

! !2

!1

d!!

"

"H

"!

#

!!

!

!!

2

$

"

"H

"!

#

!1

+

"

"H

"!

#

!2

%

!F = !!!1 ln

!

e!!(U2!U1)"

"1

= +!!1 ln

!

e!!(U1!U2)"

"2

!F = !!!1 ln"f(U2 ! U1)#!1

"f(U1 ! U2) exp[!!(U2 ! U1)]#!2

Multiple ways to extract free energy differences from simulations of intermediate states

-10 -5 0 5 10 15 200.00

0.05

0.10

0.15

0.20

0.25

0.30

-5 0 5 10 15 200.00

0.05

0.10

0.15

0.20

0.25

0.30

TI (thermodynamic integration)

EXP (exponential reweighting)

BAR (Bennett acceptance ratio)

quadrature error hard to quantify

suffers from large bias and variance

5

Page 6: Alchemical Free Energy Calculation With Gromacs

http://arxiv.org/abs/0801.1426

Statistically optimal analysis of samples from multiple equilibrium states

Multistate Bennett acceptance ratio (MBAR) estimatorextracts optimal estimate from multiple states

Michael R. Shirts (Department of Chemistry, Columbia University), John D. Chodera (Department of Chemistry, Stanford University)

https://simtk.org/home/pymbar

- optimal estimator for free energy differences from equilibrium data- robust estimates of uncertainties- combine data from multiple temperatures, pressures, bias potentials- freely-available Python implementation- provides tools to subsample correlated data to extract independent data

-150 -100 -50 0 50 100 1500.000

0.005

0.010

0.015

0.020

0.025

6

Page 7: Alchemical Free Energy Calculation With Gromacs

Examples: Free energies of solvation

• Example: Solvation of 3-methyl indole

• 5 ns explicit water MD at 9 states

Method Value

EXP (fwd) 3.80 ± 0.10

EXP (rev) 2.94 ± 0.14

EXP (avg) 3.37 ± 0.17

BAR 3.29 ± 0.06

MBAR 3.23 ± 0.07

7

Page 8: Alchemical Free Energy Calculation With Gromacs

T. Steinbrecher, D. L. Mobley, and D. A. Case. "Non-linear scaling schemes for Lennard-Jones interactions in free energy calculations", J. Chem. Phys. 127: 214108 (2007). (DOI)

Optimal alchemical intermediates depend on problem

Goals in choice of intermediates:few intermediatesgood overlap between intermediatesminimize correlation timesminimize variance and size of space to be sampled

For example:

128 intermediates for extracting toluene from water

probably overkill

8

Page 9: Alchemical Free Energy Calculation With Gromacs

Hydration free energies

N

HN

gas

water

Sgas

ΔG

Saq

Sgas, neutral Ø

Saq, neutral Ø

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

9

Page 10: Alchemical Free Energy Calculation With Gromacs

Choice of intermediates for hydration free energy

electrostatics

Lennard-Jones

annihilation decoupling

(no interactions within annihilated molecule retained)

(interactions withinannihilated molecule retained)

smaller ∆Glonger correlation times due to naked chargesno vacuum recharging calculation required3 PME evaluations per timestep, tricky to use

potentially large ∆Gshort correlation timesrequires vacuum recharging calculation

requires LJ annihilation vacuum simulation no vacuum simulation requiredeliminates potentially unphysical conformations

Also possible to turn off both electrostatics and LJ simultaneously (e.g. pulling in 4th dimension), but much more tricky.

10

Page 11: Alchemical Free Energy Calculation With Gromacs

Hydration free energies: Example workflow

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

IUPAC name

molecule

conformations (mol2)

AMBER molecule parameters

gromacs molecule topology

Lexichem (OpenEye toolkit)

Omega (OpenEye toolkit)

http://www.eyesopen.com

GAFF + AM1-BCCAntechamber (AmberTools)

http://ambermd.org/#AmberTools

amb2gmxhttp://www.alchemistry.org

11

Page 12: Alchemical Free Energy Calculation With Gromacs

Hydration free energies: Example workflow

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

gromacs molecule topology

gromacs solvated topology and coordinates

many painful steps

discharging in water decoupling LJ in water recharging in vacuum

custom topology modification scripts

- zero charges in ‘A state’ and ‘B state’- [nonbond_params] to retain intramolecular LJ throughout

- zero charges in ‘B state’ - zero charges in ‘B state’

at least one simulationper intermediate

5 intermediates 16 intermediates 5 intermediates

12

Page 13: Alchemical Free Energy Calculation With Gromacs

x

U1(x) U2(x) U3(x) UK(x)

!2!G = !

2!Gdischarge + !2!Gdecouple + !

2!Grecharge

Hydration free energies: Example workflow

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

Each simulation requires postprocessing to compute potential at all intermediates

. . .

Each leg of the thermodynamic cycle (discharging, LJ decoupling, recharging) is analyzed separately

Errors are propagated to final result

13

Page 14: Alchemical Free Energy Calculation With Gromacs

; NEIGHBORSEARCHING PARAMETERS = nstlist = 10 ; nblist update frequency = ns_type = grid ; ns algorithm (simple or grid) = pbc = xyz ; Periodic boundary conditions: xyz or no = rlist = 1.0 ; nblist cut-off =

; OPTIONS FOR ELECTROSTATICS AND VDW = ; Method for doing electrostatics = coulombtype = PME ; particle-mesh Ewaldrcoulomb = 0.9 ; direct-space cutoffvdw-type = switch ; use switch function with LJ for best energy conservationrvdw-switch = 0.8 ; switch onrvdw = 0.9 ; switch off (cutoff)DispCorr = AllEnerPres ; apply long-range dispersion correction for proper densities

; PME parameters determined by regression to ensure error is < 0.01 kcal/molfourierspacing = 0.10 ; spacing for FFT gridpme_order = 6 ; be wary of bug with earlier versions of gromacs where pme_order = 6 gives garbageewald_rtol = 1e-06 ; relative toleranceewald_geometry = 3depsilon_surface = 0

Simulation parameters

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

Nonbonded parameters

Thermostat and barostat; RUN CONTROL PARAMETERS =integrator = sd ; use Langevin thermostat with weak viscosity for best thermoastatting

;OPTIONS FOR TEMPERATURE COUPLINGtc_grps = System ; thermostat entire systemtau_t = 1.0 ; weak viscosity to not slow down phase space diffusion too muchref_t = 300 ; desired temperature

;OPTIONS FOR PRESSURE COUPLINGPcoupl = No ; is Rahman-Parrinello still broken? Currently, we equilibrate each intermediate with Berendsen independently, ; then fix volume for production simulation

14

Page 15: Alchemical Free Energy Calculation With Gromacs

Comparison of charge models for fixed-charge forcefields:Small molecule hydration free energies in explicit solvent

Élise Dumont†, David L. Mobley‡, John D. Chodera§, and Ken A. Dill*‡† Laboratoire de Chimie Théorique, Université Pierre et Marie Curie-CNRS, Paris‡ Department of Pharmaceutical Chemistry and § Graduate Group in Biophysics, University of California, San Francisco*E-mail: [email protected]

Thèse de doctorat de l’Université Pierre et Marie Curie

Spécialité : Chimie Informatique et Théorique

École doctorale de Chimie physique et analytique Paris Centre

présentée par Mlle Élise Dumont

pour l’obtention du grade de Docteur de l’Université Pierre et Marie Curie

Sujet de la thèse :

Utilisation de charges nucléaires fictives pour l’étude des e!ets élec-troniques de substituants : la méthode H*. Application pour l’étudedes e!ets inductifs purs et pour la comparaison des e!ets inductifs etmésomères sur des grandeurs spectroscopiques ou de réactivité.

Soutenue le 14 juin 2006 devant le jury composé de :

Prof. Patrick Chaquin Université Paris VI Directeur de thèseDr. Frank De Proft Université Libre de Bruxelles RapporteurDr. Philippe Hiberty CNRS – Université Paris-Sud ExaminateurProf. Ludovic Jullien ENS – Université Paris VI ExaminateurProf. Jean-Louis Rivail Université Nancy I Rapporteur

TABLE 1:: Experimentala and Computedb Hydration Free Energies for Molecules in this Study (in kcal/mol)

RESP fit from ab initio electrostatic potentialSolute Expa AM1-BCC AM1-CM2 SCF/6-31G* SCF/6-31G* B3LYP/6-31G* B3LYP/6-31G* B3LYP/TZ B3LYP/TZ MP2/TZ MP2/TZ

SCRF SCRF SCRF SCRFTIP4P-Ew AUE 1.24 1.55 1.27 2.43 2.03 1.65 1.93 1.45 1.46 2.46

RMSE 1.51 2.27 1.58 3.69 2.40 2.53 2.32 1.96 1.79 3.97R2 0.93 0.78 0.88 0.90 0.81 0.89 0.85 0.90 0.84 0.88Slope 1.22 1.22 1.15 1.72 0.89 1.47 0.86 1.34 1.07 1.76

TIP3P AUE 0.92 1.38 0.82 1.62 1.62 0.91 1.58 0.78 1.00 1.66RMSE 1.10 1.97 1.04 2.17 1.97 1.29 1.90 1.00 1.29 2.35R2 0.94 0.75 0.94 0.96 0.92 0.95 0.94 0.95 0.91 0.94Slope 1.03 1.04 0.98 1.39 0.79 1.21 0.79 1.13 0.93 1.40

a Experimental hydration free energies were taken from reference ? , and have an uncertainty of 0.2 kcal/mol.b All computed free energies have a computed uncertainty (one standard deviation of the mean) of less than 0.1 kcal/mol.

TABLE 1:: Experimentala and Computedb Hydration Free Energies for Molecules in this Study (in kcal/mol)

RESP fit from ab initio electrostatic potentialSolute Expa AM1-BCC AM1-CM2 SCF/6-31G* SCF/6-31G* B3LYP/6-31G* B3LYP/6-31G* B3LYP/TZ B3LYP/TZ MP2/TZ MP2/TZ

SCRF SCRF SCRF SCRFTIP4P-Ew AUE 1.24 1.55 1.27 2.43 2.03 1.65 1.93 1.45 1.46 2.46

RMSE 1.51 2.27 1.58 3.69 2.40 2.53 2.32 1.96 1.79 3.97R2 0.93 0.78 0.88 0.90 0.81 0.89 0.85 0.90 0.84 0.88Slope 1.22 1.22 1.15 1.72 0.89 1.47 0.86 1.34 1.07 1.76

TIP3P AUE 0.92 1.38 0.82 1.62 1.62 0.91 1.58 0.78 1.00 1.66RMSE 1.10 1.97 1.04 2.17 1.97 1.29 1.90 1.00 1.29 2.35R2 0.94 0.75 0.94 0.96 0.92 0.95 0.94 0.95 0.91 0.94Slope 1.03 1.04 0.98 1.39 0.79 1.21 0.79 1.13 0.93 1.40

a Experimental hydration free energies were taken from Abraham et al., 1990, and have an uncertainty of 0.2 kcal/mol.b All computed free energies have a computed uncertainty (one standard deviation of the mean) of less than 0.1 kcal/mol.

Error across all solutes for all charge and water models:

Thick vertical line shows experimental measurement from Abraham et al., while thin likesshow 95% confidence interval for agrement with eperiment, factoring in both experimental uncertainties (0.2 kcal/mol) and computed uncertaties (less than 0.1 kcal/mol).

Computed and experimental hydration free energies for TIP3P

!20 !15 !10 !5 0 5Hydration Free Energy (kcal/mol)

methane

propane

n!butane

isobutane

n!pentane

cyclopentane

pent!1!ene

butadiene

1!chloro!pentane

1!bromo!pentane

methanethiol

propanethiol

ethylmethylsulfide

diethylether

methanol

ethanol

propan!1!ol

1,1,1!trifluoropropan!2!ol

ethanamide

propionamide

penta!2!one

diethylamine

MP2/cc!pVTZ SCRFMP2/cc!pVTZB3LYP/cc!pVTZ SCRFB3LYP/cc!pVTZB3LYP/6!31G* SCRFB3LYP/6!31G*SCF/6!31G* SCRFSCF/6!31G*AM1!CM2AM1!BCCexperiment

!20 !15 !10 !5 0 5Hydration Free Energy (kcal/mol)

triethylphosphate

benzene

toluene

ethylbenzene

p!xylene

indane

naphthalene

biphenyl

fluorobenzene

benzonitrile

nitrobenzene

aniline

quinone

phenol

p!cresol

o!cresol

benzamide

pyridine

pyrrole

thiophene

3!methylindole

4!methylimidazoleNH

O

NH2

O

NH2

O

F

FF

OH

OH

CH3OH

OH

O

S

SH

CH3SH

Cl

Br

CH4

HN

S

HN

N

NH2

O

OH

OH

OH

N

HN

O O

NH2

N+

O

O-

N

F

O

POO

O

How well does SASA and cavity volume correlate with the nonpolar component of explicit solvent hydration free energies?

Implicit solvent hydration free energies and atom-type dependent SASA model from:

R.!C. Rizzo, T.!Aynechi, D.!A. Case, and I.!D. Kuntz. Estimation of absolute free energies of hydration using continuum methods: Accuracy of partial charge models and optimization of

nonpolar contributions. J. Chem. Theor. Comput., 2:128–139, 2006.

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

LJ component of hydration free energy (kcal/mol)

1.0

1.5

2.0

2.5

3.0

3.5

Su

rface a

rea (

nm

^2

)

solvent-accessible surface area

(TIP3P)

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

LJ component of hydration free energy (kcal/mol)

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

Volu

me (

nm

^3

)

cavity volume

(TIP3P)

Poor correlation suggests a more sophisticated treatment may be beneficial to implicit solvent models.

-3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0

LJ component of hydration free energy (kcal/mol)

0

1

2

3

4

5

6

7

Su

m o

f g

am

ma_i*

SA

_i

atom type dependent SASA of Rizzo et al.

(TIP3P)

Acknowledgments

Funding:ED gratefully acknowledges a graduate fellowship from the French Research Minister.JDC acknowledges support by HHMI and IBM predoctoral fellowshipsDLM and KAD acknowledge NIH grants GM34993 and GM06392 and Pfizer.Many great people for helpful discussions: William Swope, Jed Pitera, Julia Rice (IBM Almaden Research Center), Terry Lang and Matt Jacobson (UCSF), Christopher Bayly (Merck Frosst), Michael Shirts (Columbia), Vijay Pande (Stanford), and Chris Oostenbrink (Vrije Universiteit, Amsterdam).Misc materials: Eric Sorin and Vijay Pande (Stanford) for an AMBER-to-gromacs converter; Kaushik Raha (UCSF) for AM1-BCC charges; Rob Rizzo for implicit solvent data.Computational resources:This work was performed in part on the UCSF QB3 Shared Computing Facility.Quantum chemical calculations were performed on CCR clusters (University Paris VI).

Charge models evaluated:AM1-CM2AM1-BCCHF/6-31G*B3LYP/6-31G*B3LYP/6-31G*MP2/cc-pVTZSolvent models:TIP3PTIP4P-Ew

}each with and without SCRF

charges derived using RESP

Figure 7. Sample partial charges for several small compounds,AM1BCC and RESP from B3LYP/cc-pVTZ+SCRF and SCF/6-31G* electrostatic potentials. For these four compounds, ab initiocharges are quite different from empirical ones. The oxygen chargeis more negative with ab initio methods. For ethanol and 1-chloro-pentane, we got slightly counter-intuitive charges, with positive car-bons and negative hydrogens, but the effect was relatively small.

21

some charges for ethanol

Does inclusion of a polarization correction aid agreement with experiment?

SCF/6-31G* B3LYP/6-31G* B3LYP/TZ MP2/TZSCRF SCRF SCRF SCRF

TIP4P-Ew AUE 1.73 1.41 1.41 1.88RMSE 2.67 1.86 1.69 3.09

R2 0.92 0.92 0.92 0.90Slope 1.42 1.23 1.14 1.47

TIP3P AUE 0.93 0.85 0.92 1.02RMSE 1.26 1.04 1.13 1.44

R2 0.97 0.96 0.97 0.95Slope 1.16 1.03 0.96 1.17

TABLE 5: Statistics for corrected SCRF hydration free energies that include the quantum mechanical energy cost of polarizing the chargedistribution from the vacuum distribution to that computed using SCRF. The computed errors are not significantly better than those in Table 1,although they are more consistent across the levels of theory.

18

SCF/6-31G* SCF/6-31G* B3LYP/6-31G* B3LYP/6-31G* B3LYP/TZ B3LYP/TZ MP2/TZ MP2/TZSCRF SCRF SCRF SCRF

TIP4P-Ew AUE 1.55 1.73 2.76 1.41 2.67 1.41 1.85 1.88RMSE 1.89 2.67 3.24 1.86 3.18 1.69 2.26 3.09

R2 0.90 0.92 0.85 0.92 0.85 0.92 0.84 0.90Slope 0.93 1.42 0.72 1.23 0.73 1.14 0.89 1.47

TIP3P AUE 1.43 0.93 2.48 0.85 2.43 0.92 1.65 1.02RMSE 1.83 1.26 2.96 1.04 2.91 1.13 2.10 1.44

R2 0.94 0.97 0.93 0.96 0.94 0.97 0.91 0.95Slope 0.81 1.16 0.66 1.03 0.66 0.96 0.77 1.17

TABLE 4:: Comparison of experimental with computed hydration free energies that include an estimate of polarizing the electronic chargedistribution from gas to condensed phase, computed from the B3LYP/cc-pVTZ quantum chemical calculations with and without SCRF.

When transferring a solute from gas to water, there is an energetic cost in polarizing the electronic density response to the environment (before accounting for favorable interactions of the polarized charge density with the environment). Fixed-charge forcefields cannot capture this effect. Does attempting to add in the cost of polarizing the electronic wavefunction (computed by QM to be up to 2.5 kcal/mol in magnitude) improve correlation with experiment?

vacuum charge distribution, in vacuum

polarized charge distribution, in vacuum

polarized charge distribution, in condensed phase

B3LYP/TZ used to estimate polarization energy for all charge models

polarization energy estimated for each pair of calculations with and without SCRF used individually

Correlation with experiment does not improve significantly when polarization corrections are included.

Alchemical hydration free energy computation:GAFF parameters for solute assigned by ANTECHAMBERAbsolute hydration free energies computed by annihilating solute electrostatic interactions and decoupling Lennard-Jones interactions with solvent

annihilation of solute electrostatics

decoupling of solute LJ interactions

restoration of electrostatics in vacuum

!Gnonpolar

!Gelec,wat

!Gelec,gas

All simulations conducted with gromacs.Detailed protocol with references available upon request.

R2 = 0.01 R2 = 0.11

Water model Charge model All compounds Subset by functionnalities

Alkanes (6) Arenes (6) Alcohols (6)

Implicit 13 AM1BCC 1.35 0.26 0.29 1.41AMSOL 2.46 0.31 1.34 2.08

RESP (SCF/6-31G*) 1.28 0.49 0.23 2.00

Explicit TIP3P AM1BCC 0.92 0.51 0.30 1.43AMSOL 1.38 0.41 0.73 1.49

RESP (SCF/6-31G*) 0.82 0.47 1.05 0.84RESP (B3LYP/cc-pVTZ+SCRF) 0.78 0.47 0.60 0.46

Explicit TIP4P-Ew AM1BCC 1.24 1.09 0.75 1.06AMSOL 1.55 1.09 0.99 1.41

RESP (SCF/6-31G*) 1.27 0.89 0.85 1.41RESP (B3LYP/cc-pVTZ+SCRF) 1.45 0.87 1.08 1.04

TABLE 3:: Influence of water model on hydration free energies error. The average unsigned errors (AUE) for several charge sets areshown. These were computed from our data, and are compared to results for the same set of compounds in implicit solvent from the work ofRizzo et al 13 (data provided by R. Rizzo). Overall results are compared, as well as several subfamilies of compounds. Values are given inkcal/mol. Surprisingly, implicit solvent performs almost as well as explicit solvent. It is also important to note that there seem to be significantperformance differences between AM1-BCC and RESP charges for alcohols in explicit solvent. This is probably because implicit solventdoes not capture the explicit hydrogen bonds that are important for such molecules.

Questions:Is a particular model for computing fixed partial atomic charges for small molecules clearly superior?

How well can fixed-charge forcefields reproduce experimental hydration free energies?

Can we gain insight into what is necessary to improve correlation with experiment?

Motivation:Alchemical free energy methods now allow the calculation of absolute binding free energies of small molecules to proteins with sufficient precision to potentially be useful. But how accurate can we expect these computed binding free energies to be? Can they be predictive?

A comprehensive study of computed binding free energies would currently be too computationally costly, but small molecule hydration free energies are now rapidly computable.

The best charge models give discrepancies with experimental hydration free energies of only ~ 1 kcal/mol.Despite superior bulk properties of TIP4P-Ew, the older TIP3P gives superior hydration free energies for this forcefield.The inexpensive AM1-BCC method does nearly as well as the expensive B3LYP/cc-pVTZ + SCRF + RESP.

!16 !14 !12 !10 !8 !6 !4 !2 0 2 4!16

!14

!12

!10

!8

!6

!4

!2

0

2

4

Experimental Hydration Free Energy (kcal/mol)

Com

pute

d Hy

drat

ion

Free

Ene

rgy

(kca

l/mol

)

AM1!BCCAM1!CM2SCF/6!31G*SCF/6!31G* SCRFB3LYP/6!31G*B3LYP/6!31G* SCRFB3LYP/cc!pVTZB3LYP/cc!pVTZ SCRFMP2/cc!pVTZMP2/cc!pVTZ SCRF

TIP3P

!25 !20 !15 !10 !5 0 5!25

!20

!15

!10

!5

0

5

Experimental Hydration Free Energy (kcal/mol)

Com

pute

d Hy

drat

ion

Free

Ene

rgy

(kca

l/mol

)

TIP4P-Ew

SCF/6-31G* charges yield hydration free energies that are too positive, while AM1-BCC charges tend to give hydration free energies that are too negative.

Amides, 4-methylimidazole, and triethylphosphate appear to be particularly problematic.

How do explicit solvent hydration free energies compare with implicit solvent?

Implicit solvent appears to perform nearly as well (1.3 kcal/mol error from experiment) for small molecules at much less computational cost.

Average unsigned error from experimental hydration free energies (from Abraham et al.) in kcal/mol.

Graphics generated by PyMOL.

Graphics generated by PyMOL.

D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)15

Page 16: Alchemical Free Energy Calculation With Gromacs

of complementarity remains a major challenge inmolecular docking and has been subject to intenseinvestigation (recently reviewed by Gohlke &Klebe1). With adequate sampling, an ideal scoringfunction should correctly rank a large set of dis-similar molecules and predict the correct modesof binding. Unfortunately, genuinely adequatesampling, as for instance might be availablethrough thermodynamic integration, free energyperturbation and related methods,2–5 would bemuch too slow to be used in docking screensof compound databases that can include over aquarter of a million molecules. Investigators havethus turned to faster, less accurate scoring func-tions for docking screens. Encouragingly, severalof these have been used to predict novel ligands.6–12

As important as these successes are, they are notthought to represent a general solution to thescoring problem in molecular docking, which con-tinues to be plagued by false negatives andfalse positives. There remains much interest indeveloping better scoring functions.

In developing new scoring functions, the field isconfronted with the problem of testing the effectof a new method. Typically, new scoring functions

are evaluated for their ability to reproduce knownligand-binding patterns for well-studied receptors.This can take the form of determining if theknown ligands are ranked favorably in a screen ofa database that contains mostly decoys, or testingif the experimental geometries of ligand–receptorcomplexes are reproduced. Recently, severalinvestigators have compiled databases containingthermodynamic data of binding and, in somecases, structural information of ligand–receptorcomplexes to facilitate this effort.13,14 Such studiesare certainly useful for testing the reliability ofexisting scoring functions as well as designingand parameterizing new ones. However, becauseof the entanglement of various energetic contri-butions in ligand–receptor binding (e.g. desol-vation of the ligand and the binding site, andconformational accommodation on binding),isolating the effect of particular changes inscoring functions can be difficult on the basis ofretrospective analysis alone. It would be useful tohave model systems that allow one to experi-mentally test prospective predictions from newdocking algorithms. Ideally, such a model systemwould be simple enough to allow one to isolatethe modification introduced in the new scoringfunction from other aspects of the dockingcalculation.

Here we use a cavity created in the core of T4lysozyme as a prototype model binding site to testa modification to a docking scoring function. Thiscavity site was created by substituting Leu99 withAla (L99A) in the core of the enzyme.15 It is com-pletely buried from solvent, small (volume about150 A3), uniformly hydrophobic, and contains noordered water molecules; it comes close to being anaked binding site16 (Figure 1). Binding has beenshown for 57 mostly apolar small molecules16,17

(listed in Supplementary Material); X-ray crystalstructures have been determined for nine of thesemolecules in complex with this site. In contrast,some polar isosteres of the known ligands are notfound to bind to the cavity site. For instance,although toluene binds to L99A with a Ka value of9.8 £ 103 M21, phenol and aniline are not observedto bind. Thus, although the L99A site is simple, itsligand preferences nevertheless capture the subtle-ties found in more complicated systems. This is arequirement for a model system. Experimentally,L99A is very accessible. Acquiring new possibleligands, most of which are expected to be com-mercially available, determining their bindingenergies, and determining the structures of theircomplexes with L99A, are all relatively straight-forward. These features make testing predictionsfor L99A practical, which is also important for amodel system.

We use this cavity site to investigate the effectof different ligand partial atomic charges andsolvation energies on docking calculations. Thedocking scoring function we use includes electro-static and van der Waals interaction energies,and is corrected for ligand desolvation energies

Figure 1. (a) The molecular surface (yellow) of thecavity in T4 lysozyme mutant L99A. Carbon atoms arein gray, oxygen atoms in red, nitrogen atoms inblue and sulfur atoms in yellow. For clarity, only theprotein atoms that surround the cavity are shown.(b) A cut-away view of the cavity reveals the boundbenzene (PDB entry 181L). Residue Met102 is labeled.All molecular graphics were rendered with NEON inMidas-Plus;65 all molecular surfaces were calculatedwith MS.66

340 A Model Binding Site for Molecular Docking

Energetics of Ligand Binding in a Nonpolar Cavity

Class I - "lsophobic" Ligands

y + y p g Q I

Propyl p-. m-, o-Ethyltoluene benzene

Ethyl p . m-, o-Xylene benzene

Class I1 - "Isosteric" Ligands

mz a 6m*

(9 3

Indene Indole Benzofuran Thianaphthene

Class 111 - Phenylalkanes

o g Q y qyK 61

82

Benzene Toluene Ethyl- Propyl- Butyl- iso-Butylbenzene

FIGURE 2: Three classes of ligands used for analysis of cavity binding. For reference, one ligand from each class is shown with atom labels. Substituent atoms are labeled a, p, etc. by analogy with protein side chains.

-a I

Injection

FIGURE 3: Representative titration profile for isobutylbenzene (-0.1 mM) titrated with L99A (4 mM) in 50 mM sodium acetate, pH 5.5. The offset upper trace shows L99A titrated into buffer without ligand. Injections of 10 p L of the protein solution were made every 2.5 min into the 1.4 mL reaction cell. After subtraction of blank runs, titrations were fit as described under Experimental Procedures to obtain the data in Table 2.

includes structural isomers of ethylbenzene and propylben-

zene. These isomers have transfer free energies similar to

their respective parents and were chosen to probe various regions of the cavity wall for possible differences in steric

constraints in the presence of equivalent hydrophobic effects. Class 11, the "isosterics", contains four isosteric molecules

of varying hydrophobicity. Because these molecules are expected to have similar steric interactions with the cavity,

they should, in principle, allow a direct analysis of the

hydrophobic contribution to the binding energy. Class I11

is comprised of monosubstituted alkylbenzenes of increasing

side-chain length. These were chosen to assess the relation between ligand size and binding energy in a manner similar

to the protein mutagenesis studies of the type Gly - Ala - Val - Leu, etc.

Titration calorimetry was used to determine quantitative

association constants at 29 "C for the binding to L99A of the ligands described above. A representative titration is shown in Figure 3. The binding energetics are presented in Table 2. Dissociation constants for the various ligands range

from 14 to 500 ,uM, comparable to many enzyme-substrate

Biochemistry, Vol. 34, No. 27,1995 8567

Table 2: Calorimetric Analysis of Ligand BindingY

K, x 10-3 -AH -RT In K, ligand (M-') (kcal/mol) (kcal/mol)

benzene 5.7 iz 1.7 6.32 f 0.37 -5.19 f 0.16 ethylbenzene 14.8 & 1.7 6.76 f 0.87 -5.76 f 0.07 o-xylene 2.13 f 0.22 8.45 f 0.96 -4.6 f 0.06 m-x ylene 2.75 f 0.8 6.04 f 0.03 -4.75 f 0.15 p-x ylene 2.37 f 0.25 6.97 iz 0.98 -4.61 & 0.06 propylbenzene 55.2 f 2.0 9.97 f 0.05 -6.55 f 0.02 2-ethyltoluene 1.98 f 0.20 7.71 iz 0.74 -4.56 f 0.06 3-ethyltoluene 5.05 f 0.15 7.84 f 0.02 -5.12 & 0.02 4-ethyltoluene 8.33 f 0.08 8.44 0.03 -5.42 & 0.01 benzofuran 8.9 f 0.5 8.04 f 0.44 -5.46 f 0.03 indene 5.17 & 0.09 8.31 f 0.48 -5.13 f 0.01 indole 3.45 f 0.38 11.23 i 0.94 -4.89 k 0.06 thianaphthene 13.6 iz 1.2 7.03 f 0.04 -5.71 f 0.05 toluene 9.8 f 0.6 6.53 f 0.73 -5.52 f 0.04 isobutylbenzene 51.0 iz 4.9 7.09 f 0.35 -6.51 f 0.06 n-butylbenzene 69.8 & 2.9 8.06 f 0.98 -6.7 f 0.02

~ ~ ~~ ____

K, is the association constant and AH the molar enthalpy of binding of the ligand to L99A lysozyme. Errors are given as the standard deviation of the mean calculated from multiple runs except in the cases of m-xylene, 2-ethyltoluene, propylbenzene, and thianaphthene, where the errors given are based on the goodness of the fit to the data.

binding constants. For all ligands, the molar enthalpy of

binding is large and negative, unlike the enthalpy of transfer

of liquid hydrocarbons from water to the neat organic phase,

which is typically very close to zero at room temperature

(Gill et al., 1976).

The observed binding energies of the ligands are compared

with their free energies of transfer from water to vapor in Figure 4 and from water to octanol in Figure 5 . Among the

class I molecules, the xylenes and ethyltoluenes bind much

more poorly than ethyl- and propylbenzene, respectively, and

their binding free energies do not correlate well with transfer free energies (Figures 4a, 5a). Furthermore, there is no

agreement between the binding free energies of the different structural isomers of ethylbenzene and those of propylben-

zene. This indicates that the binding energetics of the class I molecules are strongly influenced by steric factors.

The class I1 molecules show only a rough correlation

between their binding and transfer free energy, despite their

nearly identical shapes and sizes (Figures 4b, 5b).

Among the class I11 molecules, binding becomes tighter

as the side-chain becomes longer, up to a maximum of four

carbons. This is in accord with the expectation that the

hydrophobic effect provides a large contribution to binding

free energy. Figure 5c shows the relation between binding

energy and free energy of solvation as reflected in water-

octanol partition coefficients (slope = 0.56; R = 0.97).

Entropic Consequences of Binding. To directly compare

the observed binding free energies with solvent-transfer free

energies, we must account for those contributions to the

binding free energy that arise from purely statistical sources

and which differ between the binding and solvent-transfer processes. One such contribution arises from the entropic

cost of constraining a ligand to occupy a single conformation in the binding site, relative to its translational, rotational,

and intemal degrees of freedom in solution. In the present

analysis, we assume that the ligands lose all rotational and translational freedom in the bound state and that their

vibrational partition function does not change upon binding.

For each such mode the cost of constraining the ligand is given by AG AA = -RT In q, where q is the partition

Ligand binding free energies

Wei BQ, Baase WA, Weaver LH, Matthews BW, and Shoichet BK. JMB 322:339, 2002.

model hydrophobic binding site in T4 lysozyme L99A

16

Page 17: Alchemical Free Energy Calculation With Gromacs

!Gbind

Ligand binding free energies

Alchemical transformation progresses through a number of intermediates

Alchemical thermodynamic cycle provides alternative route to binding free energy

Graphics from David Mobley

PLP + L

PøP + ø

17

Page 18: Alchemical Free Energy Calculation With Gromacs

Restraints are used to aid convergence

Without restraining ligand in binding pocket, would need to sampleentire simulation box at each discharging/decoupling intermediate

Absolute Binding Free Energies: A Quantitative Approach for Their Calculation Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M.J. Phys. Chem. B.; (Article); 2003; 107(35); 9535-9551. DOI: 10.1021/jp0217839

Choice of atoms to restrain is arbitrary in principle,minor practical differences among choices

18

Page 19: Alchemical Free Energy Calculation With Gromacs

Ka =[PL]

[P ][L]=

e!!!Ga

(1 M)

Absolute Binding Free Energies: A Quantitative Approach for Their Calculation Boresch, S.; Tettinger, F.; Leitgeb, M.; Karplus, M.J. Phys. Chem. B.; (Article); 2003; 107(35); 9535-9551. DOI: 10.1021/jp0217839

The standard state for binding free energies

Association constant is not unitless:

Free energy of binding is defined with respect to a reference stateStandard reference state is 1M ligand (1 ligand / 1660 A3)

Luckily, if you use Boresch’s ligand restraints to aid convergence,he has already computed the free energy correction for you!

19

Page 20: Alchemical Free Energy Calculation With Gromacs

; Apply long range dispersion corrections for Energy and Pressure = DispCorr = AllEnerPres

M. R. Shirts*, D. L. Mobley*, J. D. Chodera, and V. S. Pande. "Accurate and efficient corrections for missing dispersion interactions in molecular simulations", J. Phys. Chem. B 111:13052-13063 (2007).

9A

25A

Anisotropic long-range dispersion correction

Simulations in solvent must be run with long-range dispersion correction to ensure results are not sensitive to choice of cutoff.

This correction assumes isotropic distribution of Lennard-Jones sites throughout system.

isotropic assumption holds isotropic assumption fails

Instead, we have to enlarge cutoff so that isotropic assumption holds

!G9A

!G!

PLP + L

PLP + L

An explicit postprocessing step recomputes energies with large cutoff and estimates perturbation free energies using EXP.

isotropic assumption holds

Can make a difference of 3 kcal/mol, depending on number of ligand atoms

20

Page 21: Alchemical Free Energy Calculation With Gromacs

Protein workflow

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

PDB file(s) + sequence

full-chain heavy-atom models

pH-appropriate all-atom models

AMBER solvated system

gromacs solvated system topology and coordinates

MODELLER

MCCE

http://www.salilab.org/modeller

tleap / sleaphttp://ambermd.org/#AmberTools

amb2gmxhttp://www.alchemistry.org

http://www.sci.ccny.cuny.edu/~mcce

pdb2gmx and friends~ or ~

injectligand topology

21

Page 22: Alchemical Free Energy Calculation With Gromacs

Protein workflow

M. R. Shirts, J. W. Pitera, W. C. Swope, and V. S. Pande. J. Chem. Phys. 119:5740 (2003)D. L. Mobley, E. Dumont, J. D. Chodera and K. A. Dill. J. Phys. Chem. B. 111:2242-2254 (2007) (DOI)

gromacs solvated topology and coordinates

gromacs solvated topology and coordinates

add ligand restraints

restraining ligandin complex

decoupling ligand LJ in complex

recharging ligandin vacuum

custom topology modification scripts

- restraints defined- zero ligand charges- [nonbond_params] to retain intramolecular LJ throughout

- restraints defined - zero charges in ‘B state’

discharging ligandin complex

- restraints defined- zero ligand charges in ‘B state’

22

Page 23: Alchemical Free Energy Calculation With Gromacs

Multiple ligand conformations can contribute

Y88. The simulations were initiated from the orientation ofFig. 4!a", since it is most similar to the orientation in thecocrystal structure.

The computed free energy !see Table I" is5.95±0.09 kcal/mol with 1 ns at each ! value, and is5.39±0.09 with 5 ns at each ! value !for every ! value, notjust for the restraining step". No conformations were dis-carded in calculating these values. If computed transfer free

energies are converged, they should be the same whetherorientational restraints or only distance restraints are used.This is not the case here. Even with substantially increasedsampling during the discharging and Lennard-Jones decou-pling steps, this is an error of approximately 4 kcal/mol.This suggests that the use of orientational restraints greatlyimproves the ease of convergence of these calculations.

Interestingly, we find that, even without orientational re-

FIG. 6. Stable orientations of catechol observed from unrestrained simulations initiated from docking clusters. From the unrestrained simulations, we identifytwo main stable orientations for catechol, shown in !a" and !b", between which we see no transitions, so we conduct separate simulations restraining to eachof these orientations, as for phenol. Shown are final snapshots from the 5 ns molecular dynamics trajectories run with full restraints on the ligand.

FIG. 7. Time series and histogram of in-plane rotation for weakly restrained catechol. Time series !left" and histograms !right" for "B, the degree of freedomwhich describes in-plane rotation of the aromatic ring in the binding site, for simulations at the weakest restraints, !=0.01. !a" shows a 5 ns simulationbeginning from orientation 1 and !b" shows a 5 ns simulation beginning from orientation 2. Clearly, there is no interchange between the two orientations here,as would be required for convergence.

084902-12 Mobley, Chodera, and Dill J. Chem. Phys. 125, 084902 !2006"

Downloaded 23 Feb 2007 to 169.230.228.41. Redistribution subject to AIP license or copyright, see http://jcp.aip.org/jcp/copyright.jsp

Difference is only 0.7 kcal/mol!

D. L. Mobley, J. D. Chodera, K. A. Dill. J. of Chem. Phys. 125:084902, 2006.

23

Page 24: Alchemical Free Energy Calculation With Gromacs

enifnoc 42.4=G enifnoc 28.2=G enifnoc 10.0=G

oloh opa

-150 -100 -50 0 50 100 150

Dihedral angle (degrees)

0

2

4

6

8

10

12

PM

F (

kcal/

mol)

Multiple protein conformations can contribute

-3.0 kcal/mol-7.3 kcal/molbinding free energy

Val111 χ1 in apo structure

D. L. Mobley, J. D. Chodera, and K. A. Dill, J. of Chem. Theory and Comput. 3(4):1231-1235 (2007). (DOI).

24

Page 25: Alchemical Free Energy Calculation With Gromacs

enifnoc 42.4=G enifnoc 28.2=G enifnoc 10.0=G

oloh opa

-150 -100 -50 0 50 100 150

Dihedral angle (degrees)

0

2

4

6

8

10

12

PM

F (

kcal/

mol)

Multiple protein conformations can contribute

-3.0 kcal/mol-7.3 kcal/molbinding free energy

confinement free energy 4.2 kcal/mol 0.0 kcal/mol

Val111 χ1 in apo structure

D. L. Mobley, J. D. Chodera, and K. A. Dill, J. of Chem. Theory and Comput. 3(4):1231-1235 (2007). (DOI).

24

Page 26: Alchemical Free Energy Calculation With Gromacs

enifnoc 42.4=G enifnoc 28.2=G enifnoc 10.0=G

oloh opa

-150 -100 -50 0 50 100 150

Dihedral angle (degrees)

0

2

4

6

8

10

12

PM

F (

kcal/

mol)

Multiple protein conformations can contribute

-3.0 kcal/mol-7.3 kcal/molbinding free energy

confinement free energy 4.2 kcal/mol 0.0 kcal/mol

net -3.1 kcal/mol -3.0 kcal/mol

Val111 χ1 in apo structure

D. L. Mobley, J. D. Chodera, and K. A. Dill, J. of Chem. Theory and Comput. 3(4):1231-1235 (2007). (DOI).

24

Page 27: Alchemical Free Energy Calculation With Gromacs

enifnoc 42.4=G enifnoc 28.2=G enifnoc 10.0=G

oloh opa

-150 -100 -50 0 50 100 150

Dihedral angle (degrees)

0

2

4

6

8

10

12

PM

F (

kcal/

mol)

Multiple protein conformations can contribute

-3.0 kcal/mol-7.3 kcal/molbinding free energy

confinement free energy 4.2 kcal/mol 0.0 kcal/mol

release -0.3 kcal/mol -0.6 kcal/mol

total binding free energy -3.4+-0.3 -3.6+-0.3≈

net -3.1 kcal/mol -3.0 kcal/mol

Val111 χ1 in apo structure

D. L. Mobley, J. D. Chodera, and K. A. Dill, J. of Chem. Theory and Comput. 3(4):1231-1235 (2007). (DOI).

24

Page 28: Alchemical Free Energy Calculation With Gromacs

Ligand DOCK Score Prediction1 !Gocalc

2 !Tm Experiment !Goexpt.

(kcal/mol) (kcal/mol) (oC) (kcal/mol)

1,2-dichlorobenzene -19.99 Binder !5.66± 0.15 2.90 Binder -6.37

n-methylaniline -17.29 Binder !5.37± 0.11 1.00 Binder -4.70

1-methylpyrrole -15.27 Binder !4.32± 0.08 2.20 Binder -4.44

1,2-benzenedithiol -18.51 Binder !2.79± 0.13 2.50 Binder N.D.

thieno-[2,3-c]pyridine -18.81 Nonbinder !2.56± 0.07 -0.40 Nonbinder N.D.Table 4Novel ligands for which predictions were made. DOCK scores, shown, suggested all five should bind. Binding free energy calculations wereinitially used to predict whether or not these molecules would bind, then !Tm values were found experimentally to test these predictions;results are in the Experiment column. Following this, final binding free energy predictions (!Go

calc) were tested experimentally withisothermal titration calorimetry; results are as shown (!Go

expt). The RMS di"erence between predicted !Go and experiment for thethree compounds tested with ITC compounds is 0.57 kcal/mol. 1 – initial predictions were made using AM1-CM2 charges. 2 – beforedoing ITC, predictions were refined using AM1-BCC charges, which testing had indicated gave higher accuracy.

16

Free energy calculations can be predictive

Remarkable prediction accuracy of 0.5 kcal/mol!

i.e. “We got really lucky on a small test set.”

D. L. Mobley, A. P. Graves, J. D. Chodera, A. C. McReynolds, B. K. Shoichet and K. A. Dill. J. of Mol. Biol. 371(4):1118-1134 (2007). (DOI).

25

Page 29: Alchemical Free Energy Calculation With Gromacs

Checklist of potential concerns in binding calculations

CH2

O P

O

OO

CH2

OH

CH2

NH+

NH

CH2

N

NH

NH

O

NHHN N

Cl

NH

OH

NHN N

Cl

NaCl MgCl2

Protein conformationWhich conformation is most likely?Conformational change upon bindingMultiple conformations contributing to binding

Post-translational modificationsPhosphorylation, glycosylation, acylation, alkylation

Protein protonation stateAppropriate choice of protonation stateChange in protonation state upon bindingMixture of protonation states relevant to binding

Ligand protonation/tautomeric stateAppropriate choice of protonation/tautomeric stateChange in protonation/tautomeric state upon bindingMixture of protonation/tautomeric states relevant to binding

Salt environmentSalt required for functionAppropriate salt parametersOther cosalts, cosolvents, and chelators

26

Page 30: Alchemical Free Energy Calculation With Gromacs

http://amber.scripps.edu/antechamber/

ffAMBER

http://chemistry.csulb.edu/ffamber/

http://www.pharmacy.manchester.ac.uk/bryce/amberAMBER parameter database

Checklist of potential concerns in binding calculations

Image by Leo Reynolds

Ligand parameter assignmentAnecdotal reports of Antechamber issues

Protein forcefield choiceparm96 deprecated; parm03 unvalidated for free energies

modified amino acid parametersDon’t have time to rederive appropriate Only found parameters for parm99

Simulation timescalesCan we converge estimates for even a single conformation state?

cofactors or other peptides bound?

ffamber96, ffamber99sb, ffamber03

27

Page 31: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

28

Page 32: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

common substructure

variable region

28

Page 33: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

common substructure

variable region

28

Page 34: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

common substructure

variable region

28

Page 35: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

common substructure

variable region

28

Page 36: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

common substructure

variable region

28

Page 37: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

common substructure

variable region

28

Page 38: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructure

variable region

28

Page 39: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

common substructure

variable region

28

Page 40: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

common substructure

variable region

28

Page 41: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

common substructure

variable region

28

Page 42: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

common substructure

variable region

28

Page 43: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

C1L1

PL1 PC1

common substructure

variable region

28

Page 44: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

C1L1

PL1 PC1

C2L2

PL2 PC2

common substructure

variable region

28

Page 45: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

C1L1

PL1 PC1

C2L2

PL2 PC2

With four simulations, bonded terms cancel out

common substructure

variable region

28

Page 46: Alchemical Free Energy Calculation With Gromacs

Relative binding free energy calculations

GAFF-parameterized ligands

modified topology files

Maximum common substructure search (OpenEye toolkit)

http://www.eyesopen.com

common substructures

C1L1

PL1 PC1

C2L2

PL2 PC2

With four simulations, bonded terms cancel out

Avoid bridging rings in variable part

common substructure

variable region

28

Page 47: Alchemical Free Energy Calculation With Gromacs

How accurately can we compute free energies?

...JNK3 kinasehydration free energies

of small neutral moleculessmall apolar ligands T4 lysozyme L99A

1.23±0.01 kcal/mol [502](Mobley et al., in preparation)

1.89±0.04 kcal/mol [13](Mobley and Graves et al., JMB 2007)

polar ligands FKBP12

1.42 kcal/mol [9]0.94 kcal/mol [7]

(Shirts et al., in preparation)

1.33±0.05 kcal/mol [17](Nicholls and Mobley et al., J Med Chem)

0.6±0.2 kcal/mol [3](Mobley and Graves et al., JMB 2007)

6.3 kcal/mol [44](Haque, Chodera, Shirts, Mobley, Pande)

retrospective RMS error [sample size]prospective RMS error [sample size] (not to scale)

29

Page 48: Alchemical Free Energy Calculation With Gromacs

(Proteins to scale)

JNK3 kinaseFKBP12T4 lysozyme L99A

30

Page 49: Alchemical Free Energy Calculation With Gromacs

Complexity of problems can explain varying accuracy

JNK3 kinasehydration free energiesof small neutral molecules

small apolar ligands T4 lysozyme L99A

polar ligands FKBP12

solvent onlysmall, neutral moleculesfixed protonation states

small, rigid proteinsmall, neutral ligands

fixed protonation statesmultiple sidechain orientationsmultiple ligand binding modes

small, rigid proteinfixed protonation states

larger drug-like ligands, rotatable bonds

...large protein, multiple conformations

large drug-like ligands, rotatable bondsmultiple protonation states? tautomers?

phosphorylation and activationpeptide substrate?MgCl2 salt effects?

easyhard (not to scale)

31

Page 50: Alchemical Free Energy Calculation With Gromacs

p(x, k) = Z!1 exp[!uk(x) + gk]

Z =K!

k=1

Zk exp[gk]

The method of expanded ensembles

Form an expanded ensemble by allowing transitions between thermodynamic states:

with partition function

where we have introduced log weights gk to bias sampling of states.

Lyubartsev et al. New approach to Monte Carlo calculations of the free energy: Method of expanded ensembles. JCP 96:1776, 1992.

Specific realizations of expanded ensemble methods are more familiar:

simulated tempering Marinari and Parisi. Europhys. Lett. 19:451, 1992Mitsutake and Okamoto. Chem. Phys. Lett. 332:131, 2000.

simulated scaling Wei Yang and friends. JCP 126:024106, 2007.

exchanges among temperatures

exchanges among potential functions

Current configuration now consists of (x, k) pair.

32

Page 51: Alchemical Free Energy Calculation With Gromacs

uk(x) = !k[Uk(x) + pkV (x) + µT

k N(x)]

!k

Uk

pk

µk

pk(x) = Z!1

kexp[!uk(x)]

x

V (x)N(x)

Zk =

!dx exp[!uk(x)]

The reduced potential

Define the reduced potential for a state k as a combination of terms

with thermodynamic parameters for each state

inverse temperaturepotential energy functionexternal pressurechemical potential of exchangeable species

The distribution function is given by

wheremicrostate or configurationvolume of simulation box

number of each chemical species in system

Covers many common thermodynamic ensembles: NVT, NPT, µVT, µPT33

Page 52: Alchemical Free Energy Calculation With Gromacs

0 100 200 300 400

Simulation Time (ps)

0

5

10

15

20

Alc

hem

ical S

tate

jnk.aff-57

Expanded ensemble methods allow multiple ligand binding modes to be accessed in a single simulation

p(x|k) p(k|x)MD or MC

MC statechange attempt

X

one iteration weight/stateadaptation

...

0

5

10

15

20

25

30

0 100 200 300 400 500

fre

e e

ne

rgy (

kT

)

iteration (ps)

Monte Carlo moves allow alchemical state to change during simulation

Weights gk must be updated iteratively during simulation to ensure all states are visited a sufficient number of times

Imran Haque

JNK3 kinase

34

Page 53: Alchemical Free Energy Calculation With Gromacs

; OPTIONS FOR EXPANDED ENSEMBLE SIMULATIONS; Free energy control stuff = free-energy = decouple ; decouple or annihilate electrostatics and Lennard-Jonesnstfep = 50 ; 0.1 ps between weight updates (must be integer multiple of nstlist)nstdgdl = 50 ; 0.1 ps between writing energies (must be same as nstdgdl for analysis scripts)

; weight update schemelambda-mc = gibbs-wang-landau ; Wang-Landau with waste recyclingmc-wldelta = 0.25 ; initial delta factor for Wang-Landau (in kT)mc-wlscale = 0.5 ; scalar by which delta is scaled for Wang-Landaumc-nratio = 0.2 ; flatness criterion -- histograms are reset after all states are sampled within mc-nratio factor of the mean

; state transition probabilitymove-mc = metropolized-gibbs ; Metropolized Gibbs for fastest mixing of states

; starting and stoppingmc-nstart = 0 ; number of updates to perform per state for driving through each statemc-nequil = 500000 ; number of steps before freezing weights (1 ns)

init-lambda = 1 ; initial state

; schedule for switching off lambdas; first, restraints are turned on as charges are switched off; next, vdw and torsions are switched offfep-lambda = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.0 0.00 0.0 0.00 0.0 0.0 ; for global scaling (don't need)coul-lambda = 0.0 0.1 0.2 0.3 0.5 0.7 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.00 1.0 1.00 1.0 1.00 1.0 1.0 ; for scaling electrostaticsrestraint-lambda = 0.0 0.1 0.2 0.3 0.5 0.7 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.00 1.0 1.00 1.0 1.00 1.0 1.0 ; for scaling restraintsvdw-lambda = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.7 0.75 0.8 0.85 0.9 1.0 ; for scaling vdw interactionsbonded-lambda = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.65 0.7 0.75 0.8 0.85 0.9 1.0 ; for scaling torsions

sc-alpha = 0.5 ; soft-core factor

Expanded ensembles code considerably simplifies free energy calculations

Only need to define A and B perturbation topologies in a single topology file -- no need for specially-prepared multiple topology files!No reprocessing to compute energies at ‘foreign lambdas’.

http://www.simtk.org/home/gromacs_dg - Heavily modified gromacs 3.1.4 from Michael R. Shirts

35

Page 54: Alchemical Free Energy Calculation With Gromacs

Visit alchemistry.org for more information

36

Page 55: Alchemical Free Energy Calculation With Gromacs

M. R. Shirts, D. L. Mobley, and J. D. Chodera. "Alchemical free energy calculations: Ready for prime time?", Annual Reports in Computational Chemistry 3:41-59 (2007). (DOI, PDF)

Many thanks to

Michael R. ShirtsDavid L. MobleyImran HaqueThe gromacs developers

This talk (with expanded references and example input files) will be available on alchemistry.org

Recent review of note:

37