Parameterization of empirical schemes of partial atomic charge calculation for reproducing the...

5
ISSN 0012-5008, Doklady Chemistry, 2008, Vol. 419, Part 1, pp. 57–61. © Pleiades Publishing, Ltd., 2008. Original Russian Text © D.A. Shulga, A.A. Oliferenko, S.A. Pisarev, V.A. Palyulin, N.S. Zefirov, 2008, published in Doklady Akademii Nauk, 2008, Vol. 419, No. 2, pp. 214–218. 57 The importance and universality of electrostatic interactions in molecular modeling [1]—in molecular mechanics force fields, scoring functions, and quantita- tive structure–property and structure–activity relation- ships (QSPR/QSAR) and in calculation of the solvation energy of molecules—dictate the necessity of rapidly and adequately describing these interactions. This is often achieved by calculation of the Coulomb interac- tion energy of a system of atom-centered point charges. For adequate use of these charges in molecular model- ing, they should fit well the molecular electrostatic potential (MEP) [2–4]. The numerical value of the MEP at a given point of space is equal to the interaction energy of a unit positive charge placed at this point with the undisturbed charge density of a molecule. Quan- tum-chemical calculation of the MEP turns out to be impracticable for generating charges of a large number of new structures, for example, in virtual screening of databases or QSPR/QSAR studies, as well as in model- ing of large molecules. In this work, we suggest the use of the previously developed charge schemes capable of rapid generation of charges [5, 6], which (after parameter optimization) adequately reproduce the calculated quantum-chemical (HF/6-31G*) MEP of different organic structures. These charge schemes are compared with known charge schemes (MK-ESP [7], RESP [8, 9], MMFF94 [10], AM1-BCC [11], Mulliken population analysis) in the quality of the description of the MEP and the dipole moment components. Two previously developed empirical charge schemes are based on the principle of partial electrone- gativity equalization over bonds and differ from each other in the number of parameters and the degree of detail of the representation of a chemical structure [5, 6]. In the molecular graph (MG) model, the electroneg- ativities of covalently bonded atoms are equalized, whereas, in the orbital graph (OG) model, the elec- tronegativities of valence orbitals are equalized, which makes it possible to describe subtler effects of the charge distribution in a structure, but by means of a larger number of parameters. The empirical schemes based on the MG and OG models have some qualitative advantages: high calculation speed, dependence of atomic charges on their chemical environment, topo- logical charge symmetry, and use of information on the connectivity of atoms without invoking their spatial coordinates. Preliminary studies demonstrated that the MEP can be reproduced correctly by empirical charge schemes after the refinement of their parameters. In optimization of the MG and OG model parameters for reproducing the MEP, we used an approach similar to that used for reproducing the RESP charges [12]. As the training set, a set of 227 different organic compounds was prepared. The set contains representatives of basic classes of organic compounds: alcohols, amines, thiols, carboxyl- ic acids, halo derivatives, amides, nitro and nitroso compounds, aminoalcohols, thioamides, hydroxycar- boxylic acids, enols, diazines, phosphorus and sulfur compounds in different oxidation states, and others. For each structure after geometry optimization, the MEP on a grid of points [7] (with a density of three points per Å 2 ) located on four surfaces (formed by the spheres with radii obtained by scaling the van der Waals radii of atoms by factors of 1.4, 1.6, 1.8, and 2.0) was calculated using the quantum-chemical wave function. The total number of grid points around all molecules of the set at which the MEP was calculated was 471632. All quantum-chemical calculations were performed at the HF/6-31G* level of theory with the PC GAMESS software [13]. This combination of the approximation level and basis set makes it possible to directly use the calculated charges in molecular modeling with the use Parameterization of Empirical Schemes of Partial Atomic Charge Calculation for Reproducing the Molecular Electrostatic Potential D. A. Shulga, A. A. Oliferenko, S. A. Pisarev, V. A. Palyulin, and Academician N. S. Zefirov Received October 26, 2007 DOI: 10.1134/S001250080803004X Moscow State University, Vorob’evy gory, Moscow, 119992 Russia Institute of Physiologically Active Compounds, Russian Academy of Sciences, Chernogolovka, Moscow oblast, 142432 Russia CHEMISTRY

Transcript of Parameterization of empirical schemes of partial atomic charge calculation for reproducing the...

Page 1: Parameterization of empirical schemes of partial atomic charge calculation for reproducing the molecular electrostatic potential

ISSN 0012-5008, Doklady Chemistry, 2008, Vol. 419, Part 1, pp. 57–61. © Pleiades Publishing, Ltd., 2008.Original Russian Text © D.A. Shulga, A.A. Oliferenko, S.A. Pisarev, V.A. Palyulin, N.S. Zefirov, 2008, published in Doklady Akademii Nauk, 2008, Vol. 419, No. 2, pp. 214–218.

57

The importance and universality of electrostaticinteractions in molecular modeling [1]—in molecularmechanics force fields, scoring functions, and quantita-tive structure–property and structure–activity relation-ships (QSPR/QSAR) and in calculation of the solvationenergy of molecules—dictate the necessity of rapidlyand adequately describing these interactions. This isoften achieved by calculation of the Coulomb interac-tion energy of a system of atom-centered point charges.For adequate use of these charges in molecular model-ing, they should fit well the molecular electrostaticpotential (MEP) [2–4]. The numerical value of theMEP at a given point of space is equal to the interactionenergy of a unit positive charge placed at this point withthe undisturbed charge density of a molecule. Quan-tum-chemical calculation of the MEP turns out to beimpracticable for generating charges of a large numberof new structures, for example, in virtual screening ofdatabases or QSPR/QSAR studies, as well as in model-ing of large molecules.

In this work, we suggest the use of the previouslydeveloped charge schemes capable of rapid generationof charges [5, 6], which (after parameter optimization)adequately reproduce the calculated quantum-chemical(HF/6-31G*) MEP of different organic structures.These charge schemes are compared with knowncharge schemes (MK-ESP [7], RESP [8, 9], MMFF94[10], AM1-BCC [11], Mulliken population analysis) inthe quality of the description of the MEP and the dipolemoment components.

Two previously developed empirical chargeschemes are based on the principle of partial electrone-gativity equalization over bonds and differ from each

other in the number of parameters and the degree ofdetail of the representation of a chemical structure [5,6]. In the molecular graph (MG) model, the electroneg-ativities of covalently bonded atoms are equalized,whereas, in the orbital graph (OG) model, the elec-tronegativities of valence orbitals are equalized, whichmakes it possible to describe subtler effects of thecharge distribution in a structure, but by means of alarger number of parameters. The empirical schemesbased on the MG and OG models have some qualitativeadvantages: high calculation speed, dependence ofatomic charges on their chemical environment, topo-logical charge symmetry, and use of information on theconnectivity of atoms without invoking their spatialcoordinates.

Preliminary studies demonstrated that the MEP canbe reproduced correctly by empirical charge schemesafter the refinement of their parameters. In optimizationof the MG and OG model parameters for reproducingthe MEP, we used an approach similar to that used forreproducing the RESP charges [12]. As the training set,a set of 227 different organic compounds was prepared.The set contains representatives of basic classes oforganic compounds: alcohols, amines, thiols, carboxyl-ic acids, halo derivatives, amides, nitro and nitrosocompounds, aminoalcohols, thioamides, hydroxycar-boxylic acids, enols, diazines, phosphorus and sulfurcompounds in different oxidation states, and others.For each structure after geometry optimization, theMEP on a grid of points [7] (with a density of threepoints per Å

2

) located on four surfaces (formed by thespheres with radii obtained by scaling the van der Waalsradii of atoms by factors of 1.4, 1.6, 1.8, and 2.0) wascalculated using the quantum-chemical wave function.The total number of grid points around all molecules ofthe set at which the MEP was calculated was 471632.All quantum-chemical calculations were performed atthe HF/6-31G* level of theory with the PC GAMESSsoftware [13]. This combination of the approximationlevel and basis set makes it possible to directly use thecalculated charges in molecular modeling with the use

Parameterization of Empirical Schemes of Partial Atomic Charge Calculation for Reproducing

the Molecular Electrostatic Potential

D. A. Shulga, A. A. Oliferenko, S. A. Pisarev, V. A. Palyulin, and

Academician

N. S. Zefirov

Received October 26, 2007

DOI:

10.1134/S001250080803004X

Moscow State University, Vorob’evy gory, Moscow, 119992 RussiaInstitute of Physiologically Active Compounds, Russian Academy of Sciences, Chernogolovka,Moscow oblast, 142432 Russia

CHEMISTRY

Page 2: Parameterization of empirical schemes of partial atomic charge calculation for reproducing the molecular electrostatic potential

58

DOKLADY CHEMISTRY

Vol. 419

Part 1

2008

SHULGA et al.

of classical molecular mechanics force fields, in partic-ular, AMBER and MMFF94. To optimize the MG andOG scheme parameters, we used the target function

(1)

To assess the final quality of the reproduction of theMEP for a set of structures, we used the target function

(2)

(3)

where

N

is the number of structures in the set;

N

grid,

i

is the number of grid points at which the MEPwas calculated by a quantum-chemical method andcompared to the classical potential for the

i

th structure;

V

ij

is the quantum-chemical electrostatic potential at the

j

th grid point for the

i

th structure;

N

at,

i

is the number ofatoms in the

i

th molecule;

r

jl

is the distance between the

l

th atom and the

j

th grid point;

q

il

is the charge at the

l

thatom in the

i

th molecule (in the MG and OG schemes,this charge depends on the electronegativity vector

c

and the hardness vector

h

for the types of atoms used);

D

i

is the root-mean-square error of the description ofthe quantum-chemical MEP on the grid around the

i

thstructure; and

D

V

is the root-mean-square error ofsimultaneous description of the MEP for the entire setof structures. For minimization of target function (1),the simplex annealing method [14] was used, which is

F c h …, ,( ) Vij

qil

r jl

-----l 1=

Nat, i

∑–⎝ ⎠⎜ ⎟⎛ ⎞

2

,j 1=

Ngrid, i

∑i 1=

N

∑=

qil qil c h …, ,( ).=

DV1N---- Di

2

i 1=

N

∑1/2

,=

Di1

Ngrid, i-------------- Vij

qil

r jl

-----l 1=

Nat, i

∑–⎝ ⎠⎜ ⎟⎛ ⎞

2

j

Ngrid, i

∑1/2

,=

a hybrid of stochastic global and local optimizations ofparameters.

Due to a considerable linear dependence of parame-ters in the MG and OG models, their optimization iscomplicated and requires correcting the course of theprocess. In this work, we used the same classification ofatoms in both schemes, which was originally suggestedfor the OG method [5], namely, the classification basedon the valence state of a chemical element.

The quality of reproduction of the calculated MEPby the charges obtained by different methods is charac-terized by standard deviation (2) for the entire set ofstructures (Table 1). More detailed information isextracted from the distributions of the root-mean-square error of the MEP for each structure

D

i

(3) overthe entire set. In so doing, it is convenient to approxi-mate the actual error distribution by normalized normal(Gaussian) distribution curves since their parameterscan be compared (Table 1).

Let us consider reproduction of the MEP by chargesof the known schemes. The figure shows the MEP root-mean-square error distributions approximated with theuse of charges of the above schemes. The MK-ESP andRESP charges fit best the quantum-chemical MEP foreach structure and, thus, provide the lower limit of theerror of description of the MEP with the use of atom-centered charges. The MMFF94 charges are calculatedfrom the table of bond-charge increments. Their totalnumber is proportional to the squared number of theatomic types used. The error of the MEP approximatedwith the use of MMFF94 charges is intermediatebetween the errors of the MEPs obtained with the useof the RESP and Mulliken charges. The AM1-BCCcharge model is analogous to the MMFF94 model withthe difference that the initial charges are generated bythe semiempirical AM1 method after structure optimi-zation and then corrected using charge increments. An

Table 1.

Reproduction of the quantum-chemical (6-31G*) MEP by charges of different schemes, kJ/mol

Charge scheme

D

V

D

i

, min

D

i

, max

μ σ

MK-ESP 6.76 1.31 14.07 6.19 2.72

RESP 7.26 1.69 14.53 6.71 2.77

AM1-BCC 11.36 3.86 28.62 10.73 3.74

MMFF94 14.51 4.34 39.56 13.59 5.11

Mulliken analysis 21.47 2.05 45.90 19.77 8.40

MG (optimal) 14.79 4.47 43.02 13.99 4.79

OG (optimal) 11.95 2.59 34.18 11.25 3.54

MG (original) 30.25 3.91 74.53 27.89 11.74

OG (original) 31.85 2.24 69.81 28.94 13.32

Note: Measurement units are related to unit proton charge.

D

V

is the value of target function (2);

μ

and

σ

are parameters of the normalizednormal distribution that approximates the root-mean-square error distribution

D

i

(3) for the structures of the set. The calculated MEPof the entire set is in the range from –253.30 to +231.86 and is approximated well by the Gaussian distribution with the parameters

μ

= 0.63 and

σ

= 46.24; here and in Tables 2 and 3 and in the figure, optimal are the optimized parameters and original are the param-eters suggested in [5, 6].

Page 3: Parameterization of empirical schemes of partial atomic charge calculation for reproducing the molecular electrostatic potential

DOKLADY CHEMISTRY

Vol. 419

Part 1

2008

PARAMETERIZATION OF EMPIRICAL SCHEMES 59

advantage of this method is automatic generation ofqualitatively correct initial charges for conjugated, aro-matic, and formally charged structures. The error of theMEP based on AM1-BCC charges is considerablysmaller than for the MMFF94 charges. Disadvantagesof the AM1-BCC method are the necessity of definingthe initial structure and performing a semiempiricalcalculation, as well as the violation of the topologicalsymmetry of the resulting charges since they are calcu-lated based on the geometry of one of the possible con-formers found by AM1 optimization. Molecular model-ing in the framework of classical force fields, whichinvolves active study of the conformational space (con-formational search, molecular dynamics, the MonteCarlo method), with the use of topologically unsym-metrical charges can lead to incorrect results. The Mul-liken charges were calculated with the use of the samewave functions as those used for calculation of theMEP; however, the accuracy of reproduction of theMEP is low.

The figure also shows the root-mean-square errordistributions for MEP reproduction with the charges ofthe MG and OG schemes with preset parameters [5](dashed lines). The reproduction of the MEP by the MGand OG charge schemes after optimization becomesnoticeably better, which is seen in the figure (thick solidcurves correspond to the found parameters). This isaccompanied by a decrease in both the mean error

μ

and the scatter around the mean value

σ

(Table 1). Withthe use of optimized effective electronegativities and

hardnesses, both methods give charges better fitting theMEP than the Mulliken charges; the OG model betterreproduces the calculated potential than the MG model.This is evidence that the description of the charge dis-tribution in a molecule with the use of the orbital repre-sentation of a structure, which underlies the OG model,is actually better than with the minimal MG model,which uses only one parameter for each atom, the effec-tive electronegativity of an atomic type. In the OGmodel, the number of parameters per atomic type variesfrom three to seven depending on the number ofvalence orbitals of a given atom. However, in bothcases, the number of effective parameters of the meth-ods is proportional to the number of atomic types, asdistinct from the methods based on bond increments(MMFF94 and AM1-BCC), where the number ofparameters increases as the squared number of atomictypes used. After parameter optimization, the MEPerror distribution estimated from the charges of the MGor OG model becomes very close to the MEP error dis-tribution based on MMFF94 or AM1-BCC charges,respectively (figure). With the MG and OG models, thisresult was obtained with the use of tens of parameters,whereas the bond-increment methods required hun-dreds of parameters.

As a test of the applicability of the resulting charges,we used the reproduction of dipole moments calculatedby quantum-chemical method. The dipole momentcomponents were calculated for all structures of the setfrom the charges obtained by the MG and OG methods

0.02

100 20 30 40 50

0

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Probability density

MEP error, kJ/mol

MK-ESP

RESP

AM1-BCC

MMFF94

Mulliken

MG (original)MG (optimal)OG (original)OG (optimal)

analysis

Distribution of the root-mean-square error in reproduction of the HF/6-31G* potential by different charges over the set of structures.Measurement units are related to unit proton charge. Optimal and original denote the optimized parameters and those suggested in[5, 6], respectively.

Page 4: Parameterization of empirical schemes of partial atomic charge calculation for reproducing the molecular electrostatic potential

60

DOKLADY CHEMISTRY

Vol. 419

Part 1

2008

SHULGA et al.

and compared with the results of the above methods.This test can be considered rather independent since thedipole moment components were not included in opti-mization of the entire set. The results are presented inTables 2 and 3, which show the root-mean-square andunsigned mean errors of reproduction of quantum-chemically calculated dipole moment components. Thesequence of the charge schemes as applied to the qual-ity of reproduction of dipole moments is the same as thesequence based on the MEP. Analogously, the repro-duction of the dipole moment by the MG and OGcharge schemes is considerably improved after param-eter optimization.

Thus, our findings show that the empirical MG andOG charge schemes based on the electronegativityequalization principle can be parameterized for directreproduction of the MEP, an important characteristic ofthe applicability of charges in molecular modeling. Thereproduction of the MEP with the charges obtained bythe MG and OG schemes with optimized parameters is

comparable with that ensured by the MMFF94 charges,and the OG scheme is close in this parameter to theAM1-BCC method. At the same time, the MG and OGschemes require considerably lower computationalresources. The MG and OG methods automatically (incontrast to the RESP scheme) generate topologicallysymmetric values, which makes them candidates foruse as convenient and fast means of calculation ofatomic charges for molecular modeling.

REFERENCES

1. Hobza, P. and Zahradnik, R.,

Chem. Rev.,

1988, vol. 88,pp. 871–897.

2. Politzer, P. and Murray, J.S.,

Theor. Chem. Accounts

,2002, vol. 108, pp. 134–142.

3. Ruiz, J., Lopez, M., Mila, J., et al.,

J. Comput. Aided.Mol. Design,

1993, vol. 7, pp. 183–198.

4. Hernandez, B., Luque, F.J., and Orozco, M.,

J. Comput.Aided. Mol. Design

, 2000, vol. 14, pp. 329–339.

Table 2.

Characteristics of the error of reproduction of the dipole moment projection onto the

x

axis (3.33564 · 10

30

C m)calculated by a quantum-chemical method (6-31G*)

Charge scheme Standard deviation Unsigned mean error Minimal value Maximal value

MK-ESP 0.06 0.04 –0.23 0.16

RESP 0.09 0.06 –0.60 0.31

AM1-BCC 0.36 0.25 –1.04 1.15

MMFF94 0.49 0.36 –1.65 1.57

Mulliken analysis 0.86 0.65 –2.45 2.82

MG (optimal) 0.43 0.29 –1.91 1.52

OG (optimal) 0.34 0.22 –1.29 1.27

MG (original) 1.37 0.95 –5.26 4.65

OG (original) 1.55 1.05 –4.90 6.02

Note: The calculated dipole moment projection onto the

x

axis for the entire set is in the range from –7.20 to +5.54, the standard deviation(from zero) is 2.14, and the unsigned mean error is 1.62.

Table 3.

Characteristics of the error of reproduction of the dipole moment magnitude (3.33564 · 10

30

C m) calculated bya quantum-chemical method (6-31G*)

Charge scheme Standard deviation Unsigned mean error Minimal value Maximal value

MK-ESP 0.05 0.04 –0.15 0.24

RESP 0.10 0.07 –0.18 0.56

AM1-BCC 0.42 0.30 –1.67 1.50

MMFF94 0.58 0.43 –1.89 2.21

Mulliken analysis 0.96 0.78 –2.02 3.49

MG (optimal) 0.54 0.39 –1.74 2.28

OG (optimal) 0.43 0.29 –1.32 1.73

MG (original) 1.24 0.96 –6.02 1.53

OG (original) 1.62 1.22 –5.61 4.41

Note: The calculated dipole moment magnitude for the entire set is in the range 0.00–7.23, the standard deviation (from zero) is 1.56,and the unsigned mean error is 1.26.

Page 5: Parameterization of empirical schemes of partial atomic charge calculation for reproducing the molecular electrostatic potential

DOKLADY CHEMISTRY

Vol. 419

Part 1 2008

PARAMETERIZATION OF EMPIRICAL SCHEMES 61

5. Oliferenko, A.A., Palyulin, V.A., Pisarev, S.A.,Neiman, A.V., and Zefirov, N.S., J. Phys. Org. Chem.,2001, vol. 14, pp. 355–369.

6. Oliferenko, A.A., Palyulin, V.A., and Zefirov, N.S., Dokl.Chem. 1999, vol. 368, nos. 1–3, pp. 209–212 [Dokl.Akad. Nauk, 1999, vol. 368, no. 1, pp. 63–67].

7. Singh, U.C. and Kollman, P.A., J. Comput. Chem., 1984,vol. 5, pp. 129–145.

8. Bayly, C.I., Cieplak, P., Cornell, W.D., and Koll-man, P.A., J. Phys. Chem., 1993, vol. 97, pp. 10269–10280.

9. Cornell, W.D., Cieplak, P., Bayly, C.I., and Koll-man, P.A., J. Am. Chem. Soc., 1993, vol. 115, pp. 9620–9631.

10. Halgren, T.A., J. Comput. Chem., 1996, vol. 17, pp. 490–519.

11. Jakalian, A., Jack, D.B., and Bayly, C.I., J. Comput.Chem., 2002, vol. 23, pp. 1623–1641.

12. Shul’ga, D.A., Oliferenko, A.A., Pisarev, S.A., Paly-ulin, V.A., and Zefirov, N.S., Dokl. Chem. 2006,vol. 408, part, 1, pp. 76–79 [Dokl. Akad. Nauk, 2006,vol. 408, no. 3, pp. 340–343].

13. Granovsky A.A. PS GAMESS Vers. 7.0, http://clas-sic.chem.msu.su/gran/gamess/index.html

14. Press, W.H., Flannery, B.P., Teukolsky, S.A., and Vetter-ling, W.T., in Numerical Recipes in C: The Art of Scien-tific Computing, 2nd ed., Cambridge: Cambridge Univ.Press, 1992.