Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and...

4

Click here to load reader

Transcript of Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and...

Page 1: Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of S N 1 rate constants by means of QSPR

ISSN 0012�5008, Doklady Chemistry, 2011, Vol. 441, Part 1, pp. 314–317. © Pleiades Publishing, Ltd., 2011.Original Russian Text © A.A. Kravtsov, P. . Karpov, I.I. Baskin, V.A. Palyulin, N.S. Zefirov, 2011, published in Doklady Akademii Nauk, 2011, Vol. 441, No. 1, pp. 57–60.

314

The nucleophilic substitution reactions constitutethe best studied and important class of organic reac�tions. The ratio of the products of such reactionsdepends on the competition between the mono� andbimolecular mechanisms. Therefore, prediction of thepreferable reaction mechanism and a priori evaluationof corresponding reaction rates are important tasks.Models for prediction of the nucleophilic substitutionrate constants have been constructed for many reac�tion series. However, each of these models considersthe effect of different parameters of reagents belongingto the same class of compounds. Such models showhigh correlation coefficients but cannot be thought ofas universal [1, 2]. Therefore, it is necessary to con�struct a unified model that could allow one to ade�quately calculate the nucleophilic substitution rateconstants, no matter what class of compounds thereagents belong to, and to predict the mechanism ofsuch reactions (which has remained beyond the scopeof machine learning problems). Previously [3], wehave successfully applied the multicomponent QSPRmethod to predict the rate constants of SN2 reactions.The aims of the present work are to create a classifica�tion model for determining the preferable mechanismof nucleophilic substitution reactions (SN1 or SN2) andto construct a universal model for predicting the SN1nucleophilic substitution rate constants.

To solve these problems, we have employed boththe database used in [3] and containing a mass ofinformation on SN2 reactions (the structures of nucle�ofuges and electrophilic moieties of substrates, thestructures and solvatochromic parameters of solvents,reaction temperatures, and reaction rate constants)and the database containing information on 3901 SN1reactions [5]. Hereinafter, the reactions that occur in

individual solvents rather than in a mixture of severalsolvents have been used for constructing the models.The database contains 1661 such reactions of the SN1type and 1924 such reactions of the SN2 type.

We have constructed a classification model(SN1/SN2) with the use of the support vector machinemethod [6], which allows one to solve classificationproblems by constructing a hyperplane that maximallyseparates the points belonging to different classes. Tofind the best hyperplane, the input data space wasmapped into a higher�dimensional space. The optimalcharacteristics were obtained with the use of theC�SVC algorithm implemented in libSVM software[7], with a radial basis function as nonlinear kernel.The input parameters for reaction classification werethe degree of substitution of the reaction site of theelectrophile (primary, secondary, tertiary, or quater�nary type; the existence of a double bond in the nearestenvironment; involvement of the reaction site in acyclic fragment), as well as the dielectric constant,polarity, polarizability, total acidity and basicity of asolvent [4], and reaction temperature. These data werecombined into descriptor set 1.

To improve the classification accuracy, the solva�tion free energies (SFEs) for substrate/solvent pairswere calculated as additional descriptors with the useof fragmental descriptors as the input data [8]. Theresulting SFEs were combined with descriptor set 1into set 2.

The classification accuracy was determined by thecross�validation procedure. The validation set con�tained information on every third reaction. Altogether,3585 reactions were considered (1661 SN1 and1924 SN2 reactions). Figure 1 shows the plot of theclassification accuracy versus the inner model param�eters (C and γ) for both sets of descriptors. The bestaccuracy was achieved with the use of set 2 (~98% atlnC = 4–5 and lnγ = 4); without inclusion of the SFE

CHEMISTRY

Prediction of the Preferable Mechanism of Nucleophilic Substitution at Saturated Carbon Atom and Prognosis

of SN1 Rate Constants by Means of QSPRA. A. Kravtsov, P. V. Karpov, I. I. Baskin, V. A. Palyulin, and Academician N. S. Zefirov

Received May 31, 2011

DOI: 10.1134/S0012500811110048

Moscow State University, Moscow, 119991 Russia

Page 2: Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of S N 1 rate constants by means of QSPR

DOKLADY CHEMISTRY Vol. 441 Part 1 2011

PREDICTION OF THE PREFERABLE MECHANISM 315

values, the best accuracy was 90% (6 < lnC < 8, 4 <lnγ < 5).

For prediction of rate constants, the reagent struc�tures were described using the FRAGMENT [9, 10]and LFA [3, 8] fragmental descriptors. They were cal�culated for the substrate molecules as a whole and sep�arately for the nucleofuge and electrophile. Toenhance the model quality, we have also calculateddescriptors based on partial atomic charges [11, 12]and the Fukui indices [13]. The resulting descriptorvector was augmented with solvatochromic parame�ters and Palm parameters for the solvent used in thereaction and temperatures. Then, the vector dimen�sion was reduced by selecting significant descriptorswith the fast stepwise linear regression procedure built

into the Nasawin program package [14]. A total of135 descriptors were selected.

Structure–property relationships were determinedby artificial neural networks, a mathematical algo�rithm that allows one to study nonlinear dependenceswith an a priori unknown character of the influence ofthe input data on the output ones. We chose three�layer neural network architecture with eight neurons inthe hidden layer. To construct the model, three reac�tion sets were used: the training set (1329 reactions),the validation set (166 reactions), and the set for inde�pendent prediction (166 reactions). The resultingmodels are characterized by a rather high correlationcoefficient, R2 = 0.96. The statistical quality of theconstructed models was confirmed by the doublecross�validation procedure. The mean and best R2 val�ues for different sets, q2 values, and root�mean�squareerrors RMSE are listed in Table 1. A typical scatter dia�gram of the predicted and experimental values isshown in Fig. 2.

Although the constructed models provide accept�able prediction accuracy for the vast majority of reac�tions, there are significant discrepancies between thepredicted and experimental rate constants in somecases. Table 2 summarizes ten reactions for which thedifference between the predicted and experimentalrate constants is maximal. These reactions allow us toelucidate the reasons for such deviations. Three stron�gest outliers are related to substitution reactions instructures with the norbornane skeleton (reactions 1–3),which is presumably caused by the fact that the modelignores the spatial structure of molecules, as well as by

2 4 6 8 10 12 14 −4−2

02

4

lnγlnC

100

96

92

88

84

2

1

Acc

urac

y, %

Fig. 1. Accuracy of classification of the nucleophilic sub�stitution mechanism as a function of the inner C�SVMparameters for different sets of descriptors 1 and 2.

2

−8

−10 0logk (experiment)

0

−2

−4

−6

−2−4−6−8

log

k (p

redi

ctio

n)

TraininValidationTest

Fig. 2. Scatter plot of the predicted and experimental values of SN1 reaction rate constants.

Page 3: Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of S N 1 rate constants by means of QSPR

316

DOKLADY CHEMISTRY Vol. 441 Part 1 2011

KRAVTSOV et al.

the known peculiarities of such structures. Low pre�diction accuracy is typical of the structures containingnucleofuges rarely encountered in the database, suchas thiocyanate or [Me2SCl]– complex anion, or elec�trophiles unique for the database, such asMeSCH2CH2. Although the –CH2CH2SС– fragmentis repeatedly encountered in the database, almost allcompounds containing this fragment are phenyl thio�ethers. Another important source of errors is inaccura�cies in the experimental data used for constructing the

model. In particular, for identical reactions 5 and 8carried out under identical conditions, differentresearchers have reported rate constants that differ bythree orders of magnitude. For reactions 7 and 10, theexperimental reaction rate constant of benzyl chlorideis about one order of magnitude lower than in theother analogous entries of the database in spite of thefact that no special reaction conditions were created.It is worth noting that, in some cases, strong deviationsof the predicted rate constants from the known exper�

Table 1. Statistical parameters of the models for prediction of SN1 reaction rates

Set Number of reactionsR2

q2 RMSEbest mean

Training 1329 0.967 0.920 – 0.34

Validation 166 0.839 0.773 0.77 0.58

Test 166 0.874 0.748 0.75 0.61

Table 2. Examples of reactions with maximal rate constant prediction errors (dashed lines symbolize a fragment–nucleo�fuge bond))

Reac�tion Electrophile Nucleofuge Temperature,

°C Solventlogk

Errorexperiment prediction

1 Cl 170 MeOH –3.35 –0.54 2.81

2 Br 50 H2O –5.38 –2.63 2.75

3 4�Me–PhSO3 60 MeOH –7.52 –4.8 2.72

4 (3�Cl–Ph)CHPh SCN 70 Me2CO –6.5 –4.45 2.05

5 Cl 0 MeOH –3.18 –5.17 1.99

6 t�Bu SMe2Cl 50.4 MeCOOH –5.08 –6.89 1.81

7 Bn Cl 84 H2O –4.73 –3.17 1.56

8 Cl 0 MeOH –6.73 –5.17 1.56

9 MeSCH2CH2 Cl 50 H2O –0.85 –2.33 1.48

10 Bn Cl 92 H2O –4.42 –3.06 1.36

N

BrHO

HO

Page 4: Prediction of the preferable mechanism of nucleophilic substitution at saturated carbon atom and prognosis of S N 1 rate constants by means of QSPR

DOKLADY CHEMISTRY Vol. 441 Part 1 2011

PREDICTION OF THE PREFERABLE MECHANISM 317

imental ones could be a ground for further refinementof available experimental data.

Thus, we have created a classification model thatallows one to determine, with high accuracy, the typeof the nucleophilic substitution reaction at the satu�rated carbon atom (SN1 or SN2) carried out in a speci�fied system. The quality of this model largely dependson the use of the solvation free energies for sub�strate/solvent pairs; the prediction accuracy for SFEshas been discussed in our previous paper [8].

The model suggested to predict the SN1 reactionrate constants has a satisfactory predictive ability andis most universal among the available models.

REFERENCES1. Halberstam, N.M., Baskin, I.I., Palyulin, V.A., and

Zefirov, N.S., Mendeleev Commun., 2002, vol. 12, no. 5,pp. 185–186.

2. Hiob, R. and Karelson, M., Comput. Chem., 2002,vol. 26, pp. 237–243.

3. Kravtsov, A.A., Karpov, P.V., Baskin, I.I., Palyulin, V.A.,and Zefirov, N.S., Dokl. Chem., 2011, vol. 440, part 2,pp. 770–772.

4. Palm, V.A., Osnovy kolichestvennoi teorii organicheskikhreaktsii (Fundamentals of Quantitative Theory ofOrganic Reactions), Leningrad: Khimiya, 1977.

5. Itogi Nauki Tekh., Ser. Obshch. Vopr. Org. Khim., 1977,vol. 3, part 1.

6. Vapnik, V., The Nature of Statistical Learning Theory,Berlin: Springer, 1995.

7. Chang, C.C. and Lin, C.�J., LIBSVM: a Library forSupport Vector Machines, 2001. http://www.csie.ntu.edu.tw/~cjlin/libsvm.

8. Kravtsov, A.A., Karpov, P.V., Baskin, I.I., Palyulin, V.A.,and Zefirov, N.S., Dokl. Chem., 2007, vol. 414, part 1,pp. 128–131.

9. Artemenko, N.V., Baskin, I.I., Palyulin, V.A., andZefirov, N.S., Dokl. Chem., 2001, vol. 381, nos. 1–3,pp. 317–320.

10. Zefirov, N.S. and Palyulin, V.A., J. Chem. Inf. Comput.Sci., 2002, vol. 42, pp. 1112–1122.

11. Mortier, W.J., Genechten, K.V., and Gasteiger, J.,J. Am. Chem. Soc., 1985, vol. 107, pp. 829–835.

12. Sanderson, R.T., J. Am. Chem. Soc., 1983, vol. 105,pp. 2259–2261.

13. Fukui, K., Yonezawa, T., and Shingu, H., J. Chem.Phys., 1952, vol. 20, pp. 722–725.

14. Baskin, I.I., Halberstam, N.M., Artemenko, N.V.,Palyulin, V.A., and Zefirov, N.S., in EuroQSAR 2002Designing Drugs and Crop Protectants: Processes, Prob�lems and Solutions, Ford, M. et al., Eds., Melbourne:Blackwell, 2003, pp. 260–263.