Equalized electronegativity and topological indices: Application...
Transcript of Equalized electronegativity and topological indices: Application...
Indian Journal of Chemi stry Vol. 42A, June 2003, pp. 1436- 1441
Equalized electronegativity and topological indices: Application for modeling toxicity of nitrobenzene derivatives
Padmakar V Khadikar l .*, Istvan Lukovits2, Vijay K.AgrawaI 3
, Shachi Shrivastava3' , Mona Jaiswal4,
lvan Gutman5 ,Sneha Karmarkar ' & Anjali Shrivastava4
I Research Divi sion, Laxmi Fumigat ion & Pest Control (P), 3,Khatipura, Indore 452 007, India *E-mail: pvkhad ikar@red iffmail.com
2Chemical Research Center. Hungarian Academy of Sc iences, H- 1525 Budapest, P. O. Box 17. Hungary 3QSAR & Computer C hemical Laboratories, A.P.S. Uni vers ity, Rewa 486 003, Indi a
4Department of Chemi stry, Govt. Model and Autonomous Ho lkar Coll ege, Indore 452 00 I, Indi a 'Faculty of Science, Un iversi ty
of Krag ujevac. P. O. Box 60, YU-34000 Kragujevac, Serbia & Montenegro
Received 6 JanUal) 1 2003
Use of eq uali zed e lectronegati vity CX:"q) in modeling tox icity of nitrobenzene is discussed. Resul ts show that more reliable models can be obtained when Xcq is combined wi th topological indices. The results are di scussed by using multiple regression and cross-validation procedures.
An important field of predictive toxicology IS the development of quantitative structure - toxicity relationships (QSTR) for modeling tOXICity of chemicalsl.2.QSTRs can be used for the preliminary evaluation of the potential hazards of chemical compounds. The number of such mathematical models is increasing and there are numerous applications. For QSTR analysis the toxicological data and pertinent structure-descriptors of the chemicals are used.
Recently the toxicity of solubl e organ ic ions has been investigated by using topological indices3
-8
.
Other applications of topological indices in QSAR/QSTR studies have also been reported9
•1O
. A topological index is a numerical descriptor of molecular structure and is sensitive to such key constitutional features as size, shape, symmetry, branching, and heterogenity of the molecule t' •
12.
Of course, for modeling toxicity of chemical one may also employ molecular descriptors other than topological indices9
.1O
. The present work investigates the equalized electronegativity (Xeq)13.1 4 for modeling toxicity of nitrobenzene derivatives (Table 1). Index Xeq has already been used for modeling physiological activi ty tS-'8. As discussed below, we have observed that Xeq can also be used for modeling toxicity and that reasonably good results are obtained when Xeq is combined with topological indices, including an
indicator parameter. The topological indices used in combinati on with Xeq are the Wiener '9 (W), hyperWiener20
-26 (WW), Balaban27 (1), first-order valence
connectiv ity indices28 (IXV), and 10gRB indices29 . The indicator parameter, lP, is equal to 1 if a halogen is present, and is equal to zero if th ~re is no halogen substituent. The results indicate that Xeq is the most important parameter.
Methodology Dataset
The data for 43 nitrobenzene derivatives , with experimental toxicity against Tetrahymena pyriformis, have been reported in the literature.3D
•31 The toxicities.
are the 50% growth inhibitory concentrations of nitrobenzenes agains t the Tetrahymena pyriformis strain GL-C, g iven as logIGCso-1
• Detai ls concerning the experiment are given in the work of Cronin et al.30
The experimental values of 10gIGCso-' are listed in Table 1.
Equalized electronegativity (Xeq) The charge conservation equation leads to the
following expression for equalized electronegativity (Xeq): 13. 14
Xeq = N/'iv/x (1)
where N=Iv is the total number of atoms 111 the molecule; v denotes the number of atoms of a
KHADIKAR et aL.: EQUALIZED ELECTRONEGATIVITY AND TOPOLOGICAL INDICES 1437
particular element of the molecule, and X is the electronegativity of that particular atom.
Topological indices All topological indices used (W, WW, J, lXV, and
log RB) were calculated from hydrogen-depleted graphs. Their definitions and methods of calculation are described elsewhere. 19-29 Within the present investigation they were calculated using a computer program developed by one of us (LL.) .
Indicator parameter (I P) It is a dummy parameter used to indicate the
presence (IP=l) of a halogen atom in the respective nitrobenzene derivative. If no halogen atom is present, then IP=O.
Statistics Regression analysis was carried out using a
computer program, adapted to P.c. by one of us (LL.). In order to obtain appropriate models, we used the maximum R2 method31,32. In addition, we also calculated the quality factor33
, Q as the ratio of correlation coefficient (R) and the standard error of estimation (Sc) i,e, Q=RISe .Finally, the crossvalidation method was used to establish the predictive potential of our models.
Results and Discussion A preliminary analysis revealed that out of the
original set of 43 nitrobenzene derivatives3o.34, four compounds (Comp nos. 2,24,27,43 in Table 1) are serious outl iers. They were, therefore, deleted from the present study and we considered only 39 compounds (Table 1). The toxicity of these compounds and the value of indicator parameter are also given in Table 1. In addition, values of indices Xeq, W, WW, J, IXv. and 10gRB are listed in Table 2.
The intercorrelation of the aforementioned topological indices and their relationship with toxici ty (\ogIGCso-l
) is presentd in Table 3.
A perusal of Table 3 shows that Xeq is the most suitable molecular descriptor for modeling toxicity of the compounds studied, and that other molecular descriptors i.e. topological indices (W, WW, J, and logRB) have similar potential. On the other hand, the first-order valence connectivity (IXV) does not correlate with the activity.
The data presented in Table 3 show that W, WW, and logRB are highly correlated with each other. A
Table I-Set of 39 compounds used in the present study.
No. Compo Log IP IGCso
· 1
I. 2,6-Dimethylnitrobenzene 0.30 0 2. 2,3- Dimethylnitrobenzene 0.56 0 3. 2-Methyl-3-chloronitrobenzene 0.68 0 4. 2-Methylnitrobenzene 0.052 0 5. 2-Chloronitrobenzene 0.68 0 6. 2-Methyl-5-chloronitrobenzene 0.82 0 7. 2,4,5-Trichloronitrobenzene 1.53 0 8. 2,5-Dichloronitrobenzene 1.13 0 9. 6-Chloro- I,3-dinitrobenzene 1.98 I 10. Nitrobenzene 0.14 0 II. 3-Methylnitrobenzene 0.054 0 12. 1,3-Dinitrobenzene 0.89 0 13. 3,4-Dich loron i trobenzene 1.16 0 14. 4-Methylnitrobenzene 0.17 0 15. 1,4-Dinitrobenzene 1.30 0 16. 4-Chloronitrobenzene 0.43 0 17. 2,3,5 ,6-Tetrachloroni trobenzene 1.82 0 18. 6-Methyl-I ,3-dinitrobenzene 0.87 0 19. 3-Chloronitrobenzene 0.73 0 20. 1,2- Dinitrobenzene 1.25 0 2 1. 2-Bromonitrobenzene 0.75 0 22. 6-Bromo- I,3-dinitrobenzene 2.31 I 23. 4-Bromonitrobenzene 0.38 0 24. 2,4,6-Trimethylnitrobenzene 0.86 0 25. 2,4-Dichloroni trobenzene 0.99 0 26. 3,5-Dichloronitrobenzene 1.13 0 27. 6-lodo-I,3-dinitrobenzene 2.12 1 28. 2,3,4,5-Tetrachloroni trobenzene 1.78 0 29. 2,3-Dichloronitrobenzene 1.07 0 30. 2,5-Dibromonitrobenzene 1.37 0 31. I ,2-Dichloro-4,5- dinitrobenzene 2.21 I 32. 3-Methyl-4-bromonitrobenzene 1.1 6 0 33. 2,3,4-Trichloronitrobenzene 1.51 0 34. 2,4,6-Trichloronitrobenzene 1.43 0 35. 4,6-Dichloro- I,2- dinitrobenzene 2.42 I 36. 3,5-Dinitrobenzyl alcohol 0.53 0 37. 3,4-Dinitrobenzyl alcohol 1.09 0 38. 2,4,6-Trichlorol ,3- dinitrobenzene 2.19 39. 2,3,5,6-Tetrachloro-I ,4- 2.74
dinitrobenzene
comparatively smaller colinearity is observed for the J index.
I -Parameter model We obtained fairly significant equations for all six
parameters (Xeq , W, WW, J, !ogRB and IP), accounting for toxicity (logIGCso-I
), but the best results were obtained with Xeq :
10g(lGCso·l) = -11.7495 + 4.9989(± 0.4374) Xeq
(2)
n=39, Se= 0.334 1, r=0.8823, F=130.631 , Q=2.6408
1438
Comp.No.
I. 2. 3. 4.
5. 6.
7.
8. 9. 10.
II. 12. 13.
14.
15. 16.
17. 18.
19.
20.
21.
22 .
23.
24.
25. 26.
27. 28.
29. 30.
3 1.
32.
33. 34.
35.
36.
37. 38. 39.
logIGC lso
W
WW J logRB IX"
Xcq
IP
w
144
146
146 114
11 4 148
187
148
240
88 117
197
152 120
206
120 222
240
11 7
188 114
240
120
184
150 ISO 240 226
146 148
283
84
185 184
278
296
293
332 390
1.0000 0.7463
0.7253 0.7484
0.75 14 0 .2435
0.8828
0.7687
B
5.1259
5.1259
5.1259
4.7152 4.7152
5.1090
5.5 197 5.109 1
6.0 197
4.3045
4.6984
5.6091
5.1091 4.6984
5.6091
4.6984 5.9473
6.0 197 4.6984
5.6259
4.7 152
6.0197 4.6984
5.5197 5.109 1
5.0922 6.0197 5.9473
5.1259
5.1091
6.4304
4.1984
5.5366
5.5197 6.4304
6.5409
6.5577
6.8579 7.2855
INDIAN J CHEM, SEC. A, JUNE 2003
Table 2-Topological indices used in the present study
w
J
2.5512
2.5085 2.5085
2.3960 2.3960
2.4705
2.55 14
2.4705
2.5260 2.2284
2.3199
2.4024
2.3956
2.2599
2.295 1 2.2599
2.7557 2.5260
2.3199
2.5409
2.3960
2.5260
2.2599
2.6021 2.4372
2.4273 2.5260
2.7007 2.5085
2.4705
2.6976
2.3462
2.5842
2.6021 2.7521
2.5643
2.6048
2.8403 2.9443
1.0000 0.9970 0.8238
0.9997 0 .6620
0.7630
0.6770
Sz
224
228
228
180 180 232
292
232
360 142
186
296
240 192
3 14 192
342
360 186
278 180
360 192
286 236
236
360
350 228 232
422
144
288
286 412
437
431 494
582
logRB
46.7308 47.177 1
47. 177 1
36.5035 36.5035
47.6879 60.3356
47.6879
76.8953
27.6625
37.2374
62.8613
48.4988 37.8252
64.4304 37 .8252
72.6831
76.8953 37.2374
60.9518
36.5035
76.8953
37.8252
59.6662 48.0526
48 .1987
76.8953 73.494 1 47.1771
47.6879
91.5862
26.4585
59.8248
59.6662
90.5522
94.6997 93.8686
107.8003 126.4037
Table 3 - Correlation matrix
ww
1.0000 0.7794
0.9949 0.6621
0.7492
0.6741
1.0000 0.8366 0.5663
0.6902
0.5452
logRB
1.0000 0 .661 1
0.7662
0.6777
ww
296
306 306
231 231 315
417 315
576 176
245
464
337
262 521
262 486
576 245
416
23 1
576 262
402
327 324
576
508 306 3 15
674
160
408
402
647
738
727 799 963
Xeq
2.3836
2.3836 2.4655
2.4154 2.5257
2.4655 2.6630
2.5926 2.6794
2.4623
2.4 154
2.6168
2 .5926
2.4 154
2.6168 2.5258 2.7374
2.5446 2.5258
2.6168
2.5150
2.6688 2.5150
2.3607
2.5926 2.5926
2.6629 2.7374
2.5926
2.5699 2.7451
2.4570
2.6630 2.6630
2.7451
2.5798
2.5798 2.8 141 2.8866
1.0000
0.08 12
0.3479
Xecl
6.4958
6.4958
5.8738 5.5732 4.9511
5.8738 5.5523
5.2517
6.1374
4.6505
5.573 1
5.8368 5.2517
5.5731
5.8368 4.9511 5.8529
6.7595 4.95 11
5.8368 4.9511
6.1374
4.9511
7.4184
5.2517 5.25 17
6. 1374
5.8529 2.2517
2.25 17
6.4380
5.8737
5.5523
5.5523 6.4380
6.9 138 6.9 138
6.7387 7.8558
1.0000
0.6273
IP
3.3328
3.3328 3.02 17
2.9 161 2.6051 3.0158
2.8 104 2.7047
3.1045
2.4994
2.9101
2.9989 2.7048
2.9101
2.9989 2.5991 2.9220
3.4155
2.599 1
3.0049
2.6051 3.1045
2.5991 3.7435
2.7047
2.6988 3.1045
2.9220 2.7107
2.3268
3.2 102
3.0 158
2.8164
2.8104 3.2102
3.5793
3.5853 3.32 18 3.4334
1.0000
KHADIKAR el al. : EQUALIZED ELECTRONEGATIVITY AND TOPOLOGICAL INDICES 1439
Note that Xcq is an electronic parameter. The positive coefficient of Xcq in Eq. (2) indicates that electronegati vity of the substituents of the benzene ring plays an important role in toxicity . This leads to the conclusion that the toxic effect can be minimized by the attachment of a less electronegative group to the benzene ring.
Our main objective of the present investigation was to investigate the role of Xeq in explaining toxicity of the nitrobenzene derivatives. Therefore, in what follows, all the multi-parameter models examined contain Xeq.
2-Parameter model Successive regression analysis resulted into several
binary combinations of Xeq with the topological indices used. The best 2-parametric model contained
Xeq and IP:
log(IGC50·1)= -8 .6181+3 .7403 (±0.4608) Xeq+0.6394
(± 0.1469)lP . .. (3)
n=39, Se=0.2741 , r= 0.9249, F= 106.487, Q= 3.3743
Here, both parameters have positive coefficients, and therefore, with increasing value of Xcq and IP, toxicity also increases.
The positive coefficient of IP in Eq (3) indicates that halogenation of nitrobenzene derivatives causes higher toxicity.
The regression parameters and quality of model expressed by Eq (3) indicate that addition of IP significantly improves the correlation coefficient and r increases from 0.8823 to 0.9249. Also, the quality factor Q increases from 2.6408 to 3.3743. This improvement is the result of taking into account the halogenation.
3-Parameter model This best equation contains the following
independent variables, Xeq, IX' and IP:
log(lGC50·1) = -9.3597 + 3.8399 (± 0.4672) Xcq
+ 0.1672 (± 0.1467) IX' + 0.5704 (± 0.1583) IP (4)
n=39, Se= 0.2730, r=0.9277, F= 72.015 , Q= 3.3981
All regression coefficients (except the intercept) are positive. Note that the slight improvement (relative to Eq. (3» is due to the addition of the term IXV.
The first-order valence connectivity (IXV) distinguishes the degree of unsaturation and the presence of a heteroatom. Hence, the positive coefficient of IXV term in the model (3) implies that unsaturation and the presence of heteroatoms leads to increased toxicity .
4-Parameter model The best four-parametric model containing indices
Xeq, W, J, IP is as follows:
log(IGC5o·l) = -10.7796 + 3.4295 (± 0.5301) Xeq
-0.0022 (± 0.0012) + 1.3473 (± 0.4455) J + 0.6671 (± 0 .1463) IP (5)
n=39, Se=0.2504, r=0.9413 , F=66.106, Q= 3.760
The coefficients pertaining to the indices Xeq, J, and IP are positive. The negative coefficient of index W may be due to colinearity. Note also, that the Balaban index J is a highly discriminating descriptor, whose value does not substantially increase with the molecular size and the number of rings present. The positive coefficient of J is in accordance with the negative coefficient of index W.
5-Parameter model The best five-parameter equation found is
log(lGC50·1) = -12.2443 + 3.2043 (± 0.5082) Xeq
+ 0.5812 (±0.1896) W -0.0573 (± 0 .0171) WW-1.3193 (± 0.4449) 10gRB + 0.7478 (±0.1411) IP (6)
n=39, Se=0.2351 , r=0.9500, F=61.071 ,Q=4.0408
The negative signs of the coefficients of indices, WW and log RB, might be caused by colinearity.
It is worth mentioning that W, WW, and 10gRB are highly intercorrelated. Therefore, this model may suffer from the defect due to colinearity. However, this defect is not that serious because the coefficients of all these terms in the above model are appreciably higher than their respective standard deviations.
A similar problem was recently investigated by Randic26
. He stated that one should particularly be aware of a common fit in all regression analysis , when the descriptors that are highly interrelated. He further maintained that by discarding one of the
1440 INDIAN J CHEM. SEC. A. JUNE 2003
Table 4-Cross-validation parameters
Model Parameter PRESS SSY PRESS/ r2cv R2A Sprcss PSE SSY
1 Xcq 4. 1289 14.5774 0.2832 0.7168 0.3341 0.3254 CEq 12) 2 Xcq,IP 1 2.7048 16.001 5 0.1690 0.8310 0.8474 0.2741 0 .2634 CEq 13) 3 Xcq.1 X:v. IP 2.6080 16.0983 0. 1620 0.8380 0.8486 0.2730 0.2586 CEq 14) 3 Xcq.W.J.lP 2. 13 12 16.575 1 0.1 286 0.8714 0.8727 0.2504 0.2338 CEq 15) 4 Xcq.W.WW. 1. 8244 16.88 19 0 .1080 0.8919 0.8877 0.235 1 0.2 163 CEq 16) logRB,IP
descriptors that duplicates another (i n one domain) we may discard a descriptor that may carry useful structural information in another domain, in which it does not parallel with other descriptors . Bearing this in mind, we conclude that the highly correlated topological indices W, WW, and 10gRB may be retained and that their simultaneous presence (as in Eq. (6» need not be unjustified.
Predictive potential It is worth noting that regression equations with
excellent correlation coefficients need not have excellent predictive potential. Hence, one has to determine the predictive potential of the model s, too. Cross-validation is the most appropriate method for this. 31 .32
In the present case we have estimated crossvalidation parameters31 .32 for all the five proposed models and these are recorded in Table 4
PRESS (predicted residual sum squares) is a good estimate of the real predictive error of the model. If PRESS is smaller than the sum of the squares of the response values (SSY), then the model predicts better than chance and can be considered statistically significant.
A perusal of Table 4 shows that in all the five cases PRESS«SSY indicating them to be statistically significant. This provides an additional proof in favour of retaining highly correlated parameters in the proposed models.
The ratio PRESS/SSY can be used to estimate the confidence interval of the predicted toxicity.32.33 To have a reliable QSAR and QSTR model, this ratio should be smaller than 0.4. A value of this ratio less than 0.1 indicates that we have a reasonably good model. For Eq. (6) the value of PRESS/SSY is about
0.1. In other cases PRESS/SSY was found to be less than 0.4 and therefore, these equations are also reasonable QSARlQSTR models.
The overall predictive ability is obtained from the highest r 2cv (cross-val idation correlation coefficient) values. In our case r 2cv is found to be the highest for the model 5 (Eq. (6», indicating again that it has an excellent predicti ve potential.
Another cross-validation parameter used fo r deciding predictive potential is uncertainity of prediction, SPRESS. However in the present case (Table 4), SPRESS is found to be the same as the standard error of estimation (Se).
Perusal of Table 4 shows that PSE, i.e. the predictive squared error, can be used successfully fo r deciding uncertainty of the prediction. The ' PSE is found to be the lowest for the model 5 (Eq. (6) showing that this modei has excellent correlation as well as predictive potential.
At this stage it is interesting to comment upon adjustable R2 (R2
A) coefficients . R2A takes into account the adjustment of R2. R2 A is a measure of the % explained variation in the dependent variable that takes into account the relationship between the number of cases and number of independent variables in the regression model. Whereas R2 always increases when an independent variable is added, R2A will decrease if the added variable does not reduce the unexplained variation enough to offset the loss of degrees of freedom. Therefore, if a variable is added that does not contribute its fair share, the R2A will actually decline. In our case (Table 4), R2A increases with increasing number of parameters. This indicates that the new parameter(s) has a fair share in the proposed model. It should be noted that coefficient R2
KHADIKAR et al.: EQUALIZED ELECTRONEGATIVITY AND TOPOLOGICAL INDICES 1441
may appear artificially high if the number of variables is high compared to the sample size.
Hence, both R2 and R2 A indicate that the derived equations of toxicity of nitrobenzene derivatives are significant at confidence level p<O.05.
References I Karcher I N & Devillers J, Practical applications of
quantitative structure-activity relationships (QSA R) in environmelltal chemistry and toxicology, (Kluwer Academic, Dordrecht), 1990.
2 Lewis D F V, CompLller-assisted methods in the evaillatioll of chemical toxicity in Rev comp Chem, Vol. III, (VCH , New York) .
3 Khadikar P V, Phadnis A & Shrivastava A, Bioorg med Chem, 10 (2002) I 18 1.
4 Khadikar P V. Mathur K C, Sigh S, Phadni s A, Shrivastava A & Mandloi M, Bioorg med Chem, 10 (2002) 1761 .
5 Khadikar P V, Karmarkar S, Singh S & Shrivastava A. Bioorg med Chem, 10 (2002) 3163.
6 Agrawal V K & Khadikar P V, Bioorg med Chem, 10 (2002) 35 17.
7 Agrawal V K & Khadikar P V, Bioorg med Chem, 9 (2001) 3035.
8 Karmarkar S, Saxena A, Varma R G, Mathur K C, Mathur S, Singh S & Khadikar P V, Poll Res, 19 (2000) 337.
9 Karolson M, Molecular descriptors in QSARlQSPR, (Wiley, ew York) 2000.
10 Cronin M T D & Shultz J W, Chem res Toxicol, 14 (2001) 1284.
11 Trinajstic N, Chemical graph theory, (C RC, Boca Raton), 1992.
12 Guman I & Polansky 0 E, Mathematical collcepts in organic chemistry, (Springer-Verlag, Berlin), 1986.
13 Pauling L, Th e nature of the chemical bond, (Cornell Univ
Press, Itacha, New York) 1969. 14 Wells P R, Progress in physical organic chemistry, vol.6,
(Interscience, New York), 1968. 15 Agrawal V K, Joseph S, Khadikar P V & Karmarkar S, Acta
Pharm, 50 (2000) 329. 16 Agrawal V K & Khadikar P V, Oxid Commun, 25 (2002)
184. 17 Agrawal V K & Khadikar P V, Bulg chem Commun (In
press). 18 Agrawal V K, Joseph S & Khadikar P V, Nat Acad Sci
Letter, 23 (2000) 57. 19 Wiener H, JAm chem Soc, 69 ( 1947) 17. 20 Randic M, Chem Phys Lett, 211 ( 1993) 478. 2 1 Lukovits I, j chem Inf Comput Sci, 34 ( 1994) 1079. 22 Lukovits I, A formula for the Hyper- Wiener illdex in QSAR
and molecular modeling: Concepts in compLIIQtionai tools and biological applications, edited by F Sanz, J Giraldo & F Manaut (Prous Science, Barcelona) 1995.
23 Lukovits I, CompLII Chem, 19 ( 1995) 27. 24 Gutman I, Linert W, Lukovits I & Dobrynin A A, J chemlnf
CompUl Sci, 37(1997)349. 25 Lukovits I & Linert W, J chem InfComp ut Sci, 34( 1994)899. 26 Randic M, Croat chem Acta, 66 ( 1993) 289. 27 Balaban A T , Chem Phys Letl, 89 ( 1982) 399. 28 Kier L B & Hall L H, Molecular connectivity in chemistry
and drug research, (Academic Press, New York), 1976. 29 Todeschini R & Consonni V, Handbook of molecular
descriptors, (Wi ley-VCH, Weinheim), 2000. 30 Cronin M T D, Gregory B W & Shultz J W, Chem Res
Toxicol, II ( 1998) 902. 31 Box G E B, Hunter W G & Hunter J S , Statistics for
experiments, (Wi ley, New York), 1978. 32 Chatterji S, Hadi A S & Price B, Regression analysis by
examples, (Wiley, New York), 2000. 33 Pogliani L, Amino acids, 6 (1994) 141. 34 Estrada E & Uriarte E, SAR QSAR environ Res, 12 (2001 )
309.