QSAR study of some CCR5 antagonists as anti-HIV agents ... · QSAR study of some CCR5 antagonists...
Embed Size (px)
Transcript of QSAR study of some CCR5 antagonists as anti-HIV agents ... · QSAR study of some CCR5 antagonists...

ORIGINAL RESEARCH
QSAR study of some CCR5 antagonists as anti-HIV agents usingradial basis function neural network and general regressionneural network on the basis of principal components
Mohsen Shahlaei • Armin Madadkar-Sobhani •
Afshin Fassihi • Lotfollah Saghaie • Elham Arkan
Received: 2 August 2011 / Accepted: 31 October 2011 / Published online: 16 November 2011
� Springer Science+Business Media, LLC 2011
Abstract Quantitative relationships between molecular
structures and bioactivities of a set of CCR5 inhibitor
derivatives were discovered. We have demonstrated the
detailed application of two efficient nonlinear methods,
general regression and radial basis function neural networks,
for evaluation of quantitative structure–activity relationships
of the studied compounds. Components produced by prin-
cipal component analysis were used as input of the devel-
oped nonlinear models. Comparison between predictability
of PC-GRNNs and PC-RBFNNs indicated that later method
has higher ability to predict the activity of the studied mol-
ecules. In order to design novel derivatives of inhibitors with
high activity and low side effects, and because experimental
and calculated activities of molecules employed in the model
development step, shown a good correlation, developed
PC-RBFNNs QSAR model was used to calculate inhibitory
activities of some suggested compounds.
Keywords QSAR � CCR5 antagonists �Principal components analysis �Radial basis function neural networks �General regression neural networks
Introduction
Acquired immunodeficiency syndrome (AIDS) is a fatal
disease for which no complete and successful chemotherapy
has been suggested so far. Human immunodeficiency virus
subtype 1 (HIV-1), a retrovirus of the lentivirus family, has
been found to be common in causing this disorder. HIV-1
generates a progressive immunosuppression disorder by
destruction of CD4?T lymphocytes, helper cells, which
attack directly against infections, and results in opportunistic
infections and death (Campiani et al., 2002). To decrease
HIV-1 replication, existing medications use a combination
of protease and reverse transcriptase inhibitors (Pomerantz,
1999). While suppression of viral replication through ther-
apy with protease and reverse transcriptase inhibitors delays
emergence of AIDS, the virus is not eliminated and the
immune system finally succumbs to infection (Chun and
Fauci, 1999; Finzi et al., 1999; Furtado et al., 1999).
In 1996, it was proved that infection of macrophages,
monocytes, and T cells by HIV-1 is mediated by interac-
tion with, as well as the cell surface molecule CD4, the
b-chemokine receptor CCR5 belonging to the G-protein
coupled receptors family (Dragic et al., 1996).
This finding opened a severe research attempts resulted
in development of CCR5 antagonists as potential anti-HIV
beneficial compounds.
M. Shahlaei (&)
Department of Medicinal Chemistry, Faculty of Pharmacy,
Kermanshah University of Medical Sciences, Kermanshah, Iran
e-mail: [email protected]
M. Shahlaei � A. Fassihi � L. Saghaie
Department of Medicinal Chemistry, Faculty of Pharmacy,
Isfahan University of Medical Sciences, 81746-73461 Isfahan,
Iran
e-mail: [email protected]
A. Madadkar-Sobhani
Department of Life Sciences, Barcelona Supercomputing Center,
C\ Jordi Girona 31, Edificio Nexus II, 08028 Barcelona, Spain
A. Madadkar-Sobhani
Department of Bioinformatics, Institute of Biochemistry and
Biophysics, University of Tehran, Tehran, Iran
E. Arkan
Department of Medical Nanotechnology, School of Advance
Medical Technologies, Tehran University of Medical Sciences,
Tehran, Iran
123
Med Chem Res (2012) 21:3246–3262
DOI 10.1007/s00044-011-9863-2
MEDICINALCHEMISTRYRESEARCH

In addition to its function in the development of HIV
infection, many studies show different roles for CCR5 and
its ligands in disorders, such as rheumatoid arthritis (Pipi-
tone and Pitzalis, 2000), multiple sclerosis (Sellebjerg
et al., 2000), transplant rejection (Fischereder et al., 2001),
and inflammatory bowel syndrome (Andres et al., 2000).
These studies propose that CCR5 receptor modulators
would have potential benefits in a wide variety of disorders.
In medical sciences field, many researchers are continu-
ously searching for new drug-like compounds having high
potency to treat AIDS. Using computer-aided drug-design
methods, such as quantitative structure–activity relationships
(QSARs), development of such ligands may be properly
performed and accelerated. Drug discovery process is often
faced with situations that a ligand should be discovered for a
target protein for which no experimentally verified structure is
yet available. The most well-known examples are possibly the
GPCRs, which play a key role in many physiological and
pathophysiological procedures. Today, 50% of all newly
launched drugs are targeted against GPCRs (Klabunde and
Hessler, 2002). Hence, designing an approach to suggest new
agonists and/or antagonists of GPCRs has a high degree of
importance. Due to the lack of knowledge about the 3D
structure of CCR5 and the way its antagonists interact with the
binding site, one of the best ways to recognize the initial lead
compounds is ligand-based method. Pharmacophore identi-
fication, structure–activity relationships (SARs), and QSARs
are some examples of ligand-based methods.
Over one hundred years after Fischer’s proposal of the
lock-and-key analogy (Fischer, 1894) and about half cen-
tury after the reports of Hansch, Fujita, Free, and Wilson
(Free and Wilson, 1964; Hansch and Fujita 1964) QSARs
have established as an extensively used approach, signifi-
cantly contributing to the drug-design procedure. QSAR
studies provide medicinal chemists valuable information
that is useful for drug design and prediction of drug activity
(Hansch et al., 1996, 2001; Schmidli, 1997). They are
frequently applied to develop a correlation between various
physicochemical properties of potential drug candidates
including quantum, geometrical, topological, and their
binding affinity towards a common biological target.
Several data mining methodologies have been devel-
oped to evaluate quantitative relationships between struc-
ture and activity. These relationships might express a linear
regression model or may provide to the researcher non-
linear regression ones. Multiple linear regression (MLR),
partial least squares (PLS), and principal component
regression (PCR) are most common linear regression
methods (Arkan et al., 2010; Saghaie et al., 1111; Saghaie
et al., 2010; Shahlaei et al., 2010b). Artificial neural net-
works (ANNs) and support vector machine (SVM) are
examples of nonlinear regression approaches to build up
mathematical relationships between selected descriptors
and activity of the compounds (Arkan et al., 2010; Saghaie
et al., 2010; Shahlaei et al., 2010a, b).
Nonlinear regression methods are often employed to
model the SARs because of the complexity of such rela-
tionships in many bioactive molecules. The development of
these methods also opened up the field to the simultaneous
analysis of a wider diversity of structures with potentially
varying modes of action and noncongeneric compounds
(Pompe et al., 1997).
Artificial systems emulate function of the brain, where a
very high number of information-processing neurons are
interconnected. These systems are known for their capability
to model a broad set of functions, including linear and
nonlinear, without knowing the analytic forms in advance.
The mathematical flexibility of ANNs praises them as an
efficient method for pattern recognition and regression and
constructing predictive models. A particular benefit of
ANNs is their intrinsic ability to show nonlinear relation-
ships between the dependent and independent variables
without using an explicit mathematical equation(s).
Although there are a number of different neural networks
models, the most frequently used type of neural network in
QSAR is the feed-forward back-propagation network.
This algorithm has some disadvantages such as being
caught in local minima during learning phase, very poor
convergence rate, time-consuming procedures, and diffi-
culty in explicit optimum network configuration (Walczak
and Massart, 2000).
The radial basis function neural networks (RBFNNs) let
us construct regression model between independent and
dependent variables using a fast linear approach. RBFNNs
have advantages of short training time and reaching to the
optimal unique solution by attaining the global minimum
of error surface during training of network. The topology
and parameters of developed RBFNNs are straightforward
to optimize (Specht, 1991; Tetteh et al., 1998).
The generalized regression neural network (GRNN) is a
special kind of normalized radial basis function networks,
where the sigmoid activation functions often used in neural
networks are replaced by radial basis functions (Chtioui
et al., 1999).
Compared with the back-propagation neural networks,
the architecture of the GRNNs, namely the numbers of
layers and units, is defined by the numbers of objects and
predictor in the training dataset. In this algorithm, the only
one training parameter to be optimized is the width (spread)
of radial basis functions. Due to the straightforwardness of
the network structure and its implementation, it has been
extensively used in many fields (Celikoglu, 2006; Klocker
et al., 2002; Mosier and Jurs, 2002; Niwa, 2003).
Thanks to the modern softwares and hardwares, numerous
descriptors (variables in QSAR) can be easily obtained in a
short time. To extract the most relevant and significant
Med Chem Res (2012) 21:3246–3262 3247
123

information from such raw data, using of multivariate
regression methods and also data compression and/or vari-
able selection is necessary. Multivariate regression methods
are used to develop a quantitative relation, i.e., a model,
between the calculated descriptors, stored in a data matrix, X,
and the activities of compounds, stored in a data matrix,
called Y. One of the most common problems in multivariate
regression methods is multicollinearity between variables.
Multicollinearity occurs when ratio of descriptors to con-
sidered compounds is large. This situation makes the
developed model unstable. So reduction of dimensionality of
original descriptors is necessary. One of the most common
ways to reduce the number of variables without missing
useful information is principal component analysis (PCA).
PCA is an approach suitable for overcoming the insta-
bility in developed model related to multicollinear
descriptors. PCA is used to compress a pool of descriptors
into principal components (PCs) as new variables. PCA
approach assumes that despite the numerous number of
descriptors the QSAR model is governed by a compara-
tively small number of latent variables namely PCs.
In the present study two different algorithms of ANNs
are exploited to correlate the anti-HIV activity of 48 drug-
like molecules with the PCs extracted from calculated
structural descriptors. These algorithms are (i) RBFNNs
and (ii) GRNNs.
We are going to use the obtained information from
QSAR to design novel drug-like compounds in this study.
Here, key goal was to model potentially active anti-HIV
molecules by using the two nonlinear models namely
PC-RBFNNs and PC-GRNNs.
Methods
Descriptor generation and assigning calibration and test
sets
The in vitro biological activity data used in this study were
CCR5 inhibitory (in terms of log IC50), of a set of 48
compounds selected from literature (Dorn et al., 2001;
Finke et al., 2001). General chemical structures and the
structural details of these compounds as well as biological
activity data are given in Table 1.
In order to calculate the theoretical molecular descriptors,
molecular structures were built by Hyperchem version 7.0 and
were optimized using the AM1 algorithm. A gradient cutoff of
0.01 was used for all geometry optimizations. The resulted
geometries were transferred into Dragon program (developed
by Milano Chemometrics and QSAR Group) (Todeschini
et al., 2002). It was used for calculation of a large number of
descriptors including 16 different groups. In the computation
of descriptors with dragon, descriptors with constant value for
all compounds were eliminated. The name and number of
calculated descriptors are reported in the Table 2. In addition
to these groups of descriptor, a number of quantum-electronic
descriptors, such as frontier orbital energies (HOMO and
LUMO), most positive charge, lowest negative charge
and indices of electronegativity, electrophilicity, hardness,
and softness were calculated according to the method pro-
posed by Thanikaivelan et al. (2000). In addition, some different
descriptors, such as Log P, surface area, and polarizability
were calculated by Hyperchem for each molecule.
In order to test the final model performances, about 20%
of the molecules (10 out of 48) were selected as external
test set molecules (Table 1). The best situation of this stage
of model building is dividing dataset to guarantee that both
training and test sets individually cover the total space
occupied by original data set. Then ideal splitting of data
set as each of objects in test set is close to at least one of the
objects in the training set. Various methods were used for
splitting the original data set to the training and test sets.
According to Tropsha the best models would be built when
Kennard and Stone algorithm was used (Tropsha et al.,
2003). So this algorithm was applied in this study (Kennard
and Stone, 1969). This method has some advantages: the
training set molecules map the measured region of the
input variable space completely with respect to the induced
metric. The other advantage is that the test molecules all
fall inside the measured region.
All calculations were performed in the MATLAB
(version 7.1, MathWorks, Inc.) environment.
Model development
PCA was used to compress a pool of descriptors into PCs
as new variables. In the PCA, the first step is data pre-
processing on the calculated descriptors by mean centering
and autoscaling. Suppose Xi,j be the column mean-centered
and autoscaled matrix of descriptors for i samples and
j descriptors, and yi,1 the vector of the activities (pIC50).
After generation of PCs using matrix Xi,j, a new matrix,
containing scores of PCs, is created. Then, we use these
scores as new variables for regression. Scores as new
variables possess two interesting properties:
(i) These new variables are sorted as the information
content (variance) that they explain decreases from the
first PC to the last one. As a result, the last PCs can be
deleted, because they do not have useful information.
(ii) PCs are orthogonal, resulting to solving correlation
problem that exist in the pool of descriptors.
As a matter of fact, PCR is MLR using the scores matrix
as new variables for model building.
The using of calculated PCs as input variables for
GRNNs is referred as PC-GRNNs. For selecting optimum
3248 Med Chem Res (2012) 21:3246–3262
123

Table 1 General structures and details of the compounds used in this study
N
Cl
Cl
S
Me
OO
R2
N
S
On
Compound Structure
n R2
1 2 2-Thienyl
2 1 NMe2
3 1 Benzyl
4 1 Methyl
5 1 n-Octyl
6 1 Cyclopentyl
7 1 Cyclohexyl
8 1 2-Cl-Phenyl
9 1 3-Cl-Phenyl
10 1 4-Cl-Phenyl
11 1 4-MeO-Phenyl
12 1 4-Ph-Phenyl
13 1 Naphth-1-yl
14 1 Naphth-2-yl
15 1 Indan-5-yl
16 1 Pyridin-3-yl
17 1 Quinolin-8-yl
18 1 Quinolin-3-yl
19 1 1-Me-Imidazol-4-
yl
20 2 3-NO2-Phenyl
21 2 4-NO2-Phenyl
22 0 Phenyl
23 1 Phenyl
24a 2 Phenyl
25 1 2-Thienyl
Med Chem Res (2012) 21:3246–3262 3249
123

Table 1 continued
N
N
S
Me
O O
Z
Y
X
Cl
Compound X Y–Z
26a – –SCH2–
27 – –S(O)CH2–
28a – –S(O)2CH2–
29a – –CH2CH2–
30 – –NHCH2–
31 – –C(O)CH2–
32a – –C(O)NH–
33 – –C(O)N(Me)–
34a – –C(O)NHCH2–
35 – –NHC(O)CH2–
36 – –CH(OH)CH2–
37 –CH2– –O–
N
N
S
Me
O OY
Cl
R
Compound R Y
38a C6H5– –CH–
39 C6H5– –N–
40a 2-MeC6H4– –N–
41 2-MeC6H4– –CH–
42 2-MeOC6H4– –CH–
43 3-CF3C6H4– –CH–
44a 4-ClC6H4– –CH–
45 4-FC6H4– –CH–
46 C6H5CH2– –CH–
47 C6H5CH2CH2– –CH–
48 C6H5CH2CH2CH2– –CH–
a Molecules selected as test set
3250 Med Chem Res (2012) 21:3246–3262
123

number of PCs to use as input of model, root mean square
error of cross validation (RMSECV) was used. The main
advantage of GRNNs is that it does not require iterative
learning (Esbensen et al., 1994). It is an interesting prop-
erty that makes the method very attractive for model
building and hence GRNNs is much faster than the well-
known back-propagation neural network (Specht, 1991).
A GRNN consists of four neuron layers: input, pattern
(radial basis), summation, and output layers that are shown
Fig. 1 schematically. Input layer receives the input X block
and distributes it to the neurons in the next layer for pro-
cessing that is radial basis layer or pattern layer. Thus, the
number of neurons in the first layer is equal to the number
of columns, n, in the input X block. Each neuron in the
pattern layer will then generate an intermediate output by
using of Gaussian radial basis function. The duty of neu-
rons in the subsequent summation layer is performing
summation of outputs of pattern layer. This layer consists
of two types of neurons, numerator and denominator neu-
rons. Number of numerator neurons is equal to number of
elements of output y vector. Duty of these types of neurons
is computation of weighted sum of the outputs from the
previous layer. Another neuron in this layer, the denomi-
nator neuron, has a different operation. Duty of this single
neuron is simple summation of outputs of pervious layer.
The neurons in the output layer will then carry out divi-
sions of the sums computed by neurons in the summation
layer.
GRNN is able to approximate any relationship between
X block and its y vector. The operation of approximation of
y vector by network was performed during training of
network. After figuring out the relationship between input
and output of networks, this relationship was used for
computation of the output of networks. In the GRNNs
model, approximation of an output vector with respect to
an X block can be regarded as finding the expected value of
y conditional upon the X block.
The predicted value of the y vector yðxÞ is the most
probable value, E [y/x], which is determined by following
equation.
yðxÞ ¼ E½x=y� ¼Rþ1�1 y:f ðX; yÞdyRþ1�1 f ðX; yÞdy
where y is the output value estimated by GRNNs, X is the
input vector for the estimation of y, and f (X, y) is the joint
probability density function of X and y that can be
calculated by Parzen’s nonparametric estimator.
Substituting Parzen’s nonparametric estimator for joint
probability density function and calculating the
integrations leads to the following equation of GRNNs:
yðX; yÞ ¼Pn
i¼1 yi exp�D2
i
2r2
h i
Pni¼1 exp
�D2i
2r2
h i
where r is spread of network (smoothing parameter) and
D2i = (xi - x)T - (xi - x). Training the GRNNs involves
finding the optimal values for the r parameters in above
equation.
In another ANNs model building, The combination of
PCs of X matrix as input of radial basis function neural
networks is referred as PC-RBFNNs.
In PC-RBFNNs same with PC-GRNNs, the original data
matrix is reduced to an orthogonal PC and their scores are
used as inputs for RBFNNs. In PC-RBFNNs model, the use
Table 2 Calculated theoretical groups of descriptors used in this
study and the number of descriptors remained after removing constant
descriptors
No. Type of descriptors No. of descriptors
1 2D autocorrelations 98
2 3D-MoRSE descriptors 160
3 Aromaticity indices 4
4 Atom-centered fragments 38
5 BCUT 64
6 Charge descriptors 14
7 Constitutional 36
8 Functional groups 18
9 Galvez topological charge indices 21
10 Geometrical descriptors 36
11 GETAWAY 196
12 Molecular walk counts 19
13 Randic molecular profiles 31
14 RDF descriptors 138
15 Topological 214
16 WHIM descriptors 93
17 Quantum chemical 16
18 Chemical descriptors 6
19 All calculated descriptors 1156
Fig. 1 General GRNNs construct
Med Chem Res (2012) 21:3246–3262 3251
123

of scores instead of original descriptors reduces input
nodes, so training time of the network is shortened. Also,
noisy information and random error in the original data will
be excluded. So using PCs generates a more accurate
RBFNNs model. Here number of independent variables is
very large, so data reduction is very necessary.
The main advantage of radial basis function neural
networks is that it does not require iterative learning. It is
an interesting property that makes the method very
attractive for model building and hence RBFNNs is very
faster than the well-known back-propagation neural net-
work. The complete explanation behind the theory of radial
basis function neural networks is adequately described
elsewhere (Shi et al., 2006; Yao et al., 2004). Here only a
brief description of this type of neural network is presented.
RBFNNs include three layers: input layer, hidden layer,
and output layer as presented in Fig. 2 schematically. Each
neuron in each layer is fully connected to the next layer but
there is not any connection between neuron in a given
layer. No processing occurs on the input information in the
input layer and the duty of this layer is only distribution of
input to the hidden layer. In the hidden layer of RBFNNs
there are a number of radial basis function units (nh) and
bias (bk). Each hidden layer unit represents a single radial
basis function, with associated center position and width.
In the hidden layer each neuron applies a radial basis
function as nonlinear transfer function to operate on the
input information coming from the previous layer. The
most often used RBF is Gaussian function that is charac-
terized by a center (cj) and width (rj). By measuring the
Euclidean distance between input vector (x) and the radial
basis function center (cj) the RBF function performs the
nonlinear transformation using following equation in the
hidden layer:
hjðxÞ ¼ expð� x� cj
��
��2=r2
j Þ
where hj is the notation for the output of the jth RBF unit.
For the jth RBF, cj and rj are the center and width,
respectively. The operation of the output layer is linear,
which is given in:
ykðxÞ ¼Xnh
j¼1
wkjhjðxÞ þ bk
where yk is the kth output unit for the input vector x, wkj is
the weight connection between the kth output unit and the
jth hidden layer unit, and bk is the bias.
In order to optimize RBFNNs, centers, number of hidden
layer units, width, and weights should be selected. Random
subset selection, K-means clustering, orthogonal least-
squares learning algorithm, and RBF-PLS are various ways
for choosing the centers. The same widths of the radial basis
function networks for all the units or different widths for
each unit could be selected for optimizing RBFNNs.
In this article, Gaussian functions with a constant width,
which was the same for all units were selected. Using
training set molecules the centers were optimized by for-
ward subset selection routine. After the selection of opti-
mum values of centers and width of radial basis functions
the connection weight between hidden layer and output
layer was adjusted using a least-squares solution.
The overall performance of RBFNNs is evaluated in
terms of root mean square error cross validation
(RMSECV) according to the following equation:
RMSECV ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPns
i¼1 ðyk � ykÞ2
ns
s
where yk is the experimental value of biological activity, yk
is the output predicted activity of network calculated by
cross validation. ns is the number of compounds in the
analyzed set.
Model validation, predictability, and robustness
of model
To demonstrate that the resulted models have good pre-
dictability and reliability, some different methods of eval-
uation of model performance have been used. Here, R2,
which presents the explained variance for a given set, was
used to determine the goodness of model’s fit performance.
In addition, the prediction performance of the generated
models must be estimated in order to build a successful
QSAR model. In this study, we evaluated the prediction
performance of developed models using the root mean
square error (RMSE).
Cross validation is a technique used to explore the
reliability of statistical models. Root mean square errorFig. 2 The architecture of PC-RBFNNs
3252 Med Chem Res (2012) 21:3246–3262
123

cross validation (RMSECV) as a standard index to measure
the accuracy of a modeling method which is based on the
cross validation technique and R2LOO as another criterion of
predictability of developed models were applied.
According to Tropsha High R2LOO does not routinely
mean a high predictability of the developed model
(Tropsha et al., 2003). Thus, the high value of R2LOO is the
necessary but not the sufficient condition for the developed
model to have a high predictability. In addition to a high
R2LOO, a reliable model should also be characterized by a
high R2 between the calculated and experimental values of
compounds from a test set (Afantitis et al., 2006).
Also, some criteria by Tropsha were suggested, if these
criteria were satisfied then it can be said that the model is
predictive (Tropsha et al., 2003). These criteria include:
R2LOO [ 0:5
R2 [ 0:6
R2 � R20
R2\0:1
R2 � R020
R2\0:1
0:85\k\1:15 or 0:85\k0\1:15
R2 is the correlation coefficient of regression between the
predicted and observed activities of compounds in training
and test set. R20 is the correlation coefficients for regres-
sions between predicted versus observed activities through
the origin, R020 is the correlation coefficients for regressions
between observed versus predicted activities through the
origin, and the slope of the regression lines through the
origin are assigned by k and k0, respectively. Details of
definitions of parameters, such as R20, R
020 , k, and k0 are
presented obviously in literature and are not written again
here for shortness (Tropsha et al., 2003).
Also, in addition, according to Roy and Roy (2008) the
difference between values of R20 and R
020 must be studied and
given importance. They suggested following modified R2 form
R2m ¼ R2 1�
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiR2 � R2
0
q����
����
� �
If R2m value for given model is [0.5, indicates good
external predictability of the developed model.
The actual predictability of each model developed on the
training set is confirmed on an external test set (Roy and
Roy, 2008) and is calculated from: R2p ¼ 1� PRESS=SD;
where PRESS is the sum of squared differences between the
measured activity and the predicted value for each com-
pound in the test set and SD is the sum of squared deviations
between the measured activity for each molecule in the test
set and the mean measured value of the training set.
Developed models are also tested for reliability and
robustness by Y-randomization testing: new models are
recalculated for randomly reordered response. We provided
evidence that the proposed models are well founded, and
not just the result of chance correlation, by obtaining new
models on randomized response with significantly lower R2
than the original models. If the results show high R2, it
implies that an acceptable QSAR model cannot be
obtained.
Results and discussion
At the first, a lot of descriptors (columns of X block) were
calculated for each molecule using optimized structures of
molecules.
Logarithms of the inverse of biological activity
(Log 1/IC50) data of 48 molecules were used to get the
relationship with independent variables.
Before the model development, and due to the quality of
data, a pretreatment of the original data was necessary.
Thus, autoscaling and deletion of columns with zero vari-
ance were performed.
After deleting zero variance columns of X block, PCs
analysis was performed on the pool of descriptors. PCA is a
valuable multivariate statistical approach in which new
orthogonal variables called PCs are derived as linear
combinations of the original variables. These new gener-
ated variables are sorted on the basis of information content
(i.e., explained variance of the original dataset). Priority of
PCs demonstrates their higher quota in the explained var-
iance, so most of the information is retained at the first few
PCs. A main characteristic in PCA is that the generated
PCs are uncorrelated. PCs can be used to obtain scores
which present most of the original variations in the original
data set in a smaller number of dimensions.
Among the generated PCs 16 eigenvalue ranked PCs
(eigenvalues [ 1) were selected for next model building
(Table 3). These PCs can explain more than 86.11% of the
variances in the original descriptors data matrix. Therefore,
we restricted the next model building to these 16 PCs.
To determine degree of homogeneities in the original
data set and recognize potential outliers and clusters in the
studied molecules, PCA was performed within the calcu-
lated descriptors space for all the molecules. Using three
more significant PCs (eigenvalues [ 1), which explains
53.63% of the variation in the data (30.25, 15.52, and
7.86%, respectively), distribution of molecules over the
three first PCs is shown in Fig. 3. As can be seen, none of
the molecules are outlier or no cluster is observed.
After evaluating outliers and dividing the molecules into
two parts, calibration and validation sets on the basis of
Kennard and Stones algorithm, model building using cal-
ibration set was performed. PCR as linear model show poor
results. Hence nonlinear models were applied as regression
methods that are PC-RBFNNs and PC-GRNNs. Developed
Med Chem Res (2012) 21:3246–3262 3253
123

models were used to predict the activity of molecules in
test set to evaluate performance of the developed models.
PC-RBFNNs
As it was discussed above, radial basis neural network was
chosen for constructing nonlinear model. To avoid the
problem of collinearity, calculated PCs of original
descriptors were used. One of the most important factors
determining quality of generated model is number of PCs
selected for model building. If the selected number of PC is
lower than optimum number, the derived model is called
underfitted model and may not calculate true activity of
molecules. On the other hand, if too many PCs are used the
network is overfitted. Thus for initial training of network,
we chose ten hidden nodes and the spread equal to 1.0 and
these values were used for finding optimum number of
PC-RBFNNs components. This optimization was performed
by jointly analyzing RMSEC and RMSECV. As it is shown
in Fig. 4a, two PCs were selected as optimum number of
PCs. The performance of PC-RBFNNs model is signifi-
cantly influenced by parameters of networks namely the
number of radial basis functions nh and spread of networks.
With two PCs, a response surface methodology was
used to optimize nh and spread of networks. As is shown in
Fig. 5 a contour surface plot of RMSECV as a function of
nh and spread was plotted. nh was changed from 1 to 50 and
spread from 0.1 to 3 in increments of 0.1. These ranges
were selected according to the previous studies. The results
show that a PC-RBFNNs with 13 nodes in hidden layer
and spread of 2.3 resulted in the optimum network
performance.
R2CV, the ‘leave-one-out’ (LOO) cross-validated coeffi-
cient, is a practical and reliable method for testing the
predictive performance and stability of a regression model.
LOO approach involves developing a number of models
with one molecule deleted at a time. After developing each
model, the deleted data are predicted and the differences
between the experimental and predicted activity values are
Table 3 Eigenvalues of calculated PCs, % of explained variances
and cumulative variances
PC No. Eigenvalue % Variance
explained
Cumulative
variance
1 370.59 30.25 30.25
2 190.13 15.52 45.77
3 96.32 7.86 53.64
4 78.74 6.43 60.06
5 62.97 5.14 65.20
6 44.34 3.62 68.82
7 36.76 3.00 71.82
8 27.75 2.27 74.09
9 24.70 2.02 76.11
10 22.54 1.84 77.95
11 21.24 1.73 79.68
12 18.43 1.50 81.19
13 18.25 1.49 82.68
14 15.08 1.23 83.91
15 14.17 1.16 85.06
16 12.85 1.05 86.11
Fig. 3 Principal components analysis of the calculated descriptors of all molecules in data set
3254 Med Chem Res (2012) 21:3246–3262
123

calculated. R2CV values are then calculated according to the
following formula:
R2CV ¼ 1�
Pni¼1 ðyi � yiÞ2
Pni¼1 ðyi � �yÞ2
where yi is the actual experimental activity, �yi the average
actual experimental activity and yi is the predicted activity
of compound i computed by the new regression equation
obtained each time after leaving out one datum point
(No. i).
The developed model was trained using the data of
training set and it was evaluated by test compounds. For a
QSAR model, internal validation, although essential and
obligatory, does not adequately assure the predictability of
a model. In fact, we are strongly persuaded from pervious
experience that models with high apparent predictability,
emphasized only by internal validation approaches, can be
unpredictive when confirmed on new compounds not
applied in developing the model. Thus, for a stronger
assessment of model applicability for prediction on new
compounds, external validation of the generated models
should always be carried out.
In the present study, the generated models were vali-
dated externally by the additional test set.
The predicted values of inhibitory activity of the studied
compounds resulted from the optimized PC-RBFNNs
procedures are reported in Table 4, in association with
relative error of prediction (REP). The plots of predicted
activity versus experimental activity and the residuals
(predicted activity–experimental activity) versus experi-
mental activity value, obtained by the PC-RBFNNs
Fig. 4 Optimization number of PCs using root mean square error of
training set (RMSET) and root mean square error of cross validation
(RMSECV) in a PC-RBFNNs and b PC-GRNNs
Fig. 5 Optimization of number of nodes in hidden layer and spread in PC-RBFNNs using RMSECV
Med Chem Res (2012) 21:3246–3262 3255
123

Table 4 The experimental
pIC50 and the predicted values
of the data set and relative error
of prediction (REP)
Compound Activity Predicted activity
(PC-RBFNNs)
REP
(PC-RBFNNs)
Predicted activity
(PC-GRNNs)
REP
(PC-GRNNs)
1 5.921 5.889 -0.005 5.681 -0.041
2 5.569 5.792 0.040 5.382 -0.033
3 7.000 7.168 0.024 6.984 -0.002
4 6.824 7.157 0.049 6.982 0.023
5 5.745 5.888 0.025 5.981 0.041
6 6.301 6.199 -0.016 5.982 -0.051
7 7.301 7.181 -0.016 7.001 -0.041
8 6.347 6.482 0.021 6.683 0.053
9 6.000 5.843 -0.026 6.185 0.031
10 6.456 6.727 0.042 6.585 0.020
11 6.456 6.514 0.009 6.535 0.012
12 6.000 6.182 0.030 6.185 0.031
13 5.585 5.807 0.040 5.222 -0.065
14 5.469 5.262 -0.038 5.279 -0.035
15 5.229 5.455 0.043 5.082 -0.028
16 5.071 5.267 0.039 5.079 0.002
17 4.854 4.722 -0.027 4.981 0.026
18 6.000 5.821 -0.030 5.983 -0.003
19 6.000 6.106 0.018 6.079 0.013
20 6.097 6.299 0.033 5.979 -0.019
21 6.155 6.410 0.041 6.284 0.021
22 6.398 6.452 0.008 6.389 -0.001
23 6.398 6.180 -0.034 6.230 -0.026
24 6.398 6.221 -0.028 6.182 -0.034
25 5.444 5.184 -0.048 5.284 -0.029
26 6.222 6.404 0.029 5.986 -0.038
27 6.155 6.270 0.019 6.280 0.020
28 6.000 6.098 0.016 5.982 -0.003
29 6.046 6.271 0.037 6.074 0.005
30 5.921 5.800 -0.020 6.086 0.028
31 5.469 5.231 -0.043 5.166 -0.055
32 6.523 6.311 -0.032 6.775 0.039
33 5.824 5.833 0.002 5.982 0.027
34 6.222 6.334 0.018 6.385 0.026
35 5.155 4.864 -0.056 4.986 -0.033
36 4.620 4.867 0.054 4.985 0.079
37 5.398 5.635 0.044 5.287 -0.020
38 5.000 4.749 -0.050 5.089 0.018
39 6.155 6.297 0.023 6.088 -0.011
40 6.456 6.538 0.013 6.388 -0.010
41 5.921 5.702 -0.037 5.856 -0.011
42 6.000 5.987 -0.002 5.889 -0.019
43 5.699 5.401 -0.052 5.889 0.033
44 6.602 6.409 -0.029 6.485 -0.018
45 5.602 5.893 0.052 5.782 0.032
46 6.187 6.328 0.023 5.886 -0.049
47 6.222 6.323 0.016 6.386 0.026
48 7.301 7.195 -0.014 6.982 -0.044
3256 Med Chem Res (2012) 21:3246–3262
123

modeling, and the random distribution of residuals about
zero mean are shown in Figs. 6a and 7a, respectively.
Residuals both for training and test sets are distributed
normally around zero (the mean value), therefore the
nonlinear correlation between activity and selected PCs is
reliable. The plot of calculated versus experimental activity
tells the same theme, adding the information that visually
the calculated values appear to capture the experimental
values very well.
The statistical parameters and figures of merit as well as
Tropsha and Roy parameters for determining the predict-
ability of the developed model are presented for the best-
fitted model in Table 5. As presented in Table 5 the model
gave an RMSE of 0.180 for the training set and 0.216 for
the test set, and the corresponding correlation coefficient R2
of 0.901 and 0.916, respectively. Furthermore, on the basis
of criteria recommended by Tropsha and also R2m by Roy,
the obtained model is predictive.
The PC-RBFNNs was further validated by applying the
Y-randomization test. In particular, 10,000 random shuf-
fles of the Y-vector gave low R2 values. This shows that
the developed PC-RBFNN model was not obtained by
chance.
PC-GRNNs
In another model, a GRNN was employed. Same with
PC-RBFNNs, input of networks is PCs. Leave-one-out
cross validation procedure was used to choose the optimum
number of PCs for model formation. The number of PCs
that produced the least RMSECV was selected as optimum
value. A plot of RMSECV and RMSET for PC-GRNNs
model as a function of the number of factors is shown in
Fig. 4b. Based on this figure, four PCs was selected as the
optimum number. GRNN was trained to obtain the rela-
tionship between the PCs and activity of molecules. As
explained above, for the GRNNs, there is only one
parameter: ‘‘spread’’, which is the width that must be
optimized. RMSECV of the training set was applied to
choose the optimized value of spread. Figure 8 is the plot
of RMSECV versus spread and the minimum value was
chosen as the optimal value of spread, which was 1.5.
After determining the optimum value of spread, the
network was trained using training data and model was
evaluated by prediction of molecules in the test set. The
predicted activities by using this developed model are lis-
ted in Table 4 and are plotted in Fig. 6b. Same as
PC-RBFNNs, the calculated values are in good agreement
Fig. 6 Plots of predicted activity vs. actual concentration activity for
a PC-RBFNNs and b PC-GRNNsFig. 7 Scatter plots of the residuals vs. experimental activity for
a PC-RBFNNs and b PC-GRNNs
Med Chem Res (2012) 21:3246–3262 3257
123

with the experimental values. Also for investigating the
existence of any systematic error in the developed PC-
GRNNs model, the residual of calculated activity was
plotted versus experimental activity and is shown in the
Fig. 7b. As can be seen, propagation of residuals on both
sides of zero shows that there is not any systematic error in
the developed model. The figures of merit and criteria for
determining predictability of model for the GRNNs model
are reported in Table 5. In this table R2 that is a criterion of
goodness of fit was obtained for two sets and the high value
of this parameter indicates a good fit between input of
network and predicted activities values of compounds.
As a result, with respect to the developed GRNNs
model, it was found that correctly opted and trained neural
Table 5 Models and their validation and predictive ability parameters
Model R2 R2LOOCV RMSET RMSECV R2
p RMSEP PRESSR2�R2
0
R2
R2�R020
R2 k k0 R2m
PC-RBFNNs 0.901 0.881 0.180 0.213 0.916 0.215 1.244 -0.106 -0.107 0.995 1.003 0.621
PC-GRNNs 0.883 0.867 0.188 0.223 0.916 0.216 1.350 -0.132 -0.131 1.007 0.998 0.581
R,LOOCV2 = square regression coefficient for leave-one-out cross validation
RMSET = root mean square error of training set
RMSECV = root mean square error of cross validation
Rp2 = square regression coefficient for prediction set
RMSEP = root mean square error of prediction set
PRESS = predicted error sum of square for training set
k ¼P
yi yiPy2
i
k0 ¼P
yiyiPy2
i
Fig. 8 Optimization of spread using RMSECV in PC-GRNNs
Fig. 9 The overlaid 3D
structures of the some
derivatives used in this study
Fig. 10 General structure and highlighted subunits
3258 Med Chem Res (2012) 21:3246–3262
123

network could practically represent dependence of the
activity of CCR5 inhibitors on the extracted PCs from
calculated descriptors of molecules.
To evaluate the PC-GRNNs, a leave one out cross val-
idation technique, similar to that used for the PC-RBFNNs
model, was performed. The results are summarized in
Table 5.
Inspection of the results reveals the stability of the gener-
ated model. With the purpose of exhibiting that PC-GRNNs
does not consequence from happenstance, an extensively used
method to determine the model robustness is the so-called
Y-randomization. It consists of shuffling the experimental
activity in such a way that activities do not correspond to the
respective molecules. After analyzing 1,000 cases of Y-ran-
domization, R2 values obtained using this procedure were very
small when compared to the one found considering the true
calibration (R2 = 0.88). In this way, the robustness of the
developed PC-GRNNs model could be evaluated, indicating
that the regression was not a sequence of chance correlation
and therefore results in a true SAR.
As it can be seen in Table 5, the statistical parameters of
the results obtained from two studies for the same set of
compounds. The RMS errors of the PC-RBFNNs model for
the training, the test, and in the cross validation procedure
were lower than that of the model proposed in the
PC-GRNNs method. The correlation coefficient (R2) given
by the PC-RBFNNs model was higher than that of the
models in the PC-GRNNs method. From the Table 5, it can
be seen that the PC-RBFNNs model gives the highest R2
and low error values, so this model gives the most satis-
factory results, compared with the results obtained from the
PC-GRNNs method.
Suggestions of new CCR5 inhibitors
As a final point, one could dispute that how researchers can
interpret the developed models using PCs or how devel-
oped models can be applied to propose novel compounds
with improved activity. Said another way, what does the
developed models mean to medicinal chemists? As dis-
cussed above, the calculated PCs do not mean physico-
chemically, but they may be employed for building
statistical models which help the medicinal chemist limit
the number of compounds to be synthesized. For instance,
medicinal chemist can propose a training set comprised of
molecules which have the characters of two or more
chemical classes with the smallest amount of similarity.
Then he/she can use the developed models to predict the
activity of his/her proposed molecules. This practice may
lead to the introduction of biologically active molecules.
Table 6 Structures and details of the proposed molecules as novel CCR5 inhibitors
N
N
S
Me
O O
R2
Cl
R1
N
N
S
Me
O O
R2
Cl
R1
S
O
(A) (B)
Compound R1 R2 Type Activity
S1 Cl Benzyl A 7.182
S2 Cl Methyl A 7.167
S3 Cl Cyclohexyl A 7.172
S4 H Methyl A 7.174
S5 H Cyclohexyl A 7.186
S6 H Benzyl B 7.187
S7 H Methyl B 7.183
S8 H Cyclohexyl B 7.183
Med Chem Res (2012) 21:3246–3262 3259
123

In order to investigate the electronic requirements of
active CCR5 antagonist compounds, the molecular struc-
tures of all the studied derivatives were built with Hyper-
chem. Gas-phase full geometry optimization for the studied
drug-like molecules was performed. The structures were
optimized with ab initio method at the hybrid functional
B3LYP (Becke’s three-parameter (Becke, 1993)) and the
large-size basis set 6-311G**. Full optimization of all bond
lengths and angles was performed. Because the calculated
values of the electronic descriptors of the drug-like com-
pounds will be influenced by the geometry used, in the
current investigation we try to employ the most established
conformations of the studied molecules. To avoid the
caught in the local minima of geometry optimization pro-
cess, procedure was run many times with different starting
points for each compound, and in each molecule confor-
mation with the lowest energy was selected to use in next
steps of evaluation. The overlaid 3D structures of some of
the studied molecules (from the benzyl subunit) are shown
in Fig. 9. As it is seen the orientation of the sulphonyl
subunit of the molecules remains quite unchanged when
the structural pattern of the –R group or the number of
chlorine atom changes in the phenyl ring of the benzyl
subunit. To include the effects of the electronic descriptors
of the molecules on their CCR5 inhibitory activity, some
quantum chemical parameters such as local charges, most
positive charge, most negative charge, dipole moments and
HOMO and LUMO energies were calculated and applied in
the calculation of PCs. Also for evaluating SAR of com-
pounds and using of extracted information to in silico
suggestion of new CCR5 antagonists, general structure of
studied compound and highlighted subunits was shown in
Fig. 10. It is clear from Table 4 that the compounds 48, 7,
3, and 4 have highest bioactivities. With considering these
compounds, it is clear that in all of them R group in sul-
phonyl subunit is lipophile, such as phenyl (in the com-
pound 48) and cyclohexyl (in the compound 7) and also all
of these compounds have at least one withdrawing electron
atom on the phenyl ring in benzyl subunit. Also in all of
them except to 48, it is clear the beneficial binding affini-
ties of the sulfoxide moiety in pyperidine subunit, which it
confirms a hydrogen bond acceptor interaction or a simple
polar effect.
In order to design novel derivatives with high inhibition
effect of CCR5, and because experimental and computed
activities of compounds used in the model development
step, shown a good correlation, developed PC-RBFNNs
QSAR model was used to calculate inhibitory activities of
suggested compounds. Structures of novel antagonists of
CCR5 may then be suggested and bioactivities of them
could evaluate by using the generated model. Based on the
following strategy novel compounds were suggested.
Compounds owning the general structure of investigated
compounds in addition of the various substituents may
produce the novel compounds. Structures of these novel
ligands also were generated and then PCs of them were
generated. Hence, using calculated PCs and the developed
PC-RBFNs model, bioactivities of proposed ligands are
calculated.
The general structures of eight suggested compounds
and details and also their calculated activities are reported
in Table 6. The suggested compounds are combination of
the most potent compounds of Table 1 especially com-
pounds 48, 7, 3, and 4. The relative high predicted activi-
ties of suggested compounds confirm further study, such as
synthesis should be performed on such chemical structures.
Conclusion
The aim of the present study was to rationalize bioactivities
of some practically studied CCR5 antagonist compounds
through the use of developed QSAR models, in order to
finally aid design and suggestion of novel CCR5 antago-
nists and evaluation of bioactivity of these new compounds
computationally.
Two nonlinear methods, the radial basis function neural
networks and GRNNs were employed to obtain predictive
QSAR models for CCR5 inhibitory activity of a set of 48
compounds using different calculated descriptors. In both
methods, a comparison was made between obtained results
for the same set of compounds. The RMSE of the
PC-RBFNNs model for the training and the test sets, and in
the cross validation procedure were lower than RMSE of
the model developed by the PC-GRNNs method. The
correlation coefficient produced by the PC-RBFNNs model
was higher than that in the PC-GRNNs method. In order to
design novel derivatives with high inhibition effect of
CCR5, and because experimental and computed activities
of molecules used in the model development step, shown a
good correlation, developed QSAR model by PC-RBFNNs
was employed to calculate inhibitory activities of some
suggested compounds.
References
Afantitis A, Melagraki G, Sarimveis H, Koutentis PA, Markopoulos J,
Igglessi-Markopoulou O (2006) A novel QSAR model for
predicting induction of apoptosis by 4-aryl-4H-chromenes.
Bioorg Med Chem 14:6686–6694
Andres PG, Beck PL, Mizoguchi E, Mizoguchi A, Bhan AK, Dawson
T, Kuziel WA, Maeda N, MacDermott RP, Podolsky DK,
Reinecker HC (2000) Mice with a selective deletion of the CC
chemokine receptors 5 or 2 are protected from dextran sodium
sulfate-mediated colitis: Lack of CC chemokine receptor 5
3260 Med Chem Res (2012) 21:3246–3262
123

expression results in a NK11? lymphocyte-associated Th2-type
immune response in the intestine. J Immunol 164:6303–6312
Arkan E, Shahlaei M, Pourhossein A, Fakhri K, Fassihi A (2010)
Validated QSAR analysis of some diaryl substituted pyrazoles as
CCR2 inhibitors. Eur J Med Chem 45:3394–3406
Becke A (1993) Density-functional thermochemistry. III. The role of
exact exchange. J Chem Phys 98:5648–5652
Campiani G, Ramunno A, Maga G, Nacci V, Fattorusso C,
Catalanotti B, Morelli E, Novellino E (2002) Non-nucleoside
HIV-1 reverse transcriptase (RT) inhibitors: past present and
future perspectives. Curr Pharm Des 8:615–657
Celikoglu HB (2006) Application of radial basis function and
generalized regression neural networks in non-linear utility
function specification for travel mode choice modelling. Math
Comput Modell 44:640–658
Chtioui Y, Panigrahi S, Francl L (1999) A generalized regression
neural network and its application for leaf wetness prediction to
forecast plant disease. Chemometr Intell Lab 48:47–58
Chun T, Fauci A (1999) Latent reservoirs of HIV: obstacles to the
eradication of virus. Proc Natl Acad Sci USA 96:10958–10961
Dorn CP, Finke PE, Oates B, Budhu RJ, Mills SG, MacCoss M,
Malkowitz L, Springer MS, Daugherty BL, Gould SL, DeMar-
tino JA, Siciliano SJ, Carella A, Carver G, Holmes K, Danzeisen
R, Hazuda D, Kessler J, Lineberger J, Miller M, Schleif WA,
Emini EA (2001) Antagonists of the human CCR5 receptor as
anti-HIV-1 agents. Part 1: discovery and initial structure-activity
relationships for 1-amino-2-phenyl-4-(piperidin-1-yl)butanes.
Bioorg Med Chem Lett 11:259–264
Dragic T, Litwin V, Allaway GP, Martin SR, Huang Y, Nagashima
KA, Cayanan C, Maddon PJ, Koup RA, Moore JP, Paxton WA
(1996) HIV-1 entry into CD4? cells is mediated by the
chemokine receptor CC-CKR-5. Nature 381:667–673
Esbensen K, Schonkopf S, Midtgaard T (1994) Multivariate analysis
in practice. CAMO AS, Trondheim, Norway
Finke PE, Meurer LC, Oates B, Shah SK, Loebach JL, Mills SG,
MacCoss M, Castonguay L, Malkowitz L, Springer MS, Gould
SL, DeMartino JA (2001) Antagonists of the human CCR5
receptor as anti-HIV-1 agents. Part 3: a proposed pharmacophore
model for 1-[N-(methyl)-N-(phenylsulfonyl)amino]-2-(phenyl)-
4-[4-(substituted)piperidin-1-yl]butanes. Bioorg Med Chem Lett
11:2469–2473
Finzi D, Blankson J, Siliciano JD, Margolick JB, Chadwick K,
Pierson T, Smith K, Lisziewicz J, Lori F, Flexner C, Quinn TC,
Chaisson RE, Rosenberg E, Walker B, Gange S, Gallant J,
Siliciano RF (1999) Latent infection of CD4? T cells provides a
mechanism for lifelong persistence of HIV-1 even in patients on
effective combination therapy. Nat Med 5:512–517
Fischer E (1894) Einfluss der Configuration auf die Wirkung der
Enzyme Berichte der deutschen chemischen. Gesellschaft 27:
2985–2993
Fischereder M, Luckow B, Hocher B, Wthrich RP, Rothenpieler U,
Schneeberger H, Panzer U, Stahl RAK, Hauser IA, Budde K,
Neumayer HH, Kr¤mer BK, Land W, Schlndorff D (2001) CC
chemokine receptor 5 and renal-transplant survival. Lancet
357:1758–1761
Free SM Jr, Wilson J (1964) A mathematical contribution to
structure-activity. Stud J Med Chem 7:395–399
Furtado MR, Callaway DS, Phair JP, Kunstman KJ, Stanton JL,
Macken CA, Perelson AS, Wolinsky SM (1999) Persistence of
HIV-1 transcription in peripheral-blood mononuclear cells in
patients receiving potent antiretroviral therapy. New Engl J Med
340:1614–1622
Hansch C, Fujita T (1964) Erratum: q-r-p analysis. A method for the
correlation of biological activity and chemical structure. J Am
Chem Soc 86:5710 (J Am Chem Soc 86:1625)
Hansch C, Hoekman D, Gao H (1996) Comparative QSAR: toward a
deeper understanding of chemicobiological interactions. Chem
Rev 96:1045–1076
Hansch C, Kurup A, Garg R, Gao H (2001) Chem-bioinformatics and
QSAR: a review of QSAR lacking positive hydrophobic terms.
Chem Rev 101:619–672
Kennard R, Stone L (1969) Computer aided design of experiments.
Technometrics 11:137–148
Klabunde T, Hessler G (2002) Drug design strategies for targeting
G-protein-coupled receptors. ChemBioChem 3:928–944
Klocker J, Wailzer B, Buchbauer G, Wolschann P (2002) Bayesian
neural networks for aroma classification. J Chem Inf Comput Sci
42:1443–1449
Mosier P, Jurs P (2002) QSAR/QSPR studies using probabilistic
neural networks and generalized regression neural networks.
J Chem Inf Comput Sci 42:1460–1470
Niwa T (2003) Using general regression and probabilistic neural
networks to predict human intestinal absorption with topological
descriptors derived from two-dimensional chemical structures.
J Chem Inf Comput Sci 43:113–119
Pipitone N, Pitzalis C (2000) The role of chemokines in inflammation
and rheumatoid arthritis. Curr Opin Anti-inflamm Immunomodul
Invest Drugs 2:9–15
Pomerantz R (1999) Primary HIV-1 resistance: a new phase in the
epidemic? JAMA 282:1177–1179
Pompe M, Razinger M, Novic M, Veber M (1997) Modelling of gas
chromatographic retention indices using counterpropagation
neural networks. Anal Chim Acta 348:215–221
Roy P, Roy PK (2008) On some aspects of variable selection for
partial least squares regression models. QSAR Comb Sci 27:
302–313
Saghaie L, Shahlaei M, Fassihi A, Madadkar-Sobhani A, Gholivand
M, Pourhossein A (1111) QSAR analysis for some diaryl-
substituted pyrazoles as CCR2 inhibitors by GA-stepwise MLR.
Chem Biol Drug Des 77:75–85
Saghaie L, Shahlaei M, Madadkar-Sobhani A, Fassihi A (2010)
Application of partial least squares and radial basis function
neural networks in multivariate imaging analysis-quantitative
structure activity relationship: study of cyclin dependent kinase 4
inhibitors. J Mol Graph Model 29:518–528
Schmidli H (1997) Multivariate prediction for QSAR. Chemometr
Intell Lab 37:125–134
Sellebjerg F, Madsen H, Jensen C, Jensen J, Garred P (2000) CCR5
delta32 matrix metalloproteinase-9 and disease activity in
multiple. J Neuroimmunol 102:98–106
Shahlaei M, Fassihi A, Saghaie L (2010a) Application of PC-ANN
and PC-LS-SVM in QSAR of CCR1 antagonist compounds: a
comparative study. Eur J Med Chem 45:1572–1582
Shahlaei M, Sabet R, Ziari M, Moeinifard B, Fassihi A, Karbakhsh R
(2010b) QSAR study of anthranilic acid sulfonamides as inhibitors
of methionine aminopeptidase-2 using LS-SVM and GRNN based
on principal components. Eur J Med Chem 45:4499–4508
Shi J, Luan F, Zhang H, Liu M, Guo Q, Hu Z, Fan B (2006) QSPR
study of fluorescence wavelengths (kex/kem) based on the
heuristic method and radial basis function neural networks.
QSAR Comb Sci 25:147–155
Specht D (1991) A general regression neural network. IEEE Trans
Neural Netw 2:568–576
Tetteh J, Howells S, Metcalfe E, Suzuki T (1998) Optimisation of
radial basis function neural networks using biharmonic spline
interpolation. Chemometr Intell Lab 41:17–29
Thanikaivelan P, Subramanian V, Raghava Rao J, Unni Nair B (2000)
Application of quantum chemical descriptor in quantitative
structure activity and structure property relationship. Chem Phys
Lett 323:59–70
Med Chem Res (2012) 21:3246–3262 3261
123

Todeschini R, Consonni V, Mauri A, Pavan M (2002) DRAGON
software version 2.1. Milano, Italy. http://disat.unimib.it/chm/
Dragon.html
Tropsha A, Gramatica P, Gombar V (2003) The importance of being
earnest: validation is the absolute essential for successful
application and interpretation of QSPR models. QSAR Comb
Sci 22:69–77
Walczak B, Massart DL (2000) Local modelling with radial basis
function networks. Chemometr Intell Lab 50:179–198
Yao X, Panaye A, Doucet J, Zhang R, Chen H, Liu M, Hu Z, Fan B
(2004) Comparative study of QSAR/QSPR correlations using
support vector machines radial basis function neural networks
and multiple linear regression. J Chem Inf Comput Sci
44:1257–1266
3262 Med Chem Res (2012) 21:3246–3262
123