Granular Kernel Tree
Transcript of Granular Kernel Tree
-
8/7/2019 Granular Kernel Tree
1/16
270 Int. J. Data Mining and Bioinformatics, Vol. 1, No. 3, 2007
Copyright 2007 Inderscience Enterprises Ltd.
Granular Kernel Trees with parallel GeneticAlgorithms for drug activity comparisons
Bo Jin* and Yan-Qing Zhang
Department of Computer Science,
Georgia State University, Atlanta, GA 30302, USA
E-mail: [email protected] E-mail: [email protected]
*Corresponding author
Binghe Wang
Department of Chemistry and
Center for Biotechnology and Drug Design,
Georgia State University, Atlanta, GA 30302-4098, USA
E-mail: [email protected]
Abstract: With the growing interests of biological data prediction andchemical data prediction, more powerful and flexible kernels need to bedesigned so that the prior knowledge and relationships within data can beexpressed effectively in kernel functions. In this paper, Granular Kernel Trees(GKTs) are proposed and parallel Genetic Algorithms (GAs) are used tooptimise the parameters of GKTs. In applications, SVMs with new kernel treesare employed for drug activity comparisons. The experimental results show thatGKTs and evolutionary GKTs can achieve better performances than traditional
RBF kernels in terms of prediction accuracy.
Keywords: kernel design; Support Vector Machines; SVMs; Granular KernelTrees; GKTs; Genetic Algorithms; GAs; drug activity comparisons; datamining; bioinformatics.
Reference to this paper should be made as follows: Jin, B., Zhang, Y-Q. andWang, B. (2007) Granular Kernel Trees with parallel Genetic Algorithms fordrug activity comparisons, Int. J. Data Mining and Bioinformatics, Vol. 1,No. 3, pp.270285.
Biographical notes: Bo Jin is a PhD student in the Computer ScienceDepartment at Georgia State University. He received his BE Degree from theUniversity of Electronic Science and Technology of China. His researchinterests are in the areas of machine learning, data mining, chemical
informatics and biomedical informatics.
Yan-Qing Zhang is currently an Associated Professor of the Computer ScienceDepartment at Georgia State University. He received a PhD Degree inComputer Science and Engineering at the University of South Florida in 1997.His research interests include hybrid intelligent systems, computationalintelligence, granular computing, kernel machines, bioinformatics, data miningand computational web intelligence. He has published 3 books, 12 bookchapters, 49 journal papers and over 100 conference papers. He has served as areviewer for 37 international journals, and a committee member in over70 international conferences. He is a program co-chair of IEEE-GrC2006.
-
8/7/2019 Granular Kernel Tree
2/16
Granular Kernel Trees with parallel Genetic Algorithms 271
Binghe Wang is Professor of Chemistry at Georgia State University,
Georgia Research Alliance Eminent Scholar in Drug Discovery, and GeorgiaCancer Coalition Distinguished Cancer Scientist. He obtained his BS Degreefrom Beijing Medical College in 1982 and his PhD Degree in MedicinalChemistry from the University of Kansas, School of Pharmacy in 1991. He isEditor-in-Chief of Medicinal Research Reviews published by John Wiley andSons and the Series Editor of A Wiley Series in Drug Discovery andDevelopment. His research expertise includes drug delivery, drug designand synthesis, bioorganic chemistry, fluorescent sensors, and combinatorialchemistry.
1 Introduction
Kernel methods, specifically Support Vector Machines (SVMs) (Boser et al., 1992;
Cortes and Vapnik, 1995; Shawe-Taylor and Cristianini, 2004; Vapnik, 1998) have been
widely used in many fields such as bioinformatics (Schlkopf et al., 2004) and chemical
informatics (Burbidge et al., 2001; Weston et al., 2003) for data classification and pattern
recognition. With the help of kernels nonlinear mapping, input data are transformed into
a high dimensional feature space where it is easy for SVMs to find a hyperplane to
separate data. SVMs performance is mainly affected by kernel functions. While
traditional kernels, such as RBF kernels and polynomial kernels do not take into
considerations the relationships and structure within each data item but simply treat each
data vector as one unit in operations. With the growing interests of biological data
prediction and chemical data prediction such as structure-property based molecule
comparison, protein structure prediction and long DNA sequence comparison, more
complicated kernels are designed to integrate data structures, such as string kernels
(Cristianini and Shawe-Taylor, 1999; Lodhi et al., 2001), tree kernels (Collins and Duffy,
2002; Kashima and Koyanagi, 2002) and graph kernels (Grtner et al., 2003; Kashima
and Inokuchi, 2002) based on kernel decomposition concept. For detailed review please
see Grtner (2003). One common character of these kernels is that feature
transformations are implemented according to objects structures without steps of input
feature generation. Many of them directly implement inner product operations with some
kinds of iterative calculations. These transformations are very efficient in the case that
objects include large structured information. While for many challenging problems,
objects are not structured or some relationships within objects are not easy to be
described directly. Furthermore, essential optimisations are needed once kernel functions
are defined. It should be mentioned that Haussler (1999) first detailedly introduced the
decomposition based kernel design and proposed convolution kernels.In this paper, we use granular computing concepts to redescribe the decomposition
based kernel design and propose an evolutionary hierarchical approach to integrate the
prior knowledge such as data structures, feature relationships into the kernel design.
Features within an input vector are grouped into feature granules according to the
composition and structure of each data item. Each feature granule captures a particular
aspect of data items. For two input vectors, the similarity between a pair of feature
granules is measured by using a kernel function called granular kernel. Granular kernels
for different kinds of feature granules are fused together by hierarchical trees, called
GKTs. Parallel GAs are used to optimise GKTs and select an effective SVMs model.
-
8/7/2019 Granular Kernel Tree
3/16
272 B. Jin, Y-Q. Zhang and B. Wang
In applications, SVMs with new kernels trees are employed for the comparisons of
drug activities, which is a problem in Quantitative Structure Activity Relationships(QSAR) analysis. QSAR is an important technique used in drug design, which describes
the relationships between compound structures and their activities. In QSAR analysis,
compounds with different activities are discriminated, and then predictive rules are
constructed. In this study, inhibitors of E. Coli dihydrofolate reductase (DHFR) are
analysed. These inhibitors are potential therapeutic agents for the treatment of malaria,
bacterial infection, toxoplasma, and cancer. Experimental results show that SVMs with
both GKTs and EGKTs can achieve much better performance than SVMs with the
traditional RBF kernels in terms of prediction accuracy.
The rest of the paper is organised as follows. Granular kernel, kernel tree design and
evolutionary optimisation are proposed in Section 2. Section 3 describes the experiments
of drug activity comparisons. Finally, Section 4 gives conclusions and directs the future
work.
2 Granular Kernel and Kernel tree design
2.1 Definitions
Definition 1 (Cristianini and Shawe-Taylor, 1999): A kernel is a function Kthat for all
,x z X
satisfies
( , ) ( ), ( )K x z x z =
(1)
where is a mapping from input spaceX=Rn
to an inner product feature spaceF=RN
: ( ) .x x F
(2)
Definition 2: A feature granule space G of input spaceX=Rn
is a sub space ofX, where
G =Rm
and 1 mn.
From input space we may generate many feature granule spaces and some of them may
overlap on some feature dimensions.
Definition 3: A feature granule g G
is a vector which is defined in the feature granule
space G.
Definition 4: A granular kernelgKis a kernel that for all , 'g g G
satisfies
( , ') ( ), ( ')gK g g g g =
(3)
where is a mapping from feature granule space G =Rm
to an inner product feature
spaceRE
.
: ( ) .Eg g R
(4)
-
8/7/2019 Granular Kernel Tree
4/16
Granular Kernel Trees with parallel Genetic Algorithms 273
2.2 Granular Kernel properties
Property 1: Granular kernels inherit the properties of traditional kernels such as the
closure under sum, product, and multiplication with a positive constant over the granular
feature spaces.
Let G be a feature granule space and , ' .g g G
LetgK1 andgK2 be two granule kernels
operating over the same space G G. The following ( , ')gK g g
are also granular kernels.
1( , ') ( , '),gK g g c gK g g c R+
=
(5)
1( , ') ( , ') ,gK g g gK g g c c R+
= +
(6)
1 2( , ') ( , ') ( , ')gK g g gK g g gK g g = +
(7)
1 2( , ') ( , ') ( , ')gK g g gK g g gK g g =
(8)
( , ') ( ) ( '), :gK g g f g f g f X R=
(9)
1
1 1
( , ')( , ') .
( , ) ( ', ')
gK g g gK g g
gK g g gK g g =
(10)
These properties can be elicited from the traditional kernel properties directly.
Property 2 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two
granular kernels defined over different granular feature spaces under sum operation.
To prove it, let 1 1 1( , ')gK g g
and 2 2 2( , ' )gK g g
be two granular kernels, where1 1 1, 'g g G
, 2 2 2, 'g g G
and G1G2. We may define new kernels like this,
1 2 1 2 1 1 1(( , ), ( ', ' )) ( , ' )gK g g g g gK g g =
1 2 1 2 2 2 2'(( , ), ( ' , ' )) ( , ' )gK g g g g gK g g =
gKandgK can operate over the same feature space (G1G2) (G1G2). We get
1 1 1 2 2 2 1 2 1 2 1 2 1 2( , ' ) ( , ' ) (( , ), ( ' , ' )) '(( , ), ( ' , ' ))gK g g gK g g gK g g g g gK g g g g + = +
.
According to the sum closure property of kernels (Cristianini and Shawe-Taylor, 1999),
1 1 1 2 2 2( , ' ) ( , ' )gK g g gK g g +
is a kernel over (G1G2) (G1G2).
Property 3 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two
granular kernels defined over different granular feature spaces under product operation.
To prove it, let 1 1 1( , ' )gK g g
and 2 2 2( , ' )gK g g
be two granular kernels, where
1 1 1, 'g g G
, 2 2 2, 'g g G
and G1G2. We may define new kernels like this,
1 2 1 2 1 1 1(( , ), ( ', ' )) ( , ' )gK g g g g gK g g =
1 2 1 2 2 2 2'(( , ), ( ' , ' )) ( , ' ).gK g g g g gK g g =
-
8/7/2019 Granular Kernel Tree
5/16
274 B. Jin, Y-Q. Zhang and B. Wang
SogKandgK can operate over the same feature space (G1G2) (G1G2). We get
1 1 1 2 2 2 1 2 1 2 1 2 1 2( , ' ) ( , ' ) (( , ), ( ' , ' )) '(( , ), ( ' , ' )).gK g g gK g g gK g g g g gK g g g g =
According to the product closure property of kernels (Cristianini and Shawe-Taylor,
1999), 1 1 1 2 2 2( , ' ) ( , ' )gK g g gK g g
is a kernel over (G1G2) (G1G2).
2.3 GKTs and EGKTs
An easy and effective way to construct new kernel functions is combining a group of
granular kernels via some simple operations such as sum and product. The new kernel
functions can be naturally expressed as tree structures. The following are main steps in
GKTs design.
Step 1: Features are bundled into feature granules according to some prior knowledgesuch as object structures and feature relationships or with an automatic learning
algorithm.
Step 2: A tree structure is constructed with suitable number of layers, nodes and
connections. Like the first step, we can construct trees according to some prior
knowledge or with an automatic learning algorithm. Figure 1 shows a kind of GKTs with
m basic granular kernels gKt and m pairs of feature granules tg
and 'tg
, where
1 tm.
Step 3: Granular kernels are selected from the candidate kernel set. Some popular
traditional kernels such as RBF kernels and polynomial kernels can be chosen as
granular kernels, since these kernels have proved successful in many real problems.
Some special kernels designed for some particular problems could also be selected asgranular kernels if they are good at measuring the similarities of corresponding feature
granules.
Step 4: Parameters of granular kernels and operations of connection nodes are selected.
Each connection operation in GKTs can be a sum or product. A positive connection
weight may associate to each edge in the tree and a granular kernel may belong to one
more subtrees.
In this paper, GAs are used to find the optimum parameter settings of GKTs. We use
EGKTs to represent such kind of evolutionary GKTs. The following are basic definitions
and operations used in optimising EGKTs.
Chromosome: LetPi denote the population in generation Gi, where i = 1, , m and m is
the total number of generations. Each population Pi has p chromosomes cij, j = 1, ,p.
Each chromosome cij has q genesgt(cij), where t= 1, , q. Here each gene is a parameter
of GKTs and we use GKTs(cij) to represent GKTs configured with genes gt(cij),
t= 1, , q.
-
8/7/2019 Granular Kernel Tree
6/16
Granular Kernel Trees with parallel Genetic Algorithms 275
Fitness: There are several methods to evaluate SVMs performance. One is using k-fold
cross-validation, which is a popular technique for performance evaluation. Others aresome theoretical bounds evaluation on the generalisation errors, such as Xi-Alpha bound
(Joachims, 2000), VC bound (Vapnik, 1998), Radius margin bound and VCs span bound
(Vapnik and Chapelle, 2000). Detail review can be found in Duan et al. (2003). In this
paper we use k-fold cross-validation to evaluate SVMs performance in training phase.
Figure 1 An example of GKTs
In k-fold cross-validation, the training data set S is separated into kmutually exclusive
subsets vS . Forv = 1, , k, data set v is used to train SVMs with GKTs(cij) and vS
is
used to evaluate SVMs model.
, 1, , .v vS S v k = = (11)
Afterktimes of training-testing on all different subsets, we get kprediction accuracies.The fitnessfij of chromosome cij is calculated by
1
1 k
ij v
v
f Acck =
= (12)
whereAccv is the prediction accuracy of GKTs(cij) on vS .
-
8/7/2019 Granular Kernel Tree
7/16
276 B. Jin, Y-Q. Zhang and B. Wang
Selection: In the algorithm, the roulette wheel method described in Michalewicz (1996)
is used to select individuals for the new population.
Crossover: Two chromosomes are first selected randomly from current generation as
parents and then the crossover point is randomly selected to separate the chromosomes.
Parts of chromosomes are exchanged between parents to generate two children.
Mutation: Some chromosomes are randomly selected and some genes are randomly
chosen from each selected chromosome for mutation. The values of mutated genes are
replaced by random values.
2.4 Parallel GAs
We use parallel GAs to speed up SVMs model selection and parameter optimisation.In the literature, some parallel algorithms are designed for SVMs. In Dong et al. (2003), a
parallelisation approach is proposed where SVMs kernel matrix is approximated by
block diagonal matrices so an original optimisation problem can be rewritten into
hundreds of sub-problems. In Zanghirati et al. (2003) and Serafini et al. (2004), a
Gradient Projection Method (GPM) is presented and implemented for parallel
computation in SVMs. The decomposition technique is used to split the SVM Quadratic
Programming (QP) problem into smaller QP sub-problems (each sub-problem is solved
by GPM). The related SVMs software can be used in both scalar and distributed memory
parallel environments. Graf et al. (2005) develop a kind of parallel SVMs called Cascade
of SVMs on a distributed environment, where smaller optimisations are solved
independently. The partial results are combined and filtered again in a Cascade of SVMs,
until the global optimum is reached. Convergence to the global optimum is guaranteed
with multiple passes through the Cascade.
Besides the works mentioned above, Runarsson and Sigurdsson (2004) use the
parallel method to speedup the evolutionary model selection for SVMs. The algorithm is
implemented on a multi-processor computer in C++ using standard Posix threads.
In GKTs optimisation, all parameters and operations to be optimised are independent
in each generation, so its well suitable to design a parallel GAs based system to speed up
GKTs optimisation. Parallel GAs (Cant-Paz, 1998; Adamidis, 1994; Lin et al., 1997)
have been well studied in recent several years. There are three common types of parallel
GAs models:
single population master-slave models
single population fine-grained models
multiple population coarse-grained models.
In this paper, the parallel GAs system is designed based on the first type of models.
In the system, one processor is chosen as the master, who stores the population, does
selection, crossover and mutation, and then distributes individuals to slave processors on
the cluster. Each single SVMs model is trained and evaluated on one of slave processors
with the received individual (parameters). After fitness evaluation, each slave will
send back the fitness value to the master. The architecture of parallel GAs is shown in
Figure 2. The parallel GAs-SVMs system has some characteristics. First, this is a global
GAs-SVMs system, since all evaluations and operations are performed on the entire
-
8/7/2019 Granular Kernel Tree
8/16
Granular Kernel Trees with parallel Genetic Algorithms 277
population. Second, the implementation is easy, clear, practical, and especially suitable
for SVMs model selection. Third, the system can be easily moved to the large distributedcomputing environment, such as the grid-computing system.
Figure 2 Parallel GAs model
QP decomposition based parallel computing can also speed up SVMs model selection in
a distributed system, while if the training data set is large, the communication costs for
transferring sub-QP meta results will be very high. On the other hand, in SVMs model
selection, each SVMs model spends much more time for QP calculation, which generally
has higher magnitude of running time than those of operations in GAs. In the
master-slave based parallel GAs-SVMs system, only parameters and fitness values need
to be transferred between the master and the slaves. So the communication costs are low.Figure 3 shows an example of running time and speedup with parallel GAs on a cluster
system. The cluster is a shared-disk and distributed-memory platform. In the example, the
size of dataset is 314, RBF is chosen as the kernel function, the size of population is set to
300 and the number of generations is set to 50. For this example, we can see the speedup
can reach 10 with 14 nodes. Here each node is a processor. The system architecture of
SVMs with EGKTs is shown in Figure 4. In practice, the regularisation parameter Cof
SVMs is also optimised by parallel GAs.
Figure 3 An example of running time and speedup with parallel GAs: (a) running timeand (b) speedup
(a) (b)
-
8/7/2019 Granular Kernel Tree
9/16
278 B. Jin, Y-Q. Zhang and B. Wang
Figure 4 System Architecture of SVMs with EGKTs
3 Experiments
Since RBF kernels (equation (13)) usually have better performances among traditional
kernels, we compare GKTs and EGKTs with RBF kernels. To make a fair comparison
with EGKTs, traditional RBF kernels are also optimised by using GAs. Here we use
E-RBF to represent GAs based RBF kernels.
2exp( || || ).x z
(13)
3.1 Drug sets
The drug datasets used in the experiments are pyrimidines and triazines, which are
described in Hirst (1994a; 1994b) and available at UCI Repository of machine learning
databases (Newman, 1998). Pyrimidines dataset contains 55 drugs, and each drug has
three possible substitution positions (R3, R4 and R5, see Figure 5(a)). Each substituent is
characterised by nine chemical properties features: polarity, size, flexibility,
hydrogen-bond donor, hydrogen-bond acceptor, donor, acceptor, polarisability, and
effect. Drug activities are identified by the substituents. If no substituent locates in a
possible position, the features are indicated by nine 1s. Each input vector includes two
drug features with the fixed feature order. In one vector, if the activity of the first drug is
higher than that of the second one, the vector is labelled positive, otherwise it is labelled
negative. So the feature number of one vector is 54.
Figure 5 Drug structures: (a) pyrimidines and (b) triazines
(a) (b)
-
8/7/2019 Granular Kernel Tree
10/16
Granular Kernel Trees with parallel Genetic Algorithms 279
The pyrimidines dataset is randomly shuffled and split into two parts in the proportion of
4 : 1. One part is used as the training set, which contains pairs of 44 compounds.The other part is chosen as the unseen testing set, which contains pairs of the left
compounds and those between the left compounds and the training compounds. So the
size of training set should be 44 43 = 1892 and the size of testing set should be
44 11 2 + 11 10 = 1078. Due to the deletion of some pairs with the same activities,
the data sets are actually a little bit smaller than those above.
The structure of triazines is described in Figure 5(b). In triazines dataset, each
compound has six possible substitution positions: the positions of R3 and R4; if the
substituent at R3 contains a ring itself, then R3 and R4 of this third ring; similarly if the
substituent at R4 contains a ring itself, then R3 and R4 of this third ring. Ten features are
used to characterise each position: the structure branching feature and other nine features
which are the same as those used for each substituent of pyrimidines. If no substituent
locates in a possible position, the features are indicated by ten 1s. So each vector has120 features. We randomly select 60 drugs from triazines dataset and then randomly
shuffle and split them into two parts in the proportion of 4 : 1 based on drugs of pairs.
3.2 Feature granules and GKTs design
In the experiments, the input vectors are decomposed according to the possible
substituent locations. Each feature granule includes all features of one substituent
(see Figure 6). For pyrimidines, each drug pair has six feature granules and each feature
granule has nine features. For triazines, each drug pair has twelve feature granules with
the size of 10.
Figure 6 Feature granules: (a) pyrimidines and (b) triazines
(a) (b)
-
8/7/2019 Granular Kernel Tree
11/16
280 B. Jin, Y-Q. Zhang and B. Wang
We design two kinds of GKTs for each dataset which are shown in Figure 7. GKTs-1 and
GKTs-2 are used for pyrimidines. GKTs-3 and GKTs-4 are used for triazines. GKTs-1and GKTs-3 are a kind of two layer kernel trees and within which each granular kernels
importance is controlled by the outgoing connection weight. GKTs-2 and GKTs-4 are
three layer kernel trees and within which each drug pair is represented by a two layer
subtree. Two subtrees are combined together by a product operation at the top of tree.
3.3 Experimental setup
RBF kernel functions are also chosen as granular kernels functions in each GKTs, and
therefore each granular kernelgKi has a RBF parameteri.
The initial ranges of all RBFs and i are set to [0.0001, 1]. The initial range of
regularisation parameter C is [1, 256]. The probability of crossover is 0.7 and the
mutation ratio is 0.5. The range of connection weights is [0.001, 1]. 5-foldcross-validation is used on pyrimidines training dataset and 8-fold cross-validation is
used on triazines training dataset. In cross-validation, the training data are also split in the
same way as described in the subsection 3.1. The population size is set to 500 and the
number of generations is set to 30 for both datasets. The software package of SVMs used
in the experiments is LibSVM (Chang and Lin, 2001).
Figure 7 Granular Kernel Trees: (a)(b) GKTs for pyrimidines and (c)(d) GKTs for triazines
(a) GKTs-1 (b) GKTs-2
(c) GKTs-3
-
8/7/2019 Granular Kernel Tree
12/16
Granular Kernel Trees with parallel Genetic Algorithms 281
Figure 7 Granular Kernel Trees: (a)(b) GKTs for pyrimidines and (c)(d) GKTs for triazines
(continued)
(d) GKTs-4
3.4 Experimental results and comparisons
Table 1 shows performances of three GAs based kernels on pyrimidines dataset.
EGKTs-1 is evolutionary GKTs-1 and EGKTs-2 is evolutionary GKTs-2. From Table 1,
we can see that SVMs with both kinds of EGKTs can outperform SVMs with E-RBF by
3.0% and 3.3% respectively in terms of prediction accuracy on unseen testing dataset.
The fitness values and training accuracies of SVMs with EGKTs are also higher than
those of SVMs with E-RBF kernels. Its also shown that the testing accuracy of SVMs
with EGKTs-1 is a little bit higher than that of SVMs with EGKTs-2 on pyrimidines.
Table 1 Prediction accuracies on pyrimidines dataset
E-RBF(%) EGKTs-1 (%) EGKTs-2 (%)
Fitness 84.5 86.6 88.5
Training accuracy 96.8 96.8 98.8
Testing accuracy 88.4 91.7 91.4
The performances of three GAs based kernels on triazines dataset are shown in Table 2.
On testing accuracy, SVMs with EGKTs-3 (evolutionary GKTs-3) and EGKTs-4
(evolutionary GKTs-4) are better than SVMs with E-RBF by 3.7% and 4.9%respectively. We find that the training accuracies are much higher than both testing
accuracies and fitness values for all three kernels on both datasets, especially on triazines
dataset. The reason could be due to the fact that data are complicated and SVMs easily
overfit on the training dataset.
-
8/7/2019 Granular Kernel Tree
13/16
282 B. Jin, Y-Q. Zhang and B. Wang
Table 2 Prediction accuracies on triazines dataset
E-RBF(%) EGKTs-3 (%) EGKTs-4 (%)
Fitness 73.8 74.6 75.8
Training accuracy 93.4 97.2 98.7
Testing accuracy 79.6 83.3 84.5
The comparisons between RBF kernels and GKTs are made by using a large number of
kernel parameter samples. We randomly generate 2000 C values from [1, 256] for SVMs
and 2000 groups of kernel parameters for each kernel. SVMs are trained and tested with
these random parameters. For each dataset, the prediction accuracy curves of three
kernels are drawn in one picture (Figures 8 and 9) and each of them is ordered with C
values. From Figures 8 and 9, it is easy to see that the performances of GKTs are better
than those of RBF kernels. Quartiles and mean are also used to summarise each kernelperformance in terms of testing accuracy. The results are listed in Tables 3 and 4. Based
on the differences of Q1 (25th percentile), Q2 (median), Q3 (75th percentile) and Mean
values, we can conclude the performances of two GKTs are better than those of RBF
kernels by about 2.3~3.4% on pyrimidines and 3.6~4.5% on triazines.
Table 3 Testing accuracies on pyrimidines dataset with 2000 groups of random parameters
RBF(%) GKTs-1 (%) GKTs-2 (%)
Maximum 91.0 93.2 93.0
75th percentile 88.4 91.7 91.0
Median 88.0 91.3 90.6
25th percentile 87.5 90.9 90.1Minimum 83.5 87.0 87.2
Mean 88.2 91.2 90.5
Table 4 Testing accuracies on triazines dataset with 2000 groups of random parameters
RBF(%) GKTs-3 (%) GKTs-4 (%)
Maximum 83.9 88.2 88.2
75th percentile 79.9 83.7 84.1
Median 78.5 82.6 83
25th percentile 77.9 81.5 82
Minimum 72.2 77.8 76.2
Mean 78.9 82.6 83
We can see that almost all testing accuracies of EGKTs in Tables 1 and 2 are better than
the maximum testing accuracies of RBF kernels in Tables 3 and 4. We can also find that
the testing accuracies of GAs based kernel methods can be stabilised at Q3.
-
8/7/2019 Granular Kernel Tree
14/16
Granular Kernel Trees with parallel Genetic Algorithms 283
Figure 8 Testing accuracy comparisons on pyrimidines
Figure 9 Testing accuracy comparisons on triazines
4 Conclusions and future work
This paper has proposed an approach to construct GKTs according to the granular kernel
concept and properties. The experimental results have shown that GKTs and EGKTs
have better performances than traditional RBF kernels in drug activity comparisons.
Its promising to construct more powerful and suitable kernels by using such kind of
evolutionary hierarchical kernel design. In the future, we will continue our research on
the evolutionary granular kernel tree design for other problems. How to generate feature
granules could be one issue in the case that the relationships among features are complex.
-
8/7/2019 Granular Kernel Tree
15/16
284 B. Jin, Y-Q. Zhang and B. Wang
Acknowledgements
This work is supported in part by NIH under P20 GM065762. Bo Jin is supported by
Molecular Basis for Disease (MBD) Doctoral Fellowship Program.
References
Adamidis, P. (1994) Review of parallel genetic algorithms bibliography, Internal T.R., AristotleUniversity of Thessaloniki, Greece.
Berg, C., Christensen, J.P.R. and Ressel, P. (1984) Harmonic Analysis on Semigroups-Theoryof Positive Definite and Ralated Functions, Springer-Verlag, New York, USA.
Boser, B., Guyon, I. and Vapnik, V.N. (1992) A training algorithm for optimal margin classifiers, Proc. Fifth Annual Workshop on Computational Learning Theory, ACM Press, USA,
pp.144152.Burbidge, R., Trotter, M., Buxton, B. and Holden, S. (2001) Drug design by machine learning:
support vector machines for pharmaceutical data analysis, Computers and Chemistry,Vol. 26, No. 1, pp.415.
Cant-Paz, E. (1998) A survey of parallel genetic algorithms, Calculateurs Paralleles, Hermes,Paris, Vol. 10, No. 2, pp.141171.
Chang, C-C. and Lin, C-J. (2001) LIBSVM: A Library for Support Vector Machines, Softwareavailable at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
Collins, M. and Duffy, N. (2002) Convolution kernels for natural language, in Dietterich, T.G.,Becker, S. and Ghahramani, Z. (Eds.): Advances in Neural Information Processing Systems,MIT Press, Cambridge, MA, Vol. 14, pp.625632.
Cortes, C. and Vapnik, V.N. (1995) Support-vector networks, Machine Learning Vol. 20,pp.273297.
Cristianini, N. and Shawe-Taylor, J. (1999) An Introduction to Support Vector Machines:And other Kernel-based Learning Methods, Cambridge University Press, NY.
Dong, J.X., Krzyzak, A. and Suen, C.Y. (2003) A fast parallel optimization for training supportvector machine, in Perner, P. and Rosenfeld, A. (Eds.): Proceedings of 3rd InternationalConference on Machine Learning and Data Mining, Springer Lecture Notes in ArtificialIntelligence (LNAI 2734), Leipzig, Germany, pp.96105.
Duan, K., Keerthi, S.S. and Poo, A.N. (2003) Evaluation of simple performance measures fortuning SVM hyperparameters,Neurocomputing, Vol. 51, pp.4159.
Grtner, T. (2003) A survey of Kernels for structured data, ACM SIGKDD ExplorationsNewsletter, Vol. 5, pp.4958.
Grtner, T. Flach, P.A. and Wrobel, S. (2003) On graph kernels: hardness results and efficientalternatives,Proceedings of the 16th Annual Conference on Computational Learning Theoryand the 7th Kernel Workshop.
Graf, H-P., Cosatto, E., Bottou, L., Dourdanovic, I. and Vapnik, V.N. (2005) Parallel support
vector machines: the cascade SVM, in Saul, L., Weiss, Y. and Bottou, L. (Eds.):Advances in Neural Information Processing Systems, MIT Press, MIT Press, Cambridge, MA, Vol. 17,pp.513520.
Haussler, D. (1999) Convolution kernels on discrete structures, Technical reportUCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz.
Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994a) Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolatereductase by pyrimidines, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,pp.405420.
-
8/7/2019 Granular Kernel Tree
16/16
Granular Kernel Trees with parallel Genetic Algorithms 285
Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994b) Quantitative structure-activity relationships
by neural networks and inductive logic programming. II. The inhibition of dihydrofolatereductase by triazines, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,pp.421432.
Joachims, T. (2000) Estimating the generalization performance of a SVM efficiently,Proceedingsof the International Conference on Machine Learning, Morgan Kaufman.
Kashima, H. and Inokuchi, A. (2002) Kernels for graph classification,Proc. 1st ICDM Workshopon Active Mining (AM-2002), Maebashi, Japan.
Kashima, H. and Koyanagi, T. (2002) Kernels for semi-structured data, Proceedings of theNineteenth International Conference on Machine Learning, pp.291298.
Lin, S-H., Goodman, E.D. and Punch III, W.F. (1997) Investigating parallel genetic algorithms on job shop scheduling problem, Proceedings of the 6th International Conference onEvolutionary Programming VI.
Lodhi, H., Shawe-Taylor, J., Christianini, N. and Watkins, C. (2001) Text classification using
string kernels, in Leen, T., Dietterich, T. and Tresp, V. (Eds.): Advances in NeuralInformation Processing Systems, MIT Press, Cambridge, MA, Vol. 13, pp.563569.
Michalewicz, Z. (1996) Genetic Algorithms + Data Structures = Evolution Programs,Springer-Verlag, Berlin.
Newman, D.J., Hettich, S., Blake, C.L. and Merz, C.J. (1998) UCI Repository of Machine LearningDatabases, [http://www.ics.uci.edu/~mlearn/MLRepository.html], University of California,Department of Information and Computer Science, Irvine, CA.
Runarsson, T.P. and Sigurdsson, S. (2004) Asynchronous parallel evolutionary model selection forsupport vector machines,Neural Information ProcessingLetters and Reviews, Vol. 3, No. 3pp.5967.
Schlkopf, B., Tsuda, K. and Vert, J-P. (2004) Kernel Methods in Computational Biology,MIT Press, Cambridge, MA.
Serafini, T., Zanni, L. and Zanghirati, G. (2004) Parallel GPDT A Parallel Gradient Projection-based Decomposition Technique for Support Vector Machines, http://www.dm.unife.it/gpdt.
Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis, CambridgeUniversity Press, Cambridge, MA.
Vapnik, V.N. (1998) Statistical Learning Theory, John Wiley and Sons, New York.
Vapnik, V.N. and Chapelle, O. (2000) Bounds on error expectation for support vector machine,in Smola, A., Bartlett, P., Schlkopf, B. and Schuurmans, D. (Eds.): Advances in LargeMargin Classifiers, MIT Press, Cambridge, MA, pp.261280.
Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A. and Schlkopf, B. (2003)Feature selection and transduction for prediction of molecular bioactivity for drug design,Bioinformatics, Vol. 19, No. 6, pp.764771.
Zanghirati, G. and Zanni, L. (2003) Parallel solver for large quadratic programs in training supportvector machines,Parallel Computing, Vol. 29, pp.535551.