Granular Kernel Tree

8/7/2019 Granular Kernel Tree

1/16

270 Int. J. Data Mining and Bioinformatics, Vol. 1, No. 3, 2007

Copyright 2007 Inderscience Enterprises Ltd.

Granular Kernel Trees with parallel GeneticAlgorithms for drug activity comparisons

Bo Jin* and Yan-Qing Zhang

Department of Computer Science,

Georgia State University, Atlanta, GA 30302, USA

E-mail: [email protected] E-mail: [email protected]

*Corresponding author

Binghe Wang

Department of Chemistry and

Center for Biotechnology and Drug Design,

Georgia State University, Atlanta, GA 30302-4098, USA

E-mail: [email protected]

Abstract: With the growing interests of biological data prediction andchemical data prediction, more powerful and flexible kernels need to bedesigned so that the prior knowledge and relationships within data can beexpressed effectively in kernel functions. In this paper, Granular Kernel Trees(GKTs) are proposed and parallel Genetic Algorithms (GAs) are used tooptimise the parameters of GKTs. In applications, SVMs with new kernel treesare employed for drug activity comparisons. The experimental results show thatGKTs and evolutionary GKTs can achieve better performances than traditional

RBF kernels in terms of prediction accuracy.

Keywords: kernel design; Support Vector Machines; SVMs; Granular KernelTrees; GKTs; Genetic Algorithms; GAs; drug activity comparisons; datamining; bioinformatics.

Reference to this paper should be made as follows: Jin, B., Zhang, Y-Q. andWang, B. (2007) Granular Kernel Trees with parallel Genetic Algorithms fordrug activity comparisons, Int. J. Data Mining and Bioinformatics, Vol. 1,No. 3, pp.270285.

Biographical notes: Bo Jin is a PhD student in the Computer ScienceDepartment at Georgia State University. He received his BE Degree from theUniversity of Electronic Science and Technology of China. His researchinterests are in the areas of machine learning, data mining, chemical

informatics and biomedical informatics.

Yan-Qing Zhang is currently an Associated Professor of the Computer ScienceDepartment at Georgia State University. He received a PhD Degree inComputer Science and Engineering at the University of South Florida in 1997.His research interests include hybrid intelligent systems, computationalintelligence, granular computing, kernel machines, bioinformatics, data miningand computational web intelligence. He has published 3 books, 12 bookchapters, 49 journal papers and over 100 conference papers. He has served as areviewer for 37 international journals, and a committee member in over70 international conferences. He is a program co-chair of IEEE-GrC2006.


2/16

Granular Kernel Trees with parallel Genetic Algorithms 271

Binghe Wang is Professor of Chemistry at Georgia State University,

Georgia Research Alliance Eminent Scholar in Drug Discovery, and GeorgiaCancer Coalition Distinguished Cancer Scientist. He obtained his BS Degreefrom Beijing Medical College in 1982 and his PhD Degree in MedicinalChemistry from the University of Kansas, School of Pharmacy in 1991. He isEditor-in-Chief of Medicinal Research Reviews published by John Wiley andSons and the Series Editor of A Wiley Series in Drug Discovery andDevelopment. His research expertise includes drug delivery, drug designand synthesis, bioorganic chemistry, fluorescent sensors, and combinatorialchemistry.

1 Introduction

Kernel methods, specifically Support Vector Machines (SVMs) (Boser et al., 1992;

Cortes and Vapnik, 1995; Shawe-Taylor and Cristianini, 2004; Vapnik, 1998) have been

widely used in many fields such as bioinformatics (Schlkopf et al., 2004) and chemical

informatics (Burbidge et al., 2001; Weston et al., 2003) for data classification and pattern

recognition. With the help of kernels nonlinear mapping, input data are transformed into

a high dimensional feature space where it is easy for SVMs to find a hyperplane to

separate data. SVMs performance is mainly affected by kernel functions. While

traditional kernels, such as RBF kernels and polynomial kernels do not take into

considerations the relationships and structure within each data item but simply treat each

data vector as one unit in operations. With the growing interests of biological data

prediction and chemical data prediction such as structure-property based molecule

comparison, protein structure prediction and long DNA sequence comparison, more

complicated kernels are designed to integrate data structures, such as string kernels

(Cristianini and Shawe-Taylor, 1999; Lodhi et al., 2001), tree kernels (Collins and Duffy,

2002; Kashima and Koyanagi, 2002) and graph kernels (Grtner et al., 2003; Kashima

and Inokuchi, 2002) based on kernel decomposition concept. For detailed review please

see Grtner (2003). One common character of these kernels is that feature

transformations are implemented according to objects structures without steps of input

feature generation. Many of them directly implement inner product operations with some

kinds of iterative calculations. These transformations are very efficient in the case that

objects include large structured information. While for many challenging problems,

objects are not structured or some relationships within objects are not easy to be

described directly. Furthermore, essential optimisations are needed once kernel functions

are defined. It should be mentioned that Haussler (1999) first detailedly introduced the

decomposition based kernel design and proposed convolution kernels.In this paper, we use granular computing concepts to redescribe the decomposition

based kernel design and propose an evolutionary hierarchical approach to integrate the

prior knowledge such as data structures, feature relationships into the kernel design.

Features within an input vector are grouped into feature granules according to the

composition and structure of each data item. Each feature granule captures a particular

aspect of data items. For two input vectors, the similarity between a pair of feature

granules is measured by using a kernel function called granular kernel. Granular kernels

for different kinds of feature granules are fused together by hierarchical trees, called

GKTs. Parallel GAs are used to optimise GKTs and select an effective SVMs model.


3/16

272 B. Jin, Y-Q. Zhang and B. Wang

In applications, SVMs with new kernels trees are employed for the comparisons of

drug activities, which is a problem in Quantitative Structure Activity Relationships(QSAR) analysis. QSAR is an important technique used in drug design, which describes

the relationships between compound structures and their activities. In QSAR analysis,

compounds with different activities are discriminated, and then predictive rules are

constructed. In this study, inhibitors of E. Coli dihydrofolate reductase (DHFR) are

analysed. These inhibitors are potential therapeutic agents for the treatment of malaria,

bacterial infection, toxoplasma, and cancer. Experimental results show that SVMs with

both GKTs and EGKTs can achieve much better performance than SVMs with the

traditional RBF kernels in terms of prediction accuracy.

The rest of the paper is organised as follows. Granular kernel, kernel tree design and

evolutionary optimisation are proposed in Section 2. Section 3 describes the experiments

of drug activity comparisons. Finally, Section 4 gives conclusions and directs the future

work.

2 Granular Kernel and Kernel tree design

2.1 Definitions

Definition 1 (Cristianini and Shawe-Taylor, 1999): A kernel is a function Kthat for all

,x z X

satisfies

( , ) ( ), ( )K x z x z =

(1)

where is a mapping from input spaceX=Rn

to an inner product feature spaceF=RN

: ( ) .x x F

(2)

Definition 2: A feature granule space G of input spaceX=Rn

is a sub space ofX, where

G =Rm

and 1 mn.

From input space we may generate many feature granule spaces and some of them may

overlap on some feature dimensions.

Definition 3: A feature granule g G

is a vector which is defined in the feature granule

space G.

Definition 4: A granular kernelgKis a kernel that for all , 'g g G

satisfies

( , ') ( ), ( ')gK g g g g =

(3)

where is a mapping from feature granule space G =Rm

to an inner product feature

spaceRE

.

: ( ) .Eg g R

(4)


4/16


2.2 Granular Kernel properties

Property 1: Granular kernels inherit the properties of traditional kernels such as the

closure under sum, product, and multiplication with a positive constant over the granular

feature spaces.

Let G be a feature granule space and , ' .g g G

LetgK1 andgK2 be two granule kernels

operating over the same space G G. The following ( , ')gK g g

are also granular kernels.

1( , ') ( , '),gK g g c gK g g c R+

=

(5)

1( , ') ( , ') ,gK g g gK g g c c R+

= +

(6)

1 2( , ') ( , ') ( , ')gK g g gK g g gK g g = +

(7)

1 2( , ') ( , ') ( , ')gK g g gK g g gK g g =

(8)

( , ') ( ) ( '), :gK g g f g f g f X R=

(9)

1

1 1

( , ')( , ') .

( , ) ( ', ')

gK g g gK g g

gK g g gK g g =

(10)

These properties can be elicited from the traditional kernel properties directly.

Property 2 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two

granular kernels defined over different granular feature spaces under sum operation.

To prove it, let 1 1 1( , ')gK g g

and 2 2 2( , ' )gK g g

be two granular kernels, where1 1 1, 'g g G

, 2 2 2, 'g g G

and G1G2. We may define new kernels like this,

1 2 1 2 1 1 1(( , ), ( ', ' )) ( , ' )gK g g g g gK g g =

1 2 1 2 2 2 2'(( , ), ( ' , ' )) ( , ' )gK g g g g gK g g =

gKandgK can operate over the same feature space (G1G2) (G1G2). We get

1 1 1 2 2 2 1 2 1 2 1 2 1 2( , ' ) ( , ' ) (( , ), ( ' , ' )) '(( , ), ( ' , ' ))gK g g gK g g gK g g g g gK g g g g + = +

.

According to the sum closure property of kernels (Cristianini and Shawe-Taylor, 1999),

1 1 1 2 2 2( , ' ) ( , ' )gK g g gK g g +

is a kernel over (G1G2) (G1G2).

Property 3 (Berg et al., 1984; Haussler, 1999): A kernel can be constructed with two

granular kernels defined over different granular feature spaces under product operation.

To prove it, let 1 1 1( , ' )gK g g

and 2 2 2( , ' )gK g g

be two granular kernels, where

1 1 1, 'g g G

, 2 2 2, 'g g G

and G1G2. We may define new kernels like this,

1 2 1 2 1 1 1(( , ), ( ', ' )) ( , ' )gK g g g g gK g g =

1 2 1 2 2 2 2'(( , ), ( ' , ' )) ( , ' ).gK g g g g gK g g =


5/16


SogKandgK can operate over the same feature space (G1G2) (G1G2). We get

1 1 1 2 2 2 1 2 1 2 1 2 1 2( , ' ) ( , ' ) (( , ), ( ' , ' )) '(( , ), ( ' , ' )).gK g g gK g g gK g g g g gK g g g g =

According to the product closure property of kernels (Cristianini and Shawe-Taylor,

1999), 1 1 1 2 2 2( , ' ) ( , ' )gK g g gK g g

is a kernel over (G1G2) (G1G2).

2.3 GKTs and EGKTs

An easy and effective way to construct new kernel functions is combining a group of

granular kernels via some simple operations such as sum and product. The new kernel

functions can be naturally expressed as tree structures. The following are main steps in

GKTs design.

Step 1: Features are bundled into feature granules according to some prior knowledgesuch as object structures and feature relationships or with an automatic learning

algorithm.

Step 2: A tree structure is constructed with suitable number of layers, nodes and

connections. Like the first step, we can construct trees according to some prior

knowledge or with an automatic learning algorithm. Figure 1 shows a kind of GKTs with

m basic granular kernels gKt and m pairs of feature granules tg

and 'tg

, where

1 tm.

Step 3: Granular kernels are selected from the candidate kernel set. Some popular

traditional kernels such as RBF kernels and polynomial kernels can be chosen as

granular kernels, since these kernels have proved successful in many real problems.

Some special kernels designed for some particular problems could also be selected asgranular kernels if they are good at measuring the similarities of corresponding feature

granules.

Step 4: Parameters of granular kernels and operations of connection nodes are selected.

Each connection operation in GKTs can be a sum or product. A positive connection

weight may associate to each edge in the tree and a granular kernel may belong to one

more subtrees.

In this paper, GAs are used to find the optimum parameter settings of GKTs. We use

EGKTs to represent such kind of evolutionary GKTs. The following are basic definitions

and operations used in optimising EGKTs.

Chromosome: LetPi denote the population in generation Gi, where i = 1, , m and m is

the total number of generations. Each population Pi has p chromosomes cij, j = 1, ,p.

Each chromosome cij has q genesgt(cij), where t= 1, , q. Here each gene is a parameter

of GKTs and we use GKTs(cij) to represent GKTs configured with genes gt(cij),

t= 1, , q.


6/16


Fitness: There are several methods to evaluate SVMs performance. One is using k-fold

cross-validation, which is a popular technique for performance evaluation. Others aresome theoretical bounds evaluation on the generalisation errors, such as Xi-Alpha bound

(Joachims, 2000), VC bound (Vapnik, 1998), Radius margin bound and VCs span bound

(Vapnik and Chapelle, 2000). Detail review can be found in Duan et al. (2003). In this

paper we use k-fold cross-validation to evaluate SVMs performance in training phase.

Figure 1 An example of GKTs

In k-fold cross-validation, the training data set S is separated into kmutually exclusive

subsets vS . Forv = 1, , k, data set v is used to train SVMs with GKTs(cij) and vS

is

used to evaluate SVMs model.

, 1, , .v vS S v k = = (11)

Afterktimes of training-testing on all different subsets, we get kprediction accuracies.The fitnessfij of chromosome cij is calculated by

1

1 k

ij v

v

f Acck =

= (12)

whereAccv is the prediction accuracy of GKTs(cij) on vS .


7/16


Selection: In the algorithm, the roulette wheel method described in Michalewicz (1996)

is used to select individuals for the new population.

Crossover: Two chromosomes are first selected randomly from current generation as

parents and then the crossover point is randomly selected to separate the chromosomes.

Parts of chromosomes are exchanged between parents to generate two children.

Mutation: Some chromosomes are randomly selected and some genes are randomly

chosen from each selected chromosome for mutation. The values of mutated genes are

replaced by random values.

2.4 Parallel GAs

We use parallel GAs to speed up SVMs model selection and parameter optimisation.In the literature, some parallel algorithms are designed for SVMs. In Dong et al. (2003), a

parallelisation approach is proposed where SVMs kernel matrix is approximated by

block diagonal matrices so an original optimisation problem can be rewritten into

hundreds of sub-problems. In Zanghirati et al. (2003) and Serafini et al. (2004), a

Gradient Projection Method (GPM) is presented and implemented for parallel

computation in SVMs. The decomposition technique is used to split the SVM Quadratic

Programming (QP) problem into smaller QP sub-problems (each sub-problem is solved

by GPM). The related SVMs software can be used in both scalar and distributed memory

parallel environments. Graf et al. (2005) develop a kind of parallel SVMs called Cascade

of SVMs on a distributed environment, where smaller optimisations are solved

independently. The partial results are combined and filtered again in a Cascade of SVMs,

until the global optimum is reached. Convergence to the global optimum is guaranteed

with multiple passes through the Cascade.

Besides the works mentioned above, Runarsson and Sigurdsson (2004) use the

parallel method to speedup the evolutionary model selection for SVMs. The algorithm is

implemented on a multi-processor computer in C++ using standard Posix threads.

In GKTs optimisation, all parameters and operations to be optimised are independent

in each generation, so its well suitable to design a parallel GAs based system to speed up

GKTs optimisation. Parallel GAs (Cant-Paz, 1998; Adamidis, 1994; Lin et al., 1997)

have been well studied in recent several years. There are three common types of parallel

GAs models:

single population master-slave models

single population fine-grained models

multiple population coarse-grained models.

In this paper, the parallel GAs system is designed based on the first type of models.

In the system, one processor is chosen as the master, who stores the population, does

selection, crossover and mutation, and then distributes individuals to slave processors on

the cluster. Each single SVMs model is trained and evaluated on one of slave processors

with the received individual (parameters). After fitness evaluation, each slave will

send back the fitness value to the master. The architecture of parallel GAs is shown in

Figure 2. The parallel GAs-SVMs system has some characteristics. First, this is a global

GAs-SVMs system, since all evaluations and operations are performed on the entire


8/16


population. Second, the implementation is easy, clear, practical, and especially suitable

for SVMs model selection. Third, the system can be easily moved to the large distributedcomputing environment, such as the grid-computing system.

Figure 2 Parallel GAs model

QP decomposition based parallel computing can also speed up SVMs model selection in

a distributed system, while if the training data set is large, the communication costs for

transferring sub-QP meta results will be very high. On the other hand, in SVMs model

selection, each SVMs model spends much more time for QP calculation, which generally

has higher magnitude of running time than those of operations in GAs. In the

master-slave based parallel GAs-SVMs system, only parameters and fitness values need

to be transferred between the master and the slaves. So the communication costs are low.Figure 3 shows an example of running time and speedup with parallel GAs on a cluster

system. The cluster is a shared-disk and distributed-memory platform. In the example, the

size of dataset is 314, RBF is chosen as the kernel function, the size of population is set to

300 and the number of generations is set to 50. For this example, we can see the speedup

can reach 10 with 14 nodes. Here each node is a processor. The system architecture of

SVMs with EGKTs is shown in Figure 4. In practice, the regularisation parameter Cof

SVMs is also optimised by parallel GAs.

Figure 3 An example of running time and speedup with parallel GAs: (a) running timeand (b) speedup

(a) (b)


9/16


Figure 4 System Architecture of SVMs with EGKTs

3 Experiments

Since RBF kernels (equation (13)) usually have better performances among traditional

kernels, we compare GKTs and EGKTs with RBF kernels. To make a fair comparison

with EGKTs, traditional RBF kernels are also optimised by using GAs. Here we use

E-RBF to represent GAs based RBF kernels.

2exp( || || ).x z

(13)

3.1 Drug sets

The drug datasets used in the experiments are pyrimidines and triazines, which are

described in Hirst (1994a; 1994b) and available at UCI Repository of machine learning

databases (Newman, 1998). Pyrimidines dataset contains 55 drugs, and each drug has

three possible substitution positions (R3, R4 and R5, see Figure 5(a)). Each substituent is

characterised by nine chemical properties features: polarity, size, flexibility,

hydrogen-bond donor, hydrogen-bond acceptor, donor, acceptor, polarisability, and

effect. Drug activities are identified by the substituents. If no substituent locates in a

possible position, the features are indicated by nine 1s. Each input vector includes two

drug features with the fixed feature order. In one vector, if the activity of the first drug is

higher than that of the second one, the vector is labelled positive, otherwise it is labelled

negative. So the feature number of one vector is 54.

Figure 5 Drug structures: (a) pyrimidines and (b) triazines

(a) (b)


10/16


The pyrimidines dataset is randomly shuffled and split into two parts in the proportion of

4 : 1. One part is used as the training set, which contains pairs of 44 compounds.The other part is chosen as the unseen testing set, which contains pairs of the left

compounds and those between the left compounds and the training compounds. So the

size of training set should be 44 43 = 1892 and the size of testing set should be

44 11 2 + 11 10 = 1078. Due to the deletion of some pairs with the same activities,

the data sets are actually a little bit smaller than those above.

The structure of triazines is described in Figure 5(b). In triazines dataset, each

compound has six possible substitution positions: the positions of R3 and R4; if the

substituent at R3 contains a ring itself, then R3 and R4 of this third ring; similarly if the

substituent at R4 contains a ring itself, then R3 and R4 of this third ring. Ten features are

used to characterise each position: the structure branching feature and other nine features

which are the same as those used for each substituent of pyrimidines. If no substituent

locates in a possible position, the features are indicated by ten 1s. So each vector has120 features. We randomly select 60 drugs from triazines dataset and then randomly

shuffle and split them into two parts in the proportion of 4 : 1 based on drugs of pairs.

3.2 Feature granules and GKTs design

In the experiments, the input vectors are decomposed according to the possible

substituent locations. Each feature granule includes all features of one substituent

(see Figure 6). For pyrimidines, each drug pair has six feature granules and each feature

granule has nine features. For triazines, each drug pair has twelve feature granules with

the size of 10.

Figure 6 Feature granules: (a) pyrimidines and (b) triazines

(a) (b)


11/16


We design two kinds of GKTs for each dataset which are shown in Figure 7. GKTs-1 and

GKTs-2 are used for pyrimidines. GKTs-3 and GKTs-4 are used for triazines. GKTs-1and GKTs-3 are a kind of two layer kernel trees and within which each granular kernels

importance is controlled by the outgoing connection weight. GKTs-2 and GKTs-4 are

three layer kernel trees and within which each drug pair is represented by a two layer

subtree. Two subtrees are combined together by a product operation at the top of tree.

3.3 Experimental setup

RBF kernel functions are also chosen as granular kernels functions in each GKTs, and

therefore each granular kernelgKi has a RBF parameteri.

The initial ranges of all RBFs and i are set to [0.0001, 1]. The initial range of

regularisation parameter C is [1, 256]. The probability of crossover is 0.7 and the

mutation ratio is 0.5. The range of connection weights is [0.001, 1]. 5-foldcross-validation is used on pyrimidines training dataset and 8-fold cross-validation is

used on triazines training dataset. In cross-validation, the training data are also split in the

same way as described in the subsection 3.1. The population size is set to 500 and the

number of generations is set to 30 for both datasets. The software package of SVMs used

in the experiments is LibSVM (Chang and Lin, 2001).

Figure 7 Granular Kernel Trees: (a)(b) GKTs for pyrimidines and (c)(d) GKTs for triazines

(a) GKTs-1 (b) GKTs-2

(c) GKTs-3


12/16


Figure 7 Granular Kernel Trees: (a)(b) GKTs for pyrimidines and (c)(d) GKTs for triazines

(continued)

(d) GKTs-4

3.4 Experimental results and comparisons

Table 1 shows performances of three GAs based kernels on pyrimidines dataset.

EGKTs-1 is evolutionary GKTs-1 and EGKTs-2 is evolutionary GKTs-2. From Table 1,

we can see that SVMs with both kinds of EGKTs can outperform SVMs with E-RBF by

3.0% and 3.3% respectively in terms of prediction accuracy on unseen testing dataset.

The fitness values and training accuracies of SVMs with EGKTs are also higher than

those of SVMs with E-RBF kernels. Its also shown that the testing accuracy of SVMs

with EGKTs-1 is a little bit higher than that of SVMs with EGKTs-2 on pyrimidines.

Table 1 Prediction accuracies on pyrimidines dataset

E-RBF(%) EGKTs-1 (%) EGKTs-2 (%)

Fitness 84.5 86.6 88.5

Training accuracy 96.8 96.8 98.8

Testing accuracy 88.4 91.7 91.4

The performances of three GAs based kernels on triazines dataset are shown in Table 2.

On testing accuracy, SVMs with EGKTs-3 (evolutionary GKTs-3) and EGKTs-4

(evolutionary GKTs-4) are better than SVMs with E-RBF by 3.7% and 4.9%respectively. We find that the training accuracies are much higher than both testing

accuracies and fitness values for all three kernels on both datasets, especially on triazines

dataset. The reason could be due to the fact that data are complicated and SVMs easily

overfit on the training dataset.


13/16


Table 2 Prediction accuracies on triazines dataset

E-RBF(%) EGKTs-3 (%) EGKTs-4 (%)

Fitness 73.8 74.6 75.8

Training accuracy 93.4 97.2 98.7

Testing accuracy 79.6 83.3 84.5

The comparisons between RBF kernels and GKTs are made by using a large number of

kernel parameter samples. We randomly generate 2000 C values from [1, 256] for SVMs

and 2000 groups of kernel parameters for each kernel. SVMs are trained and tested with

these random parameters. For each dataset, the prediction accuracy curves of three

kernels are drawn in one picture (Figures 8 and 9) and each of them is ordered with C

values. From Figures 8 and 9, it is easy to see that the performances of GKTs are better

than those of RBF kernels. Quartiles and mean are also used to summarise each kernelperformance in terms of testing accuracy. The results are listed in Tables 3 and 4. Based

on the differences of Q1 (25th percentile), Q2 (median), Q3 (75th percentile) and Mean

values, we can conclude the performances of two GKTs are better than those of RBF

kernels by about 2.3~3.4% on pyrimidines and 3.6~4.5% on triazines.

Table 3 Testing accuracies on pyrimidines dataset with 2000 groups of random parameters

RBF(%) GKTs-1 (%) GKTs-2 (%)

Maximum 91.0 93.2 93.0

75th percentile 88.4 91.7 91.0

Median 88.0 91.3 90.6

25th percentile 87.5 90.9 90.1Minimum 83.5 87.0 87.2

Mean 88.2 91.2 90.5

Table 4 Testing accuracies on triazines dataset with 2000 groups of random parameters

RBF(%) GKTs-3 (%) GKTs-4 (%)

Maximum 83.9 88.2 88.2

75th percentile 79.9 83.7 84.1

Median 78.5 82.6 83

25th percentile 77.9 81.5 82

Minimum 72.2 77.8 76.2

Mean 78.9 82.6 83

We can see that almost all testing accuracies of EGKTs in Tables 1 and 2 are better than

the maximum testing accuracies of RBF kernels in Tables 3 and 4. We can also find that

the testing accuracies of GAs based kernel methods can be stabilised at Q3.


14/16


Figure 8 Testing accuracy comparisons on pyrimidines

Figure 9 Testing accuracy comparisons on triazines

4 Conclusions and future work

This paper has proposed an approach to construct GKTs according to the granular kernel

concept and properties. The experimental results have shown that GKTs and EGKTs

have better performances than traditional RBF kernels in drug activity comparisons.

Its promising to construct more powerful and suitable kernels by using such kind of

evolutionary hierarchical kernel design. In the future, we will continue our research on

the evolutionary granular kernel tree design for other problems. How to generate feature

granules could be one issue in the case that the relationships among features are complex.


15/16


Acknowledgements

This work is supported in part by NIH under P20 GM065762. Bo Jin is supported by

Molecular Basis for Disease (MBD) Doctoral Fellowship Program.

References

Adamidis, P. (1994) Review of parallel genetic algorithms bibliography, Internal T.R., AristotleUniversity of Thessaloniki, Greece.

Berg, C., Christensen, J.P.R. and Ressel, P. (1984) Harmonic Analysis on Semigroups-Theoryof Positive Definite and Ralated Functions, Springer-Verlag, New York, USA.

Boser, B., Guyon, I. and Vapnik, V.N. (1992) A training algorithm for optimal margin classifiers, Proc. Fifth Annual Workshop on Computational Learning Theory, ACM Press, USA,

pp.144152.Burbidge, R., Trotter, M., Buxton, B. and Holden, S. (2001) Drug design by machine learning:

support vector machines for pharmaceutical data analysis, Computers and Chemistry,Vol. 26, No. 1, pp.415.

Cant-Paz, E. (1998) A survey of parallel genetic algorithms, Calculateurs Paralleles, Hermes,Paris, Vol. 10, No. 2, pp.141171.

Chang, C-C. and Lin, C-J. (2001) LIBSVM: A Library for Support Vector Machines, Softwareavailable at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Collins, M. and Duffy, N. (2002) Convolution kernels for natural language, in Dietterich, T.G.,Becker, S. and Ghahramani, Z. (Eds.): Advances in Neural Information Processing Systems,MIT Press, Cambridge, MA, Vol. 14, pp.625632.

Cortes, C. and Vapnik, V.N. (1995) Support-vector networks, Machine Learning Vol. 20,pp.273297.

Cristianini, N. and Shawe-Taylor, J. (1999) An Introduction to Support Vector Machines:And other Kernel-based Learning Methods, Cambridge University Press, NY.

Dong, J.X., Krzyzak, A. and Suen, C.Y. (2003) A fast parallel optimization for training supportvector machine, in Perner, P. and Rosenfeld, A. (Eds.): Proceedings of 3rd InternationalConference on Machine Learning and Data Mining, Springer Lecture Notes in ArtificialIntelligence (LNAI 2734), Leipzig, Germany, pp.96105.

Duan, K., Keerthi, S.S. and Poo, A.N. (2003) Evaluation of simple performance measures fortuning SVM hyperparameters,Neurocomputing, Vol. 51, pp.4159.

Grtner, T. (2003) A survey of Kernels for structured data, ACM SIGKDD ExplorationsNewsletter, Vol. 5, pp.4958.

Grtner, T. Flach, P.A. and Wrobel, S. (2003) On graph kernels: hardness results and efficientalternatives,Proceedings of the 16th Annual Conference on Computational Learning Theoryand the 7th Kernel Workshop.

Graf, H-P., Cosatto, E., Bottou, L., Dourdanovic, I. and Vapnik, V.N. (2005) Parallel support

vector machines: the cascade SVM, in Saul, L., Weiss, Y. and Bottou, L. (Eds.):Advances in Neural Information Processing Systems, MIT Press, MIT Press, Cambridge, MA, Vol. 17,pp.513520.

Haussler, D. (1999) Convolution kernels on discrete structures, Technical reportUCSC-CRL-99-10, Department of Computer Science, University of California at Santa Cruz.

Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994a) Quantitative structure-activity relationships by neural networks and inductive logic programming. I. The inhibition of dihydrofolatereductase by pyrimidines, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,pp.405420.


16/16


Hirst, J.D., King, R.D. and Sternberg, M.J.E. (1994b) Quantitative structure-activity relationships

by neural networks and inductive logic programming. II. The inhibition of dihydrofolatereductase by triazines, Journal of Computer-Aided Molecular Design, Vol. 8, No. 4,pp.421432.

Joachims, T. (2000) Estimating the generalization performance of a SVM efficiently,Proceedingsof the International Conference on Machine Learning, Morgan Kaufman.

Kashima, H. and Inokuchi, A. (2002) Kernels for graph classification,Proc. 1st ICDM Workshopon Active Mining (AM-2002), Maebashi, Japan.

Kashima, H. and Koyanagi, T. (2002) Kernels for semi-structured data, Proceedings of theNineteenth International Conference on Machine Learning, pp.291298.

Lin, S-H., Goodman, E.D. and Punch III, W.F. (1997) Investigating parallel genetic algorithms on job shop scheduling problem, Proceedings of the 6th International Conference onEvolutionary Programming VI.

Lodhi, H., Shawe-Taylor, J., Christianini, N. and Watkins, C. (2001) Text classification using

string kernels, in Leen, T., Dietterich, T. and Tresp, V. (Eds.): Advances in NeuralInformation Processing Systems, MIT Press, Cambridge, MA, Vol. 13, pp.563569.

Michalewicz, Z. (1996) Genetic Algorithms + Data Structures = Evolution Programs,Springer-Verlag, Berlin.

Newman, D.J., Hettich, S., Blake, C.L. and Merz, C.J. (1998) UCI Repository of Machine LearningDatabases, [http://www.ics.uci.edu/~mlearn/MLRepository.html], University of California,Department of Information and Computer Science, Irvine, CA.

Runarsson, T.P. and Sigurdsson, S. (2004) Asynchronous parallel evolutionary model selection forsupport vector machines,Neural Information ProcessingLetters and Reviews, Vol. 3, No. 3pp.5967.

Schlkopf, B., Tsuda, K. and Vert, J-P. (2004) Kernel Methods in Computational Biology,MIT Press, Cambridge, MA.

Serafini, T., Zanni, L. and Zanghirati, G. (2004) Parallel GPDT A Parallel Gradient Projection-based Decomposition Technique for Support Vector Machines, http://www.dm.unife.it/gpdt.

Shawe-Taylor, J. and Cristianini, N. (2004) Kernel Methods for Pattern Analysis, CambridgeUniversity Press, Cambridge, MA.

Vapnik, V.N. (1998) Statistical Learning Theory, John Wiley and Sons, New York.

Vapnik, V.N. and Chapelle, O. (2000) Bounds on error expectation for support vector machine,in Smola, A., Bartlett, P., Schlkopf, B. and Schuurmans, D. (Eds.): Advances in LargeMargin Classifiers, MIT Press, Cambridge, MA, pp.261280.

Weston, J., Perez-Cruz, F., Bousquet, O., Chapelle, O., Elisseeff, A. and Schlkopf, B. (2003)Feature selection and transduction for prediction of molecular bioactivity for drug design,Bioinformatics, Vol. 19, No. 6, pp.764771.

Zanghirati, G. and Zanni, L. (2003) Parallel solver for large quadratic programs in training supportvector machines,Parallel Computing, Vol. 29, pp.535551.

Granular Kernel Tree

Documents

Transcript of Granular Kernel Tree