Efficient Multiple Kernel Learning Algorithms Using...
Transcript of Efficient Multiple Kernel Learning Algorithms Using...
Research ArticleEfficient Multiple Kernel Learning Algorithms UsingLow-Rank Representation
Wenjia Niu12 Kewen Xia12 Baokai Zu12 and Jianchuan Bai12
1School of Electronic and Information Engineering Hebei University of Technology Tianjin 300401 China2Key Lab of Big Data Computation of Hebei Province Tianjin 300401 China
Correspondence should be addressed to Kewen Xia kwxiahebuteducn
Received 23 February 2017 Revised 20 June 2017 Accepted 5 July 2017 Published 22 August 2017
Academic Editor Cheng-Jian Lin
Copyright copy 2017 Wenjia Niu et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Unlike Support Vector Machine (SVM) Multiple Kernel Learning (MKL) allows datasets to be free to choose the useful kernelsbased on their distribution characteristics rather than a precise one It has been shown in the literature that MKL holds superiorrecognition accuracy compared with SVM however at the expense of time consuming computations This creates analytical andcomputational difficulties in solving MKL algorithms To overcome this issue we first develop a novel kernel approximationapproach for MKL and then propose an efficient Low-Rank MKL (LR-MKL) algorithm by using the Low-Rank Representation(LRR) It is well-acknowledged that LRR can reduce dimension while retaining the data features under a global low-rank constraintFurthermore we redesign the binary-class MKL as the multiclass MKL based on pairwise strategy Finally the recognition effectand efficiency of LR-MKL are verified on the datasets Yale ORL LSVT and Digit Experimental results show that the proposedLR-MKL algorithm is an efficient kernel weights allocation method in MKL and boosts the performance of MKL largely
1 Introduction
Support Vector Machine (SVM) is an important machinelearning method [1] which trains linear learner in featurespace derived by the kernel function and utilizes generaliza-tion theory to avoid overfitting phenomenon Recently Mul-tiple Kernel Learning (MKL) method has received intensiveattention due to its more desirable recognition effect overthe classical SVM [2 3] However parameter optimization ofmultiple kernel introduces a high computing cost in search-ing the entire feature space and solving tremendous convexquadratic optimization problems Hassan et al [4] utilize theGenetic Algorithm (GA) to improve the search efficiency inMKL but the availability of GA remains to be proved and itssearch direction is too complex to be determined Besideswith the data volume increasing exponentially in the realworld it is intractable to solve large scale problems by usingconventional optimal methods Therefore many approacheshave been put forward to improve MKL For example theSequential Minimal Optimization (SMO) algorithm [5] isa typical decomposition approach that updates one or two
Lagrange multipliers at every training step to get the iterativesolutions And some online algorithms [6 7] refine predic-tors through online-to-batch conversion scheme whereas itshould be noted that the convergence rate of such decompo-sition approaches is unstable Another approach is to approxi-mate the kernelmatrix such as Cholesky decomposition [8 9]which is used to reduce the computational cost howeverat the cost of giving up recognition accuracy due to lostinformation
Generally when we set 119899 as sample size and 119873 asthe number of kernels the complexities of solving convexquadratic optimization problems in SVM andMKL are119874(1198993)and 119874(11987311989935) [10] respectively It can be observed that thecomputing scale depends on the size of training set ratherthan the kernel space dimension [8] In this big data erait is imperative to find an approach that can minimize thecomputing scale while capturing the global data structure toperfect SVM or MKL Low-Rank Representation (LRR) [11]recently has attracted great interest in many research fieldssuch as image processing [12 13] computer vision [14] and
HindawiComputational Intelligence and NeuroscienceVolume 2017 Article ID 3678487 9 pageshttpsdoiorg10115520173678487
2 Computational Intelligence and Neuroscience
data mining [15] LRR as a compressed sensing approachaims to find the lowest-rank linear combination of all trainingsamples for reconstructing test samples under a global low-rank constraint When the training samples are sufficientlycomplete the process of representing data with low-rank willaugment the similarities among the intraclass samples and thedifferences among the interclass samples Meanwhile if thedata is corrupted since the rank of coefficient matrix will belargely increased the lowest-rank criterion can enforce noisecorrection LRR integrates data clustering and noise correc-tion into a unified framework which can greatly improvethe recognition accuracy and robustness in the preprocessingstage In this sense the recognition of SVM or MKL canbe increasingly accurate and of high speed when they arecombined with LRR
In this paper combining LRR and MKL we will developa novel recognition approach so as to construct a Low-RankMKL (LR-MKL) algorithm In the proposed algorithm thecombined Low-Rank SVM (LR-SVM) will simultaneously beutilized as the reference We will conduct extensive experi-ments on public databases to show that our proposed LR-MKL algorithm can achieve better performance than originalSVM and MKL
The remainder of the paper is organized as follows Westart by a brief review on SVM in next section In Section 3 wedescribe some existing MKL algorithms and their structureframes Section 4 is devoted to introducing efficient MKLalgorithms using LRR which we present and call LR-MKLExperiments which demonstrate the utility of the suggestedalgorithm on real data are presented in Section 5 Section 6gives the conclusions
2 Overview of SVM
Given input space X sube R119863 and label vector Y XY meetsindependent and identically distributed conditions so thetraining set can be denoted as x119894 119910119894119899119894=1 (contains 119899 samples)According to the theory of structural risk minimization[1] SVM can find the classification hyperplane with themaximummargin in the mapping space R119875 Hence the SVMtraining with 1198971-norm softmargin is a quadratic optimizationproblem
min 12 ⟨ww⟩ + 119862 119899sum119894=1
120585119894st 119910119894 (⟨w x119894⟩ + 119887) ge 1 minus 120585119894
w isin R119875120585119894 isin R119899+119894 = 1 119899
(1)
Herew is the weight coefficient vector119862 is the penalty factor120585119894 is the slack variable and 119887 is the bias term of classificationhyperplane The optimization problem can be transformedinto its dual form by introducing Lagrangianmultiplier120572119894 120572119895
and the data X can be implicitly mapped to the feature spaceby utilizing the kernel function119870 so formula (1) changes into
min 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870(x119894 x119895) minus 119899sum119894=1
120572119894
st119899sum119894=1
120572119894119910119894 = 0119862 ge 120572119894 ge 0119894 = 1 119899
(2)
Simplify the objective function of formula (2) into vectorform
min 12120572TQ120572 + 1T120572st yT120572 = 0
119862 ge 120572 ge 0(3)
where 120572 isin R119899 is the vector of Lagrangian multiplier y isin Y119899is the label vector 1 is a vector of 11015840s (119899 lowast 1-dimension)and Q119894119895 = 119910119894119910119895119870(x119894 x119895) If the solution of optimizationproblem is 120572lowast119894 119894 = 1 119899 the discriminant function canbe represented as
119891 (x) = 119899sum119894=1
120572lowast119894 119910119894119870(x119894 x) + 119887 (4)
The kernel functions commonly used in SVM are linearkernel polynomial kernel radial basis function kernel andsigmoid kernel respectively denoted as
119870LIN (x119894 x119895) = ⟨x119894 x119895⟩ 119870POL (x119894 x119895) = (120574 lowast ⟨x119894 x119895⟩ + 1)119902 119870RBF (x119894 x119895) = exp (minus120574 lowast 10038171003817100381710038171003817x119894 minus x119895
1003817100381710038171003817100381722) 119870SIG (x119894 x119895) = tanh (120574 lowast ⟨x119894 x119895⟩ + 1)
(5)
To obtain the high recognition accuracy in monokernelSVM we need to discern what kind of kernel distributioncharacteristics the test data will obey Nevertheless it isunpractical and wasteful of resources to try different distri-bution characteristics one by one In this sense we needMKLto allocate the kernel weights based on the data structureautomatically
3 Multiple Kernel Learning (MKL) Algorithms
To improve the universal applicability of SVM algorithmMKL is applied instead of one specific kernel function
119870120583 (x119894 x119895) = 119891120583 (119870119898 (x119894 x119895)119872119898=1 | 120583) (6)
Computational Intelligence and Neuroscience 3
where 119870119898 is the monokernel function The multiple kernel119870120583 can be obtained by function 119891120583 R119863 rarr R119875 combining119872 different119870119898 And 120583 is the proportion parameter of kernelThere are many different methods to assign kernel weights
Pavlidis et al [16] propose a simple combination modeusing an unweighted sum or product of heterogeneouskernelsThe combining function of this UnweightedMultipleKernel Learning (UMKL) method is
119870120583 (x119894 x119895) = 119872sum119898=1
119870119898 (x119894 x119895)
119870120583 (x119894 x119895) = 119872prod119898=1
119870119898 (x119894 x119895) (7)
In a follow-up study the distribution of 120583 in MKL becomesa vital limiting factor of availability Chapelle and Rako-tomamonjy [17] report that the optimization problem canbe solved by a project gradient method in two alternativesteps first solving a primal SVM with the given 120583 secondupdating 120583 through the gradient function with 120572 calculatedin the first step The kernel combining function objectivefunction and gradient function of this Alternative MultipleKernel Learning (AMKL) method are
119870120583 (x119894 x119895) = 119872sum119898=1
120583119898119870119898 (x119894 x119895)
119869 (120583) = 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895( 119872sum119898=1
120583119898119870119898 (x119894 x119895))
minus 119899sum119894=1
120572119894120597119869 (120583)120597120583119898 = 12
119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870120583 (x119894 x119895)120597120583119898= 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870119898 (x119894 x119895) forall119898
(8)
The Generalized Multiple Kernel Learning (GMKL) method[18] also employs the gradient tool to approach solution but itregards kernel weights as a regularization item 119903(120583) which istaken as (12)(120583minus1119872)T(120583minus1119872) So the objective functionand gradient function can be transformed into
119869 (120583) = 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870120583 (x119894 x119895) minus 119899sum119894=1
120572119894 minus 119903 (120583) 120597119869 (120583)120597120583119898 = 12
119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895 120597119870120583 (x119894 x119895)120597120583119898 minus 120597119903 (120583)120597120583119898 forall119898
(9)
And the kernel combined function is
119870120583 (x119894 x119895) = 119872prod119898=1
exp (minus120583119898 (x119898119894 minus x119898119895 )2)
= exp( 119872sum119898=1
minus120583119898 (x119898119894 minus x119898119895 )2) (10)
There is another two-step alternate method using a gatingmodel called Localized Multiple Kernel Learning (LMKL)method [19] The formula of locally combined kernel isrepresented as
119870120583 (x119894 x119895) = 119872sum119898=1
120583119898 (x119894) ⟨Φ119898 (x119894) Φ119898 (x119895)⟩ 120583119898 (x119895) (11)
where Φ(x) is the mapping space of feature space To ensurenonnegativity kernels can be composed in competitive orcooperative mode by using softmax form and sigmoid form[25] respectively
softmax 120583119898 = exp (⟨k119898 x⟩ + V1198980)sum119872ℎ=1 exp (⟨kℎ x⟩ + Vℎ0) forall119898sigmoid 120583119898 = 1
exp (minus ⟨k119898 x⟩ + V1198980) forall119898(12)
where V = k119898 V1198980119872119898=1 denotes the parameter of gatingmodel On the other hand Qiu and Lane [20] quantify thefitness between kernel and accuracy in a Heuristic MultipleKernel Learning (HMKL) way by exploiting the relationshipbetween kernel matrixK and sample label y The relationshipcan be expressed by kernel alignment
119865 (K yyT) = ⟨K yyT⟩119865
radic⟨KK⟩119865 ⟨yyT yyT⟩119865= ⟨K yyT⟩
119865119899radic⟨KK⟩119865 (13)
where ⟨K yyT⟩119865 = sum119899119894=1sum119899119895=1119870(x119894 x119895)119910119894119895119910T119894119895 ⟨sdot sdot⟩119865 is the
Frobenius inner product Using kernel alignment weighs theproportion of multikernels
120583119898 = 119865 (119870119898 yyT)sum119872ℎ=1 119865 (119870ℎ yyT) forall119898 (14)
Then the concentration bound is added in kernel alignmentby Cortes et al [21] to form centering kernel
[119870119888]119894119895 = 119870119894119895 minus 1119899119899sum119894=1
119870119894119895 minus 1119899119899sum119895=1
119870119894119895 + 11198992119899sum119894119895=1
119870119894119895 (15)
Accordingly the multikernel weights of this Centering Mul-tiple Kernel Learning (CMKL) method are
120583 = Cminus1a1003817100381710038171003817Cminus1a10038171003817100381710038172 (16)
where C = ⟨119870119888119898 119870119888ℎ⟩119865119872119898ℎ=1 and a = ⟨119870119888119898 yyT⟩119865119872119898=1
4 Computational Intelligence and Neuroscience
Later Cortes et al [22] studied a Polynomial MultipleKernel Learning (PMKL) method which utilized the poly-nomial combination of the base kernels with higher degree(119889 ge 1) based on the Kernel Ridge Regression (KRR) theory
119870120583 (x119894 x119895) = sum12058311989611198962 sdotsdotsdot1198961198721198701 (x1119894 x1119895)1198961sdot 1198702 (x2119894 x2119895)1198962 sdot sdot sdot 119870119872 (x119872119894 x119872119895 )119896119872
119896119898 ge 0 sum (1198961 + 1198962 + sdot sdot sdot + 119896119872) le 119889 12058311989611198962 sdotsdotsdot119896119872 ge 0(17)
However the computing complex of coefficients 12058311989611198962 sdotsdotsdot119896119872 is119874(119872119889) which is too large to apply in practice So12058311989611198962 sdotsdotsdot119896119872 canbe simplified as a product form by nonnegative coefficients12058311989611 12058311989622 sdot sdot sdot 120583119896119872119872 and the special case (119889 = 2) can be expressedas
119870120583 (x119894 x119895) = 119872sum119898=1
119872sumℎ=1
120583119898120583ℎ119870119898 (x119898119894 x119898119895 )119870ℎ (xℎ119894 xℎ119895) (18)
Here the related optimization of learning 119870120583 can be formu-lated as the following min-max form
min120583isin120593
max120572isinR119899
minus120572T (119870120583 + 120582I) 120572 + 2120572Ty (19)
where 120593 is a positive bounded and convex set Two boundedsets 1198971-norm and 1198972-norm are the appropriate choices toconstruct 120593
1205931 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038171 le and 1205932 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038172 le and (20)
Here 1205830 and and are model parameters and 1205830 is generallyequal to 0 or 12058301205830 = 1
Other than approaches described above inspired bythe consistency between group Lasso and MKL [26] Xuet al [23] and Kloft et al [24] propose an MKL iterativemethod in a generalized 119897119901-norm (119901 ge 1) form They arecollectively called Arbitrary Norms Multiple Kernel Learn-ing (ANMKL) method On the basis of duality conditionw11989822 = 1205832119898sum119899119894=1sum119899119895=1 119910119894119910119895120572119894120572119895119870119898(x119898119894 x119898119895 ) the updatedformula of kernels weight is
120583119898 =1003817100381710038171003817w11989810038171003817100381710038172(119901+1)2
(sum119872ℎ=1 1003817100381710038171003817wℎ10038171003817100381710038172119901(119901+1)2 )1119901 forall119898 (21)
It can be seen from the formulas in this section that theoperation complexity of MKL is mainly decided by x119894 Sotrying to simplify the feature space is an efficient way toimprove the performance of MKLThrough the optimizationof basis vectors LRR can reduce dimension while retainingthe data features which is ideal for improving MKL
4 MKL Using Low-Rank Representation
41 Low-Rank Representation (LRR) The theoreticaladvances on LRR enable us to use latent low-rank structure
in data [27 28] And it simultaneously obtains therepresentation of all samples under a global low-rankconstraint Meantime the LRR procedure can operate in arelatively short time with guaranteed performance
Let the input samples space X be represented by a linearcombination in the dictionary A
X = AZ (22)
where Z = [z1 z2 z119899] is the coefficient matrix and each z119894is a representation coefficient vector of x119894 When the samplesare sufficient X serves as the dictionary A By consideringthe noise or tainted data in practical application LRR aims atapproximatingX intoAZ+E by themeans of minimizing therank of matrix A while reducing the 1198970-norm of E in whichA is a low-rank matrix and E is the associated sparse error Itcan be generally formulated as
minZE
rank (A) + 120582 E0 st X = AZ + E (23)
Here 120582 is used to balance the effect of low-rank and errorterm 1198970-norm as NP-hard problem can be substituted for1198971-norm or 11989721-norm We choose 11989721-norm as the errorterm measurement here which is defined as E21 =sum119899119895=1radicsum119899119895=1([E]119894119895)2 Meantime rank(A) can relax intonuclear-norm sdot lowast [29] Consequently the convex relaxationof formula (23) is
minZE
Zlowast + 120582 E21 st X = AZ + E (24)
The optimal solution Zlowast can be obtained via the AugmentedLagrange Multipliers (ALM) method [11]
42 Efficient SVM and MKL Using LRR Kernel matrixremarkably impacts the computational efficiency and accu-racy of SVM andMKL How to find an appropriate variant ofkernel matrix that contains both the initial label and the datageometry structure for recognition is a crucial task Since LRRhas been theoretically proved to be superior in the sequelwe adopt LRR to transform the kernel for augmenting thesimilarities among the intraclass samples and the differencesamong the interclass samples Moreover a representationof all samples under a global low-rank constraint can beattained which is more conducive to capturing the globaldata structure [30] So LR-SVM and LR-MKL are twoalternative techniques that we propose to use to improve theperformance of SVM and MKL
Firstly based on the LRR theory we improve the monok-ernel SVM as the reference item from which the improve-ment brought by LRR can be displayed visually The specificprocedure of efficient LR-SVM is presented in Algorithm 1
Algorithm 1 (efficient SVM using LRR (LR-SVM))
Input This includes the whole training set XY the featurespace of testing setX119878 = [x119899+1 x119899+2 x119899+119904] the parameters119905 120574 119862 119902 of SVM and the parameter 120582 of LRR
Computational Intelligence and Neuroscience 5
Step 1 Normalize XX119878Step 2 Perform (24) procedure on the normalized XX119878 toproject them on the coefficient feature space ZZ119878 respec-tively
Step 3 Plug Z and the label vector Y into SVM for trainingclassification model
Step 4 Utilize the obtained classificationmodel to classify thecoefficient feature Z119878 of testing set X119878 and the discriminantfunction is 119891(z) = sum119899119894=1 120572lowast119894 119910119894119870(z119894 z) + 119887Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition results
It is well known that SVM suffers from instability forthe various data structures Thus MKL recognition becomesthe development trend Next we combine LRR and MKLalgorithms mentioned in the Section 3 and change binary-classification model into multiclassification model by pair-wise (one-versus-one) strategy through which a classifierbetween any two categories of samples (119896 is the number ofcategories) can be designed Then we adopt voting methodand assign sample to the category the most votes obtainedAll the combined algorithms can be summarized into a framewhich is given in Algorithm 2 and we refer to it collectively asLR-MKL
Algorithm 2 (efficient MKL using LRR (LR-MKL))
Input This includes the whole training set XY the featurespace of testing set X119878 = [x119899+1 x119899+2 x119899+119904] and theparameter 120582 of LRR
Step 1simStep 2 They are the same as the LR-SVM algorithm
Step 3 Plug Z and the label vector Y into MKL to train 119896(119896 minus1)2 classifiers with the pairwise strategy
Step 4 Utilize each one of the binary MKL classifiers toclassify the coefficient feature Z119878 of testing set X119878
Step 5 According to the prediction label vectorsY1198751 Y1198752 Y119875119896(119896minus1)2 vote for the category of each sample toget the multilabels Y119875
Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition resultsand the kernel weight vector 1205835 Experiments and Analysis
In this section we conduct extensive experiments to examinethe efficiency of proposed LR-SVM and LR-MKL algorithmsThe operating environment is based on MATLAB (R2013a)under the Intel Core i5 CPU processor 253GHz frequencyparameters The SVM toolbox used in this paper is theLIBSVM [31] which can be easily applied and is shown to befast in large scale databases
The simulations are performed on diverse datasets toensure the universal recognition effectThe test datasets range
over the frequently used face databases and the standard testdata of UCI repository In the simulations all the samples arenormalized first
(1) Yale face database (httpvisionucsdeducontentyale-face-database) it contains 165 grayscale imagesof 15 individuals with different facial expression orconfiguration and each image is resized to 64 lowast 64pixels with 256 grey levels
(2) ORL face database (httpwwwclcamacukresearchdtgattarchivefacedatabasehtm) it contains 400images of 40 distinct subjects taken at different timesvarying light facial expressions and details Weresize them to 64 lowast 64 pixels with 256 grey levels perpixel
(3) LSVT Voice Rehabilitation dataset (httparchiveicsuciedumldatasetsLSVT+Voice+Rehabilitation)[32] it is composed of 126 speech signals from 14people with 309 features divided into two categories
(4) Multiple Features Digit dataset (httparchiveicsuciedumldatasetsMultiple+Features) it includes 2000digitized handwritten numerals 0ndash9 with 649 fea-tures
51 Experiments on LR-SVM In order to demonstrate therecognition performance of SVM improved by the presentedLR-SVM we carry out numerous experiments on the Yaleand ORL face database According to the different rate oftraining sample (20 30 40 50 60 70 and 80)we implement seven groups of experiments on each databaseTo ensure stable and reliable test each group has ten differentdivisions randomly and we average them as the final resultsThe kernel functions are 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) and120574 = 1119892 (119892 is the dimension of feature space)
The classification accuracy and run time of Yale databaseby using SVM and LR-SVM are shown in Figures 1 and 2respectively Similarly the classification accuracy and runtime of ORL database are shown in Figures 3 and 4The solidlines depict the result of SVM with different kernels whilethe patterned lines with the corresponding colour depictthat of LR-SVM As can be seen from the Figures 1 and3 the proposed LR-SVM method consistently achieves anobvious improvement in classification accuracy compared tothe original SVM method In most cases the classificationaccuracy increases with the rise in training sample rate It isshown that the more complete the training set the better theclassification accuracy But it is impossible for the trainingset to include so many samples in reality LR-MKL has ahigh accuracy even under the low training sample rate whichis suitable for the real applications Meanwhile Figures 2and 4 show that through LRR conversion the run timecan be reduced more than an order of magnitude which isreasonable for the real-time requirements of data processingin the big data era
52 Experiments on LR-MKL In this section we compare theperformance of the MKL algorithms involved in Section 3
6 Computational Intelligence and Neuroscience
45
50
55
60
65
70
75
80
85Ac
cura
cy (
)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 1 Classification accuracy of Yale by using SVM and LR-SVM
6040 50 70 803020Training sample rate ()
0
005
01
015
02
025
03
035
04
045
05
Run
time (
s)
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 2 Run time of Yale by using SVM and LR-SVM
and their corresponding LR-MKL algorithms The multik-ernel is composed of 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) Theproportion parameter vector of kernel is 120583 = [1205831 1205832 1205833 1205834]The comparative algorithms are listed below
(i) Unweighted MKL (UMKL) [16] and LR-UMKL (+)indicates sum form and (lowast) indicates product form
(ii) Alternative MKL (AMKL) [17] and LR-AMKL(iii) Generalized MKL (GMKL) [18] and LR-GMKL(iv) Localized MKL (LMKL) [19] and LR-LMKL (sof)
distribute 120583 into softmax mode and (sig) distribute120583 into sigmoid mode
30 40 50 60 70 8020Training sample rate ()
65
70
75
80
85
90
Accu
racy
()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 3 Classification accuracy of ORL by using SVM and LR-SVM
0
05
1
15
2
25
3
Run
time (
s)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 4 Run time of ORL by using SVM and LR-SVM
(v) Heuristic MKL (HMKL) [20] and LR-HMKL(vi) Centering MKL (CMKL) [21] and LR-CMKL(vii) Polynomial MKL (PMKL) [22] and LR-PMKL (1)
adopts the bounded set 1205931 with 1198971-norm and (2)adopts the bounded set 1205932 with 1198972-norm
(viii) Arbitrary Norm MKL (ANMKL) [23 24] and LR-ANMKL (1) iterates 120583 with 1198971-norm and (2) iterates120583 with 1198972-norm
(ix) Besides the highest accuracy among the four monok-ernel SVM selected as the reference item which isreferred to as SVM(best)
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
2 Computational Intelligence and Neuroscience
data mining [15] LRR as a compressed sensing approachaims to find the lowest-rank linear combination of all trainingsamples for reconstructing test samples under a global low-rank constraint When the training samples are sufficientlycomplete the process of representing data with low-rank willaugment the similarities among the intraclass samples and thedifferences among the interclass samples Meanwhile if thedata is corrupted since the rank of coefficient matrix will belargely increased the lowest-rank criterion can enforce noisecorrection LRR integrates data clustering and noise correc-tion into a unified framework which can greatly improvethe recognition accuracy and robustness in the preprocessingstage In this sense the recognition of SVM or MKL canbe increasingly accurate and of high speed when they arecombined with LRR
In this paper combining LRR and MKL we will developa novel recognition approach so as to construct a Low-RankMKL (LR-MKL) algorithm In the proposed algorithm thecombined Low-Rank SVM (LR-SVM) will simultaneously beutilized as the reference We will conduct extensive experi-ments on public databases to show that our proposed LR-MKL algorithm can achieve better performance than originalSVM and MKL
The remainder of the paper is organized as follows Westart by a brief review on SVM in next section In Section 3 wedescribe some existing MKL algorithms and their structureframes Section 4 is devoted to introducing efficient MKLalgorithms using LRR which we present and call LR-MKLExperiments which demonstrate the utility of the suggestedalgorithm on real data are presented in Section 5 Section 6gives the conclusions
2 Overview of SVM
Given input space X sube R119863 and label vector Y XY meetsindependent and identically distributed conditions so thetraining set can be denoted as x119894 119910119894119899119894=1 (contains 119899 samples)According to the theory of structural risk minimization[1] SVM can find the classification hyperplane with themaximummargin in the mapping space R119875 Hence the SVMtraining with 1198971-norm softmargin is a quadratic optimizationproblem
min 12 ⟨ww⟩ + 119862 119899sum119894=1
120585119894st 119910119894 (⟨w x119894⟩ + 119887) ge 1 minus 120585119894
w isin R119875120585119894 isin R119899+119894 = 1 119899
(1)
Herew is the weight coefficient vector119862 is the penalty factor120585119894 is the slack variable and 119887 is the bias term of classificationhyperplane The optimization problem can be transformedinto its dual form by introducing Lagrangianmultiplier120572119894 120572119895
and the data X can be implicitly mapped to the feature spaceby utilizing the kernel function119870 so formula (1) changes into
min 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870(x119894 x119895) minus 119899sum119894=1
120572119894
st119899sum119894=1
120572119894119910119894 = 0119862 ge 120572119894 ge 0119894 = 1 119899
(2)
Simplify the objective function of formula (2) into vectorform
min 12120572TQ120572 + 1T120572st yT120572 = 0
119862 ge 120572 ge 0(3)
where 120572 isin R119899 is the vector of Lagrangian multiplier y isin Y119899is the label vector 1 is a vector of 11015840s (119899 lowast 1-dimension)and Q119894119895 = 119910119894119910119895119870(x119894 x119895) If the solution of optimizationproblem is 120572lowast119894 119894 = 1 119899 the discriminant function canbe represented as
119891 (x) = 119899sum119894=1
120572lowast119894 119910119894119870(x119894 x) + 119887 (4)
The kernel functions commonly used in SVM are linearkernel polynomial kernel radial basis function kernel andsigmoid kernel respectively denoted as
119870LIN (x119894 x119895) = ⟨x119894 x119895⟩ 119870POL (x119894 x119895) = (120574 lowast ⟨x119894 x119895⟩ + 1)119902 119870RBF (x119894 x119895) = exp (minus120574 lowast 10038171003817100381710038171003817x119894 minus x119895
1003817100381710038171003817100381722) 119870SIG (x119894 x119895) = tanh (120574 lowast ⟨x119894 x119895⟩ + 1)
(5)
To obtain the high recognition accuracy in monokernelSVM we need to discern what kind of kernel distributioncharacteristics the test data will obey Nevertheless it isunpractical and wasteful of resources to try different distri-bution characteristics one by one In this sense we needMKLto allocate the kernel weights based on the data structureautomatically
3 Multiple Kernel Learning (MKL) Algorithms
To improve the universal applicability of SVM algorithmMKL is applied instead of one specific kernel function
119870120583 (x119894 x119895) = 119891120583 (119870119898 (x119894 x119895)119872119898=1 | 120583) (6)
Computational Intelligence and Neuroscience 3
where 119870119898 is the monokernel function The multiple kernel119870120583 can be obtained by function 119891120583 R119863 rarr R119875 combining119872 different119870119898 And 120583 is the proportion parameter of kernelThere are many different methods to assign kernel weights
Pavlidis et al [16] propose a simple combination modeusing an unweighted sum or product of heterogeneouskernelsThe combining function of this UnweightedMultipleKernel Learning (UMKL) method is
119870120583 (x119894 x119895) = 119872sum119898=1
119870119898 (x119894 x119895)
119870120583 (x119894 x119895) = 119872prod119898=1
119870119898 (x119894 x119895) (7)
In a follow-up study the distribution of 120583 in MKL becomesa vital limiting factor of availability Chapelle and Rako-tomamonjy [17] report that the optimization problem canbe solved by a project gradient method in two alternativesteps first solving a primal SVM with the given 120583 secondupdating 120583 through the gradient function with 120572 calculatedin the first step The kernel combining function objectivefunction and gradient function of this Alternative MultipleKernel Learning (AMKL) method are
119870120583 (x119894 x119895) = 119872sum119898=1
120583119898119870119898 (x119894 x119895)
119869 (120583) = 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895( 119872sum119898=1
120583119898119870119898 (x119894 x119895))
minus 119899sum119894=1
120572119894120597119869 (120583)120597120583119898 = 12
119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870120583 (x119894 x119895)120597120583119898= 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870119898 (x119894 x119895) forall119898
(8)
The Generalized Multiple Kernel Learning (GMKL) method[18] also employs the gradient tool to approach solution but itregards kernel weights as a regularization item 119903(120583) which istaken as (12)(120583minus1119872)T(120583minus1119872) So the objective functionand gradient function can be transformed into
119869 (120583) = 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870120583 (x119894 x119895) minus 119899sum119894=1
120572119894 minus 119903 (120583) 120597119869 (120583)120597120583119898 = 12
119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895 120597119870120583 (x119894 x119895)120597120583119898 minus 120597119903 (120583)120597120583119898 forall119898
(9)
And the kernel combined function is
119870120583 (x119894 x119895) = 119872prod119898=1
exp (minus120583119898 (x119898119894 minus x119898119895 )2)
= exp( 119872sum119898=1
minus120583119898 (x119898119894 minus x119898119895 )2) (10)
There is another two-step alternate method using a gatingmodel called Localized Multiple Kernel Learning (LMKL)method [19] The formula of locally combined kernel isrepresented as
119870120583 (x119894 x119895) = 119872sum119898=1
120583119898 (x119894) ⟨Φ119898 (x119894) Φ119898 (x119895)⟩ 120583119898 (x119895) (11)
where Φ(x) is the mapping space of feature space To ensurenonnegativity kernels can be composed in competitive orcooperative mode by using softmax form and sigmoid form[25] respectively
softmax 120583119898 = exp (⟨k119898 x⟩ + V1198980)sum119872ℎ=1 exp (⟨kℎ x⟩ + Vℎ0) forall119898sigmoid 120583119898 = 1
exp (minus ⟨k119898 x⟩ + V1198980) forall119898(12)
where V = k119898 V1198980119872119898=1 denotes the parameter of gatingmodel On the other hand Qiu and Lane [20] quantify thefitness between kernel and accuracy in a Heuristic MultipleKernel Learning (HMKL) way by exploiting the relationshipbetween kernel matrixK and sample label y The relationshipcan be expressed by kernel alignment
119865 (K yyT) = ⟨K yyT⟩119865
radic⟨KK⟩119865 ⟨yyT yyT⟩119865= ⟨K yyT⟩
119865119899radic⟨KK⟩119865 (13)
where ⟨K yyT⟩119865 = sum119899119894=1sum119899119895=1119870(x119894 x119895)119910119894119895119910T119894119895 ⟨sdot sdot⟩119865 is the
Frobenius inner product Using kernel alignment weighs theproportion of multikernels
120583119898 = 119865 (119870119898 yyT)sum119872ℎ=1 119865 (119870ℎ yyT) forall119898 (14)
Then the concentration bound is added in kernel alignmentby Cortes et al [21] to form centering kernel
[119870119888]119894119895 = 119870119894119895 minus 1119899119899sum119894=1
119870119894119895 minus 1119899119899sum119895=1
119870119894119895 + 11198992119899sum119894119895=1
119870119894119895 (15)
Accordingly the multikernel weights of this Centering Mul-tiple Kernel Learning (CMKL) method are
120583 = Cminus1a1003817100381710038171003817Cminus1a10038171003817100381710038172 (16)
where C = ⟨119870119888119898 119870119888ℎ⟩119865119872119898ℎ=1 and a = ⟨119870119888119898 yyT⟩119865119872119898=1
4 Computational Intelligence and Neuroscience
Later Cortes et al [22] studied a Polynomial MultipleKernel Learning (PMKL) method which utilized the poly-nomial combination of the base kernels with higher degree(119889 ge 1) based on the Kernel Ridge Regression (KRR) theory
119870120583 (x119894 x119895) = sum12058311989611198962 sdotsdotsdot1198961198721198701 (x1119894 x1119895)1198961sdot 1198702 (x2119894 x2119895)1198962 sdot sdot sdot 119870119872 (x119872119894 x119872119895 )119896119872
119896119898 ge 0 sum (1198961 + 1198962 + sdot sdot sdot + 119896119872) le 119889 12058311989611198962 sdotsdotsdot119896119872 ge 0(17)
However the computing complex of coefficients 12058311989611198962 sdotsdotsdot119896119872 is119874(119872119889) which is too large to apply in practice So12058311989611198962 sdotsdotsdot119896119872 canbe simplified as a product form by nonnegative coefficients12058311989611 12058311989622 sdot sdot sdot 120583119896119872119872 and the special case (119889 = 2) can be expressedas
119870120583 (x119894 x119895) = 119872sum119898=1
119872sumℎ=1
120583119898120583ℎ119870119898 (x119898119894 x119898119895 )119870ℎ (xℎ119894 xℎ119895) (18)
Here the related optimization of learning 119870120583 can be formu-lated as the following min-max form
min120583isin120593
max120572isinR119899
minus120572T (119870120583 + 120582I) 120572 + 2120572Ty (19)
where 120593 is a positive bounded and convex set Two boundedsets 1198971-norm and 1198972-norm are the appropriate choices toconstruct 120593
1205931 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038171 le and 1205932 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038172 le and (20)
Here 1205830 and and are model parameters and 1205830 is generallyequal to 0 or 12058301205830 = 1
Other than approaches described above inspired bythe consistency between group Lasso and MKL [26] Xuet al [23] and Kloft et al [24] propose an MKL iterativemethod in a generalized 119897119901-norm (119901 ge 1) form They arecollectively called Arbitrary Norms Multiple Kernel Learn-ing (ANMKL) method On the basis of duality conditionw11989822 = 1205832119898sum119899119894=1sum119899119895=1 119910119894119910119895120572119894120572119895119870119898(x119898119894 x119898119895 ) the updatedformula of kernels weight is
120583119898 =1003817100381710038171003817w11989810038171003817100381710038172(119901+1)2
(sum119872ℎ=1 1003817100381710038171003817wℎ10038171003817100381710038172119901(119901+1)2 )1119901 forall119898 (21)
It can be seen from the formulas in this section that theoperation complexity of MKL is mainly decided by x119894 Sotrying to simplify the feature space is an efficient way toimprove the performance of MKLThrough the optimizationof basis vectors LRR can reduce dimension while retainingthe data features which is ideal for improving MKL
4 MKL Using Low-Rank Representation
41 Low-Rank Representation (LRR) The theoreticaladvances on LRR enable us to use latent low-rank structure
in data [27 28] And it simultaneously obtains therepresentation of all samples under a global low-rankconstraint Meantime the LRR procedure can operate in arelatively short time with guaranteed performance
Let the input samples space X be represented by a linearcombination in the dictionary A
X = AZ (22)
where Z = [z1 z2 z119899] is the coefficient matrix and each z119894is a representation coefficient vector of x119894 When the samplesare sufficient X serves as the dictionary A By consideringthe noise or tainted data in practical application LRR aims atapproximatingX intoAZ+E by themeans of minimizing therank of matrix A while reducing the 1198970-norm of E in whichA is a low-rank matrix and E is the associated sparse error Itcan be generally formulated as
minZE
rank (A) + 120582 E0 st X = AZ + E (23)
Here 120582 is used to balance the effect of low-rank and errorterm 1198970-norm as NP-hard problem can be substituted for1198971-norm or 11989721-norm We choose 11989721-norm as the errorterm measurement here which is defined as E21 =sum119899119895=1radicsum119899119895=1([E]119894119895)2 Meantime rank(A) can relax intonuclear-norm sdot lowast [29] Consequently the convex relaxationof formula (23) is
minZE
Zlowast + 120582 E21 st X = AZ + E (24)
The optimal solution Zlowast can be obtained via the AugmentedLagrange Multipliers (ALM) method [11]
42 Efficient SVM and MKL Using LRR Kernel matrixremarkably impacts the computational efficiency and accu-racy of SVM andMKL How to find an appropriate variant ofkernel matrix that contains both the initial label and the datageometry structure for recognition is a crucial task Since LRRhas been theoretically proved to be superior in the sequelwe adopt LRR to transform the kernel for augmenting thesimilarities among the intraclass samples and the differencesamong the interclass samples Moreover a representationof all samples under a global low-rank constraint can beattained which is more conducive to capturing the globaldata structure [30] So LR-SVM and LR-MKL are twoalternative techniques that we propose to use to improve theperformance of SVM and MKL
Firstly based on the LRR theory we improve the monok-ernel SVM as the reference item from which the improve-ment brought by LRR can be displayed visually The specificprocedure of efficient LR-SVM is presented in Algorithm 1
Algorithm 1 (efficient SVM using LRR (LR-SVM))
Input This includes the whole training set XY the featurespace of testing setX119878 = [x119899+1 x119899+2 x119899+119904] the parameters119905 120574 119862 119902 of SVM and the parameter 120582 of LRR
Computational Intelligence and Neuroscience 5
Step 1 Normalize XX119878Step 2 Perform (24) procedure on the normalized XX119878 toproject them on the coefficient feature space ZZ119878 respec-tively
Step 3 Plug Z and the label vector Y into SVM for trainingclassification model
Step 4 Utilize the obtained classificationmodel to classify thecoefficient feature Z119878 of testing set X119878 and the discriminantfunction is 119891(z) = sum119899119894=1 120572lowast119894 119910119894119870(z119894 z) + 119887Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition results
It is well known that SVM suffers from instability forthe various data structures Thus MKL recognition becomesthe development trend Next we combine LRR and MKLalgorithms mentioned in the Section 3 and change binary-classification model into multiclassification model by pair-wise (one-versus-one) strategy through which a classifierbetween any two categories of samples (119896 is the number ofcategories) can be designed Then we adopt voting methodand assign sample to the category the most votes obtainedAll the combined algorithms can be summarized into a framewhich is given in Algorithm 2 and we refer to it collectively asLR-MKL
Algorithm 2 (efficient MKL using LRR (LR-MKL))
Input This includes the whole training set XY the featurespace of testing set X119878 = [x119899+1 x119899+2 x119899+119904] and theparameter 120582 of LRR
Step 1simStep 2 They are the same as the LR-SVM algorithm
Step 3 Plug Z and the label vector Y into MKL to train 119896(119896 minus1)2 classifiers with the pairwise strategy
Step 4 Utilize each one of the binary MKL classifiers toclassify the coefficient feature Z119878 of testing set X119878
Step 5 According to the prediction label vectorsY1198751 Y1198752 Y119875119896(119896minus1)2 vote for the category of each sample toget the multilabels Y119875
Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition resultsand the kernel weight vector 1205835 Experiments and Analysis
In this section we conduct extensive experiments to examinethe efficiency of proposed LR-SVM and LR-MKL algorithmsThe operating environment is based on MATLAB (R2013a)under the Intel Core i5 CPU processor 253GHz frequencyparameters The SVM toolbox used in this paper is theLIBSVM [31] which can be easily applied and is shown to befast in large scale databases
The simulations are performed on diverse datasets toensure the universal recognition effectThe test datasets range
over the frequently used face databases and the standard testdata of UCI repository In the simulations all the samples arenormalized first
(1) Yale face database (httpvisionucsdeducontentyale-face-database) it contains 165 grayscale imagesof 15 individuals with different facial expression orconfiguration and each image is resized to 64 lowast 64pixels with 256 grey levels
(2) ORL face database (httpwwwclcamacukresearchdtgattarchivefacedatabasehtm) it contains 400images of 40 distinct subjects taken at different timesvarying light facial expressions and details Weresize them to 64 lowast 64 pixels with 256 grey levels perpixel
(3) LSVT Voice Rehabilitation dataset (httparchiveicsuciedumldatasetsLSVT+Voice+Rehabilitation)[32] it is composed of 126 speech signals from 14people with 309 features divided into two categories
(4) Multiple Features Digit dataset (httparchiveicsuciedumldatasetsMultiple+Features) it includes 2000digitized handwritten numerals 0ndash9 with 649 fea-tures
51 Experiments on LR-SVM In order to demonstrate therecognition performance of SVM improved by the presentedLR-SVM we carry out numerous experiments on the Yaleand ORL face database According to the different rate oftraining sample (20 30 40 50 60 70 and 80)we implement seven groups of experiments on each databaseTo ensure stable and reliable test each group has ten differentdivisions randomly and we average them as the final resultsThe kernel functions are 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) and120574 = 1119892 (119892 is the dimension of feature space)
The classification accuracy and run time of Yale databaseby using SVM and LR-SVM are shown in Figures 1 and 2respectively Similarly the classification accuracy and runtime of ORL database are shown in Figures 3 and 4The solidlines depict the result of SVM with different kernels whilethe patterned lines with the corresponding colour depictthat of LR-SVM As can be seen from the Figures 1 and3 the proposed LR-SVM method consistently achieves anobvious improvement in classification accuracy compared tothe original SVM method In most cases the classificationaccuracy increases with the rise in training sample rate It isshown that the more complete the training set the better theclassification accuracy But it is impossible for the trainingset to include so many samples in reality LR-MKL has ahigh accuracy even under the low training sample rate whichis suitable for the real applications Meanwhile Figures 2and 4 show that through LRR conversion the run timecan be reduced more than an order of magnitude which isreasonable for the real-time requirements of data processingin the big data era
52 Experiments on LR-MKL In this section we compare theperformance of the MKL algorithms involved in Section 3
6 Computational Intelligence and Neuroscience
45
50
55
60
65
70
75
80
85Ac
cura
cy (
)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 1 Classification accuracy of Yale by using SVM and LR-SVM
6040 50 70 803020Training sample rate ()
0
005
01
015
02
025
03
035
04
045
05
Run
time (
s)
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 2 Run time of Yale by using SVM and LR-SVM
and their corresponding LR-MKL algorithms The multik-ernel is composed of 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) Theproportion parameter vector of kernel is 120583 = [1205831 1205832 1205833 1205834]The comparative algorithms are listed below
(i) Unweighted MKL (UMKL) [16] and LR-UMKL (+)indicates sum form and (lowast) indicates product form
(ii) Alternative MKL (AMKL) [17] and LR-AMKL(iii) Generalized MKL (GMKL) [18] and LR-GMKL(iv) Localized MKL (LMKL) [19] and LR-LMKL (sof)
distribute 120583 into softmax mode and (sig) distribute120583 into sigmoid mode
30 40 50 60 70 8020Training sample rate ()
65
70
75
80
85
90
Accu
racy
()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 3 Classification accuracy of ORL by using SVM and LR-SVM
0
05
1
15
2
25
3
Run
time (
s)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 4 Run time of ORL by using SVM and LR-SVM
(v) Heuristic MKL (HMKL) [20] and LR-HMKL(vi) Centering MKL (CMKL) [21] and LR-CMKL(vii) Polynomial MKL (PMKL) [22] and LR-PMKL (1)
adopts the bounded set 1205931 with 1198971-norm and (2)adopts the bounded set 1205932 with 1198972-norm
(viii) Arbitrary Norm MKL (ANMKL) [23 24] and LR-ANMKL (1) iterates 120583 with 1198971-norm and (2) iterates120583 with 1198972-norm
(ix) Besides the highest accuracy among the four monok-ernel SVM selected as the reference item which isreferred to as SVM(best)
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 3
where 119870119898 is the monokernel function The multiple kernel119870120583 can be obtained by function 119891120583 R119863 rarr R119875 combining119872 different119870119898 And 120583 is the proportion parameter of kernelThere are many different methods to assign kernel weights
Pavlidis et al [16] propose a simple combination modeusing an unweighted sum or product of heterogeneouskernelsThe combining function of this UnweightedMultipleKernel Learning (UMKL) method is
119870120583 (x119894 x119895) = 119872sum119898=1
119870119898 (x119894 x119895)
119870120583 (x119894 x119895) = 119872prod119898=1
119870119898 (x119894 x119895) (7)
In a follow-up study the distribution of 120583 in MKL becomesa vital limiting factor of availability Chapelle and Rako-tomamonjy [17] report that the optimization problem canbe solved by a project gradient method in two alternativesteps first solving a primal SVM with the given 120583 secondupdating 120583 through the gradient function with 120572 calculatedin the first step The kernel combining function objectivefunction and gradient function of this Alternative MultipleKernel Learning (AMKL) method are
119870120583 (x119894 x119895) = 119872sum119898=1
120583119898119870119898 (x119894 x119895)
119869 (120583) = 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895( 119872sum119898=1
120583119898119870119898 (x119894 x119895))
minus 119899sum119894=1
120572119894120597119869 (120583)120597120583119898 = 12
119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870120583 (x119894 x119895)120597120583119898= 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870119898 (x119894 x119895) forall119898
(8)
The Generalized Multiple Kernel Learning (GMKL) method[18] also employs the gradient tool to approach solution but itregards kernel weights as a regularization item 119903(120583) which istaken as (12)(120583minus1119872)T(120583minus1119872) So the objective functionand gradient function can be transformed into
119869 (120583) = 12119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895119870120583 (x119894 x119895) minus 119899sum119894=1
120572119894 minus 119903 (120583) 120597119869 (120583)120597120583119898 = 12
119899sum119894=1
119899sum119895=1
119910119894119910119895120572119894120572119895 120597119870120583 (x119894 x119895)120597120583119898 minus 120597119903 (120583)120597120583119898 forall119898
(9)
And the kernel combined function is
119870120583 (x119894 x119895) = 119872prod119898=1
exp (minus120583119898 (x119898119894 minus x119898119895 )2)
= exp( 119872sum119898=1
minus120583119898 (x119898119894 minus x119898119895 )2) (10)
There is another two-step alternate method using a gatingmodel called Localized Multiple Kernel Learning (LMKL)method [19] The formula of locally combined kernel isrepresented as
119870120583 (x119894 x119895) = 119872sum119898=1
120583119898 (x119894) ⟨Φ119898 (x119894) Φ119898 (x119895)⟩ 120583119898 (x119895) (11)
where Φ(x) is the mapping space of feature space To ensurenonnegativity kernels can be composed in competitive orcooperative mode by using softmax form and sigmoid form[25] respectively
softmax 120583119898 = exp (⟨k119898 x⟩ + V1198980)sum119872ℎ=1 exp (⟨kℎ x⟩ + Vℎ0) forall119898sigmoid 120583119898 = 1
exp (minus ⟨k119898 x⟩ + V1198980) forall119898(12)
where V = k119898 V1198980119872119898=1 denotes the parameter of gatingmodel On the other hand Qiu and Lane [20] quantify thefitness between kernel and accuracy in a Heuristic MultipleKernel Learning (HMKL) way by exploiting the relationshipbetween kernel matrixK and sample label y The relationshipcan be expressed by kernel alignment
119865 (K yyT) = ⟨K yyT⟩119865
radic⟨KK⟩119865 ⟨yyT yyT⟩119865= ⟨K yyT⟩
119865119899radic⟨KK⟩119865 (13)
where ⟨K yyT⟩119865 = sum119899119894=1sum119899119895=1119870(x119894 x119895)119910119894119895119910T119894119895 ⟨sdot sdot⟩119865 is the
Frobenius inner product Using kernel alignment weighs theproportion of multikernels
120583119898 = 119865 (119870119898 yyT)sum119872ℎ=1 119865 (119870ℎ yyT) forall119898 (14)
Then the concentration bound is added in kernel alignmentby Cortes et al [21] to form centering kernel
[119870119888]119894119895 = 119870119894119895 minus 1119899119899sum119894=1
119870119894119895 minus 1119899119899sum119895=1
119870119894119895 + 11198992119899sum119894119895=1
119870119894119895 (15)
Accordingly the multikernel weights of this Centering Mul-tiple Kernel Learning (CMKL) method are
120583 = Cminus1a1003817100381710038171003817Cminus1a10038171003817100381710038172 (16)
where C = ⟨119870119888119898 119870119888ℎ⟩119865119872119898ℎ=1 and a = ⟨119870119888119898 yyT⟩119865119872119898=1
4 Computational Intelligence and Neuroscience
Later Cortes et al [22] studied a Polynomial MultipleKernel Learning (PMKL) method which utilized the poly-nomial combination of the base kernels with higher degree(119889 ge 1) based on the Kernel Ridge Regression (KRR) theory
119870120583 (x119894 x119895) = sum12058311989611198962 sdotsdotsdot1198961198721198701 (x1119894 x1119895)1198961sdot 1198702 (x2119894 x2119895)1198962 sdot sdot sdot 119870119872 (x119872119894 x119872119895 )119896119872
119896119898 ge 0 sum (1198961 + 1198962 + sdot sdot sdot + 119896119872) le 119889 12058311989611198962 sdotsdotsdot119896119872 ge 0(17)
However the computing complex of coefficients 12058311989611198962 sdotsdotsdot119896119872 is119874(119872119889) which is too large to apply in practice So12058311989611198962 sdotsdotsdot119896119872 canbe simplified as a product form by nonnegative coefficients12058311989611 12058311989622 sdot sdot sdot 120583119896119872119872 and the special case (119889 = 2) can be expressedas
119870120583 (x119894 x119895) = 119872sum119898=1
119872sumℎ=1
120583119898120583ℎ119870119898 (x119898119894 x119898119895 )119870ℎ (xℎ119894 xℎ119895) (18)
Here the related optimization of learning 119870120583 can be formu-lated as the following min-max form
min120583isin120593
max120572isinR119899
minus120572T (119870120583 + 120582I) 120572 + 2120572Ty (19)
where 120593 is a positive bounded and convex set Two boundedsets 1198971-norm and 1198972-norm are the appropriate choices toconstruct 120593
1205931 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038171 le and 1205932 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038172 le and (20)
Here 1205830 and and are model parameters and 1205830 is generallyequal to 0 or 12058301205830 = 1
Other than approaches described above inspired bythe consistency between group Lasso and MKL [26] Xuet al [23] and Kloft et al [24] propose an MKL iterativemethod in a generalized 119897119901-norm (119901 ge 1) form They arecollectively called Arbitrary Norms Multiple Kernel Learn-ing (ANMKL) method On the basis of duality conditionw11989822 = 1205832119898sum119899119894=1sum119899119895=1 119910119894119910119895120572119894120572119895119870119898(x119898119894 x119898119895 ) the updatedformula of kernels weight is
120583119898 =1003817100381710038171003817w11989810038171003817100381710038172(119901+1)2
(sum119872ℎ=1 1003817100381710038171003817wℎ10038171003817100381710038172119901(119901+1)2 )1119901 forall119898 (21)
It can be seen from the formulas in this section that theoperation complexity of MKL is mainly decided by x119894 Sotrying to simplify the feature space is an efficient way toimprove the performance of MKLThrough the optimizationof basis vectors LRR can reduce dimension while retainingthe data features which is ideal for improving MKL
4 MKL Using Low-Rank Representation
41 Low-Rank Representation (LRR) The theoreticaladvances on LRR enable us to use latent low-rank structure
in data [27 28] And it simultaneously obtains therepresentation of all samples under a global low-rankconstraint Meantime the LRR procedure can operate in arelatively short time with guaranteed performance
Let the input samples space X be represented by a linearcombination in the dictionary A
X = AZ (22)
where Z = [z1 z2 z119899] is the coefficient matrix and each z119894is a representation coefficient vector of x119894 When the samplesare sufficient X serves as the dictionary A By consideringthe noise or tainted data in practical application LRR aims atapproximatingX intoAZ+E by themeans of minimizing therank of matrix A while reducing the 1198970-norm of E in whichA is a low-rank matrix and E is the associated sparse error Itcan be generally formulated as
minZE
rank (A) + 120582 E0 st X = AZ + E (23)
Here 120582 is used to balance the effect of low-rank and errorterm 1198970-norm as NP-hard problem can be substituted for1198971-norm or 11989721-norm We choose 11989721-norm as the errorterm measurement here which is defined as E21 =sum119899119895=1radicsum119899119895=1([E]119894119895)2 Meantime rank(A) can relax intonuclear-norm sdot lowast [29] Consequently the convex relaxationof formula (23) is
minZE
Zlowast + 120582 E21 st X = AZ + E (24)
The optimal solution Zlowast can be obtained via the AugmentedLagrange Multipliers (ALM) method [11]
42 Efficient SVM and MKL Using LRR Kernel matrixremarkably impacts the computational efficiency and accu-racy of SVM andMKL How to find an appropriate variant ofkernel matrix that contains both the initial label and the datageometry structure for recognition is a crucial task Since LRRhas been theoretically proved to be superior in the sequelwe adopt LRR to transform the kernel for augmenting thesimilarities among the intraclass samples and the differencesamong the interclass samples Moreover a representationof all samples under a global low-rank constraint can beattained which is more conducive to capturing the globaldata structure [30] So LR-SVM and LR-MKL are twoalternative techniques that we propose to use to improve theperformance of SVM and MKL
Firstly based on the LRR theory we improve the monok-ernel SVM as the reference item from which the improve-ment brought by LRR can be displayed visually The specificprocedure of efficient LR-SVM is presented in Algorithm 1
Algorithm 1 (efficient SVM using LRR (LR-SVM))
Input This includes the whole training set XY the featurespace of testing setX119878 = [x119899+1 x119899+2 x119899+119904] the parameters119905 120574 119862 119902 of SVM and the parameter 120582 of LRR
Computational Intelligence and Neuroscience 5
Step 1 Normalize XX119878Step 2 Perform (24) procedure on the normalized XX119878 toproject them on the coefficient feature space ZZ119878 respec-tively
Step 3 Plug Z and the label vector Y into SVM for trainingclassification model
Step 4 Utilize the obtained classificationmodel to classify thecoefficient feature Z119878 of testing set X119878 and the discriminantfunction is 119891(z) = sum119899119894=1 120572lowast119894 119910119894119870(z119894 z) + 119887Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition results
It is well known that SVM suffers from instability forthe various data structures Thus MKL recognition becomesthe development trend Next we combine LRR and MKLalgorithms mentioned in the Section 3 and change binary-classification model into multiclassification model by pair-wise (one-versus-one) strategy through which a classifierbetween any two categories of samples (119896 is the number ofcategories) can be designed Then we adopt voting methodand assign sample to the category the most votes obtainedAll the combined algorithms can be summarized into a framewhich is given in Algorithm 2 and we refer to it collectively asLR-MKL
Algorithm 2 (efficient MKL using LRR (LR-MKL))
Input This includes the whole training set XY the featurespace of testing set X119878 = [x119899+1 x119899+2 x119899+119904] and theparameter 120582 of LRR
Step 1simStep 2 They are the same as the LR-SVM algorithm
Step 3 Plug Z and the label vector Y into MKL to train 119896(119896 minus1)2 classifiers with the pairwise strategy
Step 4 Utilize each one of the binary MKL classifiers toclassify the coefficient feature Z119878 of testing set X119878
Step 5 According to the prediction label vectorsY1198751 Y1198752 Y119875119896(119896minus1)2 vote for the category of each sample toget the multilabels Y119875
Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition resultsand the kernel weight vector 1205835 Experiments and Analysis
In this section we conduct extensive experiments to examinethe efficiency of proposed LR-SVM and LR-MKL algorithmsThe operating environment is based on MATLAB (R2013a)under the Intel Core i5 CPU processor 253GHz frequencyparameters The SVM toolbox used in this paper is theLIBSVM [31] which can be easily applied and is shown to befast in large scale databases
The simulations are performed on diverse datasets toensure the universal recognition effectThe test datasets range
over the frequently used face databases and the standard testdata of UCI repository In the simulations all the samples arenormalized first
(1) Yale face database (httpvisionucsdeducontentyale-face-database) it contains 165 grayscale imagesof 15 individuals with different facial expression orconfiguration and each image is resized to 64 lowast 64pixels with 256 grey levels
(2) ORL face database (httpwwwclcamacukresearchdtgattarchivefacedatabasehtm) it contains 400images of 40 distinct subjects taken at different timesvarying light facial expressions and details Weresize them to 64 lowast 64 pixels with 256 grey levels perpixel
(3) LSVT Voice Rehabilitation dataset (httparchiveicsuciedumldatasetsLSVT+Voice+Rehabilitation)[32] it is composed of 126 speech signals from 14people with 309 features divided into two categories
(4) Multiple Features Digit dataset (httparchiveicsuciedumldatasetsMultiple+Features) it includes 2000digitized handwritten numerals 0ndash9 with 649 fea-tures
51 Experiments on LR-SVM In order to demonstrate therecognition performance of SVM improved by the presentedLR-SVM we carry out numerous experiments on the Yaleand ORL face database According to the different rate oftraining sample (20 30 40 50 60 70 and 80)we implement seven groups of experiments on each databaseTo ensure stable and reliable test each group has ten differentdivisions randomly and we average them as the final resultsThe kernel functions are 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) and120574 = 1119892 (119892 is the dimension of feature space)
The classification accuracy and run time of Yale databaseby using SVM and LR-SVM are shown in Figures 1 and 2respectively Similarly the classification accuracy and runtime of ORL database are shown in Figures 3 and 4The solidlines depict the result of SVM with different kernels whilethe patterned lines with the corresponding colour depictthat of LR-SVM As can be seen from the Figures 1 and3 the proposed LR-SVM method consistently achieves anobvious improvement in classification accuracy compared tothe original SVM method In most cases the classificationaccuracy increases with the rise in training sample rate It isshown that the more complete the training set the better theclassification accuracy But it is impossible for the trainingset to include so many samples in reality LR-MKL has ahigh accuracy even under the low training sample rate whichis suitable for the real applications Meanwhile Figures 2and 4 show that through LRR conversion the run timecan be reduced more than an order of magnitude which isreasonable for the real-time requirements of data processingin the big data era
52 Experiments on LR-MKL In this section we compare theperformance of the MKL algorithms involved in Section 3
6 Computational Intelligence and Neuroscience
45
50
55
60
65
70
75
80
85Ac
cura
cy (
)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 1 Classification accuracy of Yale by using SVM and LR-SVM
6040 50 70 803020Training sample rate ()
0
005
01
015
02
025
03
035
04
045
05
Run
time (
s)
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 2 Run time of Yale by using SVM and LR-SVM
and their corresponding LR-MKL algorithms The multik-ernel is composed of 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) Theproportion parameter vector of kernel is 120583 = [1205831 1205832 1205833 1205834]The comparative algorithms are listed below
(i) Unweighted MKL (UMKL) [16] and LR-UMKL (+)indicates sum form and (lowast) indicates product form
(ii) Alternative MKL (AMKL) [17] and LR-AMKL(iii) Generalized MKL (GMKL) [18] and LR-GMKL(iv) Localized MKL (LMKL) [19] and LR-LMKL (sof)
distribute 120583 into softmax mode and (sig) distribute120583 into sigmoid mode
30 40 50 60 70 8020Training sample rate ()
65
70
75
80
85
90
Accu
racy
()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 3 Classification accuracy of ORL by using SVM and LR-SVM
0
05
1
15
2
25
3
Run
time (
s)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 4 Run time of ORL by using SVM and LR-SVM
(v) Heuristic MKL (HMKL) [20] and LR-HMKL(vi) Centering MKL (CMKL) [21] and LR-CMKL(vii) Polynomial MKL (PMKL) [22] and LR-PMKL (1)
adopts the bounded set 1205931 with 1198971-norm and (2)adopts the bounded set 1205932 with 1198972-norm
(viii) Arbitrary Norm MKL (ANMKL) [23 24] and LR-ANMKL (1) iterates 120583 with 1198971-norm and (2) iterates120583 with 1198972-norm
(ix) Besides the highest accuracy among the four monok-ernel SVM selected as the reference item which isreferred to as SVM(best)
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
4 Computational Intelligence and Neuroscience
Later Cortes et al [22] studied a Polynomial MultipleKernel Learning (PMKL) method which utilized the poly-nomial combination of the base kernels with higher degree(119889 ge 1) based on the Kernel Ridge Regression (KRR) theory
119870120583 (x119894 x119895) = sum12058311989611198962 sdotsdotsdot1198961198721198701 (x1119894 x1119895)1198961sdot 1198702 (x2119894 x2119895)1198962 sdot sdot sdot 119870119872 (x119872119894 x119872119895 )119896119872
119896119898 ge 0 sum (1198961 + 1198962 + sdot sdot sdot + 119896119872) le 119889 12058311989611198962 sdotsdotsdot119896119872 ge 0(17)
However the computing complex of coefficients 12058311989611198962 sdotsdotsdot119896119872 is119874(119872119889) which is too large to apply in practice So12058311989611198962 sdotsdotsdot119896119872 canbe simplified as a product form by nonnegative coefficients12058311989611 12058311989622 sdot sdot sdot 120583119896119872119872 and the special case (119889 = 2) can be expressedas
119870120583 (x119894 x119895) = 119872sum119898=1
119872sumℎ=1
120583119898120583ℎ119870119898 (x119898119894 x119898119895 )119870ℎ (xℎ119894 xℎ119895) (18)
Here the related optimization of learning 119870120583 can be formu-lated as the following min-max form
min120583isin120593
max120572isinR119899
minus120572T (119870120583 + 120582I) 120572 + 2120572Ty (19)
where 120593 is a positive bounded and convex set Two boundedsets 1198971-norm and 1198972-norm are the appropriate choices toconstruct 120593
1205931 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038171 le and 1205932 = 120583 | 120583 ge 0 1003817100381710038171003817120583 minus 120583010038171003817100381710038172 le and (20)
Here 1205830 and and are model parameters and 1205830 is generallyequal to 0 or 12058301205830 = 1
Other than approaches described above inspired bythe consistency between group Lasso and MKL [26] Xuet al [23] and Kloft et al [24] propose an MKL iterativemethod in a generalized 119897119901-norm (119901 ge 1) form They arecollectively called Arbitrary Norms Multiple Kernel Learn-ing (ANMKL) method On the basis of duality conditionw11989822 = 1205832119898sum119899119894=1sum119899119895=1 119910119894119910119895120572119894120572119895119870119898(x119898119894 x119898119895 ) the updatedformula of kernels weight is
120583119898 =1003817100381710038171003817w11989810038171003817100381710038172(119901+1)2
(sum119872ℎ=1 1003817100381710038171003817wℎ10038171003817100381710038172119901(119901+1)2 )1119901 forall119898 (21)
It can be seen from the formulas in this section that theoperation complexity of MKL is mainly decided by x119894 Sotrying to simplify the feature space is an efficient way toimprove the performance of MKLThrough the optimizationof basis vectors LRR can reduce dimension while retainingthe data features which is ideal for improving MKL
4 MKL Using Low-Rank Representation
41 Low-Rank Representation (LRR) The theoreticaladvances on LRR enable us to use latent low-rank structure
in data [27 28] And it simultaneously obtains therepresentation of all samples under a global low-rankconstraint Meantime the LRR procedure can operate in arelatively short time with guaranteed performance
Let the input samples space X be represented by a linearcombination in the dictionary A
X = AZ (22)
where Z = [z1 z2 z119899] is the coefficient matrix and each z119894is a representation coefficient vector of x119894 When the samplesare sufficient X serves as the dictionary A By consideringthe noise or tainted data in practical application LRR aims atapproximatingX intoAZ+E by themeans of minimizing therank of matrix A while reducing the 1198970-norm of E in whichA is a low-rank matrix and E is the associated sparse error Itcan be generally formulated as
minZE
rank (A) + 120582 E0 st X = AZ + E (23)
Here 120582 is used to balance the effect of low-rank and errorterm 1198970-norm as NP-hard problem can be substituted for1198971-norm or 11989721-norm We choose 11989721-norm as the errorterm measurement here which is defined as E21 =sum119899119895=1radicsum119899119895=1([E]119894119895)2 Meantime rank(A) can relax intonuclear-norm sdot lowast [29] Consequently the convex relaxationof formula (23) is
minZE
Zlowast + 120582 E21 st X = AZ + E (24)
The optimal solution Zlowast can be obtained via the AugmentedLagrange Multipliers (ALM) method [11]
42 Efficient SVM and MKL Using LRR Kernel matrixremarkably impacts the computational efficiency and accu-racy of SVM andMKL How to find an appropriate variant ofkernel matrix that contains both the initial label and the datageometry structure for recognition is a crucial task Since LRRhas been theoretically proved to be superior in the sequelwe adopt LRR to transform the kernel for augmenting thesimilarities among the intraclass samples and the differencesamong the interclass samples Moreover a representationof all samples under a global low-rank constraint can beattained which is more conducive to capturing the globaldata structure [30] So LR-SVM and LR-MKL are twoalternative techniques that we propose to use to improve theperformance of SVM and MKL
Firstly based on the LRR theory we improve the monok-ernel SVM as the reference item from which the improve-ment brought by LRR can be displayed visually The specificprocedure of efficient LR-SVM is presented in Algorithm 1
Algorithm 1 (efficient SVM using LRR (LR-SVM))
Input This includes the whole training set XY the featurespace of testing setX119878 = [x119899+1 x119899+2 x119899+119904] the parameters119905 120574 119862 119902 of SVM and the parameter 120582 of LRR
Computational Intelligence and Neuroscience 5
Step 1 Normalize XX119878Step 2 Perform (24) procedure on the normalized XX119878 toproject them on the coefficient feature space ZZ119878 respec-tively
Step 3 Plug Z and the label vector Y into SVM for trainingclassification model
Step 4 Utilize the obtained classificationmodel to classify thecoefficient feature Z119878 of testing set X119878 and the discriminantfunction is 119891(z) = sum119899119894=1 120572lowast119894 119910119894119870(z119894 z) + 119887Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition results
It is well known that SVM suffers from instability forthe various data structures Thus MKL recognition becomesthe development trend Next we combine LRR and MKLalgorithms mentioned in the Section 3 and change binary-classification model into multiclassification model by pair-wise (one-versus-one) strategy through which a classifierbetween any two categories of samples (119896 is the number ofcategories) can be designed Then we adopt voting methodand assign sample to the category the most votes obtainedAll the combined algorithms can be summarized into a framewhich is given in Algorithm 2 and we refer to it collectively asLR-MKL
Algorithm 2 (efficient MKL using LRR (LR-MKL))
Input This includes the whole training set XY the featurespace of testing set X119878 = [x119899+1 x119899+2 x119899+119904] and theparameter 120582 of LRR
Step 1simStep 2 They are the same as the LR-SVM algorithm
Step 3 Plug Z and the label vector Y into MKL to train 119896(119896 minus1)2 classifiers with the pairwise strategy
Step 4 Utilize each one of the binary MKL classifiers toclassify the coefficient feature Z119878 of testing set X119878
Step 5 According to the prediction label vectorsY1198751 Y1198752 Y119875119896(119896minus1)2 vote for the category of each sample toget the multilabels Y119875
Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition resultsand the kernel weight vector 1205835 Experiments and Analysis
In this section we conduct extensive experiments to examinethe efficiency of proposed LR-SVM and LR-MKL algorithmsThe operating environment is based on MATLAB (R2013a)under the Intel Core i5 CPU processor 253GHz frequencyparameters The SVM toolbox used in this paper is theLIBSVM [31] which can be easily applied and is shown to befast in large scale databases
The simulations are performed on diverse datasets toensure the universal recognition effectThe test datasets range
over the frequently used face databases and the standard testdata of UCI repository In the simulations all the samples arenormalized first
(1) Yale face database (httpvisionucsdeducontentyale-face-database) it contains 165 grayscale imagesof 15 individuals with different facial expression orconfiguration and each image is resized to 64 lowast 64pixels with 256 grey levels
(2) ORL face database (httpwwwclcamacukresearchdtgattarchivefacedatabasehtm) it contains 400images of 40 distinct subjects taken at different timesvarying light facial expressions and details Weresize them to 64 lowast 64 pixels with 256 grey levels perpixel
(3) LSVT Voice Rehabilitation dataset (httparchiveicsuciedumldatasetsLSVT+Voice+Rehabilitation)[32] it is composed of 126 speech signals from 14people with 309 features divided into two categories
(4) Multiple Features Digit dataset (httparchiveicsuciedumldatasetsMultiple+Features) it includes 2000digitized handwritten numerals 0ndash9 with 649 fea-tures
51 Experiments on LR-SVM In order to demonstrate therecognition performance of SVM improved by the presentedLR-SVM we carry out numerous experiments on the Yaleand ORL face database According to the different rate oftraining sample (20 30 40 50 60 70 and 80)we implement seven groups of experiments on each databaseTo ensure stable and reliable test each group has ten differentdivisions randomly and we average them as the final resultsThe kernel functions are 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) and120574 = 1119892 (119892 is the dimension of feature space)
The classification accuracy and run time of Yale databaseby using SVM and LR-SVM are shown in Figures 1 and 2respectively Similarly the classification accuracy and runtime of ORL database are shown in Figures 3 and 4The solidlines depict the result of SVM with different kernels whilethe patterned lines with the corresponding colour depictthat of LR-SVM As can be seen from the Figures 1 and3 the proposed LR-SVM method consistently achieves anobvious improvement in classification accuracy compared tothe original SVM method In most cases the classificationaccuracy increases with the rise in training sample rate It isshown that the more complete the training set the better theclassification accuracy But it is impossible for the trainingset to include so many samples in reality LR-MKL has ahigh accuracy even under the low training sample rate whichis suitable for the real applications Meanwhile Figures 2and 4 show that through LRR conversion the run timecan be reduced more than an order of magnitude which isreasonable for the real-time requirements of data processingin the big data era
52 Experiments on LR-MKL In this section we compare theperformance of the MKL algorithms involved in Section 3
6 Computational Intelligence and Neuroscience
45
50
55
60
65
70
75
80
85Ac
cura
cy (
)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 1 Classification accuracy of Yale by using SVM and LR-SVM
6040 50 70 803020Training sample rate ()
0
005
01
015
02
025
03
035
04
045
05
Run
time (
s)
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 2 Run time of Yale by using SVM and LR-SVM
and their corresponding LR-MKL algorithms The multik-ernel is composed of 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) Theproportion parameter vector of kernel is 120583 = [1205831 1205832 1205833 1205834]The comparative algorithms are listed below
(i) Unweighted MKL (UMKL) [16] and LR-UMKL (+)indicates sum form and (lowast) indicates product form
(ii) Alternative MKL (AMKL) [17] and LR-AMKL(iii) Generalized MKL (GMKL) [18] and LR-GMKL(iv) Localized MKL (LMKL) [19] and LR-LMKL (sof)
distribute 120583 into softmax mode and (sig) distribute120583 into sigmoid mode
30 40 50 60 70 8020Training sample rate ()
65
70
75
80
85
90
Accu
racy
()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 3 Classification accuracy of ORL by using SVM and LR-SVM
0
05
1
15
2
25
3
Run
time (
s)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 4 Run time of ORL by using SVM and LR-SVM
(v) Heuristic MKL (HMKL) [20] and LR-HMKL(vi) Centering MKL (CMKL) [21] and LR-CMKL(vii) Polynomial MKL (PMKL) [22] and LR-PMKL (1)
adopts the bounded set 1205931 with 1198971-norm and (2)adopts the bounded set 1205932 with 1198972-norm
(viii) Arbitrary Norm MKL (ANMKL) [23 24] and LR-ANMKL (1) iterates 120583 with 1198971-norm and (2) iterates120583 with 1198972-norm
(ix) Besides the highest accuracy among the four monok-ernel SVM selected as the reference item which isreferred to as SVM(best)
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 5
Step 1 Normalize XX119878Step 2 Perform (24) procedure on the normalized XX119878 toproject them on the coefficient feature space ZZ119878 respec-tively
Step 3 Plug Z and the label vector Y into SVM for trainingclassification model
Step 4 Utilize the obtained classificationmodel to classify thecoefficient feature Z119878 of testing set X119878 and the discriminantfunction is 119891(z) = sum119899119894=1 120572lowast119894 119910119894119870(z119894 z) + 119887Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition results
It is well known that SVM suffers from instability forthe various data structures Thus MKL recognition becomesthe development trend Next we combine LRR and MKLalgorithms mentioned in the Section 3 and change binary-classification model into multiclassification model by pair-wise (one-versus-one) strategy through which a classifierbetween any two categories of samples (119896 is the number ofcategories) can be designed Then we adopt voting methodand assign sample to the category the most votes obtainedAll the combined algorithms can be summarized into a framewhich is given in Algorithm 2 and we refer to it collectively asLR-MKL
Algorithm 2 (efficient MKL using LRR (LR-MKL))
Input This includes the whole training set XY the featurespace of testing set X119878 = [x119899+1 x119899+2 x119899+119904] and theparameter 120582 of LRR
Step 1simStep 2 They are the same as the LR-SVM algorithm
Step 3 Plug Z and the label vector Y into MKL to train 119896(119896 minus1)2 classifiers with the pairwise strategy
Step 4 Utilize each one of the binary MKL classifiers toclassify the coefficient feature Z119878 of testing set X119878
Step 5 According to the prediction label vectorsY1198751 Y1198752 Y119875119896(119896minus1)2 vote for the category of each sample toget the multilabels Y119875
Output Compare the actual label vector of test set Y119878 and theprediction label vector Y119875 to obtain the recognition resultsand the kernel weight vector 1205835 Experiments and Analysis
In this section we conduct extensive experiments to examinethe efficiency of proposed LR-SVM and LR-MKL algorithmsThe operating environment is based on MATLAB (R2013a)under the Intel Core i5 CPU processor 253GHz frequencyparameters The SVM toolbox used in this paper is theLIBSVM [31] which can be easily applied and is shown to befast in large scale databases
The simulations are performed on diverse datasets toensure the universal recognition effectThe test datasets range
over the frequently used face databases and the standard testdata of UCI repository In the simulations all the samples arenormalized first
(1) Yale face database (httpvisionucsdeducontentyale-face-database) it contains 165 grayscale imagesof 15 individuals with different facial expression orconfiguration and each image is resized to 64 lowast 64pixels with 256 grey levels
(2) ORL face database (httpwwwclcamacukresearchdtgattarchivefacedatabasehtm) it contains 400images of 40 distinct subjects taken at different timesvarying light facial expressions and details Weresize them to 64 lowast 64 pixels with 256 grey levels perpixel
(3) LSVT Voice Rehabilitation dataset (httparchiveicsuciedumldatasetsLSVT+Voice+Rehabilitation)[32] it is composed of 126 speech signals from 14people with 309 features divided into two categories
(4) Multiple Features Digit dataset (httparchiveicsuciedumldatasetsMultiple+Features) it includes 2000digitized handwritten numerals 0ndash9 with 649 fea-tures
51 Experiments on LR-SVM In order to demonstrate therecognition performance of SVM improved by the presentedLR-SVM we carry out numerous experiments on the Yaleand ORL face database According to the different rate oftraining sample (20 30 40 50 60 70 and 80)we implement seven groups of experiments on each databaseTo ensure stable and reliable test each group has ten differentdivisions randomly and we average them as the final resultsThe kernel functions are 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) and120574 = 1119892 (119892 is the dimension of feature space)
The classification accuracy and run time of Yale databaseby using SVM and LR-SVM are shown in Figures 1 and 2respectively Similarly the classification accuracy and runtime of ORL database are shown in Figures 3 and 4The solidlines depict the result of SVM with different kernels whilethe patterned lines with the corresponding colour depictthat of LR-SVM As can be seen from the Figures 1 and3 the proposed LR-SVM method consistently achieves anobvious improvement in classification accuracy compared tothe original SVM method In most cases the classificationaccuracy increases with the rise in training sample rate It isshown that the more complete the training set the better theclassification accuracy But it is impossible for the trainingset to include so many samples in reality LR-MKL has ahigh accuracy even under the low training sample rate whichis suitable for the real applications Meanwhile Figures 2and 4 show that through LRR conversion the run timecan be reduced more than an order of magnitude which isreasonable for the real-time requirements of data processingin the big data era
52 Experiments on LR-MKL In this section we compare theperformance of the MKL algorithms involved in Section 3
6 Computational Intelligence and Neuroscience
45
50
55
60
65
70
75
80
85Ac
cura
cy (
)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 1 Classification accuracy of Yale by using SVM and LR-SVM
6040 50 70 803020Training sample rate ()
0
005
01
015
02
025
03
035
04
045
05
Run
time (
s)
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 2 Run time of Yale by using SVM and LR-SVM
and their corresponding LR-MKL algorithms The multik-ernel is composed of 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) Theproportion parameter vector of kernel is 120583 = [1205831 1205832 1205833 1205834]The comparative algorithms are listed below
(i) Unweighted MKL (UMKL) [16] and LR-UMKL (+)indicates sum form and (lowast) indicates product form
(ii) Alternative MKL (AMKL) [17] and LR-AMKL(iii) Generalized MKL (GMKL) [18] and LR-GMKL(iv) Localized MKL (LMKL) [19] and LR-LMKL (sof)
distribute 120583 into softmax mode and (sig) distribute120583 into sigmoid mode
30 40 50 60 70 8020Training sample rate ()
65
70
75
80
85
90
Accu
racy
()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 3 Classification accuracy of ORL by using SVM and LR-SVM
0
05
1
15
2
25
3
Run
time (
s)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 4 Run time of ORL by using SVM and LR-SVM
(v) Heuristic MKL (HMKL) [20] and LR-HMKL(vi) Centering MKL (CMKL) [21] and LR-CMKL(vii) Polynomial MKL (PMKL) [22] and LR-PMKL (1)
adopts the bounded set 1205931 with 1198971-norm and (2)adopts the bounded set 1205932 with 1198972-norm
(viii) Arbitrary Norm MKL (ANMKL) [23 24] and LR-ANMKL (1) iterates 120583 with 1198971-norm and (2) iterates120583 with 1198972-norm
(ix) Besides the highest accuracy among the four monok-ernel SVM selected as the reference item which isreferred to as SVM(best)
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
6 Computational Intelligence and Neuroscience
45
50
55
60
65
70
75
80
85Ac
cura
cy (
)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 1 Classification accuracy of Yale by using SVM and LR-SVM
6040 50 70 803020Training sample rate ()
0
005
01
015
02
025
03
035
04
045
05
Run
time (
s)
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 2 Run time of Yale by using SVM and LR-SVM
and their corresponding LR-MKL algorithms The multik-ernel is composed of 119870LIN 119870POL 119870RBF 119870SIG (119902 = 3) Theproportion parameter vector of kernel is 120583 = [1205831 1205832 1205833 1205834]The comparative algorithms are listed below
(i) Unweighted MKL (UMKL) [16] and LR-UMKL (+)indicates sum form and (lowast) indicates product form
(ii) Alternative MKL (AMKL) [17] and LR-AMKL(iii) Generalized MKL (GMKL) [18] and LR-GMKL(iv) Localized MKL (LMKL) [19] and LR-LMKL (sof)
distribute 120583 into softmax mode and (sig) distribute120583 into sigmoid mode
30 40 50 60 70 8020Training sample rate ()
65
70
75
80
85
90
Accu
racy
()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 3 Classification accuracy of ORL by using SVM and LR-SVM
0
05
1
15
2
25
3
Run
time (
s)
30 40 50 60 70 8020Training sample rate ()
LINLR-LINRBFLR-RBF
POLLR-POLSIGLR-SIG
Figure 4 Run time of ORL by using SVM and LR-SVM
(v) Heuristic MKL (HMKL) [20] and LR-HMKL(vi) Centering MKL (CMKL) [21] and LR-CMKL(vii) Polynomial MKL (PMKL) [22] and LR-PMKL (1)
adopts the bounded set 1205931 with 1198971-norm and (2)adopts the bounded set 1205932 with 1198972-norm
(viii) Arbitrary Norm MKL (ANMKL) [23 24] and LR-ANMKL (1) iterates 120583 with 1198971-norm and (2) iterates120583 with 1198972-norm
(ix) Besides the highest accuracy among the four monok-ernel SVM selected as the reference item which isreferred to as SVM(best)
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 7
Table1Th
eperform
anceso
fMKL
algorithm
sand
LR-M
KLalgorithm
sonthed
atasetsY
aleORL
LSV
TandDigit
Yale
ORL
LSVT
Digit
Acc
Time
Acc
Time
Acc
Time
Acc
Time
SVM(best)
6932
3102981
800002
16798
7619
0500142
959302
09195
LR-SVM(best)
79866
700109
870015
01575
776637
000
4396
330
4009
72UMKL
(+)[16]
875114
25981
93579
8182352
785714
00229
9735
7147412
LR-U
MKL
(+)
95893
6030
18834813
10817
738095
00152
9815
9520278
UMKL
(lowast)[16]
583248
27244
775022
205507
666935
00281
960904
71661
LR-U
MKL
(lowast)935739
026
3693406
321836
670033
00176
984618
363
92AMKL
[17]
857753
37236
938741
46854
809524
00452
974725
112138
LR-AMKL
94379
9038
6996
9417
045
9288
0952
000
8598
6952
688
04GMKL
[18]
862989
45330
962057
50070
857143
00565
9914
9980774
LR-G
MKL
958015
062
5398
5833
07761
88928
600183
99467
3459
72LM
KL(sof)[19]
879077
21540
55970003
2203122
850090
51989
99889
81667978
LR-LMKL
(sof)
979352
22720
498
9724
17379
186
742
911933
983591
9818
12LM
KL(sig)[19]
880145
1067552
970108
1070
911
887541
07238
9937
50485914
LR-LMKL
(sig)
98066
7159711
97997
911724
092
662
7045
9499
562
524
592
7HMKL
[20]
636037
924410
935109
1182340
805998
00915
976258
1035
59LR
-HMKL
919611
859
7298
689
310936
88516
2500352
98347
963959
CMKL
[21]
864166
950618
960308
1076
940
799503
00874
965014
106074
LR-C
MKL
94008
310638
098
479
9129746
939024
003
4798
9113
626
96PM
KL(1)[
22]
890035
61842
984901
69065
928571
01079
995881
248702
LR-PMKL
(1)99
1429
090
5399
5712
098
2895
938
6005
7310000
00129831
PMKL
(2)[22]
890261
53893
987533
65450
924662
01295
99504
62116
79LR
-PMKL
(2)98
896
8084
9499
582
8089
1195
5145
006
5199
794
1135108
ANMKL
(1)[2324]
867210
64856
984396
204564
919827
01167
980007
1039
79LR
-ANMKL
(1)96
866
7070
4199
464
328519
92224
700247
992850
690
70ANMKL
(2)[2324]
866998
7066
4982204
2116
15930035
0119
4980039
97753
LR-ANMKL
(2)972917
088
6399
2857
305
979253
91002
2499
249
750374
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
8 Computational Intelligence and Neuroscience
We conduct experiments on the test datasets YaleORL LSVTVoice Rehabilitation (LSVT for short) andMultiple FeaturesDigit (Digit for short)The 60 samples of dataset are drawnout randomly to train classification model and the remainingsamples serve as the test set Through the optimized resultsof 120574 and penalty factor 119862 by grid search method we findthat the classification accuracy varies not too much with 120574and 119862 ranging in a certain interval So there is no need tosearch the whole parameter space which inevitably increasesthe computational costThe penalty factor119862 can be given thattrying values 001 01 1 10 100 and 120574 are fixed on 1119892 Thenwe assign a value which has the highest average accuracyon the 5 lowast 2 cross validation sets to 119862 Each algorithm isconducted with 10 independent runs and we average them asthe final results The bold numbers represent the preferablerecognition effect between the original algorithms and theirLRR combined algorithmsThe numbers in italic font denotethe algorithms whose recognition precision is inferior tothe SVM(best) The recognition performance of algorithmsis measured by the classification accuracy and run timeillustrated by Table 1
In most cases our proposed LR-MKL methods consis-tently achieve superior results to the original MKL whichverifies the higher classification accuracy and shorter opera-tion time It is indicated that LRR can augment the similaritiesamong the intraclass samples and the differences among theinterclass samples while simplifying the kernel matrix Notethat UMKL(lowast) fails to achieve the ideal recognition effectsin many cases even less accurate than SVM(best) Howevercombiningwith LRR improves its effects to a large extentThisillustrates that simply combining kernels without accordingdata structure is infeasible and LRR can offset part of theconsequences of irrational distribution In general PMKLANMKL and their improved algorithms have the preferablerecognition effects especially the improved algorithms withthe accuracy over 90 percent all the time In terms of runtime it is clearly observed that the real-time performance ofMKL is much worse than SVM because MKL has a processof allocating kernel weights and the process can be very timeconsuming Among them LMKL is the worst and fails tosatisfy the real-time requirement Obviously our combinedLR-MKL can reduce the run time manifold even more thanone order of magnitude so it can speed high-precision MKLup to satisfy the real-time requirement In brief the proposedLR-MKL can boost the performance ofMKL to a great extent
6 Conclusion
The complexity of solving convex quadratic optimizationproblem in MKL is 119874(11987311989935) so it is infeasible to apply inlarge scale problems for its large computational cost Oureffort has beenmade on decreasing the dimension of trainingset Note that LRR just can capture the global structure ofdata in relatively few dimensions Therefore we have givena review of several existing MKL algorithms Based on thispoint we have proposed a novel combined LR-MKL whichlargely improves the performance ofMKL A large number ofexperiments have been carried on four real world datasets tocontrast the recognition effects of various kinds of MKL and
LR-MKL algorithms It has been shown that in most casesthe recognition effects of MKL algorithms are better thanSVM(best) except UMKL(lowast) And our proposed LR-MKLmethods have consistently achieved the superior results tothe original MKL Among them PMKL ANMKL and theirimproved algorithms have shown possessing the preferablerecognition effects
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was supported by the National Natural ScienceFoundation of China (no 51208168) Hebei Province NaturalScience Foundation (no E2016202341) and Tianjin NaturalScience Foundation (no 13JCYBJC37700)
References
[1] V N Vapnik ldquoAn overview of statistical learning theoryrdquo IEEETransactions on Neural Networks vol 10 no 5 pp 988ndash9991999
[2] M Hu Y Chen and J T-Y Kwok ldquoBuilding sparse multiple-kernel SVM classifiersrdquo IEEE Transactions on Neural Networksvol 20 no 5 pp 827ndash839 2009
[3] X Wang X Liu N Japkowicz and S Matwin ldquoEnsemble ofmultiple kernel SVM classifiersrdquo Advances in Artificial Intelli-gence vol 8436 pp 239ndash250 2014
[4] E Hassan S Chaudhury N Yadav P Kalra and M GopalldquoOff-line handwritten input based identity determination usingmulti kernel feature combinationrdquo Pattern Recognition Lettersvol 35 no 1 pp 113ndash119 2014
[5] F Cai and V Cherkassky ldquoGeneralized SMO algorithm forSVM-based multitask learningrdquo IEEE Transactions on NeuralNetworks and Learning Systems vol 23 no 6 pp 997ndash10032012
[6] K Crammer O Dekel J Keshet S Shalev-Shwartz andY Singer ldquoOnline passive-aggressive algorithmsrdquo Journal ofMachine Learning Research vol 7 pp 551ndash585 2006
[7] N Cesa-Bianchi A Conconi and C Gentile ldquoOn the gen-eralization ability of on-line learning algorithmsrdquo Institute ofElectrical and Electronics Engineers Transactions on InformationTheory vol 50 no 9 pp 2050ndash2057 2004
[8] S Fine and K Scheinberg ldquoEfficient SVM training usinglow-rank kernel representationsrdquo Journal of Machine LearningResearch vol 2 no 2 pp 243ndash264 2002
[9] S Zhou ldquoSparse LSSVM in primal using cholesky factorizationfor large-scale problemsrdquo IEEETransactions onNeuralNetworksand Learning Systems vol 27 no 4 pp 783ndash795 2016
[10] L Jia S-Z Liao and L-Z Ding ldquoLearning with uncertainkernel matrix setrdquo Journal of Computer Science and Technologyvol 25 no 4 pp 709ndash727 2010
[11] G Liu Z Lin S Yan J Sun Y Yu and Y Ma ldquoRobust recov-ery of subspace structures by low-rank representationrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol35 no 1 pp 171ndash184 2013
[12] Y Peng A Ganesh J Wright W Xu and Y Ma ldquoRASL robustalignment by sparse and low-rank decomposition for linearly
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience 9
correlated imagesrdquo IEEE Transactions on Pattern Analysis andMachine Intelligence vol 34 no 11 pp 2233ndash2246 2012
[13] B Cheng G Liu J Wang Z Huang and S Yan ldquoMulti-tasklow-rank affinity pursuit for image segmentationrdquo in Proceed-ings of the IEEE International Conference on Computer Vision(ICCV rsquo11) pp 2439ndash2446 IEEE Barcelona Spain November2011
[14] YMu JDongX Yuan and S Yan ldquoAccelerated low-rank visualrecovery by random projectionrdquo in Proceedings of the 2011 IEEEConference on Computer Vision and Pattern Recognition CVPR2011 pp 2609ndash2616 Colorado Springs Colo USA June 2011
[15] J Chen J Zhou and J Ye ldquoIntegrating low-rank and group-sparse structures for robust multi-task learningrdquo in Proceedingsof the 17th ACM SIGKDD International Conference on Knowl-edge Discovery and Data Mining KDDrsquo11 pp 42ndash50 San DiegoCalif USA August 2011
[16] P Pavlidis J Cai JWeston andWNGrundy ldquoGene functionalclassification fromheterogeneous datardquo inProceedings of the 5thAnnual Internatinal Conference on Computational Biology pp249ndash255 Montreal Canada May 2001
[17] O Chapelle and A Rakotomamonjy ldquoSecond order optimiza-tion of kernel parametersrdquo Nips Workshop on Kernel Learning2008
[18] M Varma and B R Babu ldquoMore generality in efficient mul-tiple kernel learningrdquo in Proceedings of the 26th InternationalConference On Machine Learning ICML 2009 pp 1065ndash1072Montreal Canada June 2009
[19] M Gonen and E Alpaydin ldquoLocalized multiple kernel learn-ingrdquo in Proceedings of the the 25th international conference pp352ndash359 Helsinki Finland July 2008
[20] S Qiu and T Lane ldquoA framework for multiple kernel supportvector regression and its applications to siRNA efficacy predic-tionrdquo IEEEACM Transactions on Computational Biology andBioinformatics vol 6 no 2 pp 190ndash199 2009
[21] CCortesMMohri andARostamizadeh ldquoTwo-stage learningkernel algorithmsrdquo in Proceedings of the 27th InternationalConference on Machine Learning ICML 2010 pp 239ndash246Haifa Israel June 2010
[22] C Cortes M Mohri and A Rostamizadeh ldquoLearning non-linear combinations of kernelsrdquo in Proceedings of the 23rdAnnual Conference on Neural Information Processing Systems(NIPS rsquo09) pp 396ndash404 December 2009
[23] Z Xu R Jin H Yang I King and M R Lyu ldquoSimple andefficient multiple kernel learning by group lassordquo in Proceedingsof the 27th International Conference onMachine Learning ICML2010 pp 1175ndash1182 Haifa Israel June 2010
[24] M Kloft U Brefeld S Sonnenburg and A Zien ldquoNon-sparseregularization and efficient training with multiple kernelsArxiv Preprint abs1003rdquo httpsarxivorgabs10030079
[25] M Gonen and E Alpaydın ldquoMultiple kernel learning algo-rithmsrdquo Journal of Machine Learning Research vol 12 pp 2211ndash2268 2011
[26] F R Bach ldquoConsistency of the group lasso and multiple kernellearningrdquo Journal ofMachine Learning Research vol 9 no 2 pp1179ndash1225 2008
[27] E J Candes X Li Y Ma and JWright ldquoRobust principal com-ponent analysisrdquo Journal of the ACM vol 58 no 3 2011
[28] C-F Chen C-P Wei and Y-C F Wang ldquoLow-rank matrixrecovery with structural incoherence for robust face recogni-tionrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2618ndash2625Providence RI USA June 2012
[29] J-F Cai E J Candes and Z Shen ldquoA singular value thresh-olding algorithm for matrix completionrdquo SIAM Journal onOptimization vol 20 no 4 pp 1956ndash1982 2010
[30] L Zhuang H Gao Z Lin Y Ma X Zhang and N Yu ldquoNon-negative low rank and sparse graph for semi-supervised learn-ingrdquo in Proceedings of the 2012 IEEE Conference on ComputerVision and Pattern Recognition CVPR 2012 pp 2328ndash2335 usaJune 2012
[31] C Chang and C Lin ldquoLIBSVM a Library for support vectormachinesrdquo ACM Transactions on Intelligent Systems and Tech-nology vol 2 no 3 article 27 2011
[32] A Tsanas M A Little C Fox and L O Ramig ldquoObjectiveautomatic assessment of rehabilitative speech treatment inparkinsonrsquos diseaserdquo IEEE Transactions on Neural Systems andRehabilitation Engineering vol 22 no 1 pp 181ndash190 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Submit your manuscripts athttpswwwhindawicom
Computer Games Technology
International Journal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Distributed Sensor Networks
International Journal of
Advances in
FuzzySystems
Hindawi Publishing Corporationhttpwwwhindawicom
Volume 2014
International Journal of
ReconfigurableComputing
Hindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
Applied Computational Intelligence and Soft Computing
thinspAdvancesthinspinthinsp
Artificial Intelligence
HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014
Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Electrical and Computer Engineering
Journal of
Hindawi Publishing Corporation
httpwwwhindawicom Volume 2014
Advances in
Multimedia
International Journal of
Biomedical Imaging
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Advances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 201
RoboticsJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Computational Intelligence and Neuroscience
Industrial EngineeringJournal of
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014
The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014
Human-ComputerInteraction
Advances in
Computer EngineeringAdvances in
Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014