Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense ...
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Difficulties with Nonlinear SVM for Large Problems The nonlinear kernel is fully dense ...
Difficulties with Nonlinear SVM
for Large Problems
The nonlinear kernelK (A;A0) 2 Rmâ m is fully dense
Computational complexity depends on m
Separating surface depends on almost entire dataset
Complexity of nonlinear SSVM ø O((m+ 1)3)
Runs out of memory while storing the kernel matrix
Long CPU time to compute the dense kernel matrix
O(m2) Need to generate and store entries
Need to store the entire dataset even after solving the problem
Reduced Support Vector Machine
K (x0;Aö0)Döuö = í
(ii) Solve the following problem by the Newtonmethod with correspondingD ú D :
2÷kp(eà D(K (A;A0)Döuö à eí );ë)k2
2 + 21kuö; í k2
2min(u; í ) 2 Rm+1
K (x0;Aö0)Döuö = í
(iii) The nonlinear classifier is defined by the optimal(u;í )solution in step (ii):
Using K (A;A0) gives lousy results!
(i) Choose a random subset matrix ofentire data matrixA 2 Rmâ n; (m << m):
A 2 Rmâ n
Nonlinear Classifier:
A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points
in Separate 486 Asterisks from 514
DotsR2
Conventional SVM Result on Checkerboard
Using 50 Randomly Selected Points Out of 1000
K (A;A0) 2 R50â 50
RSVM Result on Checkerboard Using SAME 50 Random Points Out of
1000
K (A;A0) 2 R1000â 50
RSVM on Moderate Sized Problems(Best Test Set Correctness %, CPU
seconds)
Cleveland Heart297 x 13, 30
86.473.04
85.9232.42
76.881.58
BUPA Liver345 x 6 , 35
74.862.68
73.6232.61
68.952.04
Ionosphere 351 x 34, 35
95.195.02
94.3559.88
88.702.13
Pima Indians768 x 8, 50
78.645.72
76.59328.3
57.324.64
Tic-Tac-Toe958 x 9, 96
98.7514.56
98.431033.5
88.248.87
Mushroom8124 x 22, 215
89.04466.20
N/A
N/A
83.90221.50
K (A;A0)mâ m K (A;A0)mâ m K (A;A0)mâ mmâ n; mDataset Size
RSVM on Large UCI Adult Dataset
Standard Deviation over 50 Runs = 0.001
Average Correctness % & Standard Deviation, 50 Runs
(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2%(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0%(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5%(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2%(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0%
Dataset Size( Train ; Test)
UCI AdultK (A;A0)mâ m
Testing%Std.Dev.
Amâ 123
m m=mK (A;A0)mâ m
%Testing Std.Dev.
Tim
e( C
PU
sec
. )
Training Set Size
RSVMSMOPCGC
Support Vector Regression(Linear Case:f (x) = x0w+ b)
Given the training set:S = f (xi;yi)j xi 2 Rn; yi 2 R; i = 1; . . .; lg
Find a linear function,f (x) = x0w+ b where(w;b)is determined by solving a minimization
problem that guarantees the smallest overallexperiment error made by f (x) = x0w+ b
Motivated by SVM: jjwjj2should be as small as possible
Some tiny error should be discard
-Insensitive Loss Functionï -insensitive loss function:ï
jyi à f (xi)jï = maxf0; jyi à f (xi)j à ïg
The loss made by the estimation function, fat the data point(xi;yi) is
jøjï = maxf0; jøj à ïg= 0 if jøj6 ïjøj à ï otherwise
ú
If ø2 Rn then jøjï 2 Rn is defined as:
(jøjï)i = jøijï ; i = 1. . .n
x
x
x
x
x
x
x
x
x
"
"
-Insensitive Linear Regression"
f (x) = x0w+ b
yj à f (xj) à "f (xk) à yk à "
Find (w;b)with the smallest overall error
ï- insensitive Support Vector Regression Model
Motivated by SVM: jjwjj2should be as small as possible
Some tiny error should be discarded
min(w;b;ø)2Rn+1+m
21jjwjj22+ Ce0 øj jï
where øj jï 2 Rm; ( øj jï)i = max(0; A iw+ bà yij j à ï )
Reformulated - SVR as a Constrained Minimization Problem
min(w;b;ø;øã)2Rn+1+2m
21w0w+ Ce0(ø+ øã)
yà Awà eb 6 eï + øAw+ ebà y 6 eï + øã
ø;øã > 0
subject to
n+1+2m variables and 2m constrains minimization problem
ï
Enlarge the problem size and computational complexity for solving the problem
SV Regression by Minimizing Quadratic -Insensitive Lossï
We minimize jj(w;b)jj22at the same time Occam’s razor: the simplest is the best
min(w;b;ø)2R n+1+l
21(jjwjj22 + b2) + 2
C jj(jøj")jj22
We have the following (nonsmooth) problem:
where (jøj") i = jyi à (w0xi + b)j"
Have the strong convexity of the problem
- insensitive Loss Functionï
(à x à ï )+ (x à ï )+xj jï
Quadratic -insensitive Loss Function ï
xj j2ï = ((x à ï )+ + (à x à ï )+)2
= (x à ï )2+ + (à x à ï )2
+
(x à ï )+ á(à x à ï )+ = 0
p2ï -function replace Us
e ïQuadratic -insensitive Function
p(x; ì )
p2ï (x; ì ) = (p(x à ï ; ì ))2 + (p(à x à ï ; ì ))2
which
is defined byp(x; ì ) = x + ì
1 log(1+ expà ì x)
p-function withë = 10; p(x;10); x 2 [à 3;3]
xj j2ï p2ï (x; ì ); ï = 1; ì = 5
-insensitive Smooth Support Vector Regression
ï
min(w;b)2Rn+1
21(w0w+ b2) + 2
C P
i=1
m
p2ï (A iw+ bà yi; ì )2
C P
i=1
m
A iw+ bà yij j2ï
This problem is a strongly convexstrongly convex minimization problem without any constrainsThe object function is twice twice differentiabledifferentiable thus we can use a fast Newton-Armijo methodNewton-Armijo method to solve this problem
min(w;b)2Rn+1
Ðï ;ë(w;b) :=
Nonlinear -SVR ï
w = A0ë; ë 2 R m
Based on duality theorem and KKT–optimality conditions
y ù Aw+ eb
y ù AA0ë + eby ù K (A;A0)ë + ebIn nonlinear
case :
Nonlinear SVR
min(ë;b)2R m+1
21jjëjj22 + C
P
i=1
m
K (A i;A0)ë + bà yij jï
A 2 R mâ n;B 2 R nâ l
K (A;B)
ï à
Letand R mâ n â R nâ l=) R mâ l
K (A i ;A 0) 2 R 1â mand
Nonlinear regression function : f (x) = K (x;A0)ë + b
min(ë;b)2R m+1
21(ë0ë + b2)
+ 2C P
i=1
m
p2ï (K (A i;A0)ë + bà yi; ì )+ 2
C P
i=1
m
K (A i;A0)ë + bà yij j2ï
Nonlinear Smooth Support Vector
-insensitive Regressionï
Training set and testing set (Slice methodSlice method) Gaussian kernelGaussian kernel is used to generate
nonlinear -SVR in all experiments Reduced kernel techniqueReduced kernel technique is utilized when
training dataset is bigger then 1000 Error measure : 2-norm relative error
Numerical Results
ï
yk k2
yà yêk k2 : observations: predicted values
yê
y
f (x) = 0:5ãsinc(ù10x)+nois
e
Noise: mean=0
x 2 [à 1;1], 101 points
û = 0:04Parameter:
÷= 50; ö = 5; " = 0:02
Training time : 0.3 sec.
101 Data Points in R â RNonlinear SSVR with Kernel:expà öjjxià xj jj22
First Artificial Dataset
f (x) = 0:5ãù30x
sinc( ù30x)
+ ú úrandom noise with mean=0,standard deviation 0.04
Training Time : 0.016 sec.Error : 0.059
Training Time : 0.015 sec.Error : 0.068
ï - SSVR LIBSVM
Original Function
Noise : mean=0 ,
û = 0:4
Parameter :
÷= 50; ö = 1; " = 0:5
Training time : 9.61 sec.Mean Absolute Error (MAE) of 49x49 mesh points : 0.1761
Estimated Function
481 Data Points in
R2 â R
Noise : mean=0 ,
û = 0:4
Estimated Function
Original Function
Using Reduced Kernel:
K (A;A0) 2 R28900â 300
Parameter :
Training time : 22.58 sec.MAE of 49x49 mesh points : 0.0513
C = 10000; ö = 1; ï = 0:2
Real Datasets
Linear -SSVRTenfold Numerical Result
ï
Nonlinear -SSVRTenfold Numerical Result
1/2ï
Nonlinear -SSVRTenfold Numerical Result 2/2
ï
Difficulties with Nonlinear SVM
for Large Problems
The nonlinear kernelK (A;A0) 2 Rmâ m is fully dense
Computational complexity depends on m
Separating surface depends on almost entire dataset
Complexity of nonlinear SSVM ø O((m+ 1)3)
Runs out of memory while storing the kernel matrix
Long CPU time to compute the dense kernel matrix
O(m2) Need to generate and store entries
Need to store the entire dataset even after solving the problem
Reduced Support Vector Machine
K (x0;Aö0)Döuö = í
(ii) Solve the following problem by the Newtonmethod with correspondingD ú D :
2÷kp(eà D(K (A;A0)Döuö à eí );ë)k2
2 + 21kuö; í k2
2min(u; í ) 2 Rm+1
K (x0;Aö0)Döuö = í
(iii) The nonlinear classifier is defined by the optimal(u;í )solution in step (ii):
Using K (A;A0) gives lousy results!
(i) Choose a random subset matrix ofentire data matrixA 2 Rmâ n; (m << m):
A 2 Rmâ n
Nonlinear Classifier:
A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points
in Separate 486 Asterisks from 514
DotsR2
Conventional SVM Result on Checkerboard
Using 50 Randomly Selected Points Out of 1000
K (A;A0) 2 R50â 50
RSVM Result on Checkerboard Using SAME 50 Random Points Out of
1000
K (A;A0) 2 R1000â 50
RSVM on Moderate Sized Problems(Best Test Set Correctness %, CPU
seconds)
Cleveland Heart297 x 13, 30
86.473.04
85.9232.42
76.881.58
BUPA Liver345 x 6 , 35
74.862.68
73.6232.61
68.952.04
Ionosphere 351 x 34, 35
95.195.02
94.3559.88
88.702.13
Pima Indians768 x 8, 50
78.645.72
76.59328.3
57.324.64
Tic-Tac-Toe958 x 9, 96
98.7514.56
98.431033.5
88.248.87
Mushroom8124 x 22, 215
89.04466.20
N/A
N/A
83.90221.50
K (A;A0)mâ m K (A;A0)mâ m K (A;A0)mâ mmâ n; mDataset Size
RSVM on Large UCI Adult Dataset
Standard Deviation over 50 Runs = 0.001
Average Correctness % & Standard Deviation, 50 Runs
(6414, 26148) 84.47 0.001 77.03 0.014 210 3.2%(11221, 21341) 84.71 0.001 75.96 0.016 225 2.0%(16101, 16461) 84.90 0.001 75.45 0.017 242 1.5%(22697, 9865) 85.31 0.001 76.73 0.018 284 1.2%(32562, 16282) 85.07 0.001 76.95 0.013 326 1.0%
Dataset Size( Train ; Test)
UCI AdultK (A;A0)mâ m
Testing%Std.Dev.
Amâ 123
m m=mK (A;A0)mâ m
%Testing Std.Dev.
Tim
e( C
PU
sec
. )
Training Set Size
RSVMSMOPCGC