Knowledge extraction from support vector machines
-
Upload
eyad-alshami -
Category
Engineering
-
view
122 -
download
1
Transcript of Knowledge extraction from support vector machines
Knowledge Extraction from
Support Vector Machines:
A Fuzzy Logic Approach
“Certain class of SVMs is mathematically equivalent to FARB”
What is SVM
What does it do?
Learns a hyper plane to classify data into 2 classes.
What is a hyperplane?
A hyperplane is a function like the equation for a line,
𝑦 = 𝑚𝑥 + 𝑏
In fact, for a simple classification task with just 2 features, the hyperplane can be a line.
SVM finds the optimal solution.
Support Vector Machine
SVM attempts to maximize the margin, so that the hyperplane is just as far away from red ball as the blue ball. In this way, it decreases the chance of misclassification.
More Formally
Input:
set of (input, output) training pair samples.
Output:
set of weights w (or 𝑤𝑖), one for each feature, whose linear combination predicts the value of y.
We use the optimization of maximizing the margin (‘street width’) to reduce the number of weights that are nonzero to just a few that
correspond to the important features that ‘matter’ in deciding the separating
line(hyperplane)…these nonzero weights correspond to the support vectors (because they ‘support’ the separating hyperplane)
The optimization problem
minimize 𝑓(𝑤) ≡ (1/2) ∥ 𝒘 ∥2
subject to 𝑔 𝑤, 𝑏 ≡ −𝑦𝑖 𝒘 ⋅ 𝒙 + 𝑏 + 1 ≤ 0, 𝑖 = 1…𝑚
we use Lagrange multipliers to get thisproblem into a form that can be solved analytically
What if things get more complicated?
Throw the balls in the air. While the balls are in the air and thrown up in just the right way, you use a large sheet of paper to divide the
balls in the air.
mapping data to a high dimensional space
Kernel
polynomial: (𝒙𝒊 ⋅ 𝒙𝒋 + 𝑐)𝑝
Gaussian radial basis function: exp(−∥ 𝒙𝒊– 𝒙𝒋 ∥2/2𝜎2)
SVM does its thing, maps them into a higher dimension and then finds the hyperplane to separate the classes.
Where does SVM get its name from?
• The decision function is fully specified by a (usually very small) subset of training samples, the support vectors.
• Support vectors are the data points that lie closest to the decision surface (or hyperplane)
• They are the data points most difficult to classify
• They have direct bearing on the optimum location of the decision surface
• they ‘support’ the separating hyperplane
Knowledge ExtractionExtracting the knowledge learned by a black–box classifier and
representing it in a comprehensible form
Knowledge Extraction
Benefits :
• Validation
• Feature extraction
• Knowledge refinement and improvement
• Knowledge acquisition for symbolic AI systems
• Scientific discovery
Knowledge Extraction Rule Extraction
• Methods for RE from ANNs have been classified into three categories:
Decompositional
Pedagogical
Eclectic
decompositional approach for KE
SVM :The IO mapping of the trained SVM f : Rn → {−1, 1} is given by
decompositional approach for KE
“Certain class of SVMs is mathematically equivalent to FARB”
What is FARB !?
Let’s take an example first !
Example:Input: q ∈ R ,
Output: O ∈ R ,
And: a0, a1, k ∈ R, with k > 0.
Rules:
R1: If q is equal to k Then O = a0 + a1,
R2: If q is equal to −k Then O = a0 − a1,
- Linguistic terms: equal to k , equal to –k
- To express fuzziness, Gaussian membership function is used:
These Function Satisfy:
Applying Singleton fuzzifier and Centre of Gravity defuzzifier yields:
But, What does this Output mean !??
Take a deeper look !
It is a feedforward ANN with a single neuron, employing the activation function tanh() !
So: this FRB is equivalent to ANN
This FRB , in particular, satisfy the definition of FARB, which is:
To get the same output, apply the same steps as in the example, which is:
But how this output is any close to the one in the example ?!?
And many other MFs satisfy this output, given specific values of z,u,v,r and g. Such as Logistic function and others.
Apply: zi = ui = 1, vi = ri = 0, and gi(x) = tanh(x).
ResultKolman and Margaliot: Every standard ANN has a corresponding FARB.
There’s a transformation T:
This work extend that to: Certain class of SVMs satisfy the transformation P:
“Certain class of SVMs is mathematically equivalent to FARB”
The SVM-FARB Equivalence
*
1
( ) * ( , )Nsv
i
i i
i
h x b y K x s
(2)
0
1 1
( ) ( )m m
i i i i i i i i
i i
O q a ra z a g u q v
(8)
Theorem 2. (SVM-FARB equivalence)conditionFind FARB with:
So these conditions would hold0, , , ,i i i i
m Nsv
q a a
*
0
1
*
,
,
( ) ( , )
m
i i
i
i i i i
i
i i i i
a ra b
z a y
g u q v K x s
(15)
Pause and Think
• Let’s say we have a FARB
• How many rules have we got?
1 1
0 1
...
...
m m
m
If q is and and q is
Then O a a a
(7)
Famous SVM Kernels
( , ) , (linear kernel)TK x y x y
( , ) (1 / ) , , , (polynomial kernel)T dK x y x y c c d
( , ) tanh( ), 0, 0, (MLP kernel)TK x y x y
2 2ˆ ˆ( , ) exp( / (2 )), , (RBF or Gaussian kernel)K x y x y
Corollary 1 {MLP kernel}
These parameters will satisfy (15) conditions
( ) ( )tanh(( ) )
( ) ( ) 2
k k
k k
q qq k
q q
,
,
2 , /
i i
T i
i
i i
k k
i i
q x s
k
* *
1
( ) tanh( )Nsv
T i
i i
i
h x y x s b
(17)
0
1 1
( ) ( )m m
i i i i i i i i
i i
O q a ra z a g u q v
* *
0
1, 0,
2 ,
2 ,
( ) tanh( )
,
i i
i i
i i i
i
i i i
z r
u
v k
g x x
a b a y
Pause and Think
appear in the FARB if-part
What could this mean?
iq
cos
cos ; and are normalized
T i
i
i
i
i
i
q x s
q x s
q x s
Corollary 2 {MPL Kernel}2
* *
21
( ) exp( )ˆ2
Nsv
i i
i
x yh x y b
(18)
These parameters will
satisfy (15) conditions
0
1 1
( ) ( )m m
i i i i i i i i
i i
O q a ra z a g u q v
2
2
( ) ( ) ( )2exp( ) 1
( ) ( ) 2
k k
k k
q q q k
q q
0 0
,
,
ˆ , 0
T i
i
i i
q x s
k
2
2
* *
0 1
*
2, 1
1 2 ,
0,
( ) exp( ),
/ 2,
/ 2
i i
i
i
i
Nsv
i ii
i i i
z r
u
v
g x x
a b y
a y
Experimentation{Iris data set}
• 150 examples• sepal length
• sepal width
• length
• petal width
• 3 classes• Setosa
• Versicolor
• Virginica
SVM
Results {SVM1}
Results {SVM2}