Knowledge extraction from support vector machines

51
Knowledge Extraction from Support Vector Machines: A Fuzzy Logic Approach

Transcript of Knowledge extraction from support vector machines

Page 1: Knowledge extraction from support vector machines

Knowledge Extraction from

Support Vector Machines:

A Fuzzy Logic Approach

Page 2: Knowledge extraction from support vector machines

“Certain class of SVMs is mathematically equivalent to FARB”

Page 3: Knowledge extraction from support vector machines

What is SVM

Page 4: Knowledge extraction from support vector machines

What does it do?

Learns a hyper plane to classify data into 2 classes.

Page 5: Knowledge extraction from support vector machines

What is a hyperplane?

A hyperplane is a function like the equation for a line,

𝑦 = 𝑚𝑥 + 𝑏

In fact, for a simple classification task with just 2 features, the hyperplane can be a line.

Page 6: Knowledge extraction from support vector machines

SVM finds the optimal solution.

Page 7: Knowledge extraction from support vector machines

Support Vector Machine

SVM attempts to maximize the margin, so that the hyperplane is just as far away from red ball as the blue ball. In this way, it decreases the chance of misclassification.

Page 8: Knowledge extraction from support vector machines

More Formally

Input:

set of (input, output) training pair samples.

Output:

set of weights w (or 𝑤𝑖), one for each feature, whose linear combination predicts the value of y.

Page 9: Knowledge extraction from support vector machines

We use the optimization of maximizing the margin (‘street width’) to reduce the number of weights that are nonzero to just a few that

correspond to the important features that ‘matter’ in deciding the separating

line(hyperplane)…these nonzero weights correspond to the support vectors (because they ‘support’ the separating hyperplane)

Page 10: Knowledge extraction from support vector machines

The optimization problem

minimize 𝑓(𝑤) ≡ (1/2) ∥ 𝒘 ∥2

subject to 𝑔 𝑤, 𝑏 ≡ −𝑦𝑖 𝒘 ⋅ 𝒙 + 𝑏 + 1 ≤ 0, 𝑖 = 1…𝑚

we use Lagrange multipliers to get thisproblem into a form that can be solved analytically

Page 11: Knowledge extraction from support vector machines

What if things get more complicated?

Page 12: Knowledge extraction from support vector machines

Throw the balls in the air. While the balls are in the air and thrown up in just the right way, you use a large sheet of paper to divide the

balls in the air.

mapping data to a high dimensional space

Page 13: Knowledge extraction from support vector machines

Kernel

polynomial: (𝒙𝒊 ⋅ 𝒙𝒋 + 𝑐)𝑝

Gaussian radial basis function: exp(−∥ 𝒙𝒊– 𝒙𝒋 ∥2/2𝜎2)

SVM does its thing, maps them into a higher dimension and then finds the hyperplane to separate the classes.

Page 14: Knowledge extraction from support vector machines
Page 15: Knowledge extraction from support vector machines

Where does SVM get its name from?

• The decision function is fully specified by a (usually very small) subset of training samples, the support vectors.

• Support vectors are the data points that lie closest to the decision surface (or hyperplane)

• They are the data points most difficult to classify

• They have direct bearing on the optimum location of the decision surface

• they ‘support’ the separating hyperplane

Page 16: Knowledge extraction from support vector machines

Knowledge ExtractionExtracting the knowledge learned by a black–box classifier and

representing it in a comprehensible form

Page 17: Knowledge extraction from support vector machines

Knowledge Extraction

Benefits :

• Validation

• Feature extraction

• Knowledge refinement and improvement

• Knowledge acquisition for symbolic AI systems

• Scientific discovery

Page 18: Knowledge extraction from support vector machines

Knowledge Extraction Rule Extraction

• Methods for RE from ANNs have been classified into three categories:

Decompositional

Pedagogical

Eclectic

Page 19: Knowledge extraction from support vector machines

decompositional approach for KE

SVM :The IO mapping of the trained SVM f : Rn → {−1, 1} is given by

Page 20: Knowledge extraction from support vector machines

decompositional approach for KE

Page 21: Knowledge extraction from support vector machines
Page 22: Knowledge extraction from support vector machines
Page 23: Knowledge extraction from support vector machines

“Certain class of SVMs is mathematically equivalent to FARB”

Page 24: Knowledge extraction from support vector machines

What is FARB !?

Let’s take an example first !

Page 25: Knowledge extraction from support vector machines

Example:Input: q ∈ R ,

Output: O ∈ R ,

And: a0, a1, k ∈ R, with k > 0.

Rules:

R1: If q is equal to k Then O = a0 + a1,

R2: If q is equal to −k Then O = a0 − a1,

Page 26: Knowledge extraction from support vector machines

- Linguistic terms: equal to k , equal to –k

- To express fuzziness, Gaussian membership function is used:

Page 27: Knowledge extraction from support vector machines

These Function Satisfy:

Page 28: Knowledge extraction from support vector machines

Applying Singleton fuzzifier and Centre of Gravity defuzzifier yields:

But, What does this Output mean !??

Page 29: Knowledge extraction from support vector machines

Take a deeper look !

It is a feedforward ANN with a single neuron, employing the activation function tanh() !

So: this FRB is equivalent to ANN

Page 30: Knowledge extraction from support vector machines

This FRB , in particular, satisfy the definition of FARB, which is:

Page 31: Knowledge extraction from support vector machines
Page 32: Knowledge extraction from support vector machines
Page 33: Knowledge extraction from support vector machines

To get the same output, apply the same steps as in the example, which is:

But how this output is any close to the one in the example ?!?

Page 34: Knowledge extraction from support vector machines

And many other MFs satisfy this output, given specific values of z,u,v,r and g. Such as Logistic function and others.

Apply: zi = ui = 1, vi = ri = 0, and gi(x) = tanh(x).

Page 35: Knowledge extraction from support vector machines

ResultKolman and Margaliot: Every standard ANN has a corresponding FARB.

There’s a transformation T:

This work extend that to: Certain class of SVMs satisfy the transformation P:

Page 36: Knowledge extraction from support vector machines

“Certain class of SVMs is mathematically equivalent to FARB”

Page 37: Knowledge extraction from support vector machines

The SVM-FARB Equivalence

*

1

( ) * ( , )Nsv

i

i i

i

h x b y K x s

(2)

0

1 1

( ) ( )m m

i i i i i i i i

i i

O q a ra z a g u q v

(8)

Page 38: Knowledge extraction from support vector machines

Theorem 2. (SVM-FARB equivalence)conditionFind FARB with:

So these conditions would hold0, , , ,i i i i

m Nsv

q a a

*

0

1

*

,

,

( ) ( , )

m

i i

i

i i i i

i

i i i i

a ra b

z a y

g u q v K x s

(15)

Page 39: Knowledge extraction from support vector machines

Pause and Think

• Let’s say we have a FARB

• How many rules have we got?

1 1

0 1

...

...

m m

m

If q is and and q is

Then O a a a

(7)

Page 40: Knowledge extraction from support vector machines

Famous SVM Kernels

( , ) , (linear kernel)TK x y x y

( , ) (1 / ) , , , (polynomial kernel)T dK x y x y c c d

( , ) tanh( ), 0, 0, (MLP kernel)TK x y x y

2 2ˆ ˆ( , ) exp( / (2 )), , (RBF or Gaussian kernel)K x y x y

Page 41: Knowledge extraction from support vector machines

Corollary 1 {MLP kernel}

These parameters will satisfy (15) conditions

( ) ( )tanh(( ) )

( ) ( ) 2

k k

k k

q qq k

q q

,

,

2 , /

i i

T i

i

i i

k k

i i

q x s

k

* *

1

( ) tanh( )Nsv

T i

i i

i

h x y x s b

(17)

0

1 1

( ) ( )m m

i i i i i i i i

i i

O q a ra z a g u q v

* *

0

1, 0,

2 ,

2 ,

( ) tanh( )

,

i i

i i

i i i

i

i i i

z r

u

v k

g x x

a b a y

Page 42: Knowledge extraction from support vector machines

Pause and Think

appear in the FARB if-part

What could this mean?

iq

cos

cos ; and are normalized

T i

i

i

i

i

i

q x s

q x s

q x s

Page 43: Knowledge extraction from support vector machines

Corollary 2 {MPL Kernel}2

* *

21

( ) exp( )ˆ2

Nsv

i i

i

x yh x y b

(18)

These parameters will

satisfy (15) conditions

0

1 1

( ) ( )m m

i i i i i i i i

i i

O q a ra z a g u q v

2

2

( ) ( ) ( )2exp( ) 1

( ) ( ) 2

k k

k k

q q q k

q q

0 0

,

,

ˆ , 0

T i

i

i i

q x s

k

2

2

* *

0 1

*

2, 1

1 2 ,

0,

( ) exp( ),

/ 2,

/ 2

i i

i

i

i

Nsv

i ii

i i i

z r

u

v

g x x

a b y

a y

Page 44: Knowledge extraction from support vector machines

Experimentation{Iris data set}

• 150 examples• sepal length

• sepal width

• length

• petal width

• 3 classes• Setosa

• Versicolor

• Virginica

Page 45: Knowledge extraction from support vector machines

SVM

Page 46: Knowledge extraction from support vector machines

Results {SVM1}

Page 47: Knowledge extraction from support vector machines
Page 48: Knowledge extraction from support vector machines
Page 49: Knowledge extraction from support vector machines

Results {SVM2}

Page 50: Knowledge extraction from support vector machines
Page 51: Knowledge extraction from support vector machines