Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי...

Post on 13-Dec-2015

216 views 2 download

Transcript of Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי...

Support Vector Machines Project

מגישים: גיל טל ואורן אגם

מנחה: מיקי אלעד

1999נובמבר

הטכניון מכון טכנולוגי לישראלהפקולטה להנדסת חשמל

המעבדה לעיבוד וניתוח תמונות

Introduction

• SVM is an emerging technique for supervised learning problems, which might replace neural-networks.

• Main features: – Good generalization error - maximal margin. – Convex optimization problem.– Linear and non-linear decision surfaces.

• Proposed initially by Vapnik (82’).

Project Objectives

• Learn the theory of SVM,

• Design an efficient training algorithm,

• Create a 2D demo in order to explain the features of the SVM, and the parameters involved, and

• Create a C++ software package which can serve as a platform for learnining problems.

Supervised Learning

)x,y(,........),x,y(),x,y( LL2211

1,1y L1kk

Input:

where: input vectors,

classification values.

Purpose: Find a machine I(z) that classifies

correctly the training data, and generalizes

well to other inputs.

nL1kkx

Neural Networks

1. Training involves a solution of non-convex optimiz. problem.

2. Generalization error is typically not satisfactory.

3. Hard to choose the structure of the net.

Support Vector Machine (SVM)

• Input Vectors are mapped to a high dimensional feature space Z. (1. How ?)

• In this space a hyper-plane decision surface is constructed. (2. How ?)

• This decision surface has special properties that ensure high generalization. (3. How ?)

• Training is done in a numerically feasible approach. (4. How ?)

1. Mapping to Higher Dimension

• Map the vectors from to a higher dimension

(N>n) using a non-linear mapping function chosen a priori.

• Basic idea: a linear separation in the N-dim. is a non-linear separating surface in the n-dim.

Nn:

nN

Example: Non-Linear Mapping

As a different example, if the input vectors has n=200, and we are using

5th order polynomial, N has BILLIONS OF ENTRIES

There is a computational problem that must be taken care of

2. Separating Hyper-plane

)x,y(,..........),x,y(),x,y( LL2211Input:

The input is linearly separable if there exists a vector Wand a scalar b such that:

The separating hyperplane is given by

0bxW

1bxWy

1yfor1bxW1yfor1bxW

kk

kk

kk

3. Optimal Hyper-plane

1. SVM defines the optimal hyper-plane as the one with maximal margin

2. It can be shown that the margin is given by 2W

L,...,2,1k1bxWy

ToSubjectWMinimize

kk

2

W

QP Problem

LaGrange multipliers

To do so we construct a Lagrangian:

]1)bWXi(yi[iWW2

1)A,b,W(L

t

1i

At the point of minimum we get:

L

1iii

L

1iiii

y00b

)A,b,W(L

XyW0W

)A,b,W(L

• Most of the α’s are zeros.• The non-zeros correspond to the points satisfing the inequalities as equalities

These points are called the SUPPORT-VECTORS

W Z b0 0 0

VectorsSupport

iii XyW

bZXysign)Z(I

VectorsSupport

iii

Decision Law

Classification by SVM

4. Using Kernel functions

Let us restrict the kind of functions such that

Examples:

Nn:

2121 Z,ZKZZ

d21212

221

21 ZZ1Z,ZKor2

ZZexpZ,ZK

The QP Problem

0,0YToSubject

D2

11Maximize

T

TT

is the vector of weights (laGrange mult.) , 1 is a vector of ones D is a matrix with the entries:

jijijiji X,XKyyXXyyDij

Using kernel functions, the overall problem remain QP:

The Decision Rule

bZ,XKysign)Z(I i

VectorsSupport

ii

• Using Kernel functions, we are required to perform inner-products in the lower (n) dimension only, both for training and for applying it on input patterns.• By solving for the optimal we actually find the support vector

Results

1. Write here about the software that you developed

2. Cut and paste an image which will show the application window

3. Add more examples (for example - show how the same non-linear problem is treated with growing d - the polynomial degree)

4. Say something about the algorithm that you have implemented (main features)

Example 1: Linear Classification

Example 2: Non-Linear Separation

Conclusions