Principles of Support Vector Machine

7/25/2019 Principles of Support Vector Machine

http://slidepdf.com/reader/full/principles-of-support-vector-machine 1/26

Principles of Support Vector Machine (SVM)classification

SVM is a pattern recognition method that is used widely in data mining applications, and

provides a means of supervised classification, as do SIMCA and LDA. SVM was originallydeveloped for the linear classification of separable data, but is applicable to nonlinear data

with the use of kernel functions. SVM are used in machine learning, optimization,

statistics, bioinformatics, and other fields that use pattern recognition. The algorithm

used within The Unscrambler® is based on code developed and released under an

modified BSD license by Chih-Chung Chang and Chih-Jen Lin of the National Taiwan

University. Hsu et al,2009

What is SVM classification?

SVM is a classification method based on statistical learning wherein a function that

describes a hyperplane for optimal separation of classes is determined. As the linear

function is not always able to model such a separation, data are mapped into a new

feature space and a dual representation is used with the data objects represented by their

dot product. A kernel function is used to map from the original space to the feature space,

and can be of many forms, thus providing the ability to handle nonlinear classification

cases. The kernels can be viewed as a mapping of nonlinear data to a higher dimensional

feature space, while providing a computation shortcut by allowing linear algorithms to

work with higher dimensional feature space. The support vector is defined as the reduced

training data from the kernel. The figure below illustrates the principle of applying a

kernel function to achieve separability.

In this new space SVM will search for the samples that lie on the borderline between the

classes, i.e. to find the samples that are ideal for separating the classes; these samples are

named support vectors. The figure below illustrates this in that only the samples marked

with + for the two classes are used to generate the rule for classifying new samples.

A situation where SVM will perform well is when some classes are inhomogeneous and

partly overlapping, and thus, building local PCA models with all samples will not be

successful because one class may encompass other classes if all samples are used.

SVM will in this case find a set of the most relevant samples in terms of discriminating

between the classes and is invariant to samples far from the discrimination line.

SVM has advantages over classification methods such as neural networks, as it has aunique solution, and has less tendency of overfitting when compared to other nonlinear

classification methodologies. Of course, the model validation is the critical aspect in

avoiding overfitting for any method. SVMs are effective for modeling of nonlinear data,

and are relatively insensitive to variation in parameters. SVM uses an iterative training

algorithm to achieve separation of different classes.

Two SVM classification types are available in The Unscrambler® which are based on

different means of minimizing the error function of the classification.

c-SVC: also known as Classification SVM Type 1.

nu-SVC: also known as Classification SVM Type 2.

In the c-SVM classification, a capacity factor, C, can be defined. The value of C should be

chosen based on knowledge of the noise in the data being modeled. Its value can be

optimized through cross-validation procedures. When using nu-SVM classification, the nu

value must be defined (default value = 0.5). Nu serves as the upper bound of the fraction

of errors and is the lower bound for the fraction of support vectors.

Increasing nu will allow more errors, while increasing the margin of class separation.

The kernel type to be used as a separation of classes can be chosen from the following

four options:

Linear

Polynomial

Radial basis function

Sigmoid

The linear kernel is set as the default option . If the number of variables is very large the

data do not need to be mapped to a higher dimensional space the linear kernel function is

preferred. The radial basis function is also simple function and can model systems of

varying complexity. It is an extension of the linear kernel.

If a polynomial kernel is chosen, the order of the polynomial must also be given. In SVM

classification, the best value for C is often not known a priori. Through a grid search and

applying cross validation to reduce the chance of overfit, one can identify an optimal value

of C so that unknowns can be properly classified using the SVM model.

Data suitable for SVM classification

SVM classification is a supervised method of classification. The data used for SVM must

have a data matrix which includes a single category variable defining which classes are to

be discriminated by the model. The X and Y matrices must have the same number of rows

(samples) for SVM classification, and not have any missing data. The Y matrix must

contain a single column of category variables. The X data must be numerical, and not

contain any missing data.

SVM have been used in drug discovery to identify compounds that may have efficacy, and

also to identify toxicity issues with drugs. They have been used in classification problems

such as that of classifying plastics from their FTIR spectra, meat and bone meal in feed

from NIR imaging spectroscopy, teas from HPLC chromatograms, and many other areas in

pattern recognition and data mining.

Main results of SVM classification

When an SVM model is created a new node is added in the project navigator with a folder

for the data used in the model, and the results folder. The results folder has the following

matrices:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

The main result of the SVM is the the confusion matrix, which indicates how many

samples were classified is each class, and the prediction matrix, which indicates the

classification determined for each sample in the training set.

The prediction matrix indicates the classification determined for each sample in the

training set.

More details about SVM Classification

It is advised to start with the RBF kernel with various settings of C for C-SVM and select

10-segment cross validation. If all samples are correctly classified, which means the

confusion matrix has no values outside the diagonal, one may select this model as

suitable for classifying future samples. Of course, some data will not classify all samples

in the correct class during training.

If the data are expected to be nonlinear, e.g. from looking at the classes in a scores plot

from PCA or PLS-DA, one may try other kernels and change the settings for C or nu.

SVM classification application examplesSVM were used as a multivariate classification tool for the identification of meat and bone

meal in animal feed in response to legislation banning such substances following the

outbreak of mad cow disease.Fernandez Pierna et al, 2004 NIR imaging spectroscopy is

able to detect differences in feeds based on the chemical composition. SVM can be used to

classify feed samples, reducing the need for constant expert analysis of data, thus

providing a rapid tool for analysis that can be utilized for certification of animal feed.

SVM were applied for the classification of plastics in a recycling system. Belousov et al,

2002 A remote FTIR spectrometer was mounted on a conveyor where plastics were being

sorted for recycling. A two-tiered classification model was developed where at the first

level samples were divided into the classes of “important” plastics (ABS, PC, PC/ABS, SB

and PVC) and reject plastics (PA, PP and PE). The “important” plastics were then further

categorized into each individual type of plastic.

More details regarding Support Vector Machine classification are given in the method

reference.

Tasks – Analyze – Support Vector Machine

classification

The sections that follow list menu options, dialogs and results while using Support Vector

Machine classification in practice accessible from the menu Tasks-Analyze-Support Vector

Machine lassification….

Model inputFirst the input data for the classification is defined in the Support Vector Machine dialog.

Choose the data matrix which contains the data to be used for the classification as the

first matrix. This matrix of predictors should contain only numerical values, with no

missing values. The second matrix to define is that containing the category, and must

have a single column only. The SVM training requires at least two classes. This

classification information may be from the same matrix or another, but must have the

same number of rows as the first, and have only a single column of category data.

Support Vector Machine Model Inputs

If the appropriate selection is not made for the classifier, the following warning will be

displayed. To build the SVM model go to the column drop-down list, select a single

column containing category variables.

Support Vector Machine Model Inputs Warnings

Options

Here one can choose the SVM type of classification to use, either C-SVC or nu-SVM, from

the drop-down list next to SVM type. The kernel type to be used to determine the

hyperplane that best separates the classes can be selected from the following types from

the drop-down list. The default setting of Radial basis function is the simplest, and can

model complex data.

Support Vector Machine Options

The kernel types are:

Linear Polynomial

Radial basis function

Sigmoid

For a polynomial kernel type, the degree of the polynomial should be defined. The C-SVM

has an input parameter named C, which is a capacity factor (also called penalty factor), a

measure of the robustness of the model. C must be greater than 0.

When using nu-SVM regression the nu value must be defined (default value = 0.5). Nu

serves as the upper bound of the fraction of errors and is the lower bound for the fraction

of support vectors.

Support Vector Machine Options for nu-SVM

Support Vector Machine Options for C-SVM

Grid Search

In the options tab the Grid Search button is available. Clicking on the Grid

Search button will open a dialog for grid search. The figure below shows the grid search

dialog after a grid search has been perforemd.

The dialog asks for input for the parameters Gamma and C in the case of C-SVMC and

Gamma and Nu in the case of nu-SVMR. It has been reported in the literature that an

exponentially growing sequence of the parameters is good as a first course grid search.

This is why the inputs Gamma and C are given on the log scale, but not the nu since it is

between 0 and 1. However, in the grid table above the actual values are given. It is

recommended to use cross-validation in grid search to avoid overfitting when many

combinations of the parameters are tried. After an initial grid search it may be refined with

smaller ranges for the parameters once the best range has been found. Click on the Start

button for the calculations to commence. Note that it is possible to click on Stop during

the computations so that if the results become worse for higher values for the parametersone may stop to save time.The default is to start with five levels of each parameter. Click

on one (the “best”) value for the Validation accuracy in the grid after completion to see

detailed results. The SVs lists how many samples that were selected and is depending

should be related to the number of samples in the data.

Click on Use setting to return to the previous dialog and for running the SMVC again with

these parameter settings. Notice that since the cross validation is random the RMSE and

the R-square from validation may be different in the second run. This again is a function

of the distribution of the samples.

To understand more in detail how SVMC selects the support vectors (samples that are

lying on the boundary between the classes) one may run a PCA on the same data and

make use of the Sample Grouping option in the score plot to visualize the support vectors.

Weights

If the analysis calls for variables to be weighted for making realistic comparisons to each

other (particularly useful for process and sensory data), click on the Weights tab and the

following dialog box will appear.

Support Vector Machine Weights

Individual variables can be selected from the variable list table provided in this dialog by

holding down the control (Ctrl) key and selecting variables. Alternatively, the variable

numbers can be manually entered into the text dialog box. The Select button can be used

(which will bring up the Define Range dialog), or every variable in the table can be selected

by simply clicking on All.

Once the variables have been selected, to weight them, use the options in the Change

Selected Variable(s) dialog box, under the Select tab. The options include:

A/ SDev +B)

This is a standard deviation weighting process where the parameters A and B can

be defined. The default is A = 1 and B = 0.

Constant

This allows the weighting of selected variables by predefined constant values.

Downweight

This allows the multiplication of selected variables by a very small number, such

that the variables do not participate in the model calculation, but their correlation

structure can still be observed in the scores and loadings plots and in particular,

the correlation loadings plot.

Block weighting

This option is useful for weighting various blocks of variables prior to analysis so

that they have the same weight in the model. Check the Divide by SDev box

to weight the variables with standard deviation in addition to the block weighting.

Use the Advanced tab in the Weights dialog to apply predetermined weights to eachvariable. To use this option, set up a row in the data set containing the weights (or create

a separate row matrix in the project navigator). Select the Advanced tab in the Weights

dialog and select the matrix containing the weights from the drop-down list. Use the Rows

option to define the row containing the weights and click on Update to apply the new

weights.

Another feature of the advanced tab is the ability to use the results matrix of another

analysis as weights, using the Select Results Matrix button . This option provides

an internal project navigator for selecting the appropriate results matrix to use as a

weight.

The dialog box for the Advanced option is provided below.

SVM Advanced Weights Option

Once the weighting and variables have been selected, click Update to apply them.

Validation

Validation is an important part of any method applied in modeling data. Settings for the

Validation of the SVM are set under the Validation tab as shown below. First select to cross

validate the model by checking the check box. The number of segments to use can be

chosen in the segments entry. Cross validation is helpful in model development but

should not be a replacement for full model validation using a test set.

Support Vector Machine Validation

There are six result matrices generated after creating a SVM model:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

Accuracy

There is only one matrix generated when predicting with a SVM model: Classified range

SVM node

Support vectors

The support vector matrix is comprised of the support vectors which are a subset of the

original samples that are closest to the boundary between classes and define the optimal

separation between classes.

Confusion matrix

The confusion matrix is a matrix used for visualization for classification results from

supervised methods such as support vector machine classification or linear discriminant

analysis classification. It carries information about the predicted and actual classifications

of samples, with each row showing the instances in a predicted class, and each column

representing the instances in an actual class.

In the below confusion matrix, all the “Setosa” samples are nicely attributed to the “Setosa”

group.

Two samples with actual value “Virginica” are predicted as “Versicolor”.

In the same way two samples with actual value “Versicolor” are predicted as “Virginica”.

Confusion matrix

Parameters

The parameters matrix carries information on the following parameters for all the

identified classes:

SVM type

Kernel type - as defined in the options for the SVM learning step

Degree - as defined in the options for the SVM learning step

Gamma - related to the C values set in the options

Coef0 Classes - the number of classes identified by the SVM model SV Count - the number of support vector needed for the classification of the data

Labels - the labels of the corresponding classes, given as numerical values starting

with 0

Numbers - the number of samples classified in a given class

Parameters matrix

Probabilities

The probabilities matrix has three rows, for the Rho, and probabilities A and B for each of

the identified classes.

Probabilities matrix

Prediction

The prediction matrix exhibits the predicted class for each sample in the training set.

Prediction

Accuracy

Accuracy holds the % correctly classified samples from calibration and validation. If cross

validation was not chosen it leaves this field blank. However, cross validation is highly

recommended to avoid overfitting. See the Confusion Matrix regarding details for false

positives and false negatives.

Plot of classification results

This plot shows the various classes as they were classified for a 2D scatter plot of the

original variables. Use the arrows or drop-down list to choose which of the original

variables to show. This is useful to see for which combinations of pairs of variables there

is good separation between the classes. Alternatively perform PCA on the same data and

visualize the the support vectors with the sample grouping option in the score plot and

interpret the loading plot to find the most important variables.The Act and Pre buttons can

be used to toggle if one of them or both should be shown; the predicted are shown with a

smaller markersize. If the predicted class differs from the actual this is shown with a small

symbol with the color for the wrongly assigned class inside the larger marker for the

actual class. In the illustration below two samples (Batch19 and Batch21) are predicted to

belong to class Asia although the actual class is Europe.

Classified range

After an SVM model has been applied to new data to classify them, a new matrix with the

results is added to the project navigator. The Classified_Range matrix contains a category

variable giving the category predicted by the model for each sample.

Classified range

Autopretreatment may be used with SVM. This allows a user to automatically apply the

transforms used with the data in developing the SVM model to data used in the

classification of new samples with this model.

Support Vector Machine Autopretreatment

When all of the parameters have been defined, the SVM is run by clicking OK. A new node,

SVM, is added to the project navigator with a folder for Data, and another for Results.

More details regarding Support Vector Machine classification are given in the section SVM

Classify or in the link given under License.

Tasks – Predict – Classification – SVM…

After an SVM classification model has been developed, it can be used to classify new

samples by going to Tasks-Predict-Classification-SVM…. In the dialog box, one first

chooses which SVM model to apply from the drop-down list. This requires a valid SVM

model in the current project. One then defines which samples to classify by selecting

samples from the appropriate data matrix, along with the X variables that are to be used

for the classification. The X-variables must contain only numerical data and have the same

number of variables as were used to develop the SVM model.

Classify Using SVM Model

The SVM classification results are given in a new matrix in the project navigator named

Classified_Range. The matrix has the predicted class for each sample.

Interpreting SVM Classification results

There are six result matrices generated after creating a SVM model:

Support vectors

Confusion matrix

Parameters

Probabilities

Prediction

Accuracy

There is only one matrix generated when predicting with a SVM model: Classified range

SVM node

Support vectors

The support vector matrix is comprised of the support vectors which are a subset of the

original samples that are closest to the boundary between classes and define the optimal

separation between classes.

Confusion matrix

The confusion matrix is a matrix used for visualization for classification results from

supervised methods such as support vector machine classification or linear discriminant

analysis classification. It carries information about the predicted and actual classifications

of samples, with each row showing the instances in a predicted class, and each column

representing the instances in an actual class.

In the below confusion matrix, all the “Setosa” samples are nicely attributed to the “Setosa”group.

Two samples with actual value “Virginica” are predicted as “Versicolor”.

In the same way two samples with actual value “Versicolor” are predicted as “Virginica”.

Confusion matrix

Parameters

The parameters matrix carries information on the following parameters for all the

identified classes:

SVM type

Kernel type - as defined in the options for the SVM learning step Degree - as defined in the options for the SVM learning step

Gamma - related to the C values set in the options

Coef0 Classes - the number of classes identified by the SVM model

SV Count - the number of support vector needed for the classification of the data

Labels - the labels of the corresponding classes, given as numerical values starting

with 0

Numbers - the number of samples classified in a given class

Parameters matrix

ProbabilitiesThe probabilities matrix has three rows, for the Rho, and probabilities A and B for each of