A introduction to svm.pdf

download A introduction to svm.pdf

of 48

Transcript of A introduction to svm.pdf

  • 8/18/2019 A introduction to svm.pdf

    1/48

    Practical Guide to Support

    Vector MachinesTingfan Wu

    MPLAB, UCSD

  • 8/18/2019 A introduction to svm.pdf

    2/48

    Outline

    • 

    Data Classification

    • 

    High-level Concepts of SVM

    • 

    Interpretation of SVM Model/Result•  Use Case Study

    2

  • 8/18/2019 A introduction to svm.pdf

    3/48

    What does it mean to learn?

    • 

     Acquire new skills?

    •  Make predictions about the world?

    3

  • 8/18/2019 A introduction to svm.pdf

    4/48

    Making predictions is

    fundamental to survival

    Will that bear eat me?

    Is there water in that canyon?

    These are all examples of classification problems

    Is that person a good mate?

    4

  • 8/18/2019 A introduction to svm.pdf

    5/48

    Boot Camp Related

    face recognition / speaker identification

    Motion classification

    Brain Computer Interface / Spikes Classification

    5

  • 8/18/2019 A introduction to svm.pdf

    6/48

    Driver Fatigue Detection from Facial Expression

  • 8/18/2019 A introduction to svm.pdf

    7/48

    Data Classification

    •  Given training data (class labels known)Predicts test data (class labels unknown)

    • 

    Not just fitting! generalization

    SensorClassifier

    SVM Adaboost

    Neural Network

    PredictionDataPreprocessingfeatures

    7

  • 8/18/2019 A introduction to svm.pdf

    8/48

    Generalization

    8

    Many possible classification modelsWhich one generalize better ?

  • 8/18/2019 A introduction to svm.pdf

    9/48

    Generalization

    9

  • 8/18/2019 A introduction to svm.pdf

    10/48

    Why SVM ? (my opinion)

    • 

    With careful data preprocessing, and properlyuse of SVM or NN! similar performance.

    • 

    SVM is easier to use properly.

    • 

    SVM provides a reasonable good baselineperformance.

    10

  • 8/18/2019 A introduction to svm.pdf

    11/48

    Outline

    • 

    Data Classification

    • 

    High-level Concepts of SVM

    • 

    Interpretation of SVM Model/Result•  Use case study

    11

  • 8/18/2019 A introduction to svm.pdf

    12/48

     A Simple Dilemma

    Who do I invite to mybirthday party?

    12

  • 8/18/2019 A introduction to svm.pdf

    13/48

    Problem Formulation

    • 

    training data as vectors: xi•  binary labels [ +1, -1]

    Name Gift? Income Fondness

    John Yes 3k 3/5

    Mary No 5k 1/5

    class feature vector

    y1=+1 x1 = [3000, 0.6]

    y2= -1 x2 = [5000, 0.2]13

  • 8/18/2019 A introduction to svm.pdf

    14/48

    Vector space

    Gift 

      x   2

       (   D   i  s  p  o  s  a   b   l  e   I  n  c  o  m  e   )

    x1(Fondness) 

    ++

    +

    ++

    ++

    ++ +

    +-

    --

    --

    --

    --No Gift

    14

  • 8/18/2019 A introduction to svm.pdf

    15/48

     A Line

    w T x + b = 0

      x   2   (  s  e  c  o  n   d   f  e  a  t  u  r

      e   )

    x1(first feature) 

    Nor mal: w

    ++

    +

    ++

    ++

    ++

    ++-

    --

    --

    - ---

    The line :

    “Hyperplane” in high dimensional space 15

  • 8/18/2019 A introduction to svm.pdf

    16/48

    The inequalities and regions

    ++

    +

    ++

    ++

    ++ +

    +--

    ---

    - ---

    wT

    x + b = 0

    D eci si on f unct i on f ( x )=

    si gn( w T xn ew

    + b)

    w T x i + b < 0

    w T x i + b > 0

    model

    16

  • 8/18/2019 A introduction to svm.pdf

    17/48

    Large Margin

    17

  • 8/18/2019 A introduction to svm.pdf

    18/48

    Maximal Margin

    18

  • 8/18/2019 A introduction to svm.pdf

    19/48

    Data not linearly separable

    Case 1 Case 2

    19

  • 8/18/2019 A introduction to svm.pdf

    20/48

    Trick 1: Soft-Margin

    Penalty ofviolating data

    These points are usually outliers. The hyperplane should not bias too much.

    20

  • 8/18/2019 A introduction to svm.pdf

    21/48

    Soft-margin

    21

    [Ben-Hur & Weston 2005]

  • 8/18/2019 A introduction to svm.pdf

    22/48

    Support vectors

    More important data that support (define) the hyperplane 22

  • 8/18/2019 A introduction to svm.pdf

    23/48

    Trick2: Map to Higher Dimensionx2 = x21

    x1

    x2

    x21

    x2 x2 = x21

    M appi ng: Á

    23

  • 8/18/2019 A introduction to svm.pdf

    24/48

    Mapping to Infinite Dimension• Is it possible to create a universal mapping ?•

     

    What if we can map to infinite dimension ? Every problem is separable!

    • 

    Consider “Radial Basis Function (RBF)”:

    •  =Kernel(x,y)

    w : infinite number of variables!

    24

  • 8/18/2019 A introduction to svm.pdf

    25/48

    Dual Problem

    • 

    Primal • 

    Dual

    finite calculation

    25

  • 8/18/2019 A introduction to svm.pdf

    26/48

    Gaussian/RBF Kernel

    ~ linear kernel Overfitting

    nearest neighbor?

    26

  • 8/18/2019 A introduction to svm.pdf

    27/48

    27

    [Ben-Hur & Weston 2005]

  • 8/18/2019 A introduction to svm.pdf

    28/48

    Recap

    Soft-nessNonlinearity

    28

  • 8/18/2019 A introduction to svm.pdf

    29/48

    Checkout the SVMToy

    • 

    http://www.csie.ntu.edu.tw/~cjlin/libsvm/ 

    • 

    -c (cost control softness of the margin/#SV)•  -g (gamma controls the curvature of the

    hyperplane)

    29

  • 8/18/2019 A introduction to svm.pdf

    30/48

    Cross Validation

    Training

    Split: training Split: test

    Testing

    • 

    What is the best (C, !) ?! Date dependent• Need to be determined by “testing performance”

    • Split training data into pseudo “training, testing” sets

    • Exhausted grid search for best (C, !)

    determine the best (C, !)

    30

  • 8/18/2019 A introduction to svm.pdf

    31/48

    Outline

    • 

    Machine Learning! Classification

    • 

    High-level Concepts of SVM

    • 

    Interpretation of SVM Model/Result•  Use Case Study

    31

  • 8/18/2019 A introduction to svm.pdf

    32/48

    (1)Decision value as strength

    32

    D eci si on f unct i on f ( x ) = si gn( w T xn ew + b)

    +

    -

  • 8/18/2019 A introduction to svm.pdf

    33/48

    Facial Movement Classification

    • 

    Classes: brow up(+) or down(-)

    • 

    Features: pixels of Gabor filtered image

    33

  • 8/18/2019 A introduction to svm.pdf

    34/48

    Decision value as strength

    Probability estimates from decision values also available34

  • 8/18/2019 A introduction to svm.pdf

    35/48

    (2)Weight as feature importance

    • 

    Magnitude of weight : feature importance

    • 

    Similar to regression

    35

  • 8/18/2019 A introduction to svm.pdf

    36/48

  • 8/18/2019 A introduction to svm.pdf

    37/48

    (3)Weights as profiles

    Fluorescent image of cells ofvarious dosage of certain drug

    Various image-based features

    Clustering the weights shows theprimal and secondary effect of thedrug

    37

  • 8/18/2019 A introduction to svm.pdf

    38/48

    Outline

    • 

    Machine Learning! Classification

    • 

    High-level Concepts of SVM

    • 

    Interpretation of SVM Model/Result•  User Case Study

    38

  • 8/18/2019 A introduction to svm.pdf

    39/48

    The Software

    • 

    SVM requires an constraint quadraticoptimization solver!not easy to implement.

    •  Off-the-shelf Software !   "#$%&' by Chih-Jen Lin et. al.

     –  svmlight  by Thorsten Joachims

    • 

    Incorporated into many ML software –

     

    matlab / pyML / R…

    39

  • 8/18/2019 A introduction to svm.pdf

    40/48

    Beginners may…

    1. 

    Convert their data into the format of aSVM software.

    2. 

    May not conduct scaling3.  Randomly try few parameters andwithout cross validation

    4.  Good result on training data, but poor intesting.

    40

  • 8/18/2019 A introduction to svm.pdf

    41/48

    Data scaling

    Without scaling

     – feature of large dynamic range may dominateseparating hyperplane.

    label X Height Gender

    y1=0 x1 150 2

    y2=1 x2 180 1

    y3=1 x3 185 1Height

     G en d  er 

    41

  • 8/18/2019 A introduction to svm.pdf

    42/48

    Parameter SelectionContour of cross validation accuracy.

    Good area

    42

  • 8/18/2019 A introduction to svm.pdf

    43/48

    User case : Astroparticle scientist•

     

    User:I am using libsvm in a astroparticle physicsapplication .. First, let me congratulate you toa really easy to use and nice package.

    Unfortunately, it gives me astonishingly bad test results...•

     

    OK. Please send us your dataWe are able to get 97% test accuracy. Is that

    good enough for you ?•  User:You earned a copy of my PhD thesis

    43

  • 8/18/2019 A introduction to svm.pdf

    44/48

    Dynamic Range Mismatch

    • 

     A problem from astroparticle physics : : …1 1:2.6173e+01 2:5.88670e+01 3:-1.89469e-01 4:1.25122e+021 1:5.7073e+01 2:2.21404e+02 3:8.60795e-02 4:1.22911e+02

    1 1:1.7259e+01 2:1.73436e+02 3:-1.29805e-01 4:1.25031e+021 1:2.1779e+01 2:1.24953e+02 3:1.53885e-01 4:1.52715e+021 1:9.1339e+01 2:2.93569e+02 3:1.42391e-01 4:1.60540e+021 1:5.5375e+01 2:1.79222e+02 3:1.65495e-01 4:1.11227e+021 1:2.9562e+01 2:1.91357e+02 3:9.90143e-02 4:1.03407e+02

    • 

    #Training set 3,089 and #testing set 4,000•  Large dynamic range of some features.

    44

  • 8/18/2019 A introduction to svm.pdf

    45/48

    Overfitting•

     

    Training$./svm-train train.1 (default parameter used)optimization finished, #iter = 6131nSV = 3053, nBSV = 724Total nSV = 3053

    • 

    Training Accuracy$./svm-predict train.1 train.1.model o Accuracy = 99.7734% (3082/3089)

    •  Testing Accuracy$./svm-predict test.1 train.1.model test.1.out

     Accuracy = 66.925% (2677/4000)

    nSV and nBSV: number of SVs and bounded SVs (i = C).Without scaling. One feature may dominant the value overfitting

    • 3053/3089 training data become support vector !Overfitting• Training accuracy high, but low testing accuracy! Overfitting

    45

  • 8/18/2019 A introduction to svm.pdf

    46/48

    Suggested Procedure

    •  Data pre-scaling –  scale range [0 1] or unit variance

    •  Using (default) Gaussian(RBF) kernel

    • 

    Use cross-validation to find the best parameter (C, )•  Train your model with best parameter

    •  Test!

     All above done automatically in “easy.py” script provided with libsvm.

    46

  • 8/18/2019 A introduction to svm.pdf

    47/48

    Large Scale SVM

    • 

    (#training data >> #feature ) and linear kernel

     –  Use primal solvers (eg. liblinear)

    •  To approximated result in short time –

     

     Allow inaccurate stopping conditionsvm-train –e 0.01

     –  Use stochastic gradient descent solvers

     –  24 

    47

  • 8/18/2019 A introduction to svm.pdf

    48/48

    Resources

    •  LIBSVM: http://www.csie.ntu.edu.tw/~cjlin/libsvm

    •  LIBSVM Tools: http://www.csie.ntu.edu.tw/~cjlin/libsvmtools

    •  Kernel Machines Forum: http://www.kernel-machines.org

    •  Hsu, Chang, and Lin: A Practical Guide to Suppor t Vector

    Classification•  my email: [email protected] 

    •   Acknowledgement

     – 

    Many slides from Dr. Chih-Jen Lin , NTU

    48