Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

53
Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006

Transcript of Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Page 1: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Lecture 6: Classification – Boosting and SVMsCAP 5415

Fall 2006

Page 2: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Course Project

Basic Requirement: Implement a vision algorithm

How complex? The experiments/implementation details should be

interesting enough for a 4-5 page write-up. If you choose a relatively simple algorithm, then you

should do interesting experiments to test the algorithm's limits

Page 3: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Groups

I encourage you to work in groups Can do more interesting projects Should be more interesting projects

Come talk to me if you would like to work in a group, but don't know anyone

Group write-up: 6-8 pages Possible goal: CVPR07 Submission (Dec 4)

~20% acceptance rate, don't plan on submitting second-rate work

Page 4: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

How do I pick a project?

Strategy #1: Pick topic that you think is interesting Read three papers on that topic Implement one Or implement your own solution Could be original research

Lots of opportunity in the area of computational photography

Come talk to me!!! I can point you to interesting papers that have come out

recently

Page 5: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Strategy #2

I have a few original research ideas Computational Photography Surveillance Object Segmentation

Come talk to me to see what you're interested in and if you need help finding partners for a group project

No advantage in terms of grading

Page 6: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Q:I work in one of the vision groups, can I just turn in my CVPR07

submission?

A: No

Page 7: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Well, actually

Your project may be related, but should not just be your current research project

Examples Related side project that you haven't had time to

pursue in depth Application of algorithms that you have developed

for one problem to a different problem Should have interesting experiments

Page 8: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Getting it done

Write-ups due Dec 2 Brief Proposal Due Nov 7th

I would prefer Oct 18th or 25th

Whatever you work on, keep me updated!!!! I am here to help!

Page 9: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Grading I will give you feedback on your proposal

The earlier you touch base with me, the better Once we agree, if you do what your proposal stated and

turn in a good-quality write-up, you will get an “A” What if it doesn't work?

It happens a lot! Good write-up explaining what went wrong, what you

think the underlying problems are and how you would fix them if you were to keep working on this project

I'm not talking about “I didn't understand the math” or “My code kept crashing”

Can still get an “A”

Page 10: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

One last thing about projects

I will be scheduling project meetings to meet with each group at the end of November

Class will be canceled on November 21 That class will be your project meeting.

Page 11: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

What's wrong with this decision boundary?

(Assume this is the training data)

Page 12: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

What's wrong with this decision boundary?

What if you then tested on this data?

This decision boundary over-fit the training data Hard to do with a linear classifier, but easy with a

non-linear classifier

Page 13: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

How to tell if your classifier is overfitting

Strategy #1:Hold out part of your data as a test set

What if data is hard to come by? Strategy #2: k-fold cross-validation

Break the data set into k parts For each part, hold a part out, then train the

classifier and use the held out part as a test set Slower than test-set method More efficient use of limited data

Page 14: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Basic Set-up for Boosting

We want to learn a classifier

We will assume that F(x) has the form

Basic Idea: Iteratively Choose weak learners and set the

weights

Page 15: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

AdaBoost

Initialization:

D is a distribution over the training examples Can also be thought of as a weight on each

exampleFrom “A short introduction to boosting” by Freund and Schapire

Page 16: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Next Step: Get Weak Learner

The weak learner trained to do as well as possible on the weighted training set Must have better than 50% accuracy

From “A short introduction to boosting” by Freund and Schapire

Page 17: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Next Reset Weights

From “A short introduction to boosting” by Freund and Schapire

Page 18: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Demo

Page 19: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Demo

In this demo, each weak learner is a stump of the form (ax+by)>c

Page 20: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Demo

Page 21: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
Page 22: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
Page 23: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
Page 24: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
Page 25: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
Page 26: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.
Page 27: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Looking at the algorithm again

From “A short introduction to boosting” by Freund and Schapire

Page 28: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Advantages A simple algorithm for learning robust classifiers

Freund & Shapire, 1995 Friedman, Hastie, Tibshhirani, 1998

Provides efficient algorithm for sparse visual feature selection Tieu & Viola, 2000 Viola & Jones, 2003

Easy to implement, does not require external optimization tools.

(From Tutorial on Object Detection by Torralba, Ferbus, and Li – ICCV 2005)

Page 29: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Where do the weak learners come from?

Any classifier can be a weak learner Common ones:

Stump: r(x) > c Decision tree (Another kind of classifier)

Combined with Adaboost, has been dubbed “Best off-the-shelf classifier” (Friedman, Hastie, and Tibshirani)

Page 30: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Application: Face Detection (Viola and Jones 2001)

Page 31: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Features

Threshold on the response to simple features(Figures copied from Robust Real-time Object Detection by Viola and Jones) (2001)

Page 32: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Why?

Viola and Jones introduce a trick that lets them compute the response to these features very quickly Called integral image

First step: Doing a running, cumulative sum across the image

Page 33: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Integral Image Can compute the response in a square very

easily

Page 34: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

These features also capture important features of faces

Page 35: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

How well does it work?

95% Detection Rate with a false positive rate of 1 in 14084

Page 36: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Is it fast?

In 2001, one 384x288 image every 0.7 seconds Not real-time How can we make it faster?

Page 37: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Use a cascade A classifier with 2 weak-learners detect 100% of

the faces with a 40% false positive rate Have eliminated 60% of the training set with very

little computation Can now train a slightly more complicated

classifier to eliminate even more examples

Page 38: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The implementation

32 layers Layer 1 – Two Weak Learners (Rejects 60% of

non-faces) Layer 2 – Five Weak Learners (Rejects 80% of

non-faces) Layers 3-5 – 20 Weak Learners Layers 6-7 – 50 Weak Learners Layers 8-12 – 100 Weak Learners Layers 13-32 – 200 Weak Learners

Page 39: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Computation On average 8 features out of 4297 possible features are evaluated at every

pixel

On a 700Mhz Pentium III, can process a 384x288 image in 0.067 seconds

Almost as accurate as without a cascade

Page 40: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Support Vector Machine Boosted Classifiers and SVM's are probably the

two most popular classifiers today I won't get into the math behind SVM's, if you

are interested, you should take the pattern recognition course (highly recommended)

Page 41: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Support Vector Machine

Last time, we considered the problem of linear classification

We used probabilities to fit the line

Page 42: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Support Vector Machine

Consider a different criterion Called the margin

Page 43: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Support Vector Machine

Margin – minimum distance from a data point to the decision boundary

Page 44: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Support Vector Machine

The SVM finds the decision boundary that maximizes the margin

Page 45: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Support Vector Machine

Data points along the boundary are known as support vectors

Page 46: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Non-Linear Classification in SVMs

Last time, I showed how you could do non-linear classification by using non-linear transformations of the features

x

y

This is the decision boundary fromx2 + 8xy + y2 > 0

This is the same as making a new set of features, then doing linear classification

Page 47: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Non-Linear Classification in SVMs

The decision function can be expressed in terms of dot-products

Each α will be zero unless the vector is a support vector

Page 48: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Non-Linear Classification in SVMs

What if we wanted to do non-linear classification?

We could transform the features and compute the dot product of the transformed features.

But there may be an easier way!

Page 49: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Kernel Trick

Let Φ(x) be a function that transforms x into a different space

A kernel function K is a function such that

Page 50: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Example (Burges 98)

If

Then

This is called the polynomial kernel

Page 51: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

Gaussian RBF Kernel

One of the most commonly used kernels

Equivalent to doing a dot-product in an infinite dimensional space

Page 52: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

The Kernel Trick

So, with a kernel function K, the new classification rule is

Basic Ideas: Computing the kernel function should be easier

than computing a dot-product in the transformed space

Other algorithms, like logistic regression can also be “kernelized”

Page 53: Lecture 6: Classification – Boosting and SVMs CAP 5415 Fall 2006.

So what if I want to use an SVM?

There are well-developed packages with Python and MATLAB interfaces libSVM SVMLight SVMTorch