CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab...

CSSE463: Image Recognition CSSE463: Image Recognition Day 15Day 15

Announcements:Announcements: Lab 5 posted, due Weds, Jan 13.Lab 5 posted, due Weds, Jan 13. Sunset detector posted, due Weds, Jan 20.Sunset detector posted, due Weds, Jan 20.

Today: Today: Project intro Project intro Wrap up SVM and do demoWrap up SVM and do demo

Friday: Lab 5 (SVM)Friday: Lab 5 (SVM) Next week:Next week:

Monday: Guest lecturer!Monday: Guest lecturer! Tuesday: lightning talksTuesday: lightning talks Thursday: Mid-term examThursday: Mid-term exam Friday: sunset detector labFriday: sunset detector lab

Review: SVMs: “Best” decision boundaryReview: SVMs: “Best” decision boundary

The “best” hyperplane The “best” hyperplane is the one that is the one that maximizes the margin, maximizes the margin, , , between the between the classes. Equivalent to:classes. Equivalent to:

Solve using quadratic Solve using quadratic programmingprogramming

margin

Niforbxwd i

Ti ,....2,11)(

www T

2

1)(min

Non-separable dataNon-separable data Allow data points to Allow data points to

be misclassifedbe misclassifed But assign a cost to But assign a cost to

each misclassified each misclassified point.point.

The cost is bounded The cost is bounded by the parameter C by the parameter C (which you can set)(which you can set)

You can set You can set different bounds for different bounds for each class. Why?each class. Why? Can weigh false Can weigh false

positives and false positives and false negatives differentlynegatives differently

Can we do better?Can we do better?

Cover’s Theorem from information theory says Cover’s Theorem from information theory says that we can map nonseparable data in the input that we can map nonseparable data in the input space to a feature space where the data is space to a feature space where the data is separable, with high probability, if:separable, with high probability, if: The mapping is nonlinearThe mapping is nonlinear The feature space has a higher dimensionThe feature space has a higher dimension

The mapping is called a The mapping is called a kernel functionkernel function.. Replace every instance of xReplace every instance of xiixxjj in derivation with K(x in derivation with K(xiixxjj); ); Lots of math would follow here to show it worksLots of math would follow here to show it works

Most common kernel functions Most common kernel functions

PolynomialPolynomial Gaussian Radial-basis Gaussian Radial-basis

function (RBF)function (RBF) Two-layer perceptronTwo-layer perceptron

You choose p, You choose p, , or , or ii

My experience with real My experience with real data: data: use Gaussian RBF!use Gaussian RBF!Easy Difficulty of problem Hard

p=1, p=2, higher p RBF

Q5

10

2

2

tanh),(

2

1exp),(

)1(),(

iT

i

ii

pi

Ti

xxxxK

xxxxK

xxxxK

DemoDemo Software courtesy of Software courtesy of

http://ida.first.fraunhofer.de/~anton/software.html (GNU public license)(GNU public license)

Preview of Lab 5 (posted):Preview of Lab 5 (posted): Download the Matlab functions that train and apply Download the Matlab functions that train and apply

the SVM.the SVM. The demo script contains examples of how to call the The demo script contains examples of how to call the

systemsystem Write a similar script to classify data in another toy Write a similar script to classify data in another toy

problemproblem

Directly applicable to sunset detectorDirectly applicable to sunset detectorQ1-2

Kernel functionsKernel functions

Note that a hyperplane (which by definition Note that a hyperplane (which by definition is linear) in the feature space = a nonlinear is linear) in the feature space = a nonlinear boundary in the input spaceboundary in the input spaceRecall the RBFsRecall the RBFs

Note how choice of Note how choice of affects the classifier affects the classifier

Comparison with neural netsComparison with neural nets ExpensiveExpensive

Training can take a Training can take a long long time with large data sets. time with large data sets. Consider that you’ll want to experiment with Consider that you’ll want to experiment with parameters…parameters…

But the classification runtime and space are But the classification runtime and space are O(sd)O(sd), , where where ss is the number of support vectors, and d is the is the number of support vectors, and d is the dimensionality of the feature vectors. dimensionality of the feature vectors.

In the worst case, In the worst case, ss = size of whole training set (like = size of whole training set (like nearest neighbor)nearest neighbor)

But no worse than implementing a neural net with But no worse than implementing a neural net with ss perceptrons in the hidden layer.perceptrons in the hidden layer.

Empirically shown to have good generalizability Empirically shown to have good generalizability even with relatively-small training sets and no even with relatively-small training sets and no domain knowledge.domain knowledge.

Q3

Speaking of neural nets:Speaking of neural nets:

Back to a demo of Back to a demo of matlabNeuralNetDemo.mmatlabNeuralNetDemo.m

Project discussion?Project discussion?

CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab...

Documents

Transcript of CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab...