CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab...
-
Upload
bruce-mosley -
Category
Documents
-
view
212 -
download
0
Transcript of CSSE463: Image Recognition Day 15 Announcements: Announcements: Lab 5 posted, due Weds, Jan 13. Lab...
CSSE463: Image Recognition CSSE463: Image Recognition Day 15Day 15
Announcements:Announcements: Lab 5 posted, due Weds, Jan 13.Lab 5 posted, due Weds, Jan 13. Sunset detector posted, due Weds, Jan 20.Sunset detector posted, due Weds, Jan 20.
Today: Today: Project intro Project intro Wrap up SVM and do demoWrap up SVM and do demo
Friday: Lab 5 (SVM)Friday: Lab 5 (SVM) Next week:Next week:
Monday: Guest lecturer!Monday: Guest lecturer! Tuesday: lightning talksTuesday: lightning talks Thursday: Mid-term examThursday: Mid-term exam Friday: sunset detector labFriday: sunset detector lab
Review: SVMs: “Best” decision boundaryReview: SVMs: “Best” decision boundary
The “best” hyperplane The “best” hyperplane is the one that is the one that maximizes the margin, maximizes the margin, , , between the between the classes. Equivalent to:classes. Equivalent to:
Solve using quadratic Solve using quadratic programmingprogramming
margin
Niforbxwd i
Ti ,....2,11)(
www T
2
1)(min
Non-separable dataNon-separable data Allow data points to Allow data points to
be misclassifedbe misclassifed But assign a cost to But assign a cost to
each misclassified each misclassified point.point.
The cost is bounded The cost is bounded by the parameter C by the parameter C (which you can set)(which you can set)
You can set You can set different bounds for different bounds for each class. Why?each class. Why? Can weigh false Can weigh false
positives and false positives and false negatives differentlynegatives differently
Can we do better?Can we do better?
Cover’s Theorem from information theory says Cover’s Theorem from information theory says that we can map nonseparable data in the input that we can map nonseparable data in the input space to a feature space where the data is space to a feature space where the data is separable, with high probability, if:separable, with high probability, if: The mapping is nonlinearThe mapping is nonlinear The feature space has a higher dimensionThe feature space has a higher dimension
The mapping is called a The mapping is called a kernel functionkernel function.. Replace every instance of xReplace every instance of xiixxjj in derivation with K(x in derivation with K(xiixxjj); ); Lots of math would follow here to show it worksLots of math would follow here to show it works
Most common kernel functions Most common kernel functions
PolynomialPolynomial Gaussian Radial-basis Gaussian Radial-basis
function (RBF)function (RBF) Two-layer perceptronTwo-layer perceptron
You choose p, You choose p, , or , or ii
My experience with real My experience with real data: data: use Gaussian RBF!use Gaussian RBF!Easy Difficulty of problem Hard
p=1, p=2, higher p RBF
Q5
10
2
2
tanh),(
2
1exp),(
)1(),(
iT
i
ii
pi
Ti
xxxxK
xxxxK
xxxxK
DemoDemo Software courtesy of Software courtesy of
http://ida.first.fraunhofer.de/~anton/software.html (GNU public license)(GNU public license)
Preview of Lab 5 (posted):Preview of Lab 5 (posted): Download the Matlab functions that train and apply Download the Matlab functions that train and apply
the SVM.the SVM. The demo script contains examples of how to call the The demo script contains examples of how to call the
systemsystem Write a similar script to classify data in another toy Write a similar script to classify data in another toy
problemproblem
Directly applicable to sunset detectorDirectly applicable to sunset detectorQ1-2
Kernel functionsKernel functions
Note that a hyperplane (which by definition Note that a hyperplane (which by definition is linear) in the feature space = a nonlinear is linear) in the feature space = a nonlinear boundary in the input spaceboundary in the input spaceRecall the RBFsRecall the RBFs
Note how choice of Note how choice of affects the classifier affects the classifier
Comparison with neural netsComparison with neural nets ExpensiveExpensive
Training can take a Training can take a long long time with large data sets. time with large data sets. Consider that you’ll want to experiment with Consider that you’ll want to experiment with parameters…parameters…
But the classification runtime and space are But the classification runtime and space are O(sd)O(sd), , where where ss is the number of support vectors, and d is the is the number of support vectors, and d is the dimensionality of the feature vectors. dimensionality of the feature vectors.
In the worst case, In the worst case, ss = size of whole training set (like = size of whole training set (like nearest neighbor)nearest neighbor)
But no worse than implementing a neural net with But no worse than implementing a neural net with ss perceptrons in the hidden layer.perceptrons in the hidden layer.
Empirically shown to have good generalizability Empirically shown to have good generalizability even with relatively-small training sets and no even with relatively-small training sets and no domain knowledge.domain knowledge.
Q3
Speaking of neural nets:Speaking of neural nets:
Back to a demo of Back to a demo of matlabNeuralNetDemo.mmatlabNeuralNetDemo.m
Project discussion?Project discussion?