Post on 11-Feb-2016
description
Computational Learning Theory
Pardis Noorzad
Department of Computer Engineering and ITAmirkabir University of Technology
Ordibehesht 1390
Introduction• For the analysis of data structures and
algorithms and their limits we have:– Computational complexity theory– and analysis of time and space
complexity– e.g. Dijkstra’s algorithm or Bellman-Ford?
• For the analysis of ML algorithms, there are other questions we need to answer:– Computational learning theory– Statistical learning theory
Computational learning theory(Wikipedia)
• Probably approximately correct learning (PAC learning) --Leslie Valiant– inspired boosting
• VC theory --Vladimir Vapnik– led to SVMs
• Bayesian inference --Thomas Bayes• Algorithmic learning theory --E. M.
Gold• Online machine learning --Nick
Littlestone
Today• PAC model of learning– sample complexity, – computational complexity
• VC dimension– hypothesis space complexity
• SRM– model estimation
• examples throughout … yay!
• We will try to cover these topics while avoiding difficult formulizations
PAC learning(L. Valiant, 1984)
PAC learning: measures of error
PAC-learnability: definition
Sample complexity
Sample complexity: example
VC theory (V. Vapnik and A. Chervonenkis, 1960-1990)
• VC dimension• Structural risk minimization
VC dimension(V. Vapnik and A. Chervonenkis, 1968, 1971)
• For our purposes, it is enough to find one set of three points that can be shattered
• It is not necessary to be able to shatter every possible set of three points in 2 dimensions
VC dimension: continued
VC dimension: intuition• High capacity: – not a tree, b/c different from any tree I
have seen (e.g. different number of leaves)
• Low capacity: – if it’s green then it’s a tree
Low VC dimension• VC dimension is pessimistic• If using oriented lines as our hypothesis
class– we can only learn datasets with 3 points!
• This is b/c VC dimension is computed independent of distribution of instances– In reality, near neighbors have same labels– So no need to worry about all possible
labelings
• There are a lot of datasets containing more points that are learnable by a line
Infinite VC dimension• Lookup table has infinite VC dimension• But so does the nearest neighbor classifier– b/c any number of points, labeled arbitrarily
(w/o overlaps), will be successfully learned• But it performs well...
• So infinite capacity does not guarantee poor generalization performance
Examples of calculating VC dimension
Examples of calculating VC dimension: continued
VC dimension of SVMs with polynomial kernels
Sample complexity and the VC dimension
(Blumer et al., 1989)
Structural risk minimization(V. Vapnik and A. Chervonenkis, 1974)
• A function that fits training data and minimizes VC dimension– will generalize well
• SRM provides a quantitative characterization of the tradeoff between – the complexity of the approximating
functions, – and the quality of fitting the training data
• First, we will talk about a certain bound..
A bound
true mean error/actual risk
empirical risk
A bound: continuedVC
confidence
risk bound
SRM: continued• To minimize true error (actual risk),
both empirical risk and VC confidence term should be minimized
• The VC confidence term depends on the chosen class of functions
• Whereas empirical risk and actual risk depend on the one particular function chosen for training
SRM: continued
SRM: continued• For a given data set, optimal model
estimation with SRM consists of the following steps:1. select an element of a structure (model
selection)2. estimate the model from this element
(statistical parameter estimation)
SRM: continued
Support vector machine(Vapnik, 1982)
Support vector machine(Vapnik, 1995)
What we said today• PAC bounds – only apply to finite hypothesis spaces
• VC dimension is a measure of complexity – can be applied for infinite hypothesis
spaces• Training neural networks uses only
ERM, whereas SVMs use SRM• Large margins imply small VC
dimension
References1. Machine Learning by T. M. Mitchell2. A Tutorial on Support Vector Machines for
Pattern Recognition by C. J. C. Burges3. http://www.svms.org/4. Conversations with Dr. S. Shiry and M.
Milanifard
Thanks!Questions?