Computational Learning Theory

Pardis Noorzad

Department of Computer Engineering and ITAmirkabir University of Technology

Ordibehesht 1390

Introduction• For the analysis of data structures and

algorithms and their limits we have:– Computational complexity theory– and analysis of time and space

complexity– e.g. Dijkstra’s algorithm or Bellman-Ford?

• For the analysis of ML algorithms, there are other questions we need to answer:– Computational learning theory– Statistical learning theory

Computational learning theory(Wikipedia)

• Probably approximately correct learning (PAC learning) --Leslie Valiant– inspired boosting

• VC theory --Vladimir Vapnik– led to SVMs

• Bayesian inference --Thomas Bayes• Algorithmic learning theory --E. M.

Gold• Online machine learning --Nick

Littlestone

Today• PAC model of learning– sample complexity, – computational complexity

• VC dimension– hypothesis space complexity

• SRM– model estimation

• examples throughout … yay!

• We will try to cover these topics while avoiding difficult formulizations

PAC learning(L. Valiant, 1984)

PAC learning: measures of error

PAC-learnability: definition

Sample complexity

Sample complexity: example

VC theory (V. Vapnik and A. Chervonenkis, 1960-1990)

• VC dimension• Structural risk minimization

VC dimension(V. Vapnik and A. Chervonenkis, 1968, 1971)

• For our purposes, it is enough to find one set of three points that can be shattered

• It is not necessary to be able to shatter every possible set of three points in 2 dimensions

VC dimension: continued

VC dimension: intuition• High capacity: – not a tree, b/c different from any tree I

have seen (e.g. different number of leaves)

• Low capacity: – if it’s green then it’s a tree

Low VC dimension• VC dimension is pessimistic• If using oriented lines as our hypothesis

class– we can only learn datasets with 3 points!

• This is b/c VC dimension is computed independent of distribution of instances– In reality, near neighbors have same labels– So no need to worry about all possible

labelings

• There are a lot of datasets containing more points that are learnable by a line

Infinite VC dimension• Lookup table has infinite VC dimension• But so does the nearest neighbor classifier– b/c any number of points, labeled arbitrarily

(w/o overlaps), will be successfully learned• But it performs well...

• So infinite capacity does not guarantee poor generalization performance

Examples of calculating VC dimension

Examples of calculating VC dimension: continued

VC dimension of SVMs with polynomial kernels

Sample complexity and the VC dimension

(Blumer et al., 1989)

Structural risk minimization(V. Vapnik and A. Chervonenkis, 1974)

• A function that fits training data and minimizes VC dimension– will generalize well

• SRM provides a quantitative characterization of the tradeoff between – the complexity of the approximating

functions, – and the quality of fitting the training data

• First, we will talk about a certain bound..

A bound

true mean error/actual risk

empirical risk

A bound: continuedVC

confidence

risk bound

SRM: continued• To minimize true error (actual risk),

both empirical risk and VC confidence term should be minimized

• The VC confidence term depends on the chosen class of functions

• Whereas empirical risk and actual risk depend on the one particular function chosen for training

SRM: continued

SRM: continued• For a given data set, optimal model

estimation with SRM consists of the following steps:1. select an element of a structure (model

selection)2. estimate the model from this element

(statistical parameter estimation)

SRM: continued

Support vector machine(Vapnik, 1982)

Support vector machine(Vapnik, 1995)

What we said today• PAC bounds – only apply to finite hypothesis spaces

• VC dimension is a measure of complexity – can be applied for infinite hypothesis

spaces• Training neural networks uses only

ERM, whereas SVMs use SRM• Large margins imply small VC

dimension

References1. Machine Learning by T. M. Mitchell2. A Tutorial on Support Vector Machines for

Pattern Recognition by C. J. C. Burges3. http://www.svms.org/4. Conversations with Dr. S. Shiry and M.

Milanifard

Thanks!Questions?

Computational Learning Theory

Documents

Transcript of Computational Learning Theory

Computational Field Theory

INTRODUCTION MACHINE LEARNING - Stanford AI Labrobotics.stanford.edu/~nilsson/MLBOOK.pdf · 8 Computational Learning Theory 107 ... that the machine has learned. Machine learning

CS 6375: Machine Learning Computational Learning Theoryvgogate/ml/2014f/lectures/colt.pdf · CS 6375: Machine Learning Computational Learning Theory ... the presentation of training

Computational Learning Theory...3 Computational Learning Theory What general laws constrain inductive learning? We seek theory to relate: 1. Probability of successful learning 2. Number

Computational Complexity Theory - CSA

Computational Theory Mind

Computational Learning Theorylwehbe/10701_S19/files/Lecture_13.pdf · Computational Learning Theory • What general laws constrain inductive learning? • Want theory to relate –Number

Efficient Algorithms in Computational Learning Theory A thesis presented by Rocco Anthony

Computational Learning Theory - uni-luebeck.de · C program – simulation of flips of a fair coin: ... Computational Learning Theory Intersection of AI, statistics, and theory of

Computational Learning Theory PAC IID VC Dimension SVM

Computational and Statistical Learning Theoryttic.uchicago.edu/~nati/Teaching/TTIC31120/2016/Lecture1.pdfThis course: Computational and Statistical Learning Theory•Assumes you are

Computational Game Theory

Computational linking theory

Supervised Learning: The Setup Learning Machine Learningzhe/pdf/lec-13-pac-definition-upload.pdf · 1 Computational Learning Theory: Probably Approximately Correct (PAC) Learning

Computational Learning Theory

Computational Learning Theory; The Tradeoff between Computational Complexity and Statistical Soundness Shai Ben-David CS Department, Cornell and Technion,

Predictive Analytics with Machine Learning - Buckeye · PDF fileMachine Learning Recap •Evolved from pattern recognition and computational learning theory •Explores the study and

A Tutorial on Computational Learning Theory Presented at ...web.cs.iastate.edu/~honavar/colt-tutorial.pdf · A Tutorial on Computational Learning Theory Presented at Genetic Programming

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Computational Learning Theory.

1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.