Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern...
Transcript of Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern...
![Page 1: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/1.jpg)
Machine Learning Algorithms for ClassificationMachine Learning Algorithms for ClassificationMachine Learning Algorithms for ClassificationMachine Learning Algorithms for ClassificationMachine Learning Algorithms for Classification
Rob Schapire
Princeton University
![Page 2: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/2.jpg)
Machine LearningMachine LearningMachine LearningMachine LearningMachine Learning
• studies how to automatically learn to make accuratepredictions based on past observations
• classification problems:• classify examples into given set of categories
newexample
machine learningalgorithm
classificationpredicted
ruleclassification
examplestraininglabeled
![Page 3: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/3.jpg)
Examples of Classification ProblemsExamples of Classification ProblemsExamples of Classification ProblemsExamples of Classification ProblemsExamples of Classification Problems
• text categorization (e.g., spam filtering)
• fraud detection
• optical character recognition
• machine vision (e.g., face detection)
• natural-language processing(e.g., spoken language understanding)
• market segmentation(e.g.: predict if customer will respond to promotion)
• bioinformatics(e.g., classify proteins according to their function)...
![Page 4: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/4.jpg)
Characteristics of Modern Machine LearningCharacteristics of Modern Machine LearningCharacteristics of Modern Machine LearningCharacteristics of Modern Machine LearningCharacteristics of Modern Machine Learning
• primary goal: highly accurate predictions on test data• goal is not to uncover underlying “truth”
• methods should be general purpose, fully automatic and“off-the-shelf”
• however, in practice, incorporation of prior, humanknowledge is crucial
• rich interplay between theory and practice
• emphasis on methods that can handle large datasets
![Page 5: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/5.jpg)
Why Use Machine Learning?Why Use Machine Learning?Why Use Machine Learning?Why Use Machine Learning?Why Use Machine Learning?
• advantages:
• often much more accurate than human-crafted rules(since data driven)
• humans often incapable of expressing what they know(e.g., rules of English, or how to recognize letters),but can easily classify examples
• don’t need a human expert or programmer• automatic method to search for hypotheses explaining
data• cheap and flexible — can apply to any learning task
• disadvantages
• need a lot of labeled data• error prone — usually impossible to get perfect accuracy
![Page 6: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/6.jpg)
This TalkThis TalkThis TalkThis TalkThis Talk
• machine learning algorithms:• decision trees
• conditions for successful learning
• boosting• support-vector machines
• others not covered:• neural networks• nearest neighbor algorithms• Naive Bayes• bagging• random forests...
• practicalities of using machine learning algorithms
![Page 7: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/7.jpg)
Decision TreesDecision TreesDecision TreesDecision TreesDecision Trees
![Page 8: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/8.jpg)
Example: Good versus EvilExample: Good versus EvilExample: Good versus EvilExample: Good versus EvilExample: Good versus Evil
• problem: identify people as good or bad from their appearance
sex mask cape tie ears smokes class
training databatman male yes yes no yes no Goodrobin male yes yes no no no Goodalfred male no no yes no no Goodpenguin male no no yes no yes Badcatwoman female yes no no yes no Badjoker male no no no no no Bad
test databatgirl female yes yes no yes no ??riddler male yes no no no no ??
![Page 9: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/9.jpg)
A Decision Tree ClassifierA Decision Tree ClassifierA Decision Tree ClassifierA Decision Tree ClassifierA Decision Tree Classifier
tie
good
yesno
yesno
badbad
cape smokesyesno
good
![Page 10: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/10.jpg)
How to Build Decision TreesHow to Build Decision TreesHow to Build Decision TreesHow to Build Decision TreesHow to Build Decision Trees
• choose rule to split on
• divide data using splitting rule into disjoint subsets
tie
no yes
alfred
catwomanjoker
penguin
robinbatman
batman alfredrobin
jokercatwoman
penguin
![Page 11: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/11.jpg)
How to Build Decision TreesHow to Build Decision TreesHow to Build Decision TreesHow to Build Decision TreesHow to Build Decision Trees
• choose rule to split on
• divide data using splitting rule into disjoint subsets
• repeat recursively for each subset
• stop when leaves are (almost) “pure”
tie
no yes
alfred
catwomanjoker
penguin
robinbatman
batman alfredrobin
jokercatwoman
penguin
⇒
tie
no yes
![Page 12: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/12.jpg)
How to Choose the Splitting RuleHow to Choose the Splitting RuleHow to Choose the Splitting RuleHow to Choose the Splitting RuleHow to Choose the Splitting Rule
• key problem: choosing best rule to split on:
tie
no yes
alfred
catwomanjoker
penguin
robinbatman
batman alfredrobin
jokercatwoman
penguin
no yes
alfred
catwomanjoker
penguin
robinbatman
jokercatwoman
cape
batmanrobin
alfredpenguin
![Page 13: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/13.jpg)
How to Choose the Splitting RuleHow to Choose the Splitting RuleHow to Choose the Splitting RuleHow to Choose the Splitting RuleHow to Choose the Splitting Rule
• key problem: choosing best rule to split on:
tie
no yes
alfred
catwomanjoker
penguin
robinbatman
batman alfredrobin
jokercatwoman
penguin
no yes
alfred
catwomanjoker
penguin
robinbatman
jokercatwoman
cape
batmanrobin
alfredpenguin
• idea: choose rule that leads to greatest increase in “purity”
![Page 14: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/14.jpg)
How to Measure PurityHow to Measure PurityHow to Measure PurityHow to Measure PurityHow to Measure Purity
• want (im)purity function to look like this:(p = fraction of positive examples)
impu
rity
01/2
1
p
• commonly used impurity measures:• entropy: −p ln p − (1 − p) ln(1 − p)• Gini index: p(1 − p)
![Page 15: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/15.jpg)
Kinds of Error RatesKinds of Error RatesKinds of Error RatesKinds of Error RatesKinds of Error Rates
• training error = fraction of training examples misclassified
• test error = fraction of test examples misclassified
• generalization error = probability of misclassifying newrandom example
![Page 16: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/16.jpg)
A Possible ClassifierA Possible ClassifierA Possible ClassifierA Possible ClassifierA Possible Classifier
cape
ears
tie cape
ears
good
good bad
badgood
bad
bad good bad good
sex
smokes
smokes
no yes
no yes
no yes
no
no no
no
no
yes
yes yes yes
yesfemalemale
mask
• perfectly classifies training data
• BUT: intuitively, overly complex
![Page 17: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/17.jpg)
Another Possible ClassifierAnother Possible ClassifierAnother Possible ClassifierAnother Possible ClassifierAnother Possible Classifier
bad good
no yes
mask
• overly simple
• doesn’t even fit available data
![Page 18: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/18.jpg)
Tree Size versus AccuracyTree Size versus AccuracyTree Size versus AccuracyTree Size versus AccuracyTree Size versus Accuracy
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0 10 20 30 40 50 60 70 80 90 100
Accuracy
On training dataOn test data
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �
� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �� � �
� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �
� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �� � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
40
20
30er
ror
(%)
50
tree size50
test
train
0 10010
• trees must be big enough to fit training data(so that “true” patterns are fully captured)
• BUT: trees that are too big may overfit(capture noise or spurious patterns in the data)
• significant problem: can’t tell best tree size from training error
![Page 19: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/19.jpg)
Overfitting ExampleOverfitting ExampleOverfitting ExampleOverfitting ExampleOverfitting Example
• fitting points with a polynomial
underfit ideal fit overfit(degree = 1) (degree = 3) (degree = 20)
![Page 20: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/20.jpg)
Building an Accurate ClassifierBuilding an Accurate ClassifierBuilding an Accurate ClassifierBuilding an Accurate ClassifierBuilding an Accurate Classifier
• for good test peformance, need:
• enough training examples• good performance on training set• classifier that is not too “complex” (“Occam’s razor”)
• classifiers should be “as simple as possible, but no simpler”
• “simplicity” closely related to prior expectations
![Page 21: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/21.jpg)
Building an Accurate ClassifierBuilding an Accurate ClassifierBuilding an Accurate ClassifierBuilding an Accurate ClassifierBuilding an Accurate Classifier
• for good test peformance, need:
• enough training examples• good performance on training set• classifier that is not too “complex” (“Occam’s razor”)
• classifiers should be “as simple as possible, but no simpler”
• “simplicity” closely related to prior expectations
• measure “complexity” by:• number bits needed to write down• number of parameters• VC-dimension
![Page 22: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/22.jpg)
ExampleExampleExampleExampleExample
Training data:
![Page 23: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/23.jpg)
Good and Bad ClassifiersGood and Bad ClassifiersGood and Bad ClassifiersGood and Bad ClassifiersGood and Bad Classifiers
Good:
� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �
� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �� � � � � � � � �
sufficient datalow training errorsimple classifier
Bad:
� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �
� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �
� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �� � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � �
insufficient data training error classifiertoo high too complex
![Page 24: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/24.jpg)
TheoryTheoryTheoryTheoryTheory
• can prove:
(generalization error) ≤ (training error) + O
(
√
d
m
)
with high probability• d = VC-dimension• m = number training examples
![Page 25: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/25.jpg)
Controlling Tree SizeControlling Tree SizeControlling Tree SizeControlling Tree SizeControlling Tree Size
• typical approach: build very large tree that fully fits trainingdata, then prune back
• pruning strategies:• grow on just part of training data, then find pruning with
minimum error on held out part• find pruning that minimizes
(training error) + constant · (tree size)
![Page 26: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/26.jpg)
Decision TreesDecision TreesDecision TreesDecision TreesDecision Trees
• best known:• C4.5 (Quinlan)• CART (Breiman, Friedman, Olshen & Stone)
• very fast to train and evaluate
• relatively easy to interpret
• but: accuracy often not state-of-the-art
![Page 27: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/27.jpg)
BoostingBoostingBoostingBoostingBoosting
![Page 28: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/28.jpg)
Example: Spam FilteringExample: Spam FilteringExample: Spam FilteringExample: Spam FilteringExample: Spam Filtering
• problem: filter out spam (junk email)
• gather large collection of examples of spam and non-spam:
From: [email protected] Rob, can you review a paper... non-spamFrom: [email protected] Earn money without working!!!! ... spam...
......
• goal: have computer learn from examples to distinguish spamfrom non-spam
![Page 29: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/29.jpg)
Example: Spam FilteringExample: Spam FilteringExample: Spam FilteringExample: Spam FilteringExample: Spam Filtering
• problem: filter out spam (junk email)
• gather large collection of examples of spam and non-spam:
From: [email protected] Rob, can you review a paper... non-spamFrom: [email protected] Earn money without working!!!! ... spam...
......
• goal: have computer learn from examples to distinguish spamfrom non-spam
• main observation:
• easy to find “rules of thumb” that are “often” correct
• If ‘v1agr@’ occurs in message, then predict ‘spam’
• hard to find single rule that is very highly accurate
![Page 30: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/30.jpg)
The Boosting ApproachThe Boosting ApproachThe Boosting ApproachThe Boosting ApproachThe Boosting Approach
• devise computer program for deriving rough rules of thumb
• apply procedure to subset of emails
• obtain rule of thumb
• apply to 2nd subset of emails
• obtain 2nd rule of thumb
• repeat T times
![Page 31: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/31.jpg)
DetailsDetailsDetailsDetailsDetails
• how to choose examples on each round?• concentrate on “hardest” examples
(those most often misclassified by previous rules ofthumb)
• how to combine rules of thumb into single prediction rule?• take (weighted) majority vote of rules of thumb
![Page 32: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/32.jpg)
BoostingBoostingBoostingBoostingBoosting
• boosting = general method of converting rough rules ofthumb into highly accurate prediction rule
• technically:• assume given “weak” learning algorithm that can
consistently find classifiers (“rules of thumb”) at leastslightly better than random, say, accuracy ≥ 55%
• given sufficient data, a boosting algorithm can provablyconstruct single classifier with very high accuracy, say,99%
![Page 33: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/33.jpg)
AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost
• given training examples (xi , yi ) where yi ∈ {−1,+1}
![Page 34: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/34.jpg)
AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost
• given training examples (xi , yi ) where yi ∈ {−1,+1}
• for t = 1, . . . ,T :• train weak classifier (“rule of thumb”) ht on Dt
![Page 35: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/35.jpg)
AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost
• given training examples (xi , yi ) where yi ∈ {−1,+1}• initialize D1 = uniform distribution on training examples
• for t = 1, . . . ,T :• train weak classifier (“rule of thumb”) ht on Dt
![Page 36: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/36.jpg)
AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost
• given training examples (xi , yi ) where yi ∈ {−1,+1}• initialize D1 = uniform distribution on training examples
• for t = 1, . . . ,T :• train weak classifier (“rule of thumb”) ht on Dt
• choose αt > 0• compute new distribution Dt+1:
• for each example i :
multiply Dt(xi ) by
{
e−αt (< 1) if yi = ht(xi)eαt (> 1) if yi 6= ht(xi)
• renormalize
![Page 37: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/37.jpg)
AdaBoostAdaBoostAdaBoostAdaBoostAdaBoost
• given training examples (xi , yi ) where yi ∈ {−1,+1}• initialize D1 = uniform distribution on training examples
• for t = 1, . . . ,T :• train weak classifier (“rule of thumb”) ht on Dt
• choose αt > 0• compute new distribution Dt+1:
• for each example i :
multiply Dt(xi ) by
{
e−αt (< 1) if yi = ht(xi)eαt (> 1) if yi 6= ht(xi)
• renormalize
• output final classifier Hfinal(x) = sign
(
∑
t
αtht(x)
)
![Page 38: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/38.jpg)
Toy ExampleToy ExampleToy ExampleToy ExampleToy Example
D1
weak classifiers = vertical or horizontal half-planes
![Page 39: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/39.jpg)
Round 1Round 1Round 1Round 1Round 1
� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �
� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �
� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �
� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �
h1
α
ε1
1
=0.30
=0.42
2D
![Page 40: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/40.jpg)
Round 2Round 2Round 2Round 2Round 2
� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �
� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �� � � � � � � � � � � �
� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �
� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �
� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �
� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �� � � � � � � � � � � � �
! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !! ! ! !
α
ε2
2
=0.21
=0.65
h2 3D
![Page 41: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/41.jpg)
Round 3Round 3Round 3Round 3Round 3
" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "" " " "
# # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # ## # # #
$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $$ $ $ $ $ $ $ $ $ $ $ $ $
% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %% % % % % % % % % % % % %
& & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & && & & &
' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '' ' ' '
( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (( ( ( ( ( ( ( ( ( ( ( (
) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )) ) ) ) ) ) ) ) ) ) ) )
* * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * ** * * * * * * * * * * * * * * *
+ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + ++ + + + + + + + + + + + + + + +, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,, , , , , , , , , , , , , , , ,
- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - -
h3
α
ε3
3=0.92
=0.14
![Page 42: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/42.jpg)
Final ClassifierFinal ClassifierFinal ClassifierFinal ClassifierFinal Classifier
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
/ / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / // / / / / / / /
0 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 00 0 0 0
1 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 11 1 1 1
2 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 22 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 33 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 44 4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 55 5 5 5 5 5 5 5 5 5 5
6 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66 6
7 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 77 7
8 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 88 8 8 8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 99 9 9 9 9 9 9 9 9 9 9 9
: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :: : : : : : : : :
; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;; ; ; ; ; ; ; ; ;
< << << << << << << << << << << << << << << << << << << <
= == == == == == == == == == == == == == == == == == == =
> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >
? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? ? ? ? ? ? ?
Hfinal
+ 0.92+ 0.650.42sign=
=
![Page 43: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/43.jpg)
Theory: Training ErrorTheory: Training ErrorTheory: Training ErrorTheory: Training ErrorTheory: Training Error
• weak learning assumption: each weak classifier at least slightlybetter than random
• i.e., (error of ht on Dt) ≤ 1/2 − γ for some γ > 0
• given this assumption, can prove:
training error(Hfinal) ≤ e−2γ2T
![Page 44: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/44.jpg)
How Will Test Error Behave? (A First Guess)How Will Test Error Behave? (A First Guess)How Will Test Error Behave? (A First Guess)How Will Test Error Behave? (A First Guess)How Will Test Error Behave? (A First Guess)
20 40 60 80 100
0.2
0.4
0.6
0.8
1
# of rounds (
erro
r
T)
train
test
expect:
• training error to continue to drop (or reach zero)
• test error to increase when Hfinal becomes “too complex”• “Occam’s razor”• overfitting
• hard to know when to stop training
![Page 45: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/45.jpg)
Actual Typical RunActual Typical RunActual Typical RunActual Typical RunActual Typical Run
10 100 10000
5
10
15
20
# of rounds (T
C4.5 test error
)
train
test
erro
r
(boosting C4.5 on“letter” dataset)
• test error does not increase, even after 1000 rounds• (total size > 2,000,000 nodes)
• test error continues to drop even after training error is zero!
# rounds5 100 1000
train error 0.0 0.0 0.0
test error 8.4 3.3 3.1
• Occam’s razor wrongly predicts “simpler” rule is better
![Page 46: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/46.jpg)
The Margins ExplanationThe Margins ExplanationThe Margins ExplanationThe Margins ExplanationThe Margins Explanation
• key idea:• training error only measures whether classifications are
right or wrong• should also consider confidence of classifications
![Page 47: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/47.jpg)
The Margins ExplanationThe Margins ExplanationThe Margins ExplanationThe Margins ExplanationThe Margins Explanation
• key idea:• training error only measures whether classifications are
right or wrong• should also consider confidence of classifications
• recall: Hfinal is weighted majority vote of weak classifiers
![Page 48: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/48.jpg)
The Margins ExplanationThe Margins ExplanationThe Margins ExplanationThe Margins ExplanationThe Margins Explanation
• key idea:• training error only measures whether classifications are
right or wrong• should also consider confidence of classifications
• recall: Hfinal is weighted majority vote of weak classifiers
• measure confidence by margin = strength of the vote
• empirical evidence and mathematical proof that:• large margins ⇒ better generalization error
(regardless of number of rounds)• boosting tends to increase margins of training examples
(given weak learning assumption)
![Page 49: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/49.jpg)
Application: Human-computer Spoken DialogueApplication: Human-computer Spoken DialogueApplication: Human-computer Spoken DialogueApplication: Human-computer Spoken DialogueApplication: Human-computer Spoken Dialogue[with Rahim, Di Fabbrizio, Dutton, Gupta, Hollister & Riccardi]
• application: automatic “store front” or “help desk” for AT&TLabs’ Natural Voices business
• caller can request demo, pricing information, technicalsupport, sales agent, etc.
• interactive dialogue
![Page 50: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/50.jpg)
How It WorksHow It WorksHow It WorksHow It WorksHow It Works
speech
computerutterance
understanding
natural language
text response
text
raw
recognizer
speech
automatictext−to−speech
category
predicted
Human
manager
dialogue
• NLU’s job: classify caller utterances into 24 categories(demo, sales rep, pricing info, yes, no, etc.)
• weak classifiers: test for presence of word or phrase
![Page 51: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/51.jpg)
Application: Detecting FacesApplication: Detecting FacesApplication: Detecting FacesApplication: Detecting FacesApplication: Detecting Faces[Viola & Jones]
• problem: find faces in photograph or movie
• weak classifiers: detect light/dark rectangles in image
• many clever tricks to make extremely fast and accurate
![Page 52: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/52.jpg)
BoostingBoostingBoostingBoostingBoosting
• fast (but not quite as fast as other methods)
• simple and easy to program
• flexible: can combine with any learning algorithm, e.g.• C4.5• very simple rules of thumb
• provable guarantees
• state-of-the-art accuracy
• tends not to overfit (but occasionally does)
• many applications
![Page 53: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/53.jpg)
Support-Vector MachinesSupport-Vector MachinesSupport-Vector MachinesSupport-Vector MachinesSupport-Vector Machines
![Page 54: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/54.jpg)
Geometry of SVM’sGeometry of SVM’sGeometry of SVM’sGeometry of SVM’sGeometry of SVM’s
• given linearly separable data
![Page 55: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/55.jpg)
Geometry of SVM’sGeometry of SVM’sGeometry of SVM’sGeometry of SVM’sGeometry of SVM’s
δδ
• given linearly separable data
• margin = distance to separating hyperplane
• choose hyperplane that maximizes minimum margin
• intuitively:• want to separate +’s from −’s as much as possible• margin = measure of confidence
![Page 56: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/56.jpg)
Theoretical JustificationTheoretical JustificationTheoretical JustificationTheoretical JustificationTheoretical Justification
• let δ = minimum marginR = radius of enclosing sphere
• then
VC-dim ≤(
R
δ
)2
• so larger margins ⇒ lower “complexity”• independent of number of dimensions
• in contrast, unconstrained hyperplanes in Rn have
VC-dim = (# parameters) = n + 1
![Page 57: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/57.jpg)
Finding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin Hyperplane
• examples xi , yi where xi ∈ Rn, yi ∈ {−1,+1}
• find hyperplane v · x = 0 with ‖v‖= 1
![Page 58: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/58.jpg)
Finding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin Hyperplane
• examples xi , yi where xi ∈ Rn, yi ∈ {−1,+1}
• find hyperplane v · x = 0 with ‖v‖= 1
• margin = y(v · x)
• maximize: δsubject to: yi(v · xi ) ≥ δ and ‖v‖= 1
![Page 59: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/59.jpg)
Finding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin Hyperplane
• examples xi , yi where xi ∈ Rn, yi ∈ {−1,+1}
• find hyperplane v · x = 0 with ‖v‖= 1
• margin = y(v · x)
• maximize: δsubject to: yi(v · xi ) ≥ δ and ‖v‖= 1
• set w = v/δ ⇒ ‖w‖= 1/δ
![Page 60: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/60.jpg)
Finding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin HyperplaneFinding the Maximum Margin Hyperplane
• examples xi , yi where xi ∈ Rn, yi ∈ {−1,+1}
• find hyperplane v · x = 0 with ‖v‖= 1
• margin = y(v · x)
• maximize: δsubject to: yi(v · xi ) ≥ δ and ‖v‖= 1
• set w = v/δ ⇒ ‖w‖= 1/δ
• minimize: 12 ‖w‖2
subject to: yi(w · xi ) ≥ 1
![Page 61: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/61.jpg)
Convex DualConvex DualConvex DualConvex DualConvex Dual
• form Lagrangian, set ∂/∂w = 0
• w =∑
i
αiyixi
• get quadratic program:
• maximize∑
i
αi − 12
∑
i ,j
αiαjyiyjxi · xj
subject to: αi ≥ 0
• αi = Lagrange multiplier> 0 ⇒ support vector
• key points:• optimal w is linear combination of support vectors• dependence on xi ’s only through inner products• maximization problem is convex with no local maxima
![Page 62: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/62.jpg)
What If Not Linearly Separable?What If Not Linearly Separable?What If Not Linearly Separable?What If Not Linearly Separable?What If Not Linearly Separable?
• answer #1: penalize each point by distance from margin 1,i.e., minimize:
12 ‖w‖2 +constant ·
∑
i
max{0, 1 − yi(w · xi )}
• answer #2: map into higher dimensional space in which databecomes linearly separable
![Page 63: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/63.jpg)
ExampleExampleExampleExampleExample
• not linearly separable
![Page 64: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/64.jpg)
ExampleExampleExampleExampleExample
• not linearly separable
• map x = (x1, x2) 7→ Φ(x) = (1, x1, x2, x1x2, x21 , x2
2 )
![Page 65: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/65.jpg)
ExampleExampleExampleExampleExample
• not linearly separable
• map x = (x1, x2) 7→ Φ(x) = (1, x1, x2, x1x2, x21 , x2
2 )
• hyperplane in mapped space has form
a + bx1 + cx2 + dx1x2 + ex21 + fx2
2 = 0
= conic in original space
• linearly separable in mapped space
![Page 66: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/66.jpg)
Why Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is Dumb
• can carry idea further• e.g., add all terms up to degree d
• then n dimensions mapped to O(nd) dimensions• huge blow-up in dimensionality
![Page 67: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/67.jpg)
Why Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is DumbWhy Mapping to High Dimensions Is Dumb
• can carry idea further• e.g., add all terms up to degree d
• then n dimensions mapped to O(nd) dimensions• huge blow-up in dimensionality
• statistical problem: amount of data needed often proportionalto number of dimensions (“curse of dimensionality”)
• computational problem: very expensive in time and memoryto work in high dimensions
![Page 68: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/68.jpg)
How SVM’s Avoid Both ProblemsHow SVM’s Avoid Both ProblemsHow SVM’s Avoid Both ProblemsHow SVM’s Avoid Both ProblemsHow SVM’s Avoid Both Problems
• statistically, may not hurt since VC-dimension independent ofnumber of dimensions ((R/δ)2)
• computationally, only need to be able to compute innerproducts
Φ(x) · Φ(z)
• sometimes can do very efficiently using kernels
![Page 69: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/69.jpg)
Example (continued)Example (continued)Example (continued)Example (continued)Example (continued)
• modify Φ slightly:
x = (x1, x2) 7→ Φ(x) = (1, x1, x2, x1x2, x21 , x2
2 )
![Page 70: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/70.jpg)
Example (continued)Example (continued)Example (continued)Example (continued)Example (continued)
• modify Φ slightly:
x = (x1, x2) 7→ Φ(x) = (1,√
2x1,√
2x2,√
2x1x2, x21 , x2
2 )
![Page 71: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/71.jpg)
Example (continued)Example (continued)Example (continued)Example (continued)Example (continued)
• modify Φ slightly:
x = (x1, x2) 7→ Φ(x) = (1,√
2x1,√
2x2,√
2x1x2, x21 , x2
2 )
• then
Φ(x) · Φ(z) = 1 + 2x1z1 + 2x2z2 + 2x1x2z1z2 + x21 z2
1 + x22 z2
2
= (1 + x1z1 + x2z2)2
= (1 + x · z)2
• simply use in place of usual inner product
![Page 72: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/72.jpg)
Example (continued)Example (continued)Example (continued)Example (continued)Example (continued)
• modify Φ slightly:
x = (x1, x2) 7→ Φ(x) = (1,√
2x1,√
2x2,√
2x1x2, x21 , x2
2 )
• then
Φ(x) · Φ(z) = 1 + 2x1z1 + 2x2z2 + 2x1x2z1z2 + x21 z2
1 + x22 z2
2
= (1 + x1z1 + x2z2)2
= (1 + x · z)2
• simply use in place of usual inner product
• in general, for polynomial of degree d , use (1 + x · z)d• very efficient, even though finding hyperplane in O(nd)
dimensions
![Page 73: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/73.jpg)
KernelsKernelsKernelsKernelsKernels
• kernel = function K for computing
K (x, z) = Φ(x) · Φ(z)
• permits efficient computation of SVM’s in very highdimensions
• K can be any symmetric, positive semi-definite function(Mercer’s theorem)
• some kernels:• polynomials• Gaussian exp
(
− ‖x − z‖2 /2σ)
• defined over structures (trees, strings, sequences, etc.)
• evaluation:
w · Φ(x) =∑
αiyiΦ(xi ) · Φ(x) =∑
αiyiK (xi , x)
• time depends on # support vectors
![Page 74: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/74.jpg)
SVM’s versus BoostingSVM’s versus BoostingSVM’s versus BoostingSVM’s versus BoostingSVM’s versus Boosting
• both are large-margin classifiers(although with slightly different definitions of margin)
• both work in very high dimensional spaces(in boosting, dimensions correspond to weak classifiers)
• but different tricks are used:• SVM’s use kernel trick• boosting relies on weak learner to select one dimension
(i.e., weak classifier) to add to combined classifier
![Page 75: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/75.jpg)
Application: Text CategorizationApplication: Text CategorizationApplication: Text CategorizationApplication: Text CategorizationApplication: Text Categorization[Joachims]
• goal: classify text documents• e.g.: spam filtering• e.g.: categorize news articles by topic
• need to represent text documents as vectors in Rn:
• one dimension for each word in vocabulary• value = # times word occurred in particular document• (many variations)
• kernels don’t help much
• performance state of the art
![Page 76: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/76.jpg)
Application: Recognizing Handwritten CharactersApplication: Recognizing Handwritten CharactersApplication: Recognizing Handwritten CharactersApplication: Recognizing Handwritten CharactersApplication: Recognizing Handwritten Characters[Cortes & Vapnik]
• examples are 16 × 16 pixel images, viewed as vectors in R256
7 7 4 8 0 1 4
• kernels help: degree error dimensions
1 12.0 2562 4.7 ≈ 330003 4.4 ≈ 106
4 4.3 ≈ 109
5 4.3 ≈ 1012
6 4.2 ≈ 1014
7 4.3 ≈ 1016
human 2.5• to choose best degree:
• train SVM for each degree• choose one with minimum VC-dimension ≈ (R/δ)2
![Page 77: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/77.jpg)
SVM’sSVM’sSVM’sSVM’sSVM’s
• fast algorithms now available, but not so simple to program(but good packages available)
• state-of-the-art accuracy
• power and flexibility from kernels
• theoretical justification
• many applications
![Page 78: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/78.jpg)
Other Machine Learning Problem AreasOther Machine Learning Problem AreasOther Machine Learning Problem AreasOther Machine Learning Problem AreasOther Machine Learning Problem Areas
• supervised learning• classification• regression – predict real-valued labels• rare class / cost-sensitive learning
• unsupervised – no labels• clustering• density estimation
• semi-supervised• in practice, unlabeled examples much cheaper than
labeled examples• how to take advantage of both labeled and unlabeled
examples• active learning – how to carefully select which unlabeled
examples to have labeled
• on-line learning – getting one example at a time
![Page 79: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/79.jpg)
PracticalitiesPracticalitiesPracticalitiesPracticalitiesPracticalities
![Page 80: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/80.jpg)
Getting DataGetting DataGetting DataGetting DataGetting Data
• more is more
• want training data to be like test data
• use your knowledge of problem to know where to get trainingdata, and what to expect test data to be like
![Page 81: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/81.jpg)
Choosing FeaturesChoosing FeaturesChoosing FeaturesChoosing FeaturesChoosing Features
• use your knowledge to know what features would be helpfulfor learning
• redundancy in features is okay, and often helpful• most modern algorithms do not require independent
features
• too many features?• could use feature selection methods• usually preferable to use algorithm designed to handle
large feature sets
![Page 82: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/82.jpg)
Choosing an AlgorithmChoosing an AlgorithmChoosing an AlgorithmChoosing an AlgorithmChoosing an Algorithm
• first step: identify appropriate learning paradigm• classification? regression?• labeled, unlabeled or a mix?• class proportions heavily skewed?• goal to predict probabilities? rank instances?• is interpretability of the results important?
(keep in mind, no guarantees)
![Page 83: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/83.jpg)
Choosing an AlgorithmChoosing an AlgorithmChoosing an AlgorithmChoosing an AlgorithmChoosing an Algorithm
• first step: identify appropriate learning paradigm• classification? regression?• labeled, unlabeled or a mix?• class proportions heavily skewed?• goal to predict probabilities? rank instances?• is interpretability of the results important?
(keep in mind, no guarantees)
• in general, no learning algorithm dominates all others on allproblems
• SVM’s and boosting decision trees (as well as other treeensemble methods) seem to be best off-the-shelfalgorithms
• even so, for some problems, difference in performanceamong these can be large, and sometimes, much simplermethods do better
![Page 84: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/84.jpg)
Choosing an Algorithm (cont.)Choosing an Algorithm (cont.)Choosing an Algorithm (cont.)Choosing an Algorithm (cont.)Choosing an Algorithm (cont.)
• sometimes, one particular algorithm seems to naturally fitproblem, but often, best approach is to try many algorithms
• use knowledge of problem and algorithms to guidedecisions
• e.g., in choice of weak learner, kernel, etc.
• usually, don’t know what will work until you try• be sure to try simple stuff!• some packages (e.g. weka) make easy to try many
algorithms, though implementations are not alwaysoptimal
![Page 85: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/85.jpg)
Testing PerformanceTesting PerformanceTesting PerformanceTesting PerformanceTesting Performance
• does it work? which algorithm is best?
• train on part of available data, and test on rest
• if dataset large (say, in 1000’s), can simply set aside≈ 1000 random examples as test
• otherwise, use 10-fold cross validation
• break dataset randomly into 10 parts• in turn, use each block as a test set, training on
other 9 blocks
![Page 86: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/86.jpg)
Testing PerformanceTesting PerformanceTesting PerformanceTesting PerformanceTesting Performance
• does it work? which algorithm is best?
• train on part of available data, and test on rest
• if dataset large (say, in 1000’s), can simply set aside≈ 1000 random examples as test
• otherwise, use 10-fold cross validation
• break dataset randomly into 10 parts• in turn, use each block as a test set, training on
other 9 blocks
• repeat many times
• use same train/test splits for each algorithm
• might be natural split (e.g., train on data from 2004-06, teston data from 2007)
• however, can confound results — bad performancebecause of algorithm, or change of distribution?
![Page 87: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/87.jpg)
Selecting ParametersSelecting ParametersSelecting ParametersSelecting ParametersSelecting Parameters
• sometimes, theory can guide setting of parameters, possiblybased on statistics measurable on training set
• other times, need to use trial and test, as before
• danger: trying too many combinations can lead to overfittingtest set
• break data into train, validation and test sets
• set parameters using validation set• measure performance on test set for selected
parameter settings
• or do cross-validation within cross-validation
• trying many parameter settings is also very computationallyexpensive
![Page 88: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/88.jpg)
Running ExperimentsRunning ExperimentsRunning ExperimentsRunning ExperimentsRunning Experiments
• automate everything!• write one script that does everything at the push of a
single button• fewer errors• easy to re-run (for instance, if computer crashes in
middle of experiment)• have explicit, scientific record in script of exact
experiments that were executed
![Page 89: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/89.jpg)
Running ExperimentsRunning ExperimentsRunning ExperimentsRunning ExperimentsRunning Experiments
• automate everything!• write one script that does everything at the push of a
single button• fewer errors• easy to re-run (for instance, if computer crashes in
middle of experiment)• have explicit, scientific record in script of exact
experiments that were executed
• if running many experiments:• put result of each experiment in a separate file• use script to scan for next experiment to run based on
which files have or have not already been created• makes very easy to re-start if computer crashes• easy to run many experiments in parallel if have multiple
processors/computers• also need script to automatically gather and compile
results
![Page 90: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/90.jpg)
If Writing Your Own CodeIf Writing Your Own CodeIf Writing Your Own CodeIf Writing Your Own CodeIf Writing Your Own Code
• R and matlab are great for easy coding, but for speed, mayneed C or java
• debugging machine learning algorithms is very tricky!• hard to tell if working, since don’t know what to expect• run on small cases where can figure out answer by hand• test each module/subroutine separately• compare to other implementations
(written by others, or written in different language)• compare to theory or published results
![Page 91: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/91.jpg)
SummarySummarySummarySummarySummary
• central issues in machine learning:• avoidance of overfitting• balance between simplicity and fit to data
• machine learning algorithms:• decision trees• boosting• SVM’s• many not covered
• looked at practicalities of using machine learning methods(will see more in lab)
![Page 92: Rob Schapire Princeton Universityschapire/talks/picasso-minicourse.pdfCharacteristics of Modern Machine Learning • primary goal: highly accurate predictions on test data • goal](https://reader033.fdocuments.in/reader033/viewer/2022041511/5e28387ae999873f8a1fced3/html5/thumbnails/92.jpg)
Further reading on machine learning in general:
Ethem Alpaydin. Introduction to machine learning. MIT Press, 2004.
Christopher M. Bishop. Pattern recognition and machine learning. Springer, 2006.
Richard O. Duda, Peter E. Hart and David G. Stork. Pattern Classification (2nd ed.).Wiley, 2000.
Trevor Hastie, Robert Tibshirani and Jerome Friedman. The Elements of StatisticalLearning : Data Mining, Inference, and Prediction. Springer, 2001.
Tom M. Mitchell. Machine Learning. McGraw Hill, 1997.
Vladimir N. Vapnik. Statistical Learning Theory. Wiley, 1998.
Decision trees:
Leo Breiman, Jerome H. Friedman, Richard A. Olshen and Charles J. Stone. Classificationand Regression Trees. Wadsworth & Brooks, 1984.
J. Ross Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
Boosting:
Ron Meir and Gunnar Ratsch. An Introduction to Boosting and Leveraging. In AdvancedLectures on Machine Learning (LNAI2600), 2003.www-ee.technion.ac.il/∼rmeir/Publications/MeiRae03.pdf
Robert E. Schapire. The boosting approach to machine learning: An overview. In NonlinearEstimation and Classification, Springer, 2003.www.cs.princeton.edu/∼schapire/boost.html
Support-vector machines:
Christopher J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition.Data Mining and Knowledge Discovery, 2(2):121–167, 1998.research.microsoft.com/∼cburges/papers/SVMTutorial.pdf
Nello Cristianni and John Shawe-Taylor. An Introduction to Support Vector Machines andOther Kernel-based Learning Methods. Cambridge University Press, 2000.www.support-vector.net