Machine Learning, Decision Trees OverfittingDecision Trees...
Transcript of Machine Learning, Decision Trees OverfittingDecision Trees...
![Page 1: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/1.jpg)
Machine Learning,Decision Trees OverfittingDecision Trees, Overfitting
Reading: Mitchell, Chapter 3
Machine Learning 10-601
Tom M. MitchellMachine Learning DepartmentMachine Learning Department
Carnegie Mellon University
January 14, 2008
![Page 2: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/2.jpg)
Machine Learning 10-601Instructors• William Cohen
See webpage for • Office hours
• Tom Mitchell
TA’s
• Grading policy• Final exam date• Late homeworkTA s
• Andrew Arnold• Mary McGlohon
• Late homework policy
• Syllabus details•
Course assistant• Sharon Cavlovich
• ...
webpage: www cs cmu edu/~tom/10601webpage: www.cs.cmu.edu/~tom/10601
![Page 3: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/3.jpg)
Machine Learning:Machine Learning:
Study of algorithms thaty g• improve their performance P• at some task T• at some task T• with experience E
well defined learning task: <P T E>well-defined learning task: <P,T,E>
![Page 4: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/4.jpg)
Learning to Predict Emergency C-SectionsLearning to Predict Emergency C-Sections[Sims et al., 2000]
9714 patient records, each with 215 features
![Page 5: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/5.jpg)
Learning to detect objects in images
(Prof. H. Schneiderman)
Example training images for each orientation
![Page 6: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/6.jpg)
Learning to classify text documents
Company home page
vsvs
Personal home page
vs
University home page
vs
…
![Page 7: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/7.jpg)
Reading a noun (vs verb)
[Rustandi et al., 2005]
![Page 8: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/8.jpg)
Machine Learning - Practice
Speech Recognition
Object recognitionMining DatabasesMining Databases
Control learning
• Supervised learning
• Bayesian networksControl learning
• Hidden Markov models
• Unsupervised clustering
Text analysis
• Reinforcement learning
• ....
![Page 9: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/9.jpg)
Machine Learning - Theory
PAC Learning Theory
Other theories for
• Reinforcement skill learning
# examples (m)
• Semi-supervised learning
• Active student querying
•
(supervised concept learning)
p ( )
representational complexity (H)
error rate ( )
• …
error rate (ε)failure probability (δ)
… also relating:
• # of mistakes during learning
• learner’s query strategyprobability (δ) • learner s query strategy
• convergence rate
• asymptotic performance
• bias, variance
![Page 10: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/10.jpg)
Growth of Machine Learning• Machine learning already the preferred approach to
Speech recognition Natural language processing– Speech recognition, Natural language processing– Computer vision– Medical outcomes analysis– Robot control– …
All software apps
ML apps.
• This ML niche is growing– Improved machine learning algorithms
All software apps.
Improved machine learning algorithms – Increased data capture, networking– Software too complex to write by hand– New sensors / IO devices– Demand for self-customization to user, environment
![Page 11: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/11.jpg)
Function Approximation and Decision tree learning
![Page 12: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/12.jpg)
Function approximationSetting:• Set of possible instances XSet of possible instances X• Unknown target function f: X Y• Set of function hypotheses H={ h | h: X Y }yp { | }
Given:• Training examples {<xi,yi>} of unknown target
function f
Determine:H th i h H th t b t i t f• Hypothesis h∈ H that best approximates f
![Page 13: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/13.jpg)
How would you yrepresent
AB ∨ CD(¬E)?
Each internal node: test one attribute Xi
Each branch from a node: selects one value for Xi
Each leaf node: predict Y (or P(Y|X ∈ leaf))
![Page 14: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/14.jpg)
![Page 15: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/15.jpg)
node = Root
[ID3, C4.5, …]
node = Root
![Page 16: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/16.jpg)
EntropyEntropy H(X) of a random variable XEntropy H(X) of a random variable X
H(X) is the expected number of bits needed to encode a d l d l f ( d t ffi i t d )randomly drawn value of X (under most efficient code)
Why? Information theory:Why? Information theory:• Most efficient code assigns -log2P(X=i) bits to encode
the message X=i• So, expected number of bits to code one random X is:
# of possible l f Xvalues for X
![Page 17: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/17.jpg)
EntropyEntropy H(X) of a random variable XEntropy H(X) of a random variable X
Specific conditional entropy H(X|Y=v) of X given Y=v :
Conditional entropy H(X|Y) of X given Y :
Mututal information (aka information gain) of X and Y :
![Page 18: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/18.jpg)
Sample Entropy
![Page 19: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/19.jpg)
Subset of Sfor which A=v
Gain(S,A) = mutual information between A and target class variable over sample S
![Page 20: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/20.jpg)
![Page 21: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/21.jpg)
![Page 22: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/22.jpg)
![Page 23: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/23.jpg)
Decision Tree Learning Applet
http // cs alberta ca/%7Eai plore/l• http://www.cs.ualberta.ca/%7Eaixplore/learning/DecisionTrees/Applet/DecisionTreeApplet htmlreeApplet.html
![Page 24: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/24.jpg)
Which Tree Should We Output?• ID3 performs heuristic
search through spacesearch through space of decision trees
• It stops at smallest pacceptable tree. Why?
Occam’s razor: prefer the l h h h simplest hypothesis that
fits the data
![Page 25: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/25.jpg)
Why Prefer Short Hypotheses? (Occam’s Razor)
Argument in favor:• Fewer short hypotheses than long ones
a short hypothesis that fits the data is less likely to be a statistical coincidencehighly probable that a sufficiently complex hypothesishighly probable that a sufficiently complex hypothesis will fit the data
Argument opposed:• Also fewer hypotheses with prime number of nodes
and attributes beginning with “Z”• What’s so special about “short” hypotheses?
![Page 26: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/26.jpg)
![Page 27: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/27.jpg)
![Page 28: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/28.jpg)
![Page 29: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/29.jpg)
![Page 30: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/30.jpg)
Split data into training and validation set
Create tree that classifies training set correctly
![Page 31: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/31.jpg)
![Page 32: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/32.jpg)
![Page 33: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/33.jpg)
![Page 34: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/34.jpg)
![Page 35: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/35.jpg)
![Page 36: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/36.jpg)
![Page 37: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/37.jpg)
What you should know:• Well posed function approximation problems:
– Instance space, X– Sample of labeled training data { <xi, yi>}Sample of labeled training data { xi, yi }– Hypothesis space, H = { f: X Y }
• Learning is a search/optimization problem over H• Learning is a search/optimization problem over H– Various objective functions
• minimize training error (0-1 loss) • among hypotheses that minimize training error select shortest• among hypotheses that minimize training error, select shortest
• Decision tree learningG d t d l i f d i i t (ID3 C4 5 )– Greedy top-down learning of decision trees (ID3, C4.5, ...)
– Overfitting and tree/rule post-pruning– Extensions…
![Page 38: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/38.jpg)
Questions to think about (1)
• Why use Information Gain to select attributes in decision trees? What otherattributes in decision trees? What other criteria seem reasonable, and what are the tradeoffs in making this choice?the tradeoffs in making this choice?
![Page 39: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/39.jpg)
Questions to think about (2)
• ID3 and C4.5 are heuristic algorithms that search through the space ofthat search through the space of decision trees. Why not just do an exhaustive search?exhaustive search?
![Page 40: Machine Learning, Decision Trees OverfittingDecision Trees ...tom/10601_sp08/slides/DTreesAndOverfitting-1-1… · 14/01/2008 · Machine Learning, Decision Trees OverfittingDecision](https://reader034.fdocuments.in/reader034/viewer/2022052004/601738e193cf682734576e33/html5/thumbnails/40.jpg)
Questions to think about (3)
• Consider target function f: <x1,x2> y, where x1 and x2 are real-valued y iswhere x1 and x2 are real-valued, y is boolean. What is the set of decision surfaces describable with decision treessurfaces describable with decision trees that use each attribute at most once?