1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich...
-
Upload
mildred-neal -
Category
Documents
-
view
217 -
download
0
Transcript of 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich...
![Page 1: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/1.jpg)
1
CSCI 3202: Introduction to AI
Decision Trees
Greg Grudic
(Notes borrowed from Thomas G. Dietterich and Tom Mitchell)
Intro AI Decision Trees
![Page 2: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/2.jpg)
2
Outline
• Decision Tree Representations– ID3 and C4.5 learning algorithms (Quinlan
1986)– CART learning algorithm (Breiman et al. 1985)
• Entropy, Information Gain
• Overfitting
Intro AI Decision Trees
![Page 3: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/3.jpg)
3
Training Data Example: Goal is to Predict When This Player Will Play Tennis?
Intro AI Decision Trees
![Page 4: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/4.jpg)
4Intro AI Decision Trees
![Page 5: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/5.jpg)
5Intro AI Decision Trees
![Page 6: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/6.jpg)
6Intro AI Decision Trees
![Page 7: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/7.jpg)
7Intro AI Decision Trees
![Page 8: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/8.jpg)
8
( ) ( ){ }1 1, ,..., ,N NS y y= x x
Learning Algorithm for Decision Trees
( ){ }
1,...,
, 0,1
d
j
x x
x y
=
Î
x
What happens if features are not binary? What about regression?Intro AI Decision Trees
![Page 9: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/9.jpg)
9
Choosing the Best Attribute
Intro AI Decision Trees
- Many different frameworks for choosing BEST have been proposed! - We will look at Entropy Gain.
Number +and – examplesbefore and after
a split.
A1 and A2 are “attributes” (i.e. features or inputs).
![Page 10: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/10.jpg)
10
Entropy
Intro AI Decision Trees
![Page 11: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/11.jpg)
11Intro AI Decision Trees
Entropy is like a measure of purity…
![Page 12: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/12.jpg)
12
Entropy
Intro AI Decision Trees
![Page 13: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/13.jpg)
13
Information Gain
Intro AI Decision Trees
![Page 14: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/14.jpg)
14
Training Example
Intro AI Decision Trees
![Page 15: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/15.jpg)
15
Selecting the Next Attribute
Intro AI Decision Trees
![Page 16: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/16.jpg)
16Intro AI Decision Trees
![Page 17: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/17.jpg)
17
Non-Boolean Features
• Features with multiple discrete values– Multi-way splits– Test for one value versus the rest– Group values into disjoint sets
• Real-valued features– Use thresholds
• Regression– Splits based on mean squared error metric
Intro AI Decision Trees
![Page 18: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/18.jpg)
18
Hypothesis Space Search
Intro AI Decision Trees
You do not get the globally optimal tree!- Search space is exponential.
![Page 19: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/19.jpg)
19
Overfitting
Intro AI Decision Trees
![Page 20: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/20.jpg)
20
Overfitting in Decision Trees
Intro AI Decision Trees
![Page 21: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/21.jpg)
21
Validation Data is Used to Control Overfitting
• Prune tree to reduce error on validation set
Intro AI Decision Trees
![Page 22: 1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.](https://reader035.fdocuments.in/reader035/viewer/2022062520/5697bf9f1a28abf838c94bec/html5/thumbnails/22.jpg)
Quiz 7
1. What is overfitting?2. Are decision trees universal approximators?3. What do decision tree boundaries look like?4. Why are decision trees not globally optimal?5. Do Support Vector Machines give the globally optimal
solution (given the SVN optimization criteria and a specific kernel)?
6. Does the K Nearest Neighbor algorithm give the globally optimal solution if K is allowed to be as large as the number of training examples (N), N fold cross validation is used, and the distance metric is fixed?
Intro AI Decision Trees 22