lecture25.pdf
-
Upload
sakthikothandapani -
Category
Documents
-
view
5 -
download
1
Transcript of lecture25.pdf
![Page 1: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/1.jpg)
MODULE 11
Decision Trees
LESSON 24
Learning Decision Trees
Keywords: Learning, Training Data, Axis-Parallel Decision Tree
1
![Page 2: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/2.jpg)
Money Has-exams weather Goes-to-movie25 no fine no200 no hot yes100 no rainy no125 yes rainy no30 yes rainy no300 yes fine yes55 yes hot no140 no hot no20 yes fine no175 yes fine yes110 no fine yes90 yes fine no
Table 1: Example training data set for Induction of Decision Tree
Decision Tree Construction from Training Data - an Example
Let us take a training set and induce a decision tree using the training set.
Table 1 gives a training data set with four patterns having the classGoes-to-movie=yes and eight patterns having the class Goes-to-movie=no.The impurity of this set is
Im(n) = −4
12log2
4
12−
8
12log2
8
12
= 0.9183
We need to now consider all the three attributes for the first split andchose the one with the most information gain.
Money
Let us divide the feature values of money into three parts < 50, between50-150 and > 150 .
2
![Page 3: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/3.jpg)
1. Money < 50, has 3 patterns belonging to goes-to-movie=no and 0 pat-terns belonging to goes-to-movie=yes. The entropy for money < 50 is
Im(Money < 50) = 0
2. Money 50-150 has 5 patterns belonging to goes-to-movie=no and 1 pat-tern belonging to goes-to-movie=yes . Entropy for money 50-150 is
Im(Money50− 150) = −1
6log2
1
6−
5
6log2
5
6
= 0.65
3. Money > 150 has 3 patterns belonging to goes-to-movie=yes and 0 pat-terns belonging to goes-to-movie=no. The entropy for money > 150 is
Im(Money>150) = 0
4. Gain(Money)
Gain(money) = 0.9183− 3
12∗ 0− 6
12∗ 0.65− 3
12∗ 0 = 0.5933
Has-exams
1. (Has-exams=yes)
Has a total of 7 patterns with 2 patterns belonging to goes-to-movie=yesand five patterns belonging to goes-to-movies=no. The entropy for has-exams=yes is
Im(has− exams = yes) = −2
7log2
2
7−
5
7log2
5
7
= 0.6717
3
![Page 4: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/4.jpg)
2. (Has exams=no)
Has a total of 5 patterns with 2 patterns belonging to goes-to-movie=yesand 3 patterns belonging to goes-to-movies=no. The entropy for has-exams=no is
Im(has− exams = no) = −2
5log2
2
5−
3
5log2
3
5
= 0.9710
3. Gain for has-exams
Gain(has− exams) = 0.9183− 7
12∗ 0.6717− 5
12∗ 9710
= 0.1219
Weather
1. (Weather=hot)
Has a total of 3 patterns with 1 pattern belonging to goes-to-movie=yesand 2 patterns belonging to goes-to-movie=no. The entropy for weather=hotis
Im(weather = hot) = −1
3log2
1
3−
2
3log2
2
3
= 0.9183
2. (Weather=fine)
Has a total of 6 patterns with 3 patterns belonging to goes-to-movie=yesand 3 patterns belonging to goes-to-movie=no. The entropy for weather=fineis
4
![Page 5: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/5.jpg)
Im(weather = fine) = −3
6log2
3
6−
3
6log2
3
6
= 1.0
3. (Weather=rainy)
Has a total of 3 patterns with 0 patterns belonging to goes-to-movie=yesand 3 patterns belonging to goes-to-movie=no. The entropy for weather=rainyis
Im(weather = rainy) = −0
3log2
0
3−
3
3log2
3
3
= 0
4. Gain for weather
Gain(weather) = 0.9183− 3
12∗ 0.9183− 6
12∗ 1− 3
12∗ 0
= 0.1887
All the three attributes have been investigated and here are the gain val-ues :
Gain(money) = 0.5933Gain(has-exams) = 0.1219Gain(weather) = 0.1887
Since Gain(money) has the maximimum value, money is taken as the firstattribute.
When we take money as the first decision node, the training data getssplit into three portions for money < 50, money = 50-150, and money > 150.There are three patterns along the outcomemoney < 50, 6 patterns along theoutcome money = 50-150 and 3 patterns along the outcome money > 150.We will consider each of these three branches and think of the next decisionnode as a new decision tree.
5
![Page 6: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/6.jpg)
Money < 50
Out of the 3 patterns along this branch, all the patterns belong to goes-to-movie=no, so this is a leaf node and need not be investigated further.
Money > 150
Out of the 3 patterns along this branch, all the patterns belong to goes-to-movie=yes, so this is a leaf node and need not be investigated further.
Money = 50− 150
This has a total of 6 patterns with 1 pattern belonging to goes-to-movie=yesand 5 patterns belonging to goes-to-movie=no. So the information in thisbranch is
Im(n) = −1
6log2
1
6−
5
6log2
5
6
= 0.65
Now we need to check the attributes has-exams and weather to see whichis the next attribute to be chosen.
Has-exams
1. (has-exams=yes)
There are a total of 3 patterns with has-exams=yes out of the 6 pat-terns along this branch. Out of these 3 patterns, 3 patterns belong togoes-to-movie=no and 0 patterns belong to goes-to-movie=yes. So theentropy of has-exams=yes is
Im(has− exams = yes) = −0
3log2
0
3−
3
3log2
3
3
= 0
2. (has-exams=no)
6
![Page 7: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/7.jpg)
There are a total of 3 patterns with has-exams=no out of the 6 patternsalong this branch. Out of these 3 patterns, 2 patterns have goes-to-movie=no and 1 pattern has goes-to-movie=yes. The entropy of has-exams=no is
Im(has− exams = no) = −1
3log2
1
3−
2
3log2
2
3
= 0.9183
Gain(has− exams) = 0.65− 3
6∗ 0− 3
6∗ 0.9183
= 0.1909
Weather
1. (weather=hot)
There are two patterns out of six which belong to weather=hot andboth of them belong to goes-to-movie=no. The entropy for weather=hotis
Im(weather=hot) = 0
2. (weather=fine)
There are two patterns out of six which belong to weather=fine and 1pattern belongs to goes-to-movie=yes and 1 pattern belongs to goes-to-movie=no. So, the entropy of weather=fine is
Im(weather = fine) = −1
2log2
1
2−
1
2log2
1
2
= 0.5 + 0.5 = 1.0
3. (weather=rainy)
7
![Page 8: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/8.jpg)
There are two patterns out of six which belong to weather=rainy andboth of them belong to goes-to-movie=no. The entropy for weather=rainyis
Im(weather = rainy) = 0
4. Gain for weather is
Gain(weather) = 0.65− 2
6∗ 1.0 = 0.3167
The values for gain for has-exams and weather are
Gain(has-exams) = 0.1909Gain(weather) = 0.3167
Since weather has the higher gain value, it is the attribute to be chosen.The remaining attribute is then chosen. For weather=hot and weather=rainy,all the patterns belong to goes-to-movie=no and therefore it is the leaf node.Only the node weather=fine needs to be further expanded. The entire deci-sion tree is given in the Figure 1.
The following points may be noted after going through the example :
• The decision tree can be used effectively to chose among several coursesof action.
• In what way the decision tree comes up with a decision can be easilyexplained. Each path in the decision tree corresponds to simple rules.
• At each node, the attribute chosen to make the split is the one withthe highest drop in impurity or highest increase in gain.
8
![Page 9: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/9.jpg)
<50
50−150>150
goes toa movie
= true
goes toa movie= false
has money
weather?
fine
goes toa movie= false
goes toa movie= false
hot rainy
has exams
goes toa movie= false
y n
goes toa movie
= true
Figure 1: The decision tree induced from Table 1
9
![Page 10: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/10.jpg)
Assignment
1. There are four coins 1, 2, 3, 4 out of which three coins are of equalweight and one coin is heavier. Use a decision tree to identify theheavier coin.
2. Consider the three-class problem characterized by the training datagiven in the following table. Obtain the axis-parallel decision tree forthe data.
Professor Number of Students Funding Research Output
Sam 3 Low MediumSam 5 Low MediumSam 1 High HighPam 1 High LowPam 5 Low LowPam 5 High LowRam 1 Low LowRam 3 High MediumRam 5 Low High
3. Consider a data set of 10 patterns which is split into 3 classes at a nodein a decision tree where 4 patterns are from class 1, 5 from class 2, and1 from class 3. Computer Entropy Impurity. What is the VarianceImpurity?
4. For the data in problem 3, compute the Gini Impurity. How about theMisclassification Impurity?
5. Consider the data given in the following table for a three-class problem.If the set of 100 training patterns is split using a variable X into twoparts represented by the left and right subtrees below the decision nodeas shown in the table, compute the entropy impurity.
6. Compute the variance impurity for the data given in problem 5. Howabout the misclassification impurity?
10
![Page 11: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/11.jpg)
Class Label Total No in Class No. in Left Subtree No. in Right Subtree
1 40 40 02 30 10 203 30 10 20
7. Consider the two-class problem in a two-dimensional space character-ized by the following training data. Obtain an axis-parallel decisiontree for the data.Class1 : (1, 1)t, (2, 2)t, (6, 7)t, (7, 7)t
Class2 : (6, 1)t, (6, 2)t, (7, 1)t, (7, 2)t
8. Consider the data given in problem 7. Suggest an oblique decision tree.
References
V. Susheela Devi and M. N. Murty (2011) Pattern Recognition: An
Introduction Universities Press, Hyderabad.Buntine and T. Niblett (1992) A further comparison of splitting rules fordecision-tree induction,Machine Learning, Vol. 8, pp. 75-85.B. Chandra and P. Paul Varghese (2009) Moving towards efficient deci-sion tree construction, Information Science, Vol. 179, Issue 8, pp. 1059-1069.George H. Hohn (1994) Finding multivariate splits in decision trees usingfunction optimization, Proceedings, AAAI.Esmeir, Markovitch (2008) Anytime induction of low-cost, low-error clas-sifiers : a sampling-based approach, JAIR, Vol. 33, pp1-31.Nunez, (1991) The use of background knowledge in decision tree induction,Machine Learning, Vol. 6, pp. 231-250. Sreerama K. Murthy, SimonKasif and Steven Salzberg (1994) A system for induction of oblique de-cision trees, Journal of Artificial Intelligence Research, Vol. 2, pp. 1-32.Olcay Taner Yildiz and Onur Dikmen (2007) Parallel univariate deci-sion trees, Pattern Recognition Letters, Vol. 28, Issue 7, pp. 825-832.Peter D. Turney (1995) Cost-sensitive classification : Empirical evaluationof a hybrid genetic decision tree induction algorithm, JAIR, Vol. 2, pp. 369-
11
![Page 12: lecture25.pdf](https://reader036.fdocuments.in/reader036/viewer/2022081811/563dbb67550346aa9aacdefa/html5/thumbnails/12.jpg)
409.Yen-Liang Chen, Chia-Chi Wu and Kwei Tang (2009) Building a cost-constrained decision tree with multiple condition attributes, Information Sci-
ence, Vol. 179, Issue 7, pp. 967-979.J.R. Quinlan, (1986) Induction of decision trees, Machine Learning, Vol.1, pp.81-106.J.R. Quinlan (1992) C4.5-Programs for Machine Learning, San Mateo CA:Morgan Kaufmann.J.R. Quinlan (1996) Improved use of continuous attributes in C4.5, Journalof Artificial Intelligence Research, Vol. 4, pp.77-90.
12