lecture25.pdf

12
MODULE 11 Decision Trees LESSON 24 Learning Decision Trees Keywords: Learning, Training Data, Axis-Parallel Decision Tree 1

Transcript of lecture25.pdf

Page 1: lecture25.pdf

MODULE 11

Decision Trees

LESSON 24

Learning Decision Trees

Keywords: Learning, Training Data, Axis-Parallel Decision Tree

1

Page 2: lecture25.pdf

Money Has-exams weather Goes-to-movie25 no fine no200 no hot yes100 no rainy no125 yes rainy no30 yes rainy no300 yes fine yes55 yes hot no140 no hot no20 yes fine no175 yes fine yes110 no fine yes90 yes fine no

Table 1: Example training data set for Induction of Decision Tree

Decision Tree Construction from Training Data - an Example

Let us take a training set and induce a decision tree using the training set.

Table 1 gives a training data set with four patterns having the classGoes-to-movie=yes and eight patterns having the class Goes-to-movie=no.The impurity of this set is

Im(n) = −4

12log2

4

12−

8

12log2

8

12

= 0.9183

We need to now consider all the three attributes for the first split andchose the one with the most information gain.

Money

Let us divide the feature values of money into three parts < 50, between50-150 and > 150 .

2

Page 3: lecture25.pdf

1. Money < 50, has 3 patterns belonging to goes-to-movie=no and 0 pat-terns belonging to goes-to-movie=yes. The entropy for money < 50 is

Im(Money < 50) = 0

2. Money 50-150 has 5 patterns belonging to goes-to-movie=no and 1 pat-tern belonging to goes-to-movie=yes . Entropy for money 50-150 is

Im(Money50− 150) = −1

6log2

1

6−

5

6log2

5

6

= 0.65

3. Money > 150 has 3 patterns belonging to goes-to-movie=yes and 0 pat-terns belonging to goes-to-movie=no. The entropy for money > 150 is

Im(Money>150) = 0

4. Gain(Money)

Gain(money) = 0.9183− 3

12∗ 0− 6

12∗ 0.65− 3

12∗ 0 = 0.5933

Has-exams

1. (Has-exams=yes)

Has a total of 7 patterns with 2 patterns belonging to goes-to-movie=yesand five patterns belonging to goes-to-movies=no. The entropy for has-exams=yes is

Im(has− exams = yes) = −2

7log2

2

7−

5

7log2

5

7

= 0.6717

3

Page 4: lecture25.pdf

2. (Has exams=no)

Has a total of 5 patterns with 2 patterns belonging to goes-to-movie=yesand 3 patterns belonging to goes-to-movies=no. The entropy for has-exams=no is

Im(has− exams = no) = −2

5log2

2

5−

3

5log2

3

5

= 0.9710

3. Gain for has-exams

Gain(has− exams) = 0.9183− 7

12∗ 0.6717− 5

12∗ 9710

= 0.1219

Weather

1. (Weather=hot)

Has a total of 3 patterns with 1 pattern belonging to goes-to-movie=yesand 2 patterns belonging to goes-to-movie=no. The entropy for weather=hotis

Im(weather = hot) = −1

3log2

1

3−

2

3log2

2

3

= 0.9183

2. (Weather=fine)

Has a total of 6 patterns with 3 patterns belonging to goes-to-movie=yesand 3 patterns belonging to goes-to-movie=no. The entropy for weather=fineis

4

Page 5: lecture25.pdf

Im(weather = fine) = −3

6log2

3

6−

3

6log2

3

6

= 1.0

3. (Weather=rainy)

Has a total of 3 patterns with 0 patterns belonging to goes-to-movie=yesand 3 patterns belonging to goes-to-movie=no. The entropy for weather=rainyis

Im(weather = rainy) = −0

3log2

0

3−

3

3log2

3

3

= 0

4. Gain for weather

Gain(weather) = 0.9183− 3

12∗ 0.9183− 6

12∗ 1− 3

12∗ 0

= 0.1887

All the three attributes have been investigated and here are the gain val-ues :

Gain(money) = 0.5933Gain(has-exams) = 0.1219Gain(weather) = 0.1887

Since Gain(money) has the maximimum value, money is taken as the firstattribute.

When we take money as the first decision node, the training data getssplit into three portions for money < 50, money = 50-150, and money > 150.There are three patterns along the outcomemoney < 50, 6 patterns along theoutcome money = 50-150 and 3 patterns along the outcome money > 150.We will consider each of these three branches and think of the next decisionnode as a new decision tree.

5

Page 6: lecture25.pdf

Money < 50

Out of the 3 patterns along this branch, all the patterns belong to goes-to-movie=no, so this is a leaf node and need not be investigated further.

Money > 150

Out of the 3 patterns along this branch, all the patterns belong to goes-to-movie=yes, so this is a leaf node and need not be investigated further.

Money = 50− 150

This has a total of 6 patterns with 1 pattern belonging to goes-to-movie=yesand 5 patterns belonging to goes-to-movie=no. So the information in thisbranch is

Im(n) = −1

6log2

1

6−

5

6log2

5

6

= 0.65

Now we need to check the attributes has-exams and weather to see whichis the next attribute to be chosen.

Has-exams

1. (has-exams=yes)

There are a total of 3 patterns with has-exams=yes out of the 6 pat-terns along this branch. Out of these 3 patterns, 3 patterns belong togoes-to-movie=no and 0 patterns belong to goes-to-movie=yes. So theentropy of has-exams=yes is

Im(has− exams = yes) = −0

3log2

0

3−

3

3log2

3

3

= 0

2. (has-exams=no)

6

Page 7: lecture25.pdf

There are a total of 3 patterns with has-exams=no out of the 6 patternsalong this branch. Out of these 3 patterns, 2 patterns have goes-to-movie=no and 1 pattern has goes-to-movie=yes. The entropy of has-exams=no is

Im(has− exams = no) = −1

3log2

1

3−

2

3log2

2

3

= 0.9183

Gain(has− exams) = 0.65− 3

6∗ 0− 3

6∗ 0.9183

= 0.1909

Weather

1. (weather=hot)

There are two patterns out of six which belong to weather=hot andboth of them belong to goes-to-movie=no. The entropy for weather=hotis

Im(weather=hot) = 0

2. (weather=fine)

There are two patterns out of six which belong to weather=fine and 1pattern belongs to goes-to-movie=yes and 1 pattern belongs to goes-to-movie=no. So, the entropy of weather=fine is

Im(weather = fine) = −1

2log2

1

2−

1

2log2

1

2

= 0.5 + 0.5 = 1.0

3. (weather=rainy)

7

Page 8: lecture25.pdf

There are two patterns out of six which belong to weather=rainy andboth of them belong to goes-to-movie=no. The entropy for weather=rainyis

Im(weather = rainy) = 0

4. Gain for weather is

Gain(weather) = 0.65− 2

6∗ 1.0 = 0.3167

The values for gain for has-exams and weather are

Gain(has-exams) = 0.1909Gain(weather) = 0.3167

Since weather has the higher gain value, it is the attribute to be chosen.The remaining attribute is then chosen. For weather=hot and weather=rainy,all the patterns belong to goes-to-movie=no and therefore it is the leaf node.Only the node weather=fine needs to be further expanded. The entire deci-sion tree is given in the Figure 1.

The following points may be noted after going through the example :

• The decision tree can be used effectively to chose among several coursesof action.

• In what way the decision tree comes up with a decision can be easilyexplained. Each path in the decision tree corresponds to simple rules.

• At each node, the attribute chosen to make the split is the one withthe highest drop in impurity or highest increase in gain.

8

Page 9: lecture25.pdf

<50

50−150>150

goes toa movie

= true

goes toa movie= false

has money

weather?

fine

goes toa movie= false

goes toa movie= false

hot rainy

has exams

goes toa movie= false

y n

goes toa movie

= true

Figure 1: The decision tree induced from Table 1

9

Page 10: lecture25.pdf

Assignment

1. There are four coins 1, 2, 3, 4 out of which three coins are of equalweight and one coin is heavier. Use a decision tree to identify theheavier coin.

2. Consider the three-class problem characterized by the training datagiven in the following table. Obtain the axis-parallel decision tree forthe data.

Professor Number of Students Funding Research Output

Sam 3 Low MediumSam 5 Low MediumSam 1 High HighPam 1 High LowPam 5 Low LowPam 5 High LowRam 1 Low LowRam 3 High MediumRam 5 Low High

3. Consider a data set of 10 patterns which is split into 3 classes at a nodein a decision tree where 4 patterns are from class 1, 5 from class 2, and1 from class 3. Computer Entropy Impurity. What is the VarianceImpurity?

4. For the data in problem 3, compute the Gini Impurity. How about theMisclassification Impurity?

5. Consider the data given in the following table for a three-class problem.If the set of 100 training patterns is split using a variable X into twoparts represented by the left and right subtrees below the decision nodeas shown in the table, compute the entropy impurity.

6. Compute the variance impurity for the data given in problem 5. Howabout the misclassification impurity?

10

Page 11: lecture25.pdf

Class Label Total No in Class No. in Left Subtree No. in Right Subtree

1 40 40 02 30 10 203 30 10 20

7. Consider the two-class problem in a two-dimensional space character-ized by the following training data. Obtain an axis-parallel decisiontree for the data.Class1 : (1, 1)t, (2, 2)t, (6, 7)t, (7, 7)t

Class2 : (6, 1)t, (6, 2)t, (7, 1)t, (7, 2)t

8. Consider the data given in problem 7. Suggest an oblique decision tree.

References

V. Susheela Devi and M. N. Murty (2011) Pattern Recognition: An

Introduction Universities Press, Hyderabad.Buntine and T. Niblett (1992) A further comparison of splitting rules fordecision-tree induction,Machine Learning, Vol. 8, pp. 75-85.B. Chandra and P. Paul Varghese (2009) Moving towards efficient deci-sion tree construction, Information Science, Vol. 179, Issue 8, pp. 1059-1069.George H. Hohn (1994) Finding multivariate splits in decision trees usingfunction optimization, Proceedings, AAAI.Esmeir, Markovitch (2008) Anytime induction of low-cost, low-error clas-sifiers : a sampling-based approach, JAIR, Vol. 33, pp1-31.Nunez, (1991) The use of background knowledge in decision tree induction,Machine Learning, Vol. 6, pp. 231-250. Sreerama K. Murthy, SimonKasif and Steven Salzberg (1994) A system for induction of oblique de-cision trees, Journal of Artificial Intelligence Research, Vol. 2, pp. 1-32.Olcay Taner Yildiz and Onur Dikmen (2007) Parallel univariate deci-sion trees, Pattern Recognition Letters, Vol. 28, Issue 7, pp. 825-832.Peter D. Turney (1995) Cost-sensitive classification : Empirical evaluationof a hybrid genetic decision tree induction algorithm, JAIR, Vol. 2, pp. 369-

11

Page 12: lecture25.pdf

409.Yen-Liang Chen, Chia-Chi Wu and Kwei Tang (2009) Building a cost-constrained decision tree with multiple condition attributes, Information Sci-

ence, Vol. 179, Issue 7, pp. 967-979.J.R. Quinlan, (1986) Induction of decision trees, Machine Learning, Vol.1, pp.81-106.J.R. Quinlan (1992) C4.5-Programs for Machine Learning, San Mateo CA:Morgan Kaufmann.J.R. Quinlan (1996) Improved use of continuous attributes in C4.5, Journalof Artificial Intelligence Research, Vol. 4, pp.77-90.

12