Decision Tree (Rule Induction). Poll: Which data mining technique..?

20
Decision Tree (Rule Induction)

Transcript of Decision Tree (Rule Induction). Poll: Which data mining technique..?

Decision Tree (Rule Induction)

Poll: Which data mining technique..?

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Classification Process with 10 recordsStep 1: Model Construction with 6 records

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Step 2: Test model with 6 records & Use the Model in Prediction

Input Input Input Input outputage income student credit_rating buys_computer

<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes

This follows an example from Quinlan’s ID3

Who buys notebook computer? Training Dataset is given below:

age?

overcast

student? credit rating?

no yes fairexcellent

<=30 >40

no noyes yes

yes

30..40

Tree Output:A Decision Tree for Credit Approval

Extracting Classification Rules from Trees

Represent the knowledge in the form of IF-THEN rules One rule is created for each path from the root to a leaf Each attribute-value pair along a path forms a conjunction The leaf node holds the class prediction Rules are easier for humans to understand

ExampleIF age = “<=30” AND student = “no” THEN buys_computer = “no”IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”IF age = “31…40” THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

An Example of ‘Car Buyers’ – Who buys Lexton?

no Job M/F Area Age Y/N

1 NJ M N 35 N

2 NJ F N 51 N

3 OW F N 31 Y

4 EM M N 38 Y

5 EM F S 33 Y

6 EM M S 54 Y

7 OW F S 49 Y

8 NJ F N 32 N

9 NJ M N 32 Y

10 EM M S 35 Y

11 NJ F S 54 Y

12 OW M N 50 Y

13 OW F S 36 Y

14 EM M N 49 N

Job(14,5,9)

Emplyee(5,2,3)

Owner(4,0,4)

No Job(5,3,2)

Age

Below 43(3,0,3)

Above 43(2,2,0)

Y Res. Area

Y N

South(2,0,2)

North(3,3,0)

Y N

* (a,b,c) meansa: total # of records, b: ‘N’ counts, c: ‘Y’ counts

Lab on Decision Tree(1)

SPSS Clementine, SAS Enterprise MinerSee5/C5.0Download See5/C5.0 2.02 Evaluation from

http://www.rulequest.com

Lab on Decision Tree(2)

From below initial screen, choose File – Locate Data

Lab on Decision Tree(3)

Select housing.data from Samples folder and click open.

Lab on Decision Tree(3(4)

This data set is on deciding house price in Boston area. It has 350 cases and 13 variables.

Lab on Decision Tree (5) Input variables

– crime rate– proportion large lots: residential space– proportion industrial: ratio of commercial area– CHAS: dummy variable– nitric oxides ppm: polution rate in ppm– av rooms per dwelling: # of room for dwelling– proportion pre-1940– distance to employment centers: distance to the center of city– accessibility to radial highways: accessibility to high way– property tax rate per $10\,000 – pupil-teacher ratio: teachers’ rate– B: racial statistics– percentage low income earners: ratio of low income people

Decision variable– Top 20%, Bottom 80%

Lab on Decision Tree(6)

For the analysis, click Construct Classifier or click Construct Classifier from File menu

Lab on Decision Tree(7)

Click on Global pruning to (V ). Then, click OK

Lab on Decision Tree(8)

Decision Tree

Evaluation with Training data

Evaluation with Test data

Lab on Decision Tree(9)

Understanding picture– We can see that (av rooms per dwelling) is the

most important variable in deciding house price.

Lab on Decision Tree(11)

의사결정나무 그림으로는 규칙을 알아보기 어렵다 . To view the rules, close current screen and click Construct

Classifier again or click Construct Classifier from File menu.

Lab on Decision Tree(12)

Choose/click Rulesets. Then click OK.

Lab on Decision Tree(13)