COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers:...

42
COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School of Computing, University of Leeds (including re-use of teaching resources from other sources, esp. Knowledge Management by Stuart Roberts, School of Computing, University of

Transcript of COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers:...

Page 1: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

COMP3740 CR32:Knowledge Management

and Adaptive Systems

Supervised ML to learn Classifiers: Decision Trees and Classification Rules

Eric Atwell, School of Computing, University of Leeds

(including re-use of teaching resources from other sources, esp. Knowledge Management by Stuart Roberts,

School of Computing, University of Leeds)

Page 2: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Reminder:Objectives of data mining

• Data mining aims to find useful patterns in data.

• For this we need:– Data mining techniques, algorithms, tools, eg WEKA– A methodological framework to guide us, in collecting

data and applying the best algorithms, CRISP-DM

• TODAY’S objective: learn how to learn classifiers

• Decision Trees and Classification Rules

• Supervised Machine Learning: training set has the “answer” (class) for each example (instance)

Page 3: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Reminder:Concepts that can be “learnt”

The types of concepts we try to ‘learn’ include:• Clusters or ‘Natural’ partitions;

– Eg we might cluster customers according to their shopping habits.

• Rules for classifying examples into pre-defined classes.– Eg “Mature students studying information systems with high grade

for General Studies A level are likely to get a 1st class degree”

• General Associations – Eg “People who buy nappies are in general likely also to buy beer”

• Numerical prediction– Eg Salary = a*A-level + b*Age + c*Gender + d*Prog + e*Degree

(but are Gender, Programme really numbers???)

Page 4: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.
Page 5: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Output: decision treeOutlook

Humidity

sunny

high

Play = ‘no’

normal

Play = ‘yes’

Windy

rainy

true

Play = ‘no’

false

Play = ‘yes’

Page 6: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.
Page 7: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.
Page 8: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.
Page 9: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.
Page 10: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Decision Tree Analysis

• Example instance setShares files Uses

scanner Infected before Risk

Yes Yes No High Yes No No High No No Yes Medium Yes Yes Yes Low Yes Yes No High No Yes No Low Yes No Yes High

Can we predict, from the first 3 columns, the risk of getting a virus?

For convenience later:F = ‘shares Files’, S = ‘uses Scanner’, I = ‘Infected before’

Page 11: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Decision tree building method• Forms a decision tree

– tries for a small tree covering all or most of the training set– internal nodes represent a test on an attribute value– branches represent outcome of the test

• Decides which attribute to test at each node– this is based on a measure of ‘entropy’

• Must avoid ‘over-fitting’– if the tree is complex enough it might describe the training

set exactly, but be no good for prediction

• May leave some ‘exceptions’

Page 12: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Building a decision tree (DT)The algorithm is recursive, at any step:

T = set of (remaining) training instances,

{C1, …, Ck} = set of classes

• If all instances in T belong to a single class Ci, then DT is a leaf node identifying class Ci. (done!)

…continued

Page 13: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Building a decision tree (DT)…continued • If T contains instances belonging to mixed

classes, then choose a test based on a single attribute that will partition T into subsets {T1, …, Tn} according to n outcomes of the test.The DT for T comprises a root node identifying the test and one branch for each outcome of the test.

• The branches are formed by applying the rules above recursively on each of the subsets {T1, …, Tn} .

Page 14: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I Risk Yes Yes No High Yes No No High No No Yes Medium Yes Yes Yes Low Yes Yes No High No Yes No Low Yes No Yes High

T = Classes = {High, Medium, Low}

Choose a test based on F, number of outcomes, n = 2 (Yes or No)

Fyes no

F S I Risk Yes Yes No High Yes No No High Yes Yes Yes Low Yes Yes No High Yes No Yes High

T1 = F S I Risk No No Yes Medium No Yes No Low

T2 =

Tree Building example

Page 15: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tree Building example

T1 = Classes = {High, Medium, Low}

Choose a test based on I, number of outcomes, n = 2 (Yes or No)

F S I Risk Yes Yes No High Yes No No High Yes Yes Yes Low Yes Yes No High Yes No Yes High

T3 = F S I Risk Yes Yes Yes Low Yes No Yes High

T4 = F S I Risk Yes Yes No High Yes No No High Yes Yes No High

Iyes no

Fyes no

Page 16: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tree Building example

T1 = Classes = {High, Medium, Low}

Choose a test based on I, number of outcomes, n = 2 (Yes or No)

F S I Risk Yes Yes No High Yes No No High Yes Yes Yes Low Yes Yes No High Yes No Yes High

T3 = F S I Risk Yes Yes Yes Low Yes No Yes High

Iyes no

Fyes no

Risk = ‘High’

Page 17: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tree Building example

Classes = {High, Medium, Low}

Choose a test based on S, number of outcomes, n = 2 (Yes or No)

Iyes no

T3 = F S I Risk Yes Yes Yes Low Yes No Yes High

Fyes no

Risk = ‘High’Syes no

F S I Risk Yes No Yes High

F S I Risk Yes Yes Yes Low

Page 18: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tree Building example

Classes = {High, Medium, Low}

Choose a test based on S, number of outcomes, n = 2 (Yes or No)

Iyes no

T3 = F S I Risk Yes Yes Yes Low Yes No Yes High

Fyes no

Risk = ‘High’Syes no

Risk = ‘Low’ Risk = ‘High’

Page 19: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tree Building example

Classes = {High, Medium, Low}

Choose a test based on S, number of outcomes, n = 2 (Yes or No)

Iyes no

T2 =

Fyes no

Syes no

Risk = ‘Low’

F S I Risk No No Yes Medium No Yes No Low

Risk = ‘High’

Risk = ‘High’

Syes no

F S I Risk No Yes No Low

F S I Risk No No Yes Medium

Page 20: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tree Building example

Classes = {High, Medium, Low}

Choose a test based on S, number of outcomes, n = 2 (Yes or No)

Iyes no

T2 =

Fyes no

Syes no

Risk = ‘Low’

F S I Risk No No Yes Medium No Yes No Low

Risk = ‘High’

Risk = ‘High’

Syes no

Risk = ‘Low’

Risk = ‘Medium’

Page 21: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Example Decision TreeShares files?

no yes

Uses scanner?

no yes

Infected before?

yes no

Uses scanner?

no Yes

medium low

lowhigh

high

Page 22: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Which attribute to test?• The ROOT could be S or I instead of F – leading to a

different Decision Tree

• Best DT is the “smallest”, most concise model

• The search space in general is too large to find the smallest tree by exhaustive searching (try them all).

• Instead we look for the attribute which splits the training set into the most homogeneous sets.

• The measure used for ‘homogeneity’ is based on entropy.

Page 23: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I High Risk?

Yes Yes No Yes Yes No No Yes No No Yes No Yes Yes Yes No Yes Yes No Yes No Yes No No Yes No Yes Yes

T = Classes = {Yes, No}

Choose a test based on F, number of outcomes, n = 2 (Yes or No)Fyes no

Tree Building example (modified)

F S I High Risk?

Yes Yes No Yes Yes No No Yes Yes Yes Yes No Yes Yes No Yes Yes No Yes Yes

F S I High Risk?

No No Yes No No Yes No No

Page 24: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I High Risk?

Yes Yes No Yes Yes No No Yes No No Yes No Yes Yes Yes No Yes Yes No Yes No Yes No No Yes No Yes Yes

T = Classes = {Yes, No}

Choose a test based on F, number of outcomes, n = 2 (Yes or No)Fyes no

Tree Building example (modified)

High Risk = ‘yes’

5, 1

High Risk = ‘no’

2, 0

Page 25: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I High Risk?

Yes Yes No Yes Yes No No Yes No No Yes No Yes Yes Yes No Yes Yes No Yes No Yes No No Yes No Yes Yes

T = Classes = {Yes, No}

Choose a test based on S, number of outcomes, n = 2 (Yes or No)Syes no

Tree Building example (modified)

F S I High Risk?

Yes Yes No Yes Yes Yes Yes No Yes Yes No Yes No Yes No No

F S I High Risk?

Yes No No Yes No No Yes No Yes No Yes Yes

Page 26: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I High Risk?

Yes Yes No Yes Yes No No Yes No No Yes No Yes Yes Yes No Yes Yes No Yes No Yes No No Yes No Yes Yes

T = Classes = {Yes, No}

Choose a test based on S, number of outcomes, n = 2 (Yes or No)Syes no

Tree Building example (modified)

High Risk = ‘no’

4,2

High Risk = ‘yes’

3,1

Page 27: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I High Risk?

Yes Yes No Yes Yes No No Yes No No Yes No Yes Yes Yes No Yes Yes No Yes No Yes No No Yes No Yes Yes

T = Classes = {Yes, No}

Choose a test based on I, number of outcomes, n = 2 (Yes or No)Iyes no

Tree Building example (modified)

F S I High Risk?

No No Yes No Yes Yes Yes No Yes No Yes Yes

F S I High Risk?

Yes Yes No Yes Yes No No Yes Yes Yes No Yes No Yes No No

Page 28: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

F S I High Risk?

Yes Yes No Yes Yes No No Yes No No Yes No Yes Yes Yes No Yes Yes No Yes No Yes No No Yes No Yes Yes

T = Classes = {Yes, No}

Choose a test based on I, number of outcomes, n = 2 (Yes or No)Iyes no

Tree Building example (modified)

High Risk = ‘no’

3, 1

High Risk = ‘yes’

4,1

Page 29: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Decision tree building algorithm• For each decision point,

– If all remaining examples are all +ve or all -ve, stop.– Else if there are some +ve and some -ve examples left and

some attributes left, pick the remaining attribute with largest information gain

– Else if there are no examples left, no such example has been observed; return default

– Else if there are no attributes left, examples with the same description have different classifications: noise or insufficient attributes or nondeterministic domain

Page 30: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Evaluation of decision trees• At the leaf nodes two numbers are given:

– N: the coverage for that node: how many instances– E: the error rate: how many wrongly classified instances

• The whole tree can be evaluated in terms of its size (number of nodes) and overall error-rate expressed in terms of the number and percentage of cases wrongly classified.

• We seek small trees that have low error rates.

Page 31: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Evaluation of decision trees• The error rate for the whole tree can also be

displayed in terms of a confusion matrix:(A) (B) (C) Classified as

35 2 1 Class (A) = high

4 41 5 Class (B) = medium

2 5 68 Class (C) = low

Page 32: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Evaluation of decision trees• The error rates mentioned on previous slides are

normally computed usinga. The training set of instances.

b. A test set of instances – some different examples!

• If the decision tree algorithm has ‘over-fitted’ the data, then the error rate based on the training set will be far less than that based on the test set.

Page 33: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Evaluation of decision trees• 10-fold cross-validation can be used when the training set is

limited in size:– Divide the test set randomly into 10 subsets.

– Build a tree from 9 of the subsets and test using the 10th.

– Repeat the experiment 9 more times, using a different test set each time.

– Overall error rate is average of 10 experiments

• 10-fold cross-validation will lead to up to 10 different decision trees being built. The method for selecting or constructing the best tree is not clear.

Page 34: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

From decision trees to rules• Decision trees may not be easy to interpret:

– tests associated with lower nodes have to be read in the context of tests further up the tree

– ‘sub-concepts’ may sometimes be split up and distributed to different parts of the tree (see next slide)

– Computer Scientists may prefer “if … then …” rules!

Page 35: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

DT for “F = G = 1 or J = K = 1”

F= 0;J = 0; noJ = 1;

K = 0; noK = 1; yes

F = 1;G = 1; yesG = 0;

J = 0; noJ = 1;

K = 0; noK = 1; yes

J=K=1 is split across two subtrees.

F

J

K

G

J

K

0

0

0

0

0

0

1

1

1

1

1

1yes

yes

yes

no

no no

no

Page 36: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Converting DT to rules• Step 1: Every path from root to leaf represents a

rule:F= 0;

J = 0; noJ = 1;

K = 0; noK = 1; yes

F = 1;G = 1; yesG = 0;

J = 0; noJ = 1;

K = 0; noK = 1; yes

If F = 0 and J = 0 then class no;

If F = 0 and J = 1 and K = 0 then class no

If F = 0 and J = 1 and K = 1 then class yes

….

If F = 1 and G = 0 and J = 1 and K = 1then class yes

Page 37: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Generalising rulesIf F = 0 and J = 1 and K = 1 then class yes

If F = 1 and G = 0 and J = 1 and K = 1then class yes

If G = 1 then class yes

If J =1 and K = 1 then class yes

Page 38: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Tidying up rule sets• Generalisation leads to 2 problems:• Rules no longer mutually exclusive

– Order rules and use the first matching rule used as the operative rule.

– Ordering is based on how many false positive errors the rule makes

• Rule set no longer exhaustive– Choose a default value for the class when no rule applies– Default class is that which contains the most training

cases not covered by any rule.

Page 39: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Decision Tree - Revision

Decision tree builder algorithm discovers rules for classifying instances.

At each step, it needs to decide which attribute to test at that point in the tree; a measure of ‘information gain’ can be used.

The output is a decision tree based on the ‘training’ instances, evaluated with separate “test” instances.

Leaf nodes which have a small coverage may be pruned if the error rate is small for the pruned tree.

Page 40: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Pruning example (from W & F)

Health plan contribution

4 bad2 good

1 bad1 good

4 bad2 good

none half full

We replace the subtree with: Bad

14, 5Number of instances

number of errors

Page 41: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Decision trees v classification rules• Decision trees can be used for prediction or

interpretation.– Prediction: compare an unclassified instance against the

tree and predict what class it is in (with error estimate)– Interpretation: examine tree and try to understand why

instances end up in the class they are in.

• Rule sets are often better for interpretation.– ‘Small’, accurate rules can be examined, even if overall

accuracy of the rule set is poor.

Page 42: COMP3740 CR32: Knowledge Management and Adaptive Systems Supervised ML to learn Classifiers: Decision Trees and Classification Rules Eric Atwell, School.

Self Check• You should be able to:

– Describe how the decision-trees are built from a set of instances.

– Build a decision tree based on a given attribute– Explain what the ‘training’ and ‘test’ sets are for.– Explain what “Supervised” means, and why classification

is an example of supervised ML