Decision Trees

19
Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 1 DECISION TREES

description

Decision Trees. Resources. Artificial Intelligence, 3 rd Edition, Patrick Henry Winston, Ch. 21 http://www.cse.unr.edu/~ sushil/class/games/notes/ch21.pdf Artificial Intelligence: A Modern Approach, 3 rd Edition, Russell , Norvig , Ch. 18.3 ,pg . 531- 554. Identification Tree. - PowerPoint PPT Presentation

Transcript of Decision Trees

Page 1: Decision Trees

1

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

DECISION TREES

Page 2: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 2

Resources

Artificial Intelligence, 3rd Edition,Patrick Henry Winston, Ch. 21 http://www.cse.unr.edu/~

sushil/class/games/notes/ch21.pdf Artificial Intelligence: A Modern

Approach, 3rd Edition, Russell , Norvig, Ch. 18.3,pg. 531-554

Page 3: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 3

Identification Tree

Type of Decision Tree The Winston book call their methods

SPROUTER and PRUNER, but it’s basically simplified example of an algorithm called ‘ID3’

Page 4: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 4

Identification Tree

Name (Sample ID)

Hair Height Weight Lotion

Result

Sarah Blonde Average Light No SunburnedDana Blonde Tall Average Yes NoneAlex Brown Short Average Yes NoneAnnie Blonde Short Average No SunburnedEmily Red Average Heavy No SunburnedPete Brown Tall Heavy No NoneJohn Brown Average Heavy No NoneKatie Blonde Short Light Yes None

Sunburn Dataset Select one attribute to be predicted/identified All other attributes used to identify the selected

target attributed, or classification

Page 5: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 5

Identification Tree Predict Sunburns

More than one tree can correctly identify the dataset Some trees generalize information better

Smaller trees tend to be better (Occam’s Razor) The smallest identification tree consistent with the samples is the one most

likely to identify unknown objects correctly How to we construct the smallest/‘best’ tree?

Page 6: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 6

Identification Tree

Computationally impractical to find the smallest tree when many tests are required Use a procedure that builds small trees,

but is NOT guaranteed to build the SMALLEST possible tree.

Page 7: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 7

Identification Tree

Split the samples based on the best attribute A single attribute that comes closest to correctly grouping

the samples based on the target classification Number of samples in homogeneous sets

4 2

0 3

Page 8: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 8

Identification Tree

Select best attribute, and repeat with remaining attributes Must repeat for each heterogeneous branch

Only split the samples that went down that branch The next attribute you select for one branch may be different from

the attribute you select for another branch, even if they share the same parent node

Page 9: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 9

Identification Tree

In real data, unlikely to get ANY homogeneous branches Need a measure of inhomogeneity/disorder/entropy

Minimize disorder/entropy (or maximize Information Gain) Many different measurements/calculations that can be used Example: Entropy(S)

Page 10: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 10

Identification Tree

Results using new disorder measurementHair Attribute Disorder

Calculation

All Disorder Calculations (first Node)

All Disorder Calculations (second Node)

Page 11: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 11

Identification Tree

Information Gain Expected reduction in entropy due to

sorting Sample Set S on attribute A

Page 12: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 12

Identification Tree

SPROUTER algorithm

Page 13: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 13

Tree to Rules

Each path, from root to leaf, is a rule The value of each attribute node are the

antecedents The leaf value is the consequence

Page 14: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 14

Simplify Rules

For each rule, drop antecedents if it won’t change what the rule does on all the samples

Page 15: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 15

Eliminate Rules

Once all individual rules have been simplified, you can eliminate unnecessary rules Create a “default rule” eliminates the most rules

In the event of a tie, make up some metric to break the tie Examples:

Covers the most common consequent in the sample setLeaves the simplest rules

Most common consequent

Simplest rules

Page 16: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 16

Eliminate Rules

Page 17: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 17

Decision Tree Algorithms

ID3 (Iterative Dichotomiser 3) Gets stuck on local optimums, Greedy Not good on attributes with continuous values

C4.5/J4.8 Extension of ID3 Better handling of attributes with continuous values Can handle training data where some attribute values

are missing/unknown Handling attributes with different costs Pruning Tree after creation

C5.0/See5.0 Commercial, closed-source Not talking about this, but it exists

Page 18: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 18

C4.5

Pruning Helps avoid over fitting Prepruning

Deciding not to split a set of sample any further based on some heuristic, during tree construction

Usually based on some statistical test Chi-squared

Postpruning Subtree Replacement Subtree Raising

Page 19: Decision Trees

Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno 19

C4.5

Continuous Values For an attribute with continuous values,

sort all samples based on that attribute Mark a ‘split point’ between samples

where the classification changes Calculate information gain on all split

points Select split point with highest

information gain and use for that attribute