Decision Tree Learning - Montana State Univ · Decision Tree Learning Richard McAllister Outline...
Transcript of Decision Tree Learning - Montana State Univ · Decision Tree Learning Richard McAllister Outline...
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Decision Tree Learning
Richard McAllister
February 4, 2008
1 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
1 Overview
2 Tree Construction
3 Case Study: Determinants of House Price
2 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Decision Tree Learning
Widely UsedUsed for approximating discrete-valued functionsRobust to noisy dataCapable of learning disjunctive expressions
3 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Algorithms
ID3AssistantC4.5
Uses information entropy / information gain
4 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Inductive Bias
Preference for small trees over largeOccam’s Razor: Prefer the simplest hypothesis that fitsthe data.
5 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Tree
Outlook
Humidity
No
High
Yes
Normal
Sunny
Yes
Overcast
Wind
No
Strong
Yes
Weak
Rain
Leftmost Branch:〈Outlook = Sunny , Temperature = Hot , Humidity =High, Wind = Strong〉
6 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Disjunction of Conjunctions
(Outlook = Sunny ∧ Humidity = Normal) ∨ (Outlook =Overcast) ∨ (Outlook = Rain ∧Wind = Weak)
7 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Appropriate Problems
Instances represented by attribute-value pairsTarget function has discrete output values (e.g. yes orno)Disjunctive descriptions may be requiredTraining data may contain errorsTraining data may contain missing attribute values
8 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
For Example...
Classifying medical patients by diseaseLoan applicationsEquipment malfunctionsClassification problems
9 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
ID3
Learns by constructing decision trees top-down"What should be tested first?"Greedy
Descendants are then chosen from possible valuesresulting from a decision option
10 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
How do you choose what’s next?
Information GainMeasures how well a given attribute separates thetraining examples according to their targetclassifications.
11 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Information Gain
EntropyCharacterizes the (im)purity of an arbitrary collection ofexamplesMeasures expected reduction in entropy caused byknowing the value of AEntropy(s) ≡ −p⊕log2p⊕ − plog2p
S : the collection of positive and negative examples ofsome target conceptp⊕ : the proportion of positive examples in Sp : the proportion of negative examples in S
incidentally - GeneralEntropy(S) ≡c∑
i=1
−pi log2pi
12 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Information Gain (Cont’d)
Information Gain
Gain(S, A) ≡ Entropy(S) =∑
v∈Values(a)
| Sv || S |
Entropy(Sv )
S : the collection of positive and negative examples ofsome target conceptA : a particular attributeSv : the collection of positive an negative examples ofthe same target concept after A has partitioned S.Sv = {s ∈ S | A(s) = v}
13 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Use of Information Gain
Information Gain is calculated for each attribute:Gain(S, Outlook) = 0.246Gain(S, Humidity) = 0.151Gain(S, Wind) = 0.048Gain(S, Temperature) = 0.029
Maximum is chosen at each level
14 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Continues until...
1 Every attribute has already been included along thetree’s path.
2 Training examples associated with leaf nodes all havethe same target attribute value (i.e. Entropy is zero).
15 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Hypothesis Space Search in Decision Tree Learning
Search proceeds by hill-climbing.Information gain measure guides hill-climbing.
16 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Capabilities and Limitations
ID3’s hypothesis space is a complete space of finite,discrete-valued functions.Maintains only one current hypothesis as it searchesthrough the tree.No backtracking - can reach a local optimum that is notglobal.Uses all training examples at each step in it’sstatistics-based decisions.
17 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Issues
Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs
18 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Overfitting the Data
Overfitting: when some other hypothesis that fits thetraining examples less well actually performs betterover the entire distribution of instances.Addition of incorrect training data will cause ID3 tocreate a more complex tree.Addition of noise can also cause overfitting.Prevention:
Reduced error pruning: making a node out of a subtree,using most common classification of the trainingexamples to characterize it.Rule post-pruning: convert the tree to rules, then prunerules created by the decision tree by removingpreconditions that result in improving the rule’sestimated accuracy.
19 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Issues
Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs
20 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Incorporating Continuous-Valued Attributes
Discretizing: dynamically defining new discrete-valuedattributes that partition the continuous attribute valueinto a discrete set of intervals.
21 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Issues
Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs
22 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Alternative Measures for Selecting Attributes
SplitInformation(S, A): how broadly and uniformly theattribute A splits the data.Distance-based measures
23 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Issues
Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs
24 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Handling Training Examples with Missing Attribute Values
Assign the missing attribute the most common amongthe training examples.Assign a probability to each of the possible values.Estimate the value of the missing attribute based on theobserved frequencies of the values among the trainingexamples.
25 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Issues
Overfitting the DataIncorporating Continuous-Valued AttributesAlternative Measures for Selecting AttributesHandling Training Examples with Missing AttributeValuesHandling Attributes with Differing Costs
26 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Handling Attributes with Differing Costs
Introduce a cost term into the attribute selectionmeasure.Example:
Gain2(S,A)Cost(A)
27 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Case Study: Determinants of House Price
The housing purchasing decision depends on householdpreference or priority orderings for dwelling attributes orservices and that, therefore, the attributes or servicesprobably have different importance in the determination ofhousing prices.
28 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Decision Tree Application
CART AlgorithmMakes use of an impurity measureImpurity:
low value: indicates the predomination of a single classwithin a set of caseshigh value: indicates the even distribution of classes
29 / 31
Decision TreeLearning
RichardMcAllister
Outline
Overview
TreeConstruction
Case Study:Determinantsof HousePrice
Gini Impurity Measure
IG(i) = 1−m∑
j=1
f (i , j)2 =∑j 6=k
f (i , j)f (i , k)
f (i , j) : the probability of getting value j in node i
Gini impurity is based on squared probabilities ofmembership for each target category in the node. Itreaches its minimum (zero) when all cases in the nodefall into a single target category.
30 / 31