Decision Trees

What is a tree in CS?

• A tree is a non-linear data structure• It has a unique node called the root• Every non-trivial tree has one or more leaf

nodes, arranged in different levels• Trees are always drawn with the root at

the top or on the left• Nodes at a level are connected to nodes at

higher (parent) level or lower (child) level• There are no loops in a tree

Decision Trees

• A decision tree (DT) is a hierarchical classification and prediction model

• It is organized as a rooted tree with 2 types of nodes called decision nodes and class nodes

• It is a supervised data mining model used for classification or prediction

An Example Data Set and Decision Tree

# Class

Outlook Company Sailboat Sail?

1 sunny big small yes

2 sunny med small yes

3 sunny med big yes

4 sunny no small yes

5 sunny big big yes

6 rainy no small no

7 rainy med small yes

8 rainy big big yes

9 rainy no big no

10 rainy med big no

Attribute

yes

no

yes no

sunny rainy

nomed

yes

small big

big

outlook

company

sailboat

Classification

• What is classification?• What are some applications of Decision

Tree Classifiers (DTC)• What is a BDTC?• Misclassification errors

Classification

# Class

Outlook Company Sailboat Sail?

1 sunny no big ?

2 rainy big small ?

Attribute

yes

no

yes no

sunny rainy

nomed

yes

small big

big

outlook

company

sailboat

Chance and Terminal nodes

• Each internal node of a DT is a decision point, where some condition is tested

• The result of this condition determines which branch of the tree is to be taken next

• Thus they are called decision node, chance node or non-terminal node

• Chance nodes partition the available data at that point to maximize dependent variable differences

Terminal nodes

• The leaf nodes of a DT are called terminal node

• They indicate the class into which a data instance will be classified

• They have just one incoming node• They do not have child nodes (outgoing nodes)• There are no conditions tested at terminal

nodes• Tree traversal from the root to the leaf

produces the production rule for that class

Advantages of DT

• Easy to understand and interpret• Works for categorical and quantitative

data• DT can grow to any depth• Attributes can be chosen in any desired

order• Pruning a DT is very easy• Works for missing or null values

Advantages contd.

• Can be used to identify outliers• Production rules can be obtained directly

from the built DT• They are relatively faster than other

classification models• DT can be used even when domain

experts are absent

Disadvantages

• A DT induces sequential decisions• Class-overlap problem• Correlated data• Complex production rules• A DT can be sub-optimal

Quinlan’s classical example

# Class

Outlook Temperature Humidity Windy Play

1 sunny hot high no N

2 sunny hot high yes N

3 overcast hot high no P

4 rainy moderate high no P

5 rainy cold normal no P

6 rainy cold normal yes N

7 overcast cold normal yes P

8 sunny moderate high no N

9 sunny cold normal no P

10 rainy moderate normal no P

11 sunny moderate normal yes P

12 overcast moderate high yes P

13 overcast hot normal no P

14 rainy moderate high yes N

Attribute

Simple Tree

Outlook

Humidity WindyP

sunnyovercast

rainy

PN

high normal

PN

yes no

Complicated Tree

Temperature

Outlook Windy

cold

moderatehot

P

sunny rainy

N

yes no

P

overcast

Outlook

sunny rainy

P

overcast

Windy

PN

yes no

Windy

NP

yes no

Humidity

P

high normal

Windy

PN

yes no

Humidity

P

high

Outlook

N

sunny rainy

P

overcast

null

Production rules

• Rules abstracted by a DT can be converted into production rules

• These are obtained by traversing each branch of the DT from root to each of the leaves

• A DT can be reconstructed if all production rules are known

General View of DT Induction

ID3 induction algorithm

• ID3 (Interactive dichotomiser)• Introduced in 1986 by Quinlan• Uses greedy tree-growing method• Works on binary attributes• Uses entropy measure

C4.5 induction algorithm

• Invented by Quinlan in 1993• Is an extension of ID3 algorithm• Uses greedy tree-growing method• Works on general attributes• Uses entropy measure• Uses multi-way splits

CART induction algorithm

• Invented by Breiman, et.al. in 1984• Uses binary recursive partitioning method• Works on general attributes• Uses Gini measure• Uses two-way splits

Measures for node splitting

• Gini’s Index measure• Modified Gini Index• Normalized, symmetric and asymmetric

Gini Index measure• Shannon’s entropy measure• Minimum classification error measure• Chi-square statistic

Entropy

• The average amount of information I needed to classify an object is given by the entropy measure

• For a two-class problem:

Chi-squared Automatic Interaction Detector(CHAID)

• As the name implies, this is a statistical technique for tree induction that uses Karl Pearson's X2 test for contingency tables.

• It works for categorical variables (with 2 or more categories), and can be used as an alternative to logistic regression.

• There is no pruning step as it stops growing the DT when a certain condition is met.

Pruning DT

• Once the decision tree has been constructed, a sensitivity analysis should be performed to test the suitability of the model to variations in the data instances. Expected values of each alternative are evaluated to determine optimal model. But the decision maker's attitude towards high risk alternatives can negatively influence the outcome of a sensitivity analysis. Most of the decision tree software packages allows the user to carry out sensitivity analysis.

Pre Vs Post-pruning

• There are two approaches to prune a DT -- pre-pruning and post-pruning. In pre-pruning, the tree growing is halted when a stopping condition is met.

• Post-pruning works with a completely grown tree. In post-pruning, test cases are used to prune the DT to minimize the classification error or to adjust the tree to data changes.

• Tree pruning is usually a post-processing step with an intention to minimize over fitting, and to remove redundancies.

Decision Tables

• A decision table is a hierarchical structure akin to decision trees, except that data are enumerated into a table using a pair of attributes, rather than a single attribute.

• Quantitative variables should be categorized using the discretisation technique discussed in chapter 1.

Fraud Detection

• Fraud detection is increasingly becoming a necessity due to the large number of uncaught frauds. Fraudulent financial transaction amounts to billions of dollars every year throughout the world. Fraud prevention is different from fraud detection, as the former is pre-transaction safety, and the later is used during or immediately after a transaction.

Software for DT

• DTREG is a powerful statistical analysis program that generates classification and regression trees (www.dtreg.com)

• GATree (www.gatree.com)• Weka (University of Waikato, NZ)• TreeAge Pro (www.treeage.com)• YaDT (www.di.unipi.it/~ruggieri/YaDT/YaDT1.2.1.zip)

THE END

Decision Trees

Documents

Transcript of Decision Trees