Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris...

Breeding Decision Trees Using Evolutionary Techniques

Written by

Athanasios Papagelis

Dimitris Kalles

Presented by Alexandre Temporel

AbstractThe idea is to make use of evolutionary techniques to evolve

decision trees.

Can the learner:

• search efficiently simple & complex hypotheses space?• Discover conditionally dependant attributes?

irrelevant

• Tests using standard concept learning and also compared to two known algorithms:

– C4.5 (Quinlan, 1993)– OneR (Holte, 1993)

GOAL: demonstrates the potential advantages of this evolutionary techniques compared to other classifiers.

Outline1) Problem Statement

2) Construction of GATree System.

- Operators

- Fitness function

- Advanced features

3) Experiments

- 1st exp: search efficiently simple & complex hypotheses space

- 2nd exp: conditionally dependant attributes

irrelevant attributes

- 3rd exp: search target concepts on standard databases

4) Discussion on the search type of GATree

5) Conclusion

Related workGas have widely been used for classification and concept learning tasks:

J.Bala, J.Huang and H.Valaie (1995)Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern ClassificationSchool of Information and Engineering

Work on their ability as a tool to evolve decision trees:

Burnett C.Nathan (may 2001)Evolutionary Induction of Decision Trees

Kenneth A.De Jong, William M.Spears and Dianna F.GordonUse Genetic Algorithms for concept learningNaval Research Laboratory

Koza R. KozaConcept formation and decision tree induction using the genetic programming paradigm

Martijn C.J. Bot and William B.LangdonApplication of Genetic Programming to Induction of Linear Classification Trees

H.Kennedy, C.Chinniah, P.Bradbeer, L.MorssThe construction and Evaluation of Decision Trees: a comparison of Evolutionary and Concept Learning MethodsNapier University

Problem statement(1/2)

Also if presence of irrelevant attributes in a data set...

Mislead the impurity functions...

Produce bigger, less comprehensible andlower performance tree.

Using these Evolutionary techniques allow us to:• overcome the use of greedy heuristics• search the hypotheses space in a natural way• evolve simple, accurate, robust and simple decision trees

Problem statement(2/2)

Decision trees used in many domains (i.e.: pattern classification)

Proven to be NP complete (Murthy, 1998)

Current inductive learning algorithms use:• Information gain, gain ratio (Quinlan, 1986)• Gini Index (Breiman 1984)

Assumption: attributes are conditionally independents

Poor performance on ‘strong conditional dependence’ data-set

GATree system(1/3): Operators

GATree program uses GALIB (Wall, 1996), a robust C++ library of Genetic Algorithm components. http://lancet.mit.edu/ga/

Initial population: we use minimal binary decision trees

Crossover Operation

Mutation Operator

http://lancet.mit.edu/ga/

GATree system(2/3) : Fitness function

xtree

xssifiedCorrectClatreefitness

iii

2

2 *)(

))(),(()( iii treeSIZEtreeACCURACYftreepayoff

))(),(()( iii treeSIZEtreeACCURACYftreefitness

Factor Size x is a constant (i.e. x=1000)

• If x is small, the fitness figure decreases with bigger trees.» SMALLER TREES->BETTER FITNESS

• If x is large, we search bigger search space.» ONLY ACCURACY REALLY MATTERS

GATree system(3/3) : Advanced features

))(),(()( iii treeSIZEtreeACCURACYftreepayoff

• Overcrowding problem (Goldberg)

use of a scaled payoff function ( which reduces fitness of similar trees in a population )

• Use of alternative crossover and mutation operators

More Accurate sub-trees have less chance to be selected for crossover or mutation

• To Speed Up evolution, use of:

Limited Error Fitness (LEF) (GatherCole&Rose, 1997)

CPU timesaving with insignificant accuracy losses

Experiments: foreword(1/2)

1st experiment: we use DataGen (Melli,1999) to generate artificial data set using random rules (to ensure complexity variety).

The goal is to reconstruct the underlying knowledge.

2nd experiment: We use more or less complicated target concepts (Xor, parity....) and see how GATree performs against C4.5.

3rd experiment: We use WEKA (Witten & Frank, 2000) to test C4.5 and OneR algorithms.

==> Use of 5 fold cross-validation

Experiments: foreword(2/2)

• Evolution Type : Generational• Init. Population : 200• Generations : 800• Generation Gap : 25%• Mutation Prob. : 0.005• Crossover Prob. : 0.93• Size factor : 1000• Random Seed : 123456789

1st exp: Simple concept3 Rules to extract:(31.0%) c1 <- B=(f or g or j) & C=(a or g or j)(28.0%) c2 <- C=(b or e)(41.0%) c3 <- B=(b or i) & C=(d or i)

size

Best individual

Fitn

ess-

accu

racy

Size

Number of generations

1st exp: Complex concept8 Rules to extract:(15.0%) c1 <- B=(f or l or s or w) & C=(c or e or f or k)(14.0%) c2 <- A= (a or b or t) & B=(a or h or q or k).................................Etc....

size

Best individual

Fitn

ess-

accu

racy

Size

Number of generations

2nd exp (irrelevant attribute)A1 A2 A3 Class

T F T T

T F F T

F T F T

F T T T

F F T F

F F F F

T T T F

T T F F

Easy concept but C4.5 falsely estimates the

contribution of A3

GATree produces the optimal solution

irrelevant

2nd exp (conditionally. Dependant attribute)

Name Attrib Class Function Noise InstancesRandom attributes

XOR1 10 (A1 xor A2) or (A3 xor A4) No 100 6

XOR2 10 (A1 xor A2) xor (A3 xor A4) No 100 6

XOR3 10(A1 xor A2) or (A3 and A4) or

(A5 and A6)10% Class

Error100 4

Par1 10 3 attrib. parity problem No 100 7

Par2 10 4 attributes parity problem No 100 6

• Greedy heuristics are not good when

attributes are conditionally dependants.• GATree outperformed at a very good level• GATree can be disturbed by class noise

C4.5 GATree

XOR1 67±12.04 100±0

XOR2 53±18.57 90±17.32

XOR3 79±6.52 78±8.37

Par1 70±24.49 100±0

Par2 63±6.71 85±7.91

3rd exp (results on standard sets)

Greedy heuristics are not good when

attributes are conditionally dependants.

GATree outperformed at a very good level

GATree can be disturbed by class noise

Accuracy Size

C4.5 OneR GATree C4.5 GATree

Colic 83.84±3.41 81.37±5.36 85.01±4.55 27.4 5.84

Heart-Statlog 74.44±3.56 76.3±3.04 77.48±3.07 39.4 8.28

Diabetes 66.27±3.71 63.27±2.59 63.97±3.71 140.6 6.6

Credit 83.77±2.93 86.81±4.45 86.81±4 57.8 3

Hepatitis 77.42±6.84 84.52±6.2 80.46±5.39 19.8 5.56

Iris 92±2.98 94.67±3.8 93.8±4.02 9.6 7.48

Labor 85.26±7.98 72.73±14.37 87.27±7.24 8.6 8.72

Lymph 65.52±14.63 74.14±7.18 75.24±10.69 28.2 7.96

Breast-Cancer 71.93±5.11 68.17±7.93 71.03±8.34 35.4 6.68

Zoo 90±7.91 43.8±10.47 82.4±4.02 17 10.12

Vote 96.09±3.86 95.63±4.33 95.63±4.33 11 3

Glass 55.24±7.49 43.19±4.33 53.48±4.33 60.2 8.98

Balance-Scale 78.24±4.4 59.68±4.4 71.15±6.47 106.6 8.92

AVERAGES 78.46 72.64 79.75 43.2 7.01

3rd exp (observations)

• GA trees performs as well as or a bit better than C4.5 on the

standard data sets.

• But trees size produces by GA trees are 6 times smaller than when using C4.5.

• OneR is good for noisy datasets but performs substantially worse

overall.

Discussion on Search type of GATree

GATree adopts a less greedy strategy than other learners

It tries to minimise the size of the tree and maximise the accuracy

GATree are not hill climbing searcher

exhaustive

But more a type of beam search (exploration / exploitation)

However when tuned properly they have the same characteristics

ConclusionDerived hypotheses of standard algorithms can substantially

deviate from the optimum.

Due to their greedy strategy

Can be solved by using global metrics

of tree quality

Compared to greedy induction, GATree produces:

Accurate trees

Small size

Comprehensible

GA adapt themselves dynamically on a variety of different target concepts

EncodingIt is important for a problem to select the proper encoding. Encoding represents the mapping of solved problem to one of the following dimensional space:

• Value encodingchromosome A: 45.9845 12.8375 102.46556 55.3857 36.39857

• Binary encodingChromosomes in binary encoding are strings of 0 or 1. They can look like that ones shown on next example : chromosome A: 10001010001111101010111

• Tree encodingGAs may also be used for program designing and construction. In that case chromosome genes represent programming language commands, mathematical operations and other components of program.We use Natural representation of the search space using actual decision trees and not binary strings....

BiasWithout Bias, we have no basis for classifying unseen examples, the best we can do:- Memorise training samples- Classify new examples at random

The problem with biased algorithm is that we decrease the hypothesis space and we might miss the optimal hypothesis which matches the optimum.

Preference bias is based on the learner’s behaviour.

C4.5 is biased towards small and accurate trees (preference bias) but uses gain ratio metric / minimum error pruning (procedural bias)

desirable when it determines the characteristics of the produced tree.

if inadequate, it may affect the quality of the output

Procedural bias is based on the learner’s design.

GA has a new weak procedural bias

it considers a relative large number of hypotheses in a relative efficient manner

it employs global metrics of tree quality.

a set of minimum numerical performance measurements related to a goal: FITNESS

Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris...

Documents

Transcript of Breeding Decision Trees Using Evolutionary Techniques Written by Athanasios Papagelis Dimitris...