Post-processing Operators for Decision Lists

64
BioHEL System Our approach Results Summary Post-processing Operators for Decision Lists María A. Franco Supervisor: Jaume Bacardit University of Nottingham, UK, ICOS Research Group, School of Computer Science [email protected] June 12, 2012 María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 1 / 29

Transcript of Post-processing Operators for Decision Lists

Page 1: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Post-processing Operators forDecision Lists

María A. Franco

Supervisor: Jaume BacarditUniversity of Nottingham, UK,

ICOS Research Group,School of Computer Science

[email protected]

June 12, 2012

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 1 / 29

Page 2: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

Page 3: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

Page 4: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

Page 5: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

Page 6: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

Page 7: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of this workTo improve the quality of the decision lists by means of localsearch (memetic operators)

Decision lists are a widespread paradigm in rule learning,guided local search and supervised learning.

ExamplePittsburgh Learning Classifier SystemsRule induction systems in mainstream machine learning(PART, CN2, JRip)

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29

Page 8: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of this workTo improve the quality of the decision lists by means of localsearch (memetic operators)

Decision lists are a widespread paradigm in rule learning,guided local search and supervised learning.

ExamplePittsburgh Learning Classifier SystemsRule induction systems in mainstream machine learning(PART, CN2, JRip)

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29

Page 9: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Outline

1 BioHELAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

2 Our approach: Post-processing the rulesSwappingPruningCleaning

3 Results

4 SummaryWhere to go from here?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 4 / 29

Page 10: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Introduction to the BioHEL System

BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL [Bacardit et al., 2009]

BioHEL is an evolutionary learning system that employsthe Iterative Rule Learning (IRL) paradigmBioHEL was especially designed to cope with large scaledatasets

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 5 / 29

Page 11: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Attribute List Knowledge Representation

Meta-representation to handle large amount of discreteand continuous attributes fast [Bacardit and Krasnogor, 2009].

ALKR Classifier Example

numAtt

predicates

class

whichAtt

3

0

0.70.5

1

0.3

offsetPred 0

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 6 / 29

Page 12: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Attribute List Knowledge Representation

Discrete attributesGABIL representation

F1 F2 F3100 01 1101ABC DE FGHI

F1 = A ∧ F2 = E ∧ F3 = (F ∨ G ∨ I)

Continuous attributesHyper-rectangle representation

C1 = [0.1,0.3] ∧ C2 = [0.7,0.9]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 7 / 29

Page 13: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 14: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 15: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 16: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 17: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 18: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 19: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 20: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 21: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 22: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 23: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

Page 24: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

How can the rules be improved further?

We encountered the following problems:The rules were learned in the wrong order

Larger rulesets!

Example

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 9 / 29

Page 25: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

How can the rules be improved further?

We encountered the following problems:The rules did not have the correct specificity

The number of attributes expressed was rather high!

ExampleProblem:x1 = 1 ∧ x3 = 0

000 = 0 100 = 1001 = 0 101 = 0010 = 0 110 = 1011 = 0 111 = 0

Goodx1 = 1 ∧ x3 = 0

Over-specificx1 = 1 ∧ x2 = 1 ∧ x3 = 0x1 = 1 ∧ x2 = 0 ∧ x3 = 0

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 10 / 29

Page 26: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruningCleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29

Page 27: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruningCleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29

Page 28: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Rule Swapping

Consist is swapping the order of the rules in the finalrulesets.Which rules shall we swap? ⇒ Similarities

Measure of similarity

S(i , j) =DisNA

∑Disk Sk (i , j)∑Dis

k numVals(k)+

RealNA

Real∑k

Sk (i , j) +MiNA

Measures the overlapping between rules

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 12 / 29

Page 29: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 30: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 31: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 32: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 33: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 34: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 35: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 36: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 37: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 38: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

Page 39: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29

Page 40: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29

Page 41: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29

Page 42: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29

Page 43: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29

Page 44: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Rule pruning

Drops attributes that do not affect the accuracy of the rules.

ExampleProblem:x1 = 1 ∧ x3 = 0

000 = 0 100 = 1001 = 0 101 = 0010 = 0 110 = 1011 = 0 111 = 0

Goodx1 = 1 ∧ x3 = 0

Over-specificx1 = 1 ∧ x2 = 1 ∧ x3 = 0x1 = 1 ∧ x2 = 0 ∧ x3 = 0

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 16 / 29

Page 45: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29

Page 46: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29

Page 47: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29

Page 48: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Rule cleaning

In the χary domain is not always possible to drop attributesif the correct attributes are misaligned

ExampleProblem:x1 nominal {a,b,c,d,e}x2 nominal {w,y,z}x3 nominal {m,n}

Rule 1:x1 = (a ∨ b) ∧ x2 = w

Generated Rule:x1 = (a ∨ b ∨ c) ∧ x2 = w ∧ x3 = m

We need to deactivate literals in the attributes

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 18 / 29

Page 49: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Cleaning approaches:CL - Focus on the positivesCL2 - Do not infer

(- - - - ( (+ - + + + + - + -+) ) - - -) CL2 CLOLD OLDCL CL2

1 1 1 0 1 1a b c d e f

Values covered by possitive examples: a,b,cValues covered by negative examples: c,e

1 1 1 0 0 0a b c d e f

1 1 1 0 0 1a b c d e f

CL CL2

OLD

Continuous

Discrete

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 19 / 29

Page 50: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Experimental design

We analysed the operators over final rulesets generatedwith 35 real world problems3 stages of experiments

Independent operatorsCombinations between CL and PRCombinations with the SW operator

Questions

Where are the most significant improvements?

Are the results significant?

What about the computational time?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29

Page 51: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Experimental design

We analysed the operators over final rulesets generatedwith 35 real world problems3 stages of experiments

Independent operatorsCombinations between CL and PRCombinations with the SW operator

Questions

Where are the most significant improvements?

Are the results significant?

What about the computational time?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29

Page 52: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Results of the operators independently%

of va

riation

−20

−15

−10

−5

0

−30

−25

−20

−15

−10

−5

0

−3

−2

−1

0

1

2

−4

−2

0

2

Atts

Rules

Test_acc

Test_ensemble

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irs

lab

lym

pen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

Algorithm

CL

CL2

PR

SW

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 21 / 29

Page 53: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Results of combining CL and PR%

of va

riation

−30

−25

−20

−15

−10

−5

0

−4

−3

−2

−1

0

1

2

−4

−2

0

2

4

CL

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irsla

blympen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

CL2

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irsla

blympen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

Atts

Te

st_

acc

Te

st_

en

se

mble

Algorithm

CL−PR

PR−CL

PR−CL−PR

CL2−PR

PR−CL2

PR−CL2−PR

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 22 / 29

Page 54: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Results of combining CL, PR and SW%

of va

riation

−25

−20

−15

−10

−5

0

−30

−25

−20

−15

−10

−5

0

−3

−2

−1

0

1

2

−4

−2

0

2

4

Atts

Rules

Test_acc

Test_ensemble

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irsla

blympen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

Algorithm

CL−SW

CL2−SW

PR−SW

PR−CL2−PR−SW

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 23 / 29

Page 55: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Are the results significant?

Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).

Test Test # Rules # Attsacc ensem

P-Values 0.708 0.962 8.9e-09 2.2e-16

Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30

CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?

CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29

Page 56: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Are the results significant?

Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).

Test Test # Rules # Attsacc ensem

P-Values 0.708 0.962 8.9e-09 2.2e-16

Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30

CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?

CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29

Page 57: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Are the results significant?

Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).

Test Test # Rules # Attsacc ensem

P-Values 0.708 0.962 8.9e-09 2.2e-16

Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30

CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?

CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29

Page 58: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

How long does the post-processing takes?

Table: Execution time of the application of each one of the differentoperators independently

Prob Ins Rules Atts CL2 (s) PR (s) SW (s)

CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27

Swapping is very slow... It depends on the number of instancesand number of rules generated.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29

Page 59: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

How long does the post-processing takes?

Table: Execution time of the application of each one of the differentoperators independently

Prob Ins Rules Atts CL2 (s) PR (s) SW (s)

CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27

Swapping is very slow... It depends on the number of instancesand number of rules generated.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29

Page 60: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

Summary and next steps

SummaryThe operators manage to reduce the number of rules andexpressed attributes in 30% in some cases.

Next stepsApply the CL and PR operators during the learning processInvestigate other measures of similarities among rulesApply these operators over other systems

Different representations

CUDA accelerated operators?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29

Page 61: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

Summary and next steps

SummaryThe operators manage to reduce the number of rules andexpressed attributes in 30% in some cases.

Next stepsApply the CL and PR operators during the learning processInvestigate other measures of similarities among rulesApply these operators over other systems

Different representations

CUDA accelerated operators?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29

Page 62: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

References I

Bacardit, J., Burke, E., and Krasnogor, N. (2009).Improving the scalability of rule-based evolutionary learning.Memetic Computing, 1(1):55–67.

Bacardit, J. and Krasnogor, N. (2009).A mixed discrete-continuous attribute list representation for large scale classification domains.In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages1155–1162, New York, NY, USA. ACM Press.

Franco, M., Krasnogor, N., and Bacardit, J. (2012a).Analysing biohel using challenging boolean functions.Evolutionary Intelligence, 5:87–102.10.1007/s12065-012-0080-9.

Franco, M. A., Krasnogor, N., and Bacardit, J. (2010).Speeding up the evaluation of evolutionary learning systems using GPGPUs.In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages1039–1046, New York, NY, USA. ACM.

Franco, M. A., Krasnogor, N., and Bacardit, J. (2011).Modelling the initialisation stage of the alkr representation for discrete domains and gabil encoding.In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages1291–1298, New York, NY, USA. ACM.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 27 / 29

Page 63: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

References II

Franco, M. A., Krasnogor, N., and Bacardit, J. (2012b).Postprocessing operators for decision lists.In GECCO ’12: Proceedings of the 14th annual conference comp on Genetic and evolutionary computation,page to appear, New York, NY, USA. ACM Press.

Venturini, G. (1993).SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on MachineLearning, pages 280–296. Springer-Verlag.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 28 / 29

Page 64: Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

Questions or comments?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 29 / 29