Post-processing Operators for Decision Lists

Post on 07-Jul-2015

729 views 0 download

Tags:

Transcript of Post-processing Operators for Decision Lists

BioHEL SystemOur approach

ResultsSummary

Post-processing Operators forDecision Lists

María A. Franco

Supervisor: Jaume BacarditUniversity of Nottingham, UK,

ICOS Research Group,School of Computer Science

mxf@cs.nott.ac.uk

June 12, 2012

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 1 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of my PhD projectTo enhance evolutionary learning systems based on IRL(BioHEL) to work better with large scale datasets.

How have we been doing this?Analysing the weaknesses of the system in differentdomains [Franco et al., 2012a]

Improving the execution time by means of GPGPUs[Franco et al., 2010]

Developing theoretical models that allow us to adaptparameters within the system [Franco et al., 2011]

Improving the quality of the final solutions by means oflocal search (memetic operators) [Franco et al., 2012b]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 2 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of this workTo improve the quality of the decision lists by means of localsearch (memetic operators)

Decision lists are a widespread paradigm in rule learning,guided local search and supervised learning.

ExamplePittsburgh Learning Classifier SystemsRule induction systems in mainstream machine learning(PART, CN2, JRip)

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29

BioHEL SystemOur approach

ResultsSummary

Motivation

Goal of this workTo improve the quality of the decision lists by means of localsearch (memetic operators)

Decision lists are a widespread paradigm in rule learning,guided local search and supervised learning.

ExamplePittsburgh Learning Classifier SystemsRule induction systems in mainstream machine learning(PART, CN2, JRip)

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 3 / 29

BioHEL SystemOur approach

ResultsSummary

Outline

1 BioHELAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

2 Our approach: Post-processing the rulesSwappingPruningCleaning

3 Results

4 SummaryWhere to go from here?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 4 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Introduction to the BioHEL System

BIOinformatics-oriented Hierarchical Evolutionary Learning- BioHEL [Bacardit et al., 2009]

BioHEL is an evolutionary learning system that employsthe Iterative Rule Learning (IRL) paradigmBioHEL was especially designed to cope with large scaledatasets

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 5 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Attribute List Knowledge Representation

Meta-representation to handle large amount of discreteand continuous attributes fast [Bacardit and Krasnogor, 2009].

ALKR Classifier Example

numAtt

predicates

class

whichAtt

3

0

0.70.5

1

0.3

offsetPred 0

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 6 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Attribute List Knowledge Representation

Discrete attributesGABIL representation

F1 F2 F3100 01 1101ABC DE FGHI

F1 = A ∧ F2 = E ∧ F3 = (F ∨ G ∨ I)

Continuous attributesHyper-rectangle representation

C1 = [0.1,0.3] ∧ C2 = [0.7,0.9]

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 7 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

Solutions generated by the BioHEL system

Since BioHEL uses IRL [Venturini, 1993] the solutions arehierarchical sets of rules ⇒ decision lists

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 8 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

How can the rules be improved further?

We encountered the following problems:The rules were learned in the wrong order

Larger rulesets!

Example

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 9 / 29

BioHEL SystemOur approach

ResultsSummary

BioHEL SystemAttribute List Knowledge RepresentationStructure of the solutionsWhat is the problem?

How can the rules be improved further?

We encountered the following problems:The rules did not have the correct specificity

The number of attributes expressed was rather high!

ExampleProblem:x1 = 1 ∧ x3 = 0

000 = 0 100 = 1001 = 0 101 = 0010 = 0 110 = 1011 = 0 111 = 0

Goodx1 = 1 ∧ x3 = 0

Over-specificx1 = 1 ∧ x2 = 1 ∧ x3 = 0x1 = 1 ∧ x2 = 0 ∧ x3 = 0

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 10 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruningCleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruningCleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 11 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Rule Swapping

Consist is swapping the order of the rules in the finalrulesets.Which rules shall we swap? ⇒ Similarities

Measure of similarity

S(i , j) =DisNA

∑Disk Sk (i , j)∑Dis

k numVals(k)+

RealNA

Real∑k

Sk (i , j) +MiNA

Measures the overlapping between rules

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 12 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 13 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Helps eraseunnecessary rulesIt does not ensure thefinal rule set is minimalIt has to reevaluate therules in the new order ineach iteration

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 14 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 15 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Rule pruning

Drops attributes that do not affect the accuracy of the rules.

ExampleProblem:x1 = 1 ∧ x3 = 0

000 = 0 100 = 1001 = 0 101 = 0010 = 0 110 = 1011 = 0 111 = 0

Goodx1 = 1 ∧ x3 = 0

Over-specificx1 = 1 ∧ x2 = 1 ∧ x3 = 0x1 = 1 ∧ x2 = 0 ∧ x3 = 0

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 16 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Our approach: Post-processing the rules

Ruleset-wise operatorsRule swapping

Rule-wise operatorsPruning ⇒ Wait! This does not work if the other attributesare not correctly specified!Cleaning

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 17 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

Rule cleaning

In the χary domain is not always possible to drop attributesif the correct attributes are misaligned

ExampleProblem:x1 nominal {a,b,c,d,e}x2 nominal {w,y,z}x3 nominal {m,n}

Rule 1:x1 = (a ∨ b) ∧ x2 = w

Generated Rule:x1 = (a ∨ b ∨ c) ∧ x2 = w ∧ x3 = m

We need to deactivate literals in the attributes

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 18 / 29

BioHEL SystemOur approach

ResultsSummary

SwappingPruningCleaning

How does it works?

Cleaning approaches:CL - Focus on the positivesCL2 - Do not infer

(- - - - ( (+ - + + + + - + -+) ) - - -) CL2 CLOLD OLDCL CL2

1 1 1 0 1 1a b c d e f

Values covered by possitive examples: a,b,cValues covered by negative examples: c,e

1 1 1 0 0 0a b c d e f

1 1 1 0 0 1a b c d e f

CL CL2

OLD

Continuous

Discrete

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 19 / 29

BioHEL SystemOur approach

ResultsSummary

Experimental design

We analysed the operators over final rulesets generatedwith 35 real world problems3 stages of experiments

Independent operatorsCombinations between CL and PRCombinations with the SW operator

Questions

Where are the most significant improvements?

Are the results significant?

What about the computational time?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29

BioHEL SystemOur approach

ResultsSummary

Experimental design

We analysed the operators over final rulesets generatedwith 35 real world problems3 stages of experiments

Independent operatorsCombinations between CL and PRCombinations with the SW operator

Questions

Where are the most significant improvements?

Are the results significant?

What about the computational time?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 20 / 29

BioHEL SystemOur approach

ResultsSummary

Results of the operators independently%

of va

riation

−20

−15

−10

−5

0

−30

−25

−20

−15

−10

−5

0

−3

−2

−1

0

1

2

−4

−2

0

2

Atts

Rules

Test_acc

Test_ensemble

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irs

lab

lym

pen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

Algorithm

CL

CL2

PR

SW

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 21 / 29

BioHEL SystemOur approach

ResultsSummary

Results of combining CL and PR%

of va

riation

−30

−25

−20

−15

−10

−5

0

−4

−3

−2

−1

0

1

2

−4

−2

0

2

4

CL

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irsla

blympen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

CL2

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irsla

blympen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

Atts

Te

st_

acc

Te

st_

en

se

mble

Algorithm

CL−PR

PR−CL

PR−CL−PR

CL2−PR

PR−CL2

PR−CL2−PR

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 22 / 29

BioHEL SystemOur approach

ResultsSummary

Results of combining CL, PR and SW%

of va

riation

−25

−20

−15

−10

−5

0

−30

−25

−20

−15

−10

−5

0

−3

−2

−1

0

1

2

−4

−2

0

2

4

Atts

Rules

Test_acc

Test_ensemble

Adult

C−4

CN

CN

−bin

KD

DC

up

ParM

XS

S1

bal

bpa

bre

cmc

col

cr−a

gls

h−c1

h−h

h−s

hep

ion

irsla

blympen

pim

prt

sat

son

thyvo

tw

avw

bcd

wdbc

win

ew

pbc

zoo

Algorithm

CL−SW

CL2−SW

PR−SW

PR−CL2−PR−SW

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 23 / 29

BioHEL SystemOur approach

ResultsSummary

Are the results significant?

Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).

Test Test # Rules # Attsacc ensem

P-Values 0.708 0.962 8.9e-09 2.2e-16

Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30

CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?

CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29

BioHEL SystemOur approach

ResultsSummary

Are the results significant?

Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).

Test Test # Rules # Attsacc ensem

P-Values 0.708 0.962 8.9e-09 2.2e-16

Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30

CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?

CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29

BioHEL SystemOur approach

ResultsSummary

Are the results significant?

Table: Rankings of the Friedman statistical tests. ? indicates that thealgorithm is significantly better (Holm test with 99% confidence).

Test Test # Rules # Attsacc ensem

P-Values 0.708 0.962 8.9e-09 2.2e-16

Base 7.80 7.07 3.73 10.84CL 7.73 7.86 – 10.84CL2 7.64 7.84 – 10.84PR 7.57 7.21 – 5.53 ?SW 7.51 6.60 2.59 ? 11.30

CL-PR 6.37 7.29 – 3.97 ?PR -CL 6.67 7.31 – 5.53 ?PR-CL-PR 5.87 6.79 – 1.51 ?CL2-PR 6.59 6.79 – 5.81 ?PR -CL2 6.89 7.16 – 5.71 ?PR-CL2-PR 6.36 6.91 – 2.29 ?

CL-SW 7.14 6.51 2.07 ? 11.23CL2-SW 7.46 6.83 2.40 ? 11.17PR-SW 6.94 6.29 2.14 ? 5.94 ?PR-CL2-PR-SW 6.46 6.54 2.07 ? 2.47 ?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 24 / 29

BioHEL SystemOur approach

ResultsSummary

How long does the post-processing takes?

Table: Execution time of the application of each one of the differentoperators independently

Prob Ins Rules Atts CL2 (s) PR (s) SW (s)

CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27

Swapping is very slow... It depends on the number of instancesand number of rules generated.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29

BioHEL SystemOur approach

ResultsSummary

How long does the post-processing takes?

Table: Execution time of the application of each one of the differentoperators independently

Prob Ins Rules Atts CL2 (s) PR (s) SW (s)

CN-bin 493788 38.20 ± 1.85 7.12±0.73 17.44±0.76 20.52±0.82 157.51±76.42Adult 43960 194.24 ± 10.26 10.18±2.80 49.87±3.85 69.60±10.22 5855.04±874.14CN 234638 253.34 ± 12.48 10.09±2.78 314.02±26.01 631.68±70.09 43097.44±5429.48KDD 444619 188.84 ± 13.52 4.25±2.99 213.95±18.25 375.85±59.00 23791.21±5041.45C-4 60803 316.14 ± 19.10 9.96±3.23 96.49±8.33 192.21±24.76 18763.03±2614.41ParMX 235929 394.34 ± 19.39 9.00±0.01 405.77±37.05 619.20±82.02 106343.70±13094.78SS1 75583 773.26 ± 30.42 11.49±3.40 293.70±23.26 649.51±85.94 133415.03±19160.27

Swapping is very slow... It depends on the number of instancesand number of rules generated.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 25 / 29

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

Summary and next steps

SummaryThe operators manage to reduce the number of rules andexpressed attributes in 30% in some cases.

Next stepsApply the CL and PR operators during the learning processInvestigate other measures of similarities among rulesApply these operators over other systems

Different representations

CUDA accelerated operators?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

Summary and next steps

SummaryThe operators manage to reduce the number of rules andexpressed attributes in 30% in some cases.

Next stepsApply the CL and PR operators during the learning processInvestigate other measures of similarities among rulesApply these operators over other systems

Different representations

CUDA accelerated operators?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 26 / 29

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

References I

Bacardit, J., Burke, E., and Krasnogor, N. (2009).Improving the scalability of rule-based evolutionary learning.Memetic Computing, 1(1):55–67.

Bacardit, J. and Krasnogor, N. (2009).A mixed discrete-continuous attribute list representation for large scale classification domains.In GECCO ’09: Proceedings of the 11th Annual conference on Genetic and evolutionary computation, pages1155–1162, New York, NY, USA. ACM Press.

Franco, M., Krasnogor, N., and Bacardit, J. (2012a).Analysing biohel using challenging boolean functions.Evolutionary Intelligence, 5:87–102.10.1007/s12065-012-0080-9.

Franco, M. A., Krasnogor, N., and Bacardit, J. (2010).Speeding up the evaluation of evolutionary learning systems using GPGPUs.In GECCO ’10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages1039–1046, New York, NY, USA. ACM.

Franco, M. A., Krasnogor, N., and Bacardit, J. (2011).Modelling the initialisation stage of the alkr representation for discrete domains and gabil encoding.In Proceedings of the 13th annual conference on Genetic and evolutionary computation, GECCO ’11, pages1291–1298, New York, NY, USA. ACM.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 27 / 29

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

References II

Franco, M. A., Krasnogor, N., and Bacardit, J. (2012b).Postprocessing operators for decision lists.In GECCO ’12: Proceedings of the 14th annual conference comp on Genetic and evolutionary computation,page to appear, New York, NY, USA. ACM Press.

Venturini, G. (1993).SIA: a supervised inductive algorithm with genetic search for learning attributes based concepts.In Brazdil, P. B., editor, Machine Learning: ECML-93 - Proceedings of the European Conference on MachineLearning, pages 280–296. Springer-Verlag.

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 28 / 29

BioHEL SystemOur approach

ResultsSummary

Where to go from here?

Questions or comments?

María A. Franco. University of Nottingham Post-processing Operators for Decision Lists 29 / 29