Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... ·...

45
Advanced classifica-on methods

Transcript of Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... ·...

Page 1: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Advancedclassifica-onmethods

Page 2: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

EnsembleMethods

•  Constructasetofclassifiersfromthetrainingdata

•  Predictclasslabelofpreviouslyunseenrecordsbyaggrega-ngpredic-onsmadebymul-pleclassifiers

Page 3: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

GeneralIdeaOriginal

Training data

....D1 D2 Dt-1 Dt

D

Step 1:Create Multiple

Data Sets

C1 C2 Ct -1 Ct

Step 2:Build Multiple

Classifiers

C*Step 3:

CombineClassifiers

Page 4: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Whydoesitwork?

•  Supposethereare25baseclassifiers– Eachclassifierhaserrorrate,ε=0.35– Assumeclassifiersareindependent– Probabilitythattheensembleclassifiermakesawrongpredic-on:

∑=

− =−⎟⎟⎠

⎞⎜⎜⎝

⎛25

13

25 06.0)1(25

i

ii

iεε

Page 5: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

ExamplesofEnsembleMethods

•  Howtogenerateanensembleofclassifiers?– Bagging

– Boos-ng

Page 6: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Bagging

•  Samplingwithreplacement

•  Buildclassifieroneachbootstrapsample

•  Eachsamplehasprobability(1–1/n)nofbeingselected

Original Data 1 2 3 4 5 6 7 8 9 10Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

Page 7: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Boos-ng

•  Anitera-veproceduretoadap-velychangedistribu-onoftrainingdatabyfocusingmoreonpreviouslymisclassifiedrecords–  Ini-ally,allNrecordsareassignedequalweights– Unlikebagging,weightsmaychangeattheendofboos-nground

Page 8: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Boos-ng

•  Recordsthatarewronglyclassifiedwillhavetheirweightsincreased

•  Recordsthatareclassifiedcorrectlywillhavetheirweightsdecreased

Original Data 1 2 3 4 5 6 7 8 9 10Boosting (Round 1) 7 3 2 8 7 9 4 10 6 3Boosting (Round 2) 5 4 9 4 2 5 1 7 4 2Boosting (Round 3) 4 4 8 10 4 5 4 6 3 4

• Example4ishardtoclassify

• Itsweightisincreased,thereforeitismorelikelytobechosenagaininsubsequentrounds

Page 9: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Example:AdaBoost

•  Baseclassifiers:C1,C2,…,CT

•  Errorrate:

•  Importanceofaclassifier:

( )∑=

≠=N

jjjiji yxCw

N 1

)(1δε

⎟⎟⎠

⎞⎜⎜⎝

⎛ −=

i

ii ε

εα

1ln21

Page 10: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Example:AdaBoost

•  Weightupdate:

•  Ifanyintermediateroundsproduceerrorratehigherthan50%,theweightsarerevertedbackto1/nandtheresamplingprocedureisrepeated

•  Classifica-on:

factor ionnormalizat theis where

)( ifexp)( ifexp)(

)1(

j

iij

iij

j

jij

i

Z

yxCyxC

Zww

j

j

⎪⎩

⎪⎨⎧

==

−+

α

α

( )∑=

==T

jjj

yyxCxC

1

)(maxarg)(* δα

Page 11: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

BoostingRound 1 + + + -- - - - - -

0.0094 0.0094 0.4623B1

a = 1.9459

Illustra-ngAdaBoostDatapointsfortraining

Ini-alweightsforeachdatapoint

OriginalData + + + -- - - - + +

0.1 0.1 0.1

Page 12: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Illustra-ngAdaBoostBoostingRound 1 + + + -- - - - - -

BoostingRound 2 - - - -- - - - + +

BoostingRound 3 + + + ++ + + + + +

Overall + + + -- - - - + +

0.0094 0.0094 0.4623

0.3037 0.0009 0.0422

0.0276 0.1819 0.0038

B1

B2

B3

a = 1.9459

a = 2.9323

a = 3.8744

Page 13: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Rule-BasedClassifier

•  Classifyrecordsbyusingacollec-onof“if…then…”rules

•  Rule:(Condi&on)→y–  where

•  Condi&onisaconjunc-onsofa`ributes•  yistheclasslabel

–  LHS:ruleantecedentorcondi-on–  RHS:ruleconsequent–  Examplesofclassifica-onrules:

•  (BloodType=Warm)∧(LayEggs=Yes)→Birds•  (TaxableIncome<50K)∧(Refund=Yes)→Evade=No

Page 14: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Rule-basedClassifier(Example)

R1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

Name Blood Type Give Birth Can Fly Live in Water Classhuman warm yes no no mammalspython cold no no no reptilessalmon cold no no yes fisheswhale warm yes no yes mammalsfrog cold no no sometimes amphibianskomodo cold no no no reptilesbat warm yes yes no mammalspigeon warm no yes no birdscat warm yes no no mammalsleopard shark cold yes no yes fishesturtle cold no no sometimes reptilespenguin warm no no sometimes birdsporcupine warm yes no no mammalseel cold no no yes fishessalamander cold no no sometimes amphibiansgila monster cold no no no reptilesplatypus warm no no no mammalsowl warm no yes no birdsdolphin warm yes no yes mammalseagle warm no yes no birds

Page 15: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Applica-onofRule-BasedClassifier

•  Arulercoversaninstancexifthea`ributesoftheinstancesa-sfythecondi-onoftheruleR1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

TheruleR1coversahawk=>BirdTheruleR3coversthegrizzlybear=>Mammal

Name Blood Type Give Birth Can Fly Live in Water Classhawk warm no yes no ?grizzly bear warm yes no no ?

Page 16: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

RuleCoverageandAccuracy•  Coverageofarule:– Frac-onofrecordsthatsa-sfytheantecedentofarule

•  Accuracyofarule:– Frac-onofrecordsthatsa-sfyboththeantecedentandconsequentofarule

Tid Refund Marital Status

Taxable Income Class

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

(Status=Single)→No

Coverage=40%,Accuracy=50%

Page 17: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

HowdoesRule-basedClassifierWork?R1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

AlemurtriggersruleR3,soitisclassifiedasamammalAturtletriggersbothR4andR5Adogfishsharktriggersnoneoftherules

Name Blood Type Give Birth Can Fly Live in Water Classlemur warm yes no no ?turtle cold no no sometimes ?dogfish shark cold yes no yes ?

Page 18: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

Characteris-csofRule-BasedClassifier

•  Mutuallyexclusiverules– Classifiercontainsmutuallyexclusiverulesiftherulesareindependentofeachother

– Everyrecordiscoveredbyatmostonerule

•  Exhaus-verules– Classifierhasexhaus-vecoverageifitaccountsforeverypossiblecombina-onofa`ributevalues

– Eachrecordiscoveredbyatleastonerule

Page 19: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

FromDecisionTreesToRules

YESYESNONO

NONO

NONO

Yes No

{Married}{Single,

Divorced}

< 80K > 80K

Taxable Income

Marital Status

Refund

Classification Rules

(Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes

(Refund=No, Marital Status={Married}) ==> No

Rulesaremutuallyexclusiveandexhaus-ve

Rulesetcontainsasmuchinforma-onasthetree

Page 20: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

RulesCanBeSimplified

YESYESNONO

NONO

NONO

Yes No

{Married}{Single,

Divorced}

< 80K > 80K

Taxable Income

Marital Status

Refund

Tid Refund Marital Status

Taxable Income Cheat

1 Yes Single 125K No

2 No Married 100K No

3 No Single 70K No

4 Yes Married 120K No

5 No Divorced 95K Yes

6 No Married 60K No

7 Yes Divorced 220K No

8 No Single 85K Yes

9 No Married 75K No

10 No Single 90K Yes 10

Ini-alRule:(Refund=No)∧(Status=Married)→No

SimplifiedRule:(Status=Married)→No

Page 21: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

EffectofRuleSimplifica-on•  Rulesarenolongermutuallyexclusive– Arecordmaytriggermorethanonerule– Solu-on?•  Orderedruleset•  Unorderedruleset–usevo-ngschemes

•  Rulesarenolongerexhaus-ve– Arecordmaynottriggeranyrules– Solu-on?•  Useadefaultclass

Page 22: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

OrderedRuleSet

•  Rulesarerankorderedaccordingtotheirpriority–  Anorderedrulesetisknownasadecisionlist

•  Whenatestrecordispresentedtotheclassifier–  Itisassignedtotheclasslabelofthehighestrankedruleithas

triggered–  Ifnoneoftherulesfired,itisassignedtothedefaultclass

R1:(GiveBirth=no)∧(CanFly=yes)→BirdsR2:(GiveBirth=no)∧(LiveinWater=yes)→FishesR3:(GiveBirth=yes)∧(BloodType=warm)→MammalsR4:(GiveBirth=no)∧(CanFly=no)→Rep-lesR5:(LiveinWater=some-mes)→Amphibians

Name Blood Type Give Birth Can Fly Live in Water Classturtle cold no no sometimes ?

Page 23: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

RuleOrderingSchemes

•  Rule-basedordering–  Individualrulesarerankedbasedontheirquality

•  Class-basedordering–  Rulesthatbelongtothesameclassappeartogether

Rule-based Ordering

(Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income<80K) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes

(Refund=No, Marital Status={Married}) ==> No

Class-based Ordering

(Refund=Yes) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income<80K) ==> No

(Refund=No, Marital Status={Married}) ==> No

(Refund=No, Marital Status={Single,Divorced},Taxable Income>80K) ==> Yes

Page 24: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

BuildingClassifica-onRules

•  DirectMethod:•  Extractrulesdirectlyfromdata•  e.g.:RIPPER,CN2,Holte’s1R

•  IndirectMethod:•  Extractrulesfromotherclassifica-onmodels(e.g.decisiontrees,neuralnetworks,etc).•  e.g:C4.5rules

Page 25: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

DirectMethod:Sequen-alCovering

1.  Startfromanemptyrule2.  GrowaruleusingtheLearn-One-Rule

func-on3.  Removetrainingrecordscoveredbytherule4.  RepeatStep(2)and(3)un-lstopping

criterionismet

Page 26: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

ExampleofSequen-alCovering

(i) Original Data (ii) Step 1

Page 27: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

ExampleofSequen-alCovering…

(iii) Step 2

R1

(iv) Step 3

R1

R2

Page 28: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

AspectsofSequen-alCovering

•  RuleGrowing

•  InstanceElimina-on

•  RuleEvalua-on

•  StoppingCriterion

•  RulePruning

Page 29: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

RuleGrowing

•  Twocommonstrategies

Status =Single

Status =Divorced

Status =Married

Income> 80K...

Yes: 3No: 4{ }

Yes: 0No: 3

Refund=No

Yes: 3No: 4

Yes: 2No: 1

Yes: 1No: 0

Yes: 3No: 1

(a) General-to-specific

Refund=No,Status=Single,Income=85K(Class=Yes)

Refund=No,Status=Single,Income=90K(Class=Yes)

Refund=No,Status = Single(Class = Yes)

(b) Specific-to-general

Page 30: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

RuleGrowing(Examples)•  CN2Algorithm:

–  Startfromanemptyconjunct:{}–  Addconjunctsthatminimizestheentropymeasure:{A},{A,B},…–  Determinetheruleconsequentbytakingmajorityclassofinstancescovered

bytherule

•  RIPPERAlgorithm:–  Startfromanemptyrule:{}=>class–  AddconjunctsthatmaximizesFOIL’sinforma-ongainmeasure:

•  R0:{}=>class(ini-alrule)•  R1:{A}=>class(rulealeraddingconjunct)•  Gain(R0,R1)=t[log(p1/(p1+n1))–log(p0/(p0+n0))]•  wheret:numberofposi-veinstancescoveredbybothR0andR1p0:numberofposi-veinstancescoveredbyR0n0:numberofnega-veinstancescoveredbyR0p1:numberofposi-veinstancescoveredbyR1n1:numberofnega-veinstancescoveredbyR1

Page 31: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

InstanceElimina-on

•  Whydoweneedtoeliminateinstances?–  Otherwise,thenextruleis

iden-caltopreviousrule•  Whydoweremoveposi-ve

instances?–  Ensurethatthenextruleis

different•  Whydoweremovenega-ve

instances?–  Preventunderes-ma-ng

accuracyofrule–  ComparerulesR2andR3in

thediagram

class = +

class = -

+

+ +

+++

++

++

++

+

+

+

+

++

+

+

-

-

--- -

-

--

- -

-

-

-

-

--

-

-

-

-

+

+

++

+

+

+

R1R3 R2

+

+

Page 32: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

RuleEvalua-on

•  Metrics:– Accuracy

– Laplace

– M-es-mate

knnc++

=1

knkpnc

++

=

n:Numberofinstancescoveredbyrule

nc:Numberofinstancescoveredbyrule

k:Numberofclasses

p:Priorprobability

nnc=

Page 33: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

StoppingCriterionandRulePruning

•  Stoppingcriterion– Computethegain–  Ifgainisnotsignificant,discardthenewrule

•  RulePruning– Similartopost-pruningofdecisiontrees– ReducedErrorPruning:

•  Removeoneoftheconjunctsintherule•  Compareerrorrateonvalida-onsetbeforeandalerpruning•  Iferrorimproves,prunetheconjunct

Page 34: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

SummaryofDirectMethod•  Growasinglerule

•  RemoveInstancesfromrule

•  Prunetherule(ifnecessary)

•  AddruletoCurrentRuleSet

•  Repeat

Page 35: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

DirectMethod:RIPPER•  For2-classproblem,chooseoneoftheclassesasposi-ve

class,andtheotherasnega-veclass–  Learnrulesforposi-veclass–  Nega-veclasswillbedefaultclass

•  Formul--classproblem–  Ordertheclassesaccordingtoincreasingclassprevalence(frac-onofinstancesthatbelongtoapar-cularclass)

–  Learntherulesetforsmallestclassfirst,treattherestasnega-veclass

–  Repeatwithnextsmallestclassasposi-veclass

Page 36: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

DirectMethod:RIPPER•  Growingarule:

–  Startfromemptyrule–  AddconjunctsaslongastheyimproveFOIL’sinforma-ongain–  Stopwhenrulenolongercoversnega-veexamples–  Prunetheruleimmediatelyusingincrementalreducederrorpruning

–  Measureforpruning:v=(p-n)/(p+n)•  p:numberofposi-veexamplescoveredbytheruleinthevalida-onset

•  n:numberofnega-veexamplescoveredbytheruleinthevalida-onset

–  Pruningmethod:deleteanyfinalsequenceofcondi-onsthatmaximizesv

Page 37: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

DirectMethod:RIPPER

•  BuildingaRuleSet:– Usesequen-alcoveringalgorithm

•  Findsthebestrulethatcoversthecurrentsetofposi-veexamples•  Eliminatebothposi-veandnega-veexamplescoveredbytherule

– Each-mearuleisaddedtotheruleset,computethenewdescrip-onlength•  stopaddingnewruleswhenthenewdescrip-onlengthisdbitslongerthanthesmallestdescrip-onlengthobtainedsofar

Page 38: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

DirectMethod:RIPPER

•  Op-mizetheruleset:– ForeachrulerintherulesetR

•  Consider2alterna-verules:–  Replacementrule(r*):grownewrulefromscratch–  Revisedrule(r’):addconjunctstoextendtheruler

•  Comparetherulesetforragainsttherulesetforr*andr’•  ChooserulesetthatminimizesMDLprinciple

– Repeatrulegenera-onandruleop-miza-onfortheremainingposi-veexamples

Page 39: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

IndirectMethods

Rule Set

r1: (P=No,Q=No) ==> -r2: (P=No,Q=Yes) ==> +r3: (P=Yes,R=No) ==> +r4: (P=Yes,R=Yes,Q=No) ==> -r5: (P=Yes,R=Yes,Q=Yes) ==> +

P

Q R

Q- + +

- +

No No

No

Yes Yes

Yes

No Yes

Page 40: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

IndirectMethod:C4.5rules

•  Extractrulesfromanunpruneddecisiontree•  Foreachrule,r:A→y,– consideranalterna-veruler’:A’ →ywhereA’isobtainedbyremovingoneoftheconjunctsinA

– Comparethepessimis-cerrorrateforragainstallr’s

– Pruneifoneofther’shaslowerpessimis-cerrorrate

– Repeatun-lwecannolongerimprovegeneraliza-onerror

Page 41: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

IndirectMethod:C4.5rules

•  Insteadoforderingtherules,ordersubsetsofrules(classordering)– Eachsubsetisacollec-onofruleswiththesameruleconsequent(class)

– Computedescrip-onlengthofeachsubset•  Descrip-onlength=L(error)+gL(model)•  gisaparameterthattakesintoaccountthepresenceofredundanta`ributesinaruleset(defaultvalue=0.5)

Page 42: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

ExampleName Give Birth Lay Eggs Can Fly Live in Water Have Legs Class

human yes no no no yes mammalspython no yes no no no reptilessalmon no yes no yes no fisheswhale yes no no yes no mammalsfrog no yes no sometimes yes amphibianskomodo no yes no no yes reptilesbat yes no yes no yes mammalspigeon no yes yes no yes birdscat yes no no no yes mammalsleopard shark yes no no yes no fishesturtle no yes no sometimes yes reptilespenguin no yes no sometimes yes birdsporcupine yes no no no yes mammalseel no yes no yes no fishessalamander no yes no sometimes yes amphibiansgila monster no yes no no yes reptilesplatypus no yes no no yes mammalsowl no yes yes no yes birdsdolphin yes no no yes no mammalseagle no yes yes no yes birds

Page 43: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

C4.5versusC4.5rulesversusRIPPERC4.5rules:

(GiveBirth=No,CanFly=Yes)→Birds

(GiveBirth=No,LiveinWater=Yes)→Fishes

(GiveBirth=Yes)→Mammals

(GiveBirth=No,CanFly=No,LiveinWater=No)→Rep-les

()→Amphibians

GiveBirth?

Live InWater?

CanFly?

Mammals

Fishes Amphibians

Birds Reptiles

Yes No

Yes

Sometimes

No

Yes No

RIPPER:

(LiveinWater=Yes)→Fishes

(HaveLegs=No)→Rep-les

(GiveBirth=No,CanFly=No,LiveInWater=No)

→Rep-les

(CanFly=Yes,GiveBirth=No)→Birds

()→Mammals

Page 44: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

C4.5versusC4.5rulesversusRIPPER

PREDICTED CLASS Amphibians Fishes Reptiles Birds MammalsACTUAL Amphibians 0 0 0 0 2CLASS Fishes 0 3 0 0 0

Reptiles 0 0 3 0 1Birds 0 0 1 2 1Mammals 0 2 1 0 4

PREDICTED CLASS Amphibians Fishes Reptiles Birds MammalsACTUAL Amphibians 2 0 0 0 0CLASS Fishes 0 2 0 0 1

Reptiles 1 0 3 0 0Birds 1 0 0 3 0Mammals 0 0 1 0 6

C4.5andC4.5rules:

RIPPER:

Page 45: Advanced classificaon methods - unipi.itdidawiki.cli.di.unipi.it/lib/exe/fetch.php/dm/le... · Applicaon of Rule-Based Classifier • A rule r covers an instance x if the aributes

AdvantagesofRule-BasedClassifiers

•  Ashighlyexpressiveasdecisiontrees•  Easytointerpret•  Easytogenerate•  Canclassifynewinstancesrapidly•  Performancecomparabletodecisiontrees