Download - IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Page 1: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

A cost-sensitive measure to quantify theperformance of an imprecise classifier

Joaquín Abellán, Andrés. R. Masegosa

Department of Computer Science and Artificial IntelligenceUniversity of Granada

Granada, Spain

Oviedo (Asturias), December 2012

ERCIM’12 Oviedo (Spain) 1/30

Page 2: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Outline

1 Introduction

2 Previous Knowledge

3 Imprecise classification with credal decision trees

4 Cost-sensitive imprecise classification

5 Experimental Evaluation

6 Conclusions & Future Works

Page 3: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

Part I

Introduction

Page 4: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

Supervisded Classification

A probabilistic classifier [5] models the conditional probability of the classvariable C given a set of predictive variables X = {X1, ...,XN}:

p(C|X1, ...,XN)

Under supervised classification settings, this conditional probabiblity isestimated from a set of n labelled samples:

D = {(c1, x1), ..., (cn, xn)}

How this conditional probability is learnt depends of the particular classificationmodel: decision trees , Bayesian network classifiers, AODE, ensembles ofdecision trees...

The prediction c? of a new data case xnew is the class with the highestposterior probability:

c? = arg maxc∈Val(C)

P(c|xnew )

Page 5: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

p(C|X1, ...,XN)

D = {(c1, x1), ..., (cn, xn)}

P(c|xnew )

Page 6: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

p(C|X1, ...,XN)

D = {(c1, x1), ..., (cn, xn)}

P(c|xnew )

Page 7: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

p(C|X1, ...,XN)

D = {(c1, x1), ..., (cn, xn)}

P(c|xnew )

Introduction

Imprecise Classification

A imprecise probabilistic classifier [8] learns a set of the conditionalprobabilities P, not a single conditional probability, for the same set of labelleddata samples:

PD(C|X1, ...,XN)

How this set of conditional probabilities is learnt depends of the particularimprecise classification model: credal decision trees [1], naive credalclassifier[4]...

They are based on the so-called Imprecise probability theory [6].

The prediction for a new data case xnew is not always a single class, it can be aset of different classes: the set of non-dominated classes, U.

We have a set of different conditional prob. distributions:

{p(C|xnew ) : p ∈ P}

Page 9: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

PD(C|X1, ...,XN)

Page 10: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

PD(C|X1, ...,XN)

Page 11: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

Measuring the performance of an imprecise classifier

Imprecise classifiers produce set-valuated predictions, U.

Descriptive measures [4]:

Determinancy: the percentage of instances for which the impreciseclassifier returns a unique state.Single Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified determinately.Set Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified indeterminately.Indeterminacy Size: The average size of the set of non-dominatedclasses.

Discounted Accuracy measure [4]:

DACC =1

nTest

∑i

(accurate)i

|Ui |

Page 12: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

Measuring the performance of an imprecise classifier

Imprecise classifiers produce set-valuated predictions, U.

Descriptive measures [4]:

Determinancy: the percentage of instances for which the impreciseclassifier returns a unique state.Single Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified determinately.Set Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified indeterminately.Indeterminacy Size: The average size of the set of non-dominatedclasses.

Discounted Accuracy measure [4]:

DACC =1

nTest

∑i

(accurate)i

|Ui |

Page 13: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Introduction

A cost-sensitive measure to quantify the performance of animprecise classifier

We present a tentative method for quantifying the performance of an impreciseclassifiers with miss-classification costs.

Classification costs considers the weight that an expert would give to eachtype of error. E.g.:

If an email is spam or not.If a mushroom is edible or poisonous.

The set of non-dominated classes are chosen differently in the presence ofmisclassification-costs.

A new cost-sensitive measure for imprecise classifier is presented.

Page 14: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

Part II

Previous Knowledge

Page 15: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

Imprecise Dirichlet Model

The imprecise Dirichlet model (IDM) [7] was introduced by Walley for inference aboutthe probability distribution of a categorical variable.

p(ci ) = θi , θ = {θ1, ..., θK } and D data set of n i.i.d. samples.

Assuming a Dirichlet prior, π(θ) =∏θ

stj−1j function, the posterior :

π(θ|D) =K∏

j=1

θnj+stj−1j

s is the equivalent sample size or number of hidden samples.tj is the proportion of hidden samples in cj (e.g. uniform tj = 1

K ).

Exptected value:

E(θj |D) =nj + stjn + s

Page 16: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

π(θ|D) =K∏

j=1

θnj+stj−1j

K ).

Exptected value:

Page 17: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

π(θ|D) =K∏

j=1

θnj+stj−1j

K ).

Exptected value:

Page 18: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

π(θ|D) =K∏

j=1

θnj+stj−1j

K ).

Exptected value:

Page 19: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

The IDM assumes prior ignorance and considers a set of Dirichlet priors byvarying the paramters tj .

This is vacuous model: p(cj ) ∈ (0, 1)∀j .

Inferences are made by computing lower and upper probabilities:

p(cj |D) = inf0<tj<1

nj + stjn + s

=nj

n + s

p(cj |D) = sup0<tj<1

nj + stjn + s

=nj + sn + s

We have a credal set (convex and closed set) of posterior probabilies for C.

Parameter s determines how quickly the lower and upper probabilitiesconverge as more data become available.

Page 20: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

nj + stjn + s

=nj

n + s

nj + stjn + s

=nj + sn + s

Page 21: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

nj + stjn + s

=nj

n + s

nj + stjn + s

=nj + sn + s

Page 22: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

Stochastic and credal Dominance on probability intervals

How do we make predictions with imprecise classifiers?.

Select the class values are not defeated (non-dominated) underprobability terms.Use a dominance criterion.

Stochastic Dominance:

ci and cj have associated the intervals [li , ui ] and [lj , uj ], respectively.There is stochastic dominance of cj on ci iff lj ≥ ui .

Credal Dominance:

The probability on how any state of C happens is expressed by anon-empty credal set P.There is credal dominance of ci on cj iif p(ci ) ≥ p(cj ) ∀p ∈ P.

For the IDM model, both previous definitions are equivalent [3].

Page 23: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

Credal Dominance:

Page 24: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

Credal Dominance:

Page 25: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Previous Knowledge

Credal Dominance:

Imprecise classification with credal decision trees

Part III

Imprecise classification withcredal decision trees

Decision Trees

A decision tree, also called a classification tree, is a simple structure that canbe used as a classifier.

Each node represents an attribute variable

Each branch represents one of the states of this variable

Each tree leaf specifies an expected value of the class variable

Example of decision tree for three attribute variables Xi (i = 1, 2, 3), with twopossible values (0, 1) and a class variable C with cases or states c1, c2, c3:

Page 28: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Building Decision Trees from data

Imprecise Credal Decision Trees (ICDT)Associate to each node the most informative variable about the class C.

Imprecise Information Gain measure based on the maximum entropyof a credal set [2].

Probability intervals at the leaves are estimated using IDM

pσ(c) ∈ [nσ(c)nσ + s

,nσ(c) + s

nσ + s]

Page 29: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Building Decision Trees from data

Imprecise Credal Decision Trees (ICDT)Associate to each node the most informative variable about the class C.

Imprecise Information Gain measure based on the maximum entropyof a credal set [2].

Probability intervals at the leaves are estimated using IDM

pσ(c) ∈ [nσ(c)nσ + s

,nσ(c) + s

nσ + s]

Page 30: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Cost-sensitive imprecise classification

Part IV

Cost-sensitive impreciseclassification

Page 31: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Cost-sensitive supervised classification

We assume the existence of the following miss-classification cost matrix:

M =

m11 m12 · · · m1Km21 m22 · · · m2K

.... . .

...mK 1 · · · mK ,K−1 mK ,K

mij is the cost of prediction ci when the true class is cj .

Using the Bayes decision rule [5], we predict the class with minimumexpected posterior risk:

ct = arg minci∈Val(C)

R(ci |xnew )

R(ci |xnew) =K∑

j=1

mij P(cj |xnew )

Page 32: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

We assume the existence of the following miss-classification cost matrix:

M =

m11 m12 · · · m1Km21 m22 · · · m2K

.... . .

...mK 1 · · · mK ,K−1 mK ,K

mij is the cost of prediction ci when the true class is cj .

Using the Bayes decision rule [5], we predict the class with minimumexpected posterior risk:

ct = arg minci∈Val(C)

R(ci |xnew )

R(ci |xnew) =K∑

j=1

mij P(cj |xnew )

Page 33: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

We define the lower and upper posterior risks as follows:

The right hand side of the equations valid for the credal decision trees.

R(ci |xnew) =K∑

j=1

mij P(cj |xnew ) =K∑

j=1

mijnσ(cj )

nσ + s

R(ci |xnew) =K∑

j=1

mijnσ(cj ) + s

nσ + s

Cost-based dominance criteria (stochastic dominance)

ci dominates cj iif R(ci |xnew ) ≤ R(cj |xnew ).

This cost-senstive method also returns a set of non-dominated set of classes,UM .

Page 34: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

We define the lower and upper posterior risks as follows:

The right hand side of the equations valid for the credal decision trees.

R(ci |xnew) =K∑

j=1

mijnσ(cj )

nσ + s

R(ci |xnew) =K∑

j=1

mijnσ(cj ) + s

nσ + s

Cost-based dominance criteria (stochastic dominance)

ci dominates cj iif R(ci |xnew ) ≤ R(cj |xnew ).

This cost-senstive method also returns a set of non-dominated set of classes,UM .

Page 35: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

A cost-sensitive measure for imprecise classification

MIC =1N(−

∑i:Success

ln|Ui |K−

1K − 1

∑i:Error

αt,i ln K )

Success is quantified based on the number of non-dominated states produced.

Classification errors considers the miss-classification costs:

We consider the worst case error.

αt,i = maxcj∈Ui

mjt

Complete imprecise classifier: MIC = 0.

Classifier with perfect accuracy: MIC = ln K .

Page 36: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Evaluation

Part V

Page 37: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Set-up

The Aim

The presence of miss-classification costs affects the performance of an

imprecise classifier.

We compare two imprecise classifiers.

Imprecise credal decision trees (ICDT) and Naive Credal classifier(NCC) [4].

We employ to evaluation measures:

The DACC measure and the MIC measure with different cost-matrices.

Experiments on 40 UCI data sets using a 10-fold-cross validation scheme.

Statistical Tests for the comparison:

Corrected Paired T-Test at 5% significant level.

Wilconxon Test at 5% significant level.

Page 38: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Set-up

The Aim

The presence of miss-classification costs affects the performance of an

imprecise classifier.

We compare two imprecise classifiers.

Imprecise credal decision trees (ICDT) and Naive Credal classifier(NCC) [4].

We employ to evaluation measures:

The DACC measure and the MIC measure with different cost-matrices.

Experiments on 40 UCI data sets using a 10-fold-cross validation scheme.

Statistical Tests for the comparison:

Corrected Paired T-Test at 5% significant level.

Wilconxon Test at 5% significant level.

Page 39: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Set-up

Cost-Matrix (I)c1 c2 c3 c4

c1 0 1 1 1c2 2 0 2 2c3 3 3 0 3c4 4 4 4 0

Rows→ Real classes. Columns→ predicted classes. Classes are ordered fromhigh to low frequency.

This cost matrix assumes that miss-classification costs only depends of thereal class:

Diagnosis problem where the patient can be healthy or can suffer severaldiseases from mild to more severe.The cost of an error in the diagnosis mainly depends of the actual disease.

Page 40: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Set-up

Cost-Matrix (II)c1 c2 c3 c4

c1 0 2 3 4c2 1 0 3 4c3 1 2 0 4c4 1 2 3 0

This cost matrix assumes that miss-classification costs only depends of theclassifier prediction.

Textile industry different kinds of jeans have to be automatically classifiedinto different quality levels.The cost only depends of the action pursued by the prediction of theclassifier.

Page 41: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Set-up

Cost-Matrix (III) Cost-Matrix (IV)

c4 c3 c2 c1

c4 0 1 1 1c3 2 0 2 2c2 3 3 0 3c1 4 4 4 0

c4 c3 c2 c1c4 0 2 3 4c3 1 0 3 4c2 1 2 0 4c1 1 2 3 0

Same principles than Cost-Matrix (I) and Cost-Matrix (II), respectively.

The highest costs are associated to classes with the highest frequency.

Page 42: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Evaluation I

ICDT NCCDeterminacy: Av. 94.7% 58.0%Single Acc.: Av. 79.3% 84.4%

Set Acc.: Av. 89.0% 93.3%Indeterm. Size: Av. 5.1 5.5

ICDT and NCC performs differently as imprecise classifiers.

ICDT has a higher determinacy than NCC, but with lower accuracy inthose cases.ICDT returns slightly lower number of non-dominated classes but withslightly lower accuracy.

Page 43: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Evaluation II

ICDT NCCDACC: Av. 0.768 0.6035

Wilcoxon test *Paired t-test 22 4

ICDT is a better imprecise classifier than NCC.

Strong differences in favor of this ICDT.

Page 44: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Experimental Evaluation II

ICDT NCCMIC: Cost Matrix (I): Av. 0.796 0.790

Wilcoxon test = =Paired t-test 13 10

MIC: Cost Matrix (II): Av. 1.066 0.808Wilcoxon test *Paired t-test 17 9

MIC: Cost Matrix (III): Av. 0.936 1.492Wilcoxon test = =Paired t-test 15 11

MIC: Cost Matrix (IV): Av. 0.771 0.773Wilcoxon test = =Paired t-test 12 13

NCC can be competitive w.r.t. ICDT for some cost-matrices.

Page 45: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Conclusions and Future Works

Part VI

Page 46: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

A dominance criteria which considers miss-classification costswhen selecting the non-dominated classes.

A cost-sensitive measure to evaluate imprecise classifierswhich takes miss-classification costs into account.

The performance of a imprecise classifier is affected bymiss-classification costs.

ICDT outperforms NCC in terms of DACC (0/1 cost error).NCC performs competitevely for some cost-matrices.

Future Works:

Apply this approach to a real example.

Page 47: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

A dominance criteria which considers miss-classification costswhen selecting the non-dominated classes.

A cost-sensitive measure to evaluate imprecise classifierswhich takes miss-classification costs into account.

The performance of a imprecise classifier is affected bymiss-classification costs.

ICDT outperforms NCC in terms of DACC (0/1 cost error).NCC performs competitevely for some cost-matrices.

Future Works:

Apply this approach to a real example.

Page 48: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Bibliography

J. Abellán and S. Moral, “Building classification trees using the total uncertainty criterion", Int. J. of IntelligentSystems, vol. 18, no. 12, pp. 1215–1225, 2003.

J. Abellán, J. and S. Moral, “Upper entropy of credal sets. Applications to credal classification", Int. J. ofApproximate Reasoning, vol. 39, no. 2-3, pp. 235–255, 2005.

J. Abellán, “Equivalence relations among dominance concepts on probability intervals and general credalsets", Int. J. of General Systems, vol. 41, no. 2, pp. 109–122, 2012.

G. Corani and M. Zaffalon, “Learning reliable classifiers from small or incomplete data sets: the naive credalclassifier 2", J. of Machine Learning Research, vol. 9, pp. 581–621, 2008.

R.O. Duda and P.E. Hart, Pattern classification and scene analysis. John Wiley and Sons, New York, 1973.

Walley, P. (1991) Statistical Reasoning with Imprecise Probabilities (Chapman and Hall, New York).

P. Walley, “Inferences from multinomial data: learning about a bag of marbles", J. Roy. Statist. Soc. B, vol. 58,pp. 3–57, 1996.

M. Zaffalon, “The naive credal classifier", J. of Statistical Planning and Inference, vol. 105, pp. 5–21, 2002.

Page 49: IMPRECISE CLASSIFICATION WITH CREDAL DECISION TREES Read More:

Thanks for you attention!

Any questions?