A cost-sensitive measure to quantify theperformance of an imprecise classifier
Joaquín Abellán, Andrés. R. Masegosa
Department of Computer Science and Artificial IntelligenceUniversity of Granada
Granada, Spain
Oviedo (Asturias), December 2012
ERCIM’12 Oviedo (Spain) 1/30
Outline
1 Introduction
2 Previous Knowledge
3 Imprecise classification with credal decision trees
4 Cost-sensitive imprecise classification
5 Experimental Evaluation
6 Conclusions & Future Works
ERCIM’12 Oviedo (Spain) 2/30
Introduction
Part I
Introduction
ERCIM’12 Oviedo (Spain) 3/30
Introduction
Supervisded Classification
A probabilistic classifier [5] models the conditional probability of the classvariable C given a set of predictive variables X = {X1, ...,XN}:
p(C|X1, ...,XN)
Under supervised classification settings, this conditional probabiblity isestimated from a set of n labelled samples:
D = {(c1, x1), ..., (cn, xn)}
How this conditional probability is learnt depends of the particular classificationmodel: decision trees , Bayesian network classifiers, AODE, ensembles ofdecision trees...
The prediction c? of a new data case xnew is the class with the highestposterior probability:
c? = arg maxc∈Val(C)
P(c|xnew )
ERCIM’12 Oviedo (Spain) 4/30
Introduction
Supervisded Classification
A probabilistic classifier [5] models the conditional probability of the classvariable C given a set of predictive variables X = {X1, ...,XN}:
p(C|X1, ...,XN)
Under supervised classification settings, this conditional probabiblity isestimated from a set of n labelled samples:
D = {(c1, x1), ..., (cn, xn)}
How this conditional probability is learnt depends of the particular classificationmodel: decision trees , Bayesian network classifiers, AODE, ensembles ofdecision trees...
The prediction c? of a new data case xnew is the class with the highestposterior probability:
c? = arg maxc∈Val(C)
P(c|xnew )
ERCIM’12 Oviedo (Spain) 4/30
Introduction
Supervisded Classification
A probabilistic classifier [5] models the conditional probability of the classvariable C given a set of predictive variables X = {X1, ...,XN}:
p(C|X1, ...,XN)
Under supervised classification settings, this conditional probabiblity isestimated from a set of n labelled samples:
D = {(c1, x1), ..., (cn, xn)}
How this conditional probability is learnt depends of the particular classificationmodel: decision trees , Bayesian network classifiers, AODE, ensembles ofdecision trees...
The prediction c? of a new data case xnew is the class with the highestposterior probability:
c? = arg maxc∈Val(C)
P(c|xnew )
ERCIM’12 Oviedo (Spain) 4/30
Introduction
Supervisded Classification
A probabilistic classifier [5] models the conditional probability of the classvariable C given a set of predictive variables X = {X1, ...,XN}:
p(C|X1, ...,XN)
Under supervised classification settings, this conditional probabiblity isestimated from a set of n labelled samples:
D = {(c1, x1), ..., (cn, xn)}
How this conditional probability is learnt depends of the particular classificationmodel: decision trees , Bayesian network classifiers, AODE, ensembles ofdecision trees...
The prediction c? of a new data case xnew is the class with the highestposterior probability:
c? = arg maxc∈Val(C)
P(c|xnew )
ERCIM’12 Oviedo (Spain) 4/30
Introduction
Imprecise Classification
A imprecise probabilistic classifier [8] learns a set of the conditionalprobabilities P, not a single conditional probability, for the same set of labelleddata samples:
PD(C|X1, ...,XN)
How this set of conditional probabilities is learnt depends of the particularimprecise classification model: credal decision trees [1], naive credalclassifier[4]...
They are based on the so-called Imprecise probability theory [6].
The prediction for a new data case xnew is not always a single class, it can be aset of different classes: the set of non-dominated classes, U.
We have a set of different conditional prob. distributions:
{p(C|xnew ) : p ∈ P}
ERCIM’12 Oviedo (Spain) 5/30
Introduction
Imprecise Classification
A imprecise probabilistic classifier [8] learns a set of the conditionalprobabilities P, not a single conditional probability, for the same set of labelleddata samples:
PD(C|X1, ...,XN)
How this set of conditional probabilities is learnt depends of the particularimprecise classification model: credal decision trees [1], naive credalclassifier[4]...
They are based on the so-called Imprecise probability theory [6].
The prediction for a new data case xnew is not always a single class, it can be aset of different classes: the set of non-dominated classes, U.
We have a set of different conditional prob. distributions:
{p(C|xnew ) : p ∈ P}
ERCIM’12 Oviedo (Spain) 5/30
Introduction
Imprecise Classification
A imprecise probabilistic classifier [8] learns a set of the conditionalprobabilities P, not a single conditional probability, for the same set of labelleddata samples:
PD(C|X1, ...,XN)
How this set of conditional probabilities is learnt depends of the particularimprecise classification model: credal decision trees [1], naive credalclassifier[4]...
They are based on the so-called Imprecise probability theory [6].
The prediction for a new data case xnew is not always a single class, it can be aset of different classes: the set of non-dominated classes, U.
We have a set of different conditional prob. distributions:
{p(C|xnew ) : p ∈ P}
ERCIM’12 Oviedo (Spain) 5/30
Introduction
Measuring the performance of an imprecise classifier
Imprecise classifiers produce set-valuated predictions, U.
Descriptive measures [4]:
Determinancy: the percentage of instances for which the impreciseclassifier returns a unique state.Single Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified determinately.Set Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified indeterminately.Indeterminacy Size: The average size of the set of non-dominatedclasses.
Discounted Accuracy measure [4]:
DACC =1
nTest
∑i
(accurate)i
|Ui |
ERCIM’12 Oviedo (Spain) 6/30
Introduction
Measuring the performance of an imprecise classifier
Imprecise classifiers produce set-valuated predictions, U.
Descriptive measures [4]:
Determinancy: the percentage of instances for which the impreciseclassifier returns a unique state.Single Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified determinately.Set Accuracy: the accuracy achieved by the imprecise classifier on theinstances classified indeterminately.Indeterminacy Size: The average size of the set of non-dominatedclasses.
Discounted Accuracy measure [4]:
DACC =1
nTest
∑i
(accurate)i
|Ui |
ERCIM’12 Oviedo (Spain) 6/30
Introduction
A cost-sensitive measure to quantify the performance of animprecise classifier
We present a tentative method for quantifying the performance of an impreciseclassifiers with miss-classification costs.
Classification costs considers the weight that an expert would give to eachtype of error. E.g.:
If an email is spam or not.If a mushroom is edible or poisonous.
The set of non-dominated classes are chosen differently in the presence ofmisclassification-costs.
A new cost-sensitive measure for imprecise classifier is presented.
ERCIM’12 Oviedo (Spain) 7/30
Previous Knowledge
Part II
Previous Knowledge
ERCIM’12 Oviedo (Spain) 8/30
Previous Knowledge
Imprecise Dirichlet Model
The imprecise Dirichlet model (IDM) [7] was introduced by Walley for inference aboutthe probability distribution of a categorical variable.
p(ci ) = θi , θ = {θ1, ..., θK } and D data set of n i.i.d. samples.
Assuming a Dirichlet prior, π(θ) =∏θ
stj−1j function, the posterior :
π(θ|D) =K∏
j=1
θnj+stj−1j
s is the equivalent sample size or number of hidden samples.tj is the proportion of hidden samples in cj (e.g. uniform tj = 1
K ).
Exptected value:
E(θj |D) =nj + stjn + s
ERCIM’12 Oviedo (Spain) 9/30
Previous Knowledge
Imprecise Dirichlet Model
The imprecise Dirichlet model (IDM) [7] was introduced by Walley for inference aboutthe probability distribution of a categorical variable.
p(ci ) = θi , θ = {θ1, ..., θK } and D data set of n i.i.d. samples.
Assuming a Dirichlet prior, π(θ) =∏θ
stj−1j function, the posterior :
π(θ|D) =K∏
j=1
θnj+stj−1j
s is the equivalent sample size or number of hidden samples.tj is the proportion of hidden samples in cj (e.g. uniform tj = 1
K ).
Exptected value:
E(θj |D) =nj + stjn + s
ERCIM’12 Oviedo (Spain) 9/30
Previous Knowledge
Imprecise Dirichlet Model
The imprecise Dirichlet model (IDM) [7] was introduced by Walley for inference aboutthe probability distribution of a categorical variable.
p(ci ) = θi , θ = {θ1, ..., θK } and D data set of n i.i.d. samples.
Assuming a Dirichlet prior, π(θ) =∏θ
stj−1j function, the posterior :
π(θ|D) =K∏
j=1
θnj+stj−1j
s is the equivalent sample size or number of hidden samples.tj is the proportion of hidden samples in cj (e.g. uniform tj = 1
K ).
Exptected value:
E(θj |D) =nj + stjn + s
ERCIM’12 Oviedo (Spain) 9/30
Previous Knowledge
Imprecise Dirichlet Model
The imprecise Dirichlet model (IDM) [7] was introduced by Walley for inference aboutthe probability distribution of a categorical variable.
p(ci ) = θi , θ = {θ1, ..., θK } and D data set of n i.i.d. samples.
Assuming a Dirichlet prior, π(θ) =∏θ
stj−1j function, the posterior :
π(θ|D) =K∏
j=1
θnj+stj−1j
s is the equivalent sample size or number of hidden samples.tj is the proportion of hidden samples in cj (e.g. uniform tj = 1
K ).
Exptected value:
E(θj |D) =nj + stjn + s
ERCIM’12 Oviedo (Spain) 9/30
Previous Knowledge
Imprecise Dirichlet Model
The IDM assumes prior ignorance and considers a set of Dirichlet priors byvarying the paramters tj .
This is vacuous model: p(cj ) ∈ (0, 1)∀j .
Inferences are made by computing lower and upper probabilities:
p(cj |D) = inf0<tj<1
nj + stjn + s
=nj
n + s
p(cj |D) = sup0<tj<1
nj + stjn + s
=nj + sn + s
We have a credal set (convex and closed set) of posterior probabilies for C.
Parameter s determines how quickly the lower and upper probabilitiesconverge as more data become available.
ERCIM’12 Oviedo (Spain) 10/30
Previous Knowledge
Imprecise Dirichlet Model
The IDM assumes prior ignorance and considers a set of Dirichlet priors byvarying the paramters tj .
This is vacuous model: p(cj ) ∈ (0, 1)∀j .
Inferences are made by computing lower and upper probabilities:
p(cj |D) = inf0<tj<1
nj + stjn + s
=nj
n + s
p(cj |D) = sup0<tj<1
nj + stjn + s
=nj + sn + s
We have a credal set (convex and closed set) of posterior probabilies for C.
Parameter s determines how quickly the lower and upper probabilitiesconverge as more data become available.
ERCIM’12 Oviedo (Spain) 10/30
Previous Knowledge
Imprecise Dirichlet Model
The IDM assumes prior ignorance and considers a set of Dirichlet priors byvarying the paramters tj .
This is vacuous model: p(cj ) ∈ (0, 1)∀j .
Inferences are made by computing lower and upper probabilities:
p(cj |D) = inf0<tj<1
nj + stjn + s
=nj
n + s
p(cj |D) = sup0<tj<1
nj + stjn + s
=nj + sn + s
We have a credal set (convex and closed set) of posterior probabilies for C.
Parameter s determines how quickly the lower and upper probabilitiesconverge as more data become available.
ERCIM’12 Oviedo (Spain) 10/30
Previous Knowledge
Stochastic and credal Dominance on probability intervals
How do we make predictions with imprecise classifiers?.
Select the class values are not defeated (non-dominated) underprobability terms.Use a dominance criterion.
Stochastic Dominance:
ci and cj have associated the intervals [li , ui ] and [lj , uj ], respectively.There is stochastic dominance of cj on ci iff lj ≥ ui .
Credal Dominance:
The probability on how any state of C happens is expressed by anon-empty credal set P.There is credal dominance of ci on cj iif p(ci ) ≥ p(cj ) ∀p ∈ P.
For the IDM model, both previous definitions are equivalent [3].
ERCIM’12 Oviedo (Spain) 11/30
Previous Knowledge
Stochastic and credal Dominance on probability intervals
How do we make predictions with imprecise classifiers?.
Select the class values are not defeated (non-dominated) underprobability terms.Use a dominance criterion.
Stochastic Dominance:
ci and cj have associated the intervals [li , ui ] and [lj , uj ], respectively.There is stochastic dominance of cj on ci iff lj ≥ ui .
Credal Dominance:
The probability on how any state of C happens is expressed by anon-empty credal set P.There is credal dominance of ci on cj iif p(ci ) ≥ p(cj ) ∀p ∈ P.
For the IDM model, both previous definitions are equivalent [3].
ERCIM’12 Oviedo (Spain) 11/30
Previous Knowledge
Stochastic and credal Dominance on probability intervals
How do we make predictions with imprecise classifiers?.
Select the class values are not defeated (non-dominated) underprobability terms.Use a dominance criterion.
Stochastic Dominance:
ci and cj have associated the intervals [li , ui ] and [lj , uj ], respectively.There is stochastic dominance of cj on ci iff lj ≥ ui .
Credal Dominance:
The probability on how any state of C happens is expressed by anon-empty credal set P.There is credal dominance of ci on cj iif p(ci ) ≥ p(cj ) ∀p ∈ P.
For the IDM model, both previous definitions are equivalent [3].
ERCIM’12 Oviedo (Spain) 11/30
Previous Knowledge
Stochastic and credal Dominance on probability intervals
How do we make predictions with imprecise classifiers?.
Select the class values are not defeated (non-dominated) underprobability terms.Use a dominance criterion.
Stochastic Dominance:
ci and cj have associated the intervals [li , ui ] and [lj , uj ], respectively.There is stochastic dominance of cj on ci iff lj ≥ ui .
Credal Dominance:
The probability on how any state of C happens is expressed by anon-empty credal set P.There is credal dominance of ci on cj iif p(ci ) ≥ p(cj ) ∀p ∈ P.
For the IDM model, both previous definitions are equivalent [3].
ERCIM’12 Oviedo (Spain) 11/30
Imprecise classification with credal decision trees
Part III
Imprecise classification withcredal decision trees
ERCIM’12 Oviedo (Spain) 12/30
Imprecise classification with credal decision trees
Decision Trees
A decision tree, also called a classification tree, is a simple structure that canbe used as a classifier.
Each node represents an attribute variable
Each branch represents one of the states of this variable
Each tree leaf specifies an expected value of the class variable
Example of decision tree for three attribute variables Xi (i = 1, 2, 3), with twopossible values (0, 1) and a class variable C with cases or states c1, c2, c3:
ERCIM’12 Oviedo (Spain) 13/30
Imprecise classification with credal decision trees
Building Decision Trees from data
Imprecise Credal Decision Trees (ICDT)Associate to each node the most informative variable about the class C.
Imprecise Information Gain measure based on the maximum entropyof a credal set [2].
Probability intervals at the leaves are estimated using IDM
pσ(c) ∈ [nσ(c)nσ + s
,nσ(c) + s
nσ + s]
ERCIM’12 Oviedo (Spain) 14/30
Imprecise classification with credal decision trees
Building Decision Trees from data
Imprecise Credal Decision Trees (ICDT)Associate to each node the most informative variable about the class C.
Imprecise Information Gain measure based on the maximum entropyof a credal set [2].
Probability intervals at the leaves are estimated using IDM
pσ(c) ∈ [nσ(c)nσ + s
,nσ(c) + s
nσ + s]
ERCIM’12 Oviedo (Spain) 14/30
Cost-sensitive imprecise classification
Part IV
Cost-sensitive impreciseclassification
ERCIM’12 Oviedo (Spain) 15/30
Cost-sensitive imprecise classification
Cost-sensitive supervised classification
We assume the existence of the following miss-classification cost matrix:
M =
m11 m12 · · · m1Km21 m22 · · · m2K
.... . .
...mK 1 · · · mK ,K−1 mK ,K
mij is the cost of prediction ci when the true class is cj .
Using the Bayes decision rule [5], we predict the class with minimumexpected posterior risk:
ct = arg minci∈Val(C)
R(ci |xnew )
R(ci |xnew) =K∑
j=1
mij P(cj |xnew )
ERCIM’12 Oviedo (Spain) 16/30
Cost-sensitive imprecise classification
Cost-sensitive supervised classification
We assume the existence of the following miss-classification cost matrix:
M =
m11 m12 · · · m1Km21 m22 · · · m2K
.... . .
...mK 1 · · · mK ,K−1 mK ,K
mij is the cost of prediction ci when the true class is cj .
Using the Bayes decision rule [5], we predict the class with minimumexpected posterior risk:
ct = arg minci∈Val(C)
R(ci |xnew )
R(ci |xnew) =K∑
j=1
mij P(cj |xnew )
ERCIM’12 Oviedo (Spain) 16/30
Cost-sensitive imprecise classification
Cost-sensitive imprecise classification
We define the lower and upper posterior risks as follows:
The right hand side of the equations valid for the credal decision trees.
R(ci |xnew) =K∑
j=1
mij P(cj |xnew ) =K∑
j=1
mijnσ(cj )
nσ + s
R(ci |xnew) =K∑
j=1
mij P(cj |xnew ) =K∑
j=1
mijnσ(cj ) + s
nσ + s
Cost-based dominance criteria (stochastic dominance)
ci dominates cj iif R(ci |xnew ) ≤ R(cj |xnew ).
This cost-senstive method also returns a set of non-dominated set of classes,UM .
ERCIM’12 Oviedo (Spain) 17/30
Cost-sensitive imprecise classification
Cost-sensitive imprecise classification
We define the lower and upper posterior risks as follows:
The right hand side of the equations valid for the credal decision trees.
R(ci |xnew) =K∑
j=1
mij P(cj |xnew ) =K∑
j=1
mijnσ(cj )
nσ + s
R(ci |xnew) =K∑
j=1
mij P(cj |xnew ) =K∑
j=1
mijnσ(cj ) + s
nσ + s
Cost-based dominance criteria (stochastic dominance)
ci dominates cj iif R(ci |xnew ) ≤ R(cj |xnew ).
This cost-senstive method also returns a set of non-dominated set of classes,UM .
ERCIM’12 Oviedo (Spain) 17/30
Cost-sensitive imprecise classification
Cost-sensitive supervised classification
A cost-sensitive measure for imprecise classification
MIC =1N(−
∑i:Success
ln|Ui |K−
1K − 1
∑i:Error
αt,i ln K )
Success is quantified based on the number of non-dominated states produced.
Classification errors considers the miss-classification costs:
We consider the worst case error.
αt,i = maxcj∈Ui
mjt
Complete imprecise classifier: MIC = 0.
Classifier with perfect accuracy: MIC = ln K .
ERCIM’12 Oviedo (Spain) 18/30
Experimental Evaluation
Part V
Experimental Evaluation
ERCIM’12 Oviedo (Spain) 19/30
Experimental Evaluation
Experimental Set-up
The Aim
The presence of miss-classification costs affects the performance of an
imprecise classifier.
We compare two imprecise classifiers.
Imprecise credal decision trees (ICDT) and Naive Credal classifier(NCC) [4].
We employ to evaluation measures:
The DACC measure and the MIC measure with different cost-matrices.
Experiments on 40 UCI data sets using a 10-fold-cross validation scheme.
Statistical Tests for the comparison:
Corrected Paired T-Test at 5% significant level.
Wilconxon Test at 5% significant level.
ERCIM’12 Oviedo (Spain) 20/30
Experimental Evaluation
Experimental Set-up
The Aim
The presence of miss-classification costs affects the performance of an
imprecise classifier.
We compare two imprecise classifiers.
Imprecise credal decision trees (ICDT) and Naive Credal classifier(NCC) [4].
We employ to evaluation measures:
The DACC measure and the MIC measure with different cost-matrices.
Experiments on 40 UCI data sets using a 10-fold-cross validation scheme.
Statistical Tests for the comparison:
Corrected Paired T-Test at 5% significant level.
Wilconxon Test at 5% significant level.
ERCIM’12 Oviedo (Spain) 20/30
Experimental Evaluation
Experimental Set-up
Cost-Matrix (I)c1 c2 c3 c4
c1 0 1 1 1c2 2 0 2 2c3 3 3 0 3c4 4 4 4 0
Rows→ Real classes. Columns→ predicted classes. Classes are ordered fromhigh to low frequency.
This cost matrix assumes that miss-classification costs only depends of thereal class:
Diagnosis problem where the patient can be healthy or can suffer severaldiseases from mild to more severe.The cost of an error in the diagnosis mainly depends of the actual disease.
ERCIM’12 Oviedo (Spain) 21/30
Experimental Evaluation
Experimental Set-up
Cost-Matrix (II)c1 c2 c3 c4
c1 0 2 3 4c2 1 0 3 4c3 1 2 0 4c4 1 2 3 0
This cost matrix assumes that miss-classification costs only depends of theclassifier prediction.
Textile industry different kinds of jeans have to be automatically classifiedinto different quality levels.The cost only depends of the action pursued by the prediction of theclassifier.
ERCIM’12 Oviedo (Spain) 22/30
Experimental Evaluation
Experimental Set-up
Cost-Matrix (III) Cost-Matrix (IV)
c4 c3 c2 c1
c4 0 1 1 1c3 2 0 2 2c2 3 3 0 3c1 4 4 4 0
c4 c3 c2 c1c4 0 2 3 4c3 1 0 3 4c2 1 2 0 4c1 1 2 3 0
Same principles than Cost-Matrix (I) and Cost-Matrix (II), respectively.
The highest costs are associated to classes with the highest frequency.
ERCIM’12 Oviedo (Spain) 23/30
Experimental Evaluation
Experimental Evaluation I
ICDT NCCDeterminacy: Av. 94.7% 58.0%Single Acc.: Av. 79.3% 84.4%
Set Acc.: Av. 89.0% 93.3%Indeterm. Size: Av. 5.1 5.5
ICDT and NCC performs differently as imprecise classifiers.
ICDT has a higher determinacy than NCC, but with lower accuracy inthose cases.ICDT returns slightly lower number of non-dominated classes but withslightly lower accuracy.
ERCIM’12 Oviedo (Spain) 24/30
Experimental Evaluation
Experimental Evaluation II
ICDT NCCDACC: Av. 0.768 0.6035
Wilcoxon test *Paired t-test 22 4
ICDT is a better imprecise classifier than NCC.
Strong differences in favor of this ICDT.
ERCIM’12 Oviedo (Spain) 25/30
Experimental Evaluation
Experimental Evaluation II
ICDT NCCMIC: Cost Matrix (I): Av. 0.796 0.790
Wilcoxon test = =Paired t-test 13 10
MIC: Cost Matrix (II): Av. 1.066 0.808Wilcoxon test *Paired t-test 17 9
MIC: Cost Matrix (III): Av. 0.936 1.492Wilcoxon test = =Paired t-test 15 11
MIC: Cost Matrix (IV): Av. 0.771 0.773Wilcoxon test = =Paired t-test 12 13
NCC can be competitive w.r.t. ICDT for some cost-matrices.
ERCIM’12 Oviedo (Spain) 26/30
Conclusions and Future Works
Part VI
Conclusions and Future Works
ERCIM’12 Oviedo (Spain) 27/30
Conclusions and Future Works
Conclusions and Future Works
A dominance criteria which considers miss-classification costswhen selecting the non-dominated classes.
A cost-sensitive measure to evaluate imprecise classifierswhich takes miss-classification costs into account.
The performance of a imprecise classifier is affected bymiss-classification costs.
ICDT outperforms NCC in terms of DACC (0/1 cost error).NCC performs competitevely for some cost-matrices.
Future Works:
Apply this approach to a real example.
ERCIM’12 Oviedo (Spain) 28/30
Conclusions and Future Works
Conclusions and Future Works
A dominance criteria which considers miss-classification costswhen selecting the non-dominated classes.
A cost-sensitive measure to evaluate imprecise classifierswhich takes miss-classification costs into account.
The performance of a imprecise classifier is affected bymiss-classification costs.
ICDT outperforms NCC in terms of DACC (0/1 cost error).NCC performs competitevely for some cost-matrices.
Future Works:
Apply this approach to a real example.
ERCIM’12 Oviedo (Spain) 28/30
Conclusions and Future Works
Bibliography
J. Abellán and S. Moral, “Building classification trees using the total uncertainty criterion", Int. J. of IntelligentSystems, vol. 18, no. 12, pp. 1215–1225, 2003.
J. Abellán, J. and S. Moral, “Upper entropy of credal sets. Applications to credal classification", Int. J. ofApproximate Reasoning, vol. 39, no. 2-3, pp. 235–255, 2005.
J. Abellán, “Equivalence relations among dominance concepts on probability intervals and general credalsets", Int. J. of General Systems, vol. 41, no. 2, pp. 109–122, 2012.
G. Corani and M. Zaffalon, “Learning reliable classifiers from small or incomplete data sets: the naive credalclassifier 2", J. of Machine Learning Research, vol. 9, pp. 581–621, 2008.
R.O. Duda and P.E. Hart, Pattern classification and scene analysis. John Wiley and Sons, New York, 1973.
Walley, P. (1991) Statistical Reasoning with Imprecise Probabilities (Chapman and Hall, New York).
P. Walley, “Inferences from multinomial data: learning about a bag of marbles", J. Roy. Statist. Soc. B, vol. 58,pp. 3–57, 1996.
M. Zaffalon, “The naive credal classifier", J. of Statistical Planning and Inference, vol. 105, pp. 5–21, 2002.
ERCIM’12 Oviedo (Spain) 29/30
Conclusions and Future Works
Thanks for you attention!
Any questions?
ERCIM’12 Oviedo (Spain) 30/30
Top Related