Test-Cost Sensitive Naïve Bayes Classification
description
Transcript of Test-Cost Sensitive Naïve Bayes Classification
1
Test-Cost Sensitive Naïve Bayes Classification
X. Chai, L. Deng, Q. YangDept. of Computer Science
The Hong Kong University of Science and Technology
C. LingDept. of Computer ScienceThe University of Western
Ontario
2
Example – Medical Diagnosis
temperature
pressure blood test
cardiogram
essay
39oc
? ? ?
?
Is the patient
healthy?Which test should be
taken first?
Which test to perform next?
Concern: cost the patient as little as possible while maintaining low mis-diagnosis risk
3
Test-Cost Sensitive Learning
Great success of traditional inductive learning techniques. (decision trees, NB)
– do not handle different types of costs during classification
Misclassification costs (Cmc): the costs incurred by classification errors– distinguish different types of classification errors
– neglect the possibility of obtaining missing values in a test case through performing attribute tests
Test costs (Ctest): the costs incurred by obtaining missing values of attributes.
Minimize the total costs Ctotal = Cmc + Ctest
4
Some Related Work
MDP-based cost-sensitive learning (Zubek and Dietterich 2002) Cast as a Markov decision process Solutions are given in terms of optimal policies Very high computational cost to conduct the search
Decision trees with minimal cost (Ling et al 2004)Consider both misclassification and test costs in tree building Splitting criterion: minimal total cost instead of InfoGain Attributes not appearing on the testing branch are ignored, alt
hough they are still informative for classification Not suitable for batch tests due to its sequential nature
5
Decision trees with minimal cost (Ling et al 2004)
Attribute selection criterion: minimal total cost (Ctotal = Cmc + Ctest) instead of minimal entropy in C4.5
If growing a tree has a smaller total cost, then choose an attribute with minimal total cost. Otherwise, stop and form a leaf.
Label leaf also according to minimal total cost: Suppose the leaf have P positive examples and N negative examples FP denotes the cost of a false positive example and FN false negative
If (P×FN N×FP) THEN label = positive ELSE label = negative
6
A Tree Building Example
P:N
P1:N1 P2:N2
Attribute A with a test cost C
Cmc = min(P×FN, N×FP)Ctest = 0Ctotal = Cmc + Ctest
A = v1 A = v2Consider attribute A for a potential splitting attribute
C’mc = min(P1×FN, N1×FP) + min(P2×FN, N2×FP)C’test = (P1 + N1 + P2 + N2) × CC’total = C’mc + C’test
• If C’total < Ctotal, splitting on A would reduce the total cost Choose an attribute with the minimal total cost for splitting
• If C’total Ctotal for all remaining attributes, no further sub-tree will be built, and the set will become a leaf.
7
Sequential Test Strategy
Optimal Sequential Test (OST): each test example goes down the tree until an attribute whose
value is unknown is met in the test example. Then the test is done and the missing value is revealed. The process continues until it falls into a leaf node. The leaf node label is used as prediction. The total cost is the sum of misclassification cost and test cost.
Problems with the OST strategy: The algorithm chooses a locally optimal attribute without
backtracking. Thus the OST strategy is not globally optimal. Attributes not appearing on the testing branch are ignored,
although they are still informative for classification Not suitable for batch tests due to its sequential nature
8
Problem Formulation
Given: D – a training dataset of N samples {x1,…,xN} from P classes
{c1,…,cP}, where each sample xi is described by M attributes (A1,…,AM) among whom there can be missing values.
C – a misclassification cost matrix. Cij = C(i,j) specifies the cost of classifying a sample from ci as belong to class cj
T – a test-cost vector. Tk = T(k) specifies the cost of taking a test on attribute Ak (1kM)
Build: csNB – a cost sensitive naïve Bayes classifier S – a test strategy for every new case with the aim to minimize
the sum of the misclassification cost Cmc and test cost Ctest
9
csNB classification
Two procedures: Learning and prediction
Learning a csNB classifier Same as learning a traditional NB classifier Estimate prior probabilities P(cj) and P(Am=vm,k|cj) from the trai
ning dataset D. Missing values are simply ignored in likelihood computation.
Prediction Sequential test strategy Batch test strategy
10
Sequential Test Strategy v.s. Batch Test Strategy
What is a sequential test strategy? – decisions are made sequentially on whether a further test on an unknown attribute should be performed, and if so, which attribute to select based on the values of the attributes initially known or previously tested.
– a test strategy that is designed on the fly during classification.
What is a batch test strategy? – selection of tests on unknown attributes must be determined in advance before any test is carried out. – a test strategy that is designed beforehand.
Both are aimed to minimize the sum of misclassification and test costs.
11
Suppose a patient comes with all attribute values unknown: (?,?,?,?) Sequential test:
Batch test:
Example: Diagnosis of Hepatitis
Assume:
– 21% patients are positive (c1) (have hepatitis) P(c1)=21%
– 79% patients are negative(c2)
(healthy) P(c2)=79%
– Classification costs: C12=450, C21=150, C11=C12=0
– Four attributes to describe a patient
Test costs and likelihoods of each attribute:
(?,?,?,?) test ascites
(?,?,?,pos)
(?,?,?,neg)
test spiders
……
test spleen
……
(?,?,?,?) Test{spleen, spiders, ascites}
(?,neg,neg,pos)
classify
12
Prediction with Sequential Test Strategy
Suppose x is a test example. Let denote the set of known attributes and the unknown attributes.
A~
A
iA We define the utility of testing unknown attribute is defined as:
)(),~()( itestii ACAAGainAUtil
iA)( itest AC is the test cost attribute given by Ti
),~( iAAGain is the reduction in the expected misclassification cost
if we know ’s true valueiA
Where:
13
Prediction with Sequential Test Strategy
A~A
)~()
~(),
~( imcmci AACACAAGain
P
ijij
jj
jmc AcPCAcRAC
1
)~|(min)
~|(min)
~( is the expected Cmc based on
A~
iA
kkiimckiiimc vAACAvAPAAC
1,, )
~()
~|()
~( takes expectation over
all possible values of iA
Gain( , ) is defined as:
Where:
14
Prediction with Sequential Test Strategy
Overall, an attribute is worth testing on if testing it offers more gain than the cost it brings.
By calculating all the utilities of testing unknown attributes in , we can decide: Whether a further test is needed? Which attribute to test?
After attribute is tested, its true value is revealed and it is removed from to . The same procedure continues until: no unknown attribute is left ( ) or the utility of testing any unknown attribute is non-positive
Finally, the example is predicted as classand Ctest is the total costs of the tests performed.
iA),
~( iAAGain
*iA
)( itest AC
A
0)( ii AUtil
)(maxarg* iiiAUtilA
A~
AA
)~|(minarg AcR jj
15
csNB-sequential-predict Algorithm
furthertest?
Compute the utility of testing every unknown attribute
… …
classify
No
Select the unknown attributewith the highest utility to test
Yes
16
Prediction with Batch Test Strategy
A natural extension from the sequential test algorithm of csNB
All the attributes with non-negative utility are selected. The batch of attributes selected are,
and the test cost
After is selected, the values of these attributes are revealed and the class label is then predicted.
},0)(|{' AAAUtilAA iii
'
)(AA
itesttest
i
ACC
'A
17
Experiments
Experiments were carried out on eight datasets from UCI ML repository (Ecoli, heart, Australia, Voting, Breast, … ).
Four algorithms were implemented for comparison: csNB – the test-cost sensitive naïve Bayes csDT – the cost-sensitive decision trees proposed in Ling et al 2004. LNB – lazy naïve Bayes, which predicts based only on the known attr
ibutes and requires no tests to be done on any unknown attribute ENB – Exacting naïve Bayes, which requires all the missing values to
be made up before prediction.
The performance of the algorithms is measured in terms of the total cost Ctotal = Cmc + Ctest, where Cmc can be obtained by comparing the predicted and true labels of the test examples.
18
Experimental Results – Sequential Test
Average total costs comparisons on datasets: Ecoli, Breast, Heart, Thyroid
LNB
ENB
csNB
csDT
19
Experimental Results – Sequential Test
Average total costs comparisons on datasets: Australia, Cars, Voting, Mushroom
20
Experimental Results – Sequential Test
Comparison of LNB, csNB and csDT with increasing percentage of unknown attributes
Mushroom dataset
21
Experimental Results – Sequential Test
Compared with csDT, csNB is more effective at balancing the misclassification and test costs.
Comparison of csNB and csDT with varying test costs (missing rates are set to 20% and 60%) on the Mushroom dataset
22
Experimental Results – Batch Test
• Overall, csNB incurs 29.6% less total cost than csDT.
• csDT is inflexible to derive batch test strategies due to its sequential nature in tree building.• csNB has no such constraints and all the attributes can be evaluated at the same level.
23
Conclusion and future work
We proposed a test-cost sensitive naïve Bayes algorithm for designing classifiers that minimize the sum of the misclassification cost and test costs
In the framework of csNB, attributes can be intelligently selected to design both sequential and batch test strategies.
In the future, we plan to develop more effective algorithms and consider more complicated situations where the test cost of an attribute may be conditional on other attributes.
It is also interesting to consider the cost of finding the missing values for training data
24
THANK YOU!Q & A