Test-Cost Sensitive Naïve Bayes Classification

1

Test-Cost Sensitive Naïve Bayes Classification

X. Chai, L. Deng, Q. YangDept. of Computer Science

The Hong Kong University of Science and Technology

C. LingDept. of Computer ScienceThe University of Western

Ontario

2

Example – Medical Diagnosis

temperature

pressure blood test

cardiogram

essay

39oc

? ? ?

?

Is the patient

healthy?Which test should be

taken first?

Which test to perform next?

Concern: cost the patient as little as possible while maintaining low mis-diagnosis risk

3

Test-Cost Sensitive Learning

Great success of traditional inductive learning techniques. (decision trees, NB)

– do not handle different types of costs during classification

Misclassification costs (Cmc): the costs incurred by classification errors– distinguish different types of classification errors

– neglect the possibility of obtaining missing values in a test case through performing attribute tests

Test costs (Ctest): the costs incurred by obtaining missing values of attributes.

Minimize the total costs Ctotal = Cmc + Ctest

4

Some Related Work

MDP-based cost-sensitive learning (Zubek and Dietterich 2002) Cast as a Markov decision process Solutions are given in terms of optimal policies Very high computational cost to conduct the search

Decision trees with minimal cost (Ling et al 2004)Consider both misclassification and test costs in tree building Splitting criterion: minimal total cost instead of InfoGain Attributes not appearing on the testing branch are ignored, alt

hough they are still informative for classification Not suitable for batch tests due to its sequential nature

5

Decision trees with minimal cost (Ling et al 2004)

Attribute selection criterion: minimal total cost (Ctotal = Cmc + Ctest) instead of minimal entropy in C4.5

If growing a tree has a smaller total cost, then choose an attribute with minimal total cost. Otherwise, stop and form a leaf.

Label leaf also according to minimal total cost: Suppose the leaf have P positive examples and N negative examples FP denotes the cost of a false positive example and FN false negative

If (P×FN N×FP) THEN label = positive ELSE label = negative

6

A Tree Building Example

P:N

P1:N1 P2:N2

Attribute A with a test cost C

Cmc = min(P×FN, N×FP)Ctest = 0Ctotal = Cmc + Ctest

A = v1 A = v2Consider attribute A for a potential splitting attribute

C’mc = min(P1×FN, N1×FP) + min(P2×FN, N2×FP)C’test = (P1 + N1 + P2 + N2) × CC’total = C’mc + C’test

• If C’total < Ctotal, splitting on A would reduce the total cost Choose an attribute with the minimal total cost for splitting

• If C’total Ctotal for all remaining attributes, no further sub-tree will be built, and the set will become a leaf.

7

Sequential Test Strategy

Optimal Sequential Test (OST): each test example goes down the tree until an attribute whose

value is unknown is met in the test example. Then the test is done and the missing value is revealed. The process continues until it falls into a leaf node. The leaf node label is used as prediction. The total cost is the sum of misclassification cost and test cost.

Problems with the OST strategy: The algorithm chooses a locally optimal attribute without

backtracking. Thus the OST strategy is not globally optimal. Attributes not appearing on the testing branch are ignored,

although they are still informative for classification Not suitable for batch tests due to its sequential nature

8

Problem Formulation

Given: D – a training dataset of N samples {x1,…,xN} from P classes

{c1,…,cP}, where each sample xi is described by M attributes (A1,…,AM) among whom there can be missing values.

C – a misclassification cost matrix. Cij = C(i,j) specifies the cost of classifying a sample from ci as belong to class cj

T – a test-cost vector. Tk = T(k) specifies the cost of taking a test on attribute Ak (1kM)

Build: csNB – a cost sensitive naïve Bayes classifier S – a test strategy for every new case with the aim to minimize

the sum of the misclassification cost Cmc and test cost Ctest

9

csNB classification

Two procedures: Learning and prediction

Learning a csNB classifier Same as learning a traditional NB classifier Estimate prior probabilities P(cj) and P(Am=vm,k|cj) from the trai

ning dataset D. Missing values are simply ignored in likelihood computation.

Prediction Sequential test strategy Batch test strategy

10

Sequential Test Strategy v.s. Batch Test Strategy

What is a sequential test strategy? – decisions are made sequentially on whether a further test on an unknown attribute should be performed, and if so, which attribute to select based on the values of the attributes initially known or previously tested.

– a test strategy that is designed on the fly during classification.

What is a batch test strategy? – selection of tests on unknown attributes must be determined in advance before any test is carried out. – a test strategy that is designed beforehand.

Both are aimed to minimize the sum of misclassification and test costs.

11

Suppose a patient comes with all attribute values unknown: (?,?,?,?) Sequential test:

Batch test:

Example: Diagnosis of Hepatitis

Assume:

– 21% patients are positive (c1) (have hepatitis) P(c1)=21%

– 79% patients are negative(c2)

(healthy) P(c2)=79%

– Classification costs: C12=450, C21=150, C11=C12=0

– Four attributes to describe a patient

Test costs and likelihoods of each attribute:

(?,?,?,?) test ascites

(?,?,?,pos)

(?,?,?,neg)

test spiders

……

test spleen

……

(?,?,?,?) Test{spleen, spiders, ascites}

(?,neg,neg,pos)

classify

12

Prediction with Sequential Test Strategy

Suppose x is a test example. Let denote the set of known attributes and the unknown attributes.

A~

A

iA We define the utility of testing unknown attribute is defined as:

)(),~()( itestii ACAAGainAUtil

iA)( itest AC is the test cost attribute given by Ti

),~( iAAGain is the reduction in the expected misclassification cost

if we know ’s true valueiA

Where:

13


A~A

)~()

~(),

~( imcmci AACACAAGain

P

ijij

jj

jmc AcPCAcRAC

1

)~|(min)

~|(min)

~( is the expected Cmc based on

A~

iA

kkiimckiiimc vAACAvAPAAC

1,, )

~()

~|()

~( takes expectation over

all possible values of iA

Gain( , ) is defined as:

Where:

14


Overall, an attribute is worth testing on if testing it offers more gain than the cost it brings.

By calculating all the utilities of testing unknown attributes in , we can decide: Whether a further test is needed? Which attribute to test?

After attribute is tested, its true value is revealed and it is removed from to . The same procedure continues until: no unknown attribute is left ( ) or the utility of testing any unknown attribute is non-positive

Finally, the example is predicted as classand Ctest is the total costs of the tests performed.

iA),

~( iAAGain

*iA

)( itest AC

A

0)( ii AUtil

)(maxarg* iiiAUtilA

A~

AA

)~|(minarg AcR jj

15

csNB-sequential-predict Algorithm

furthertest?

Compute the utility of testing every unknown attribute

… …

classify

No

Select the unknown attributewith the highest utility to test

Yes

16

Prediction with Batch Test Strategy

A natural extension from the sequential test algorithm of csNB

All the attributes with non-negative utility are selected. The batch of attributes selected are,

and the test cost

After is selected, the values of these attributes are revealed and the class label is then predicted.

},0)(|{' AAAUtilAA iii

'

)(AA

itesttest

i

ACC

'A

17

Experiments

Experiments were carried out on eight datasets from UCI ML repository (Ecoli, heart, Australia, Voting, Breast, … ).

Four algorithms were implemented for comparison: csNB – the test-cost sensitive naïve Bayes csDT – the cost-sensitive decision trees proposed in Ling et al 2004. LNB – lazy naïve Bayes, which predicts based only on the known attr

ibutes and requires no tests to be done on any unknown attribute ENB – Exacting naïve Bayes, which requires all the missing values to

be made up before prediction.

The performance of the algorithms is measured in terms of the total cost Ctotal = Cmc + Ctest, where Cmc can be obtained by comparing the predicted and true labels of the test examples.

18

Experimental Results – Sequential Test

Average total costs comparisons on datasets: Ecoli, Breast, Heart, Thyroid

LNB

ENB

csNB

csDT

19


Average total costs comparisons on datasets: Australia, Cars, Voting, Mushroom

20


Comparison of LNB, csNB and csDT with increasing percentage of unknown attributes

Mushroom dataset

21


Compared with csDT, csNB is more effective at balancing the misclassification and test costs.

Comparison of csNB and csDT with varying test costs (missing rates are set to 20% and 60%) on the Mushroom dataset

22

Experimental Results – Batch Test

• Overall, csNB incurs 29.6% less total cost than csDT.

• csDT is inflexible to derive batch test strategies due to its sequential nature in tree building.• csNB has no such constraints and all the attributes can be evaluated at the same level.

23

Conclusion and future work

We proposed a test-cost sensitive naïve Bayes algorithm for designing classifiers that minimize the sum of the misclassification cost and test costs

In the framework of csNB, attributes can be intelligently selected to design both sequential and batch test strategies.

In the future, we plan to develop more effective algorithms and consider more complicated situations where the test cost of an attribute may be conditional on other attributes.

It is also interesting to consider the cost of finding the missing values for training data

24

THANK YOU!Q & A

Test-Cost Sensitive Naïve Bayes Classification

Documents

Transcript of Test-Cost Sensitive Naïve Bayes Classification