1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi –...

34
1 Information Geometry on Clas sification Logistic, AdaBoost, Area unde r ROC curve Shinto E guchi ISM seminor on 17/1/2001 This talk is based on one of joint work with Dr J Copas

Transcript of 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi –...

Page 1: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

1

Information Geometry on Classification

  Logistic, AdaBoost, Area under ROC curve

Shinto Eguchi

– –

ISM seminoron 17/1/2001

This talk is based on one of joint work with Dr J Copas

Page 2: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

2

Outline

Problem setting for classification

overview of classification methods

Dw classifications

Dw divergence of discriminant functions

definition from NP Lemma, expected and ovserved expressions

examples of Dwlogistic regression, adaboost, area under ROC curve, hit rate, credit scoring, medical screening

structure of Dw risk functions

optimal Dw under near-logisticimplement by cross-validation

Risk scores of skin cancer

area under ROC curve, comparisondiscussion on other methods

[ http://juban.ism.ac.jp/ ]

Page 3: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

3

Standard methods

  Fisher linear discriminant analysis [4]

  Logistic regression [ Cornfield, 1962]

  Multilayer perception

[ http://juban.ism.ac.jp/file_ppt/ 公開講座 ( ニューラル ).

ppt] New approaches

  Boostimg – combining weak learners –

  AdaBoost

[http://juban.ism.ac.jp/file_ppt/ 公開講座( Boost ) .ppt]

  Support vector machine – VCdimension –

[http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt]

  Kernel method – Mercer theorem –

[http://juban.ism.ac.jp/file_ppt/ 主成分発表原稿 .ppt]

Page 4: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

4

Problem setting

input vector

output variable

Definition is a classifier if is onto.

    (direct sum)

the k-th decision space

Page 5: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

5

Joint distribution of , y :

where prior distribution

conditional distribution of given y

Probablistic model

Misclassification

error rate

hit rate

Page 6: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

6

discriminant function

classifier

Bayes rule Given P(x, y),

Training data (examples)

i-th input i-th input

Page 7: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

7

output variable

Reduction of our problem to binary classification

log-likelihood ratio

discriminant function

classifier

error rate

Page 8: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

8

Other loss functions for classification

Credit scoring [5]

A cost model : a profit if y = 1; loss if y = 0.

General setting

Let be a cost of classify y as .

The expected cost is

Page 9: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

9

hit

correct rejection false negative

false positive

ROC (Reciever Operating Characteristic) curve

Page 10: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

10

Main story

linear discriminant function

Given a training data

objective function

proposed estimator

What (U ,V ) is ?

Logistic is OK.

Page 11: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

11

log-likelihood ratio

discriminant function

A reinterpretation of Neyman-Pearson Lemma

Proposition

Remark

Page 12: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

12

Proof of Proposition

Page 13: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

13

Divergence Dw of discriminant function

Def.

Expectation expression

Page 14: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

14

Proof

Page 15: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

15

Sample expression given a set of training data

Minimum Dw method

for a statistical model F

Page 16: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

16

Examples of Dw divergence

(1) logistic regression

(2) Hit rate, Credit scoring, medical screening

Page 17: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

17

This Dw is the loss function of AdaBoost, cf. [7], [8].

(3) Area under ROC curve

(4) AdaBoost

Page 18: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

18

Structure of Dw risk functions

optimal Dw under near-logisticimplement by cross-validation

Logistic(linear)-parametric model

model distribution of , y :

Page 19: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

19

Estimating equation of minimum Dw methods

Remark

Page 20: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

20

Cauchy-Schwartz inequality

Prametric assumption

Page 21: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

21

Near-Parametric assumption

Page 22: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

22

Our risk function of an estimator is

But our situation is

Let

Cross + varianced Risk estimate

the bias term is

where

variance term is

where is the estimate from the training date by leaving thei th-example out.

Page 23: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

23

Page 24: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

24

Outlier

For

Page 25: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

25

Note :

where

Page 26: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

26

Page 27: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

27

Page 28: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

28

Page 29: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

29

Page 30: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

30

Page 31: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

31

Page 32: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

32

Page 33: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

33

Page 34: 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi – – ISM seminor on 17/1/2001 This talk is based on one.

34

References

[1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancerwith applications to melanoma. J. Amer. Statist. Assoc. 93, 415-426.

[2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23.

[3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions. Technical Report of Department of Statistics, University of Warwick.

[4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188.

[5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541.

[6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York.

[7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686.

[8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.