1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi –...

Information Geometry on Classification

　 Logistic, AdaBoost, Area under ROC curve

Shinto Eguchi

– –

ISM seminoron 17/1/2001

This talk is based on one of joint work with Dr J Copas

Outline

Problem setting for classification

overview of classification methods

Dw classifications

Dw divergence of discriminant functions

definition from NP Lemma, expected and ovserved expressions

examples of Dwlogistic regression, adaboost, area under ROC curve, hit rate, credit scoring, medical screening

structure of Dw risk functions

optimal Dw under near-logisticimplement by cross-validation

Risk scores of skin cancer

area under ROC curve, comparisondiscussion on other methods

[ http://juban.ism.ac.jp/ ]

Standard methods

　 Fisher linear discriminant analysis [4]

　 Logistic regression [ Cornfield, 1962]

　 Multilayer perception

[ http://juban.ism.ac.jp/file_ppt/ 公開講座 ( ニューラル ).

ppt] New approaches

　 Boostimg – combining weak learners –

　 AdaBoost

[http://juban.ism.ac.jp/file_ppt/ 公開講座（ Boost ） .ppt]

　 Support vector machine – VCdimension –

[http://juban.ism.ac.jp/file_ppt/open-svm12-21.ppt]

　 Kernel method – Mercer theorem –

[http://juban.ism.ac.jp/file_ppt/ 主成分発表原稿 .ppt]

Problem setting

input vector

output variable

Definition is a classifier if is onto.

　　　 (direct sum)

the k-th decision space

Joint distribution of , y :

where prior distribution

conditional distribution of given y

Probablistic model

Misclassification

error rate

hit rate

discriminant function

classifier

Bayes rule Given P(x, y),

Training data (examples)

i-th input i-th input

output variable

Reduction of our problem to binary classification

log-likelihood ratio

classifier

error rate

Other loss functions for classification

Credit scoring [5]

A cost model : a profit if y = 1; loss if y = 0.

General setting

Let be a cost of classify y as .

The expected cost is

correct rejection false negative

false positive

ROC (Reciever Operating Characteristic) curve

Main story

linear discriminant function

Given a training data

objective function

proposed estimator

What (U ,V ) is ?

Logistic is OK.

log-likelihood ratio

A reinterpretation of Neyman-Pearson Lemma

Proposition

Remark

Proof of Proposition

Divergence Dw of discriminant function

Expectation expression

Sample expression given a set of training data

Minimum Dw method

for a statistical model F

Examples of Dw divergence

(1) logistic regression

(2) Hit rate, Credit scoring, medical screening

This Dw is the loss function of AdaBoost, cf. [7], [8].

(3) Area under ROC curve

(4) AdaBoost

Structure of Dw risk functions

optimal Dw under near-logisticimplement by cross-validation

Logistic(linear)-parametric model

model distribution of , y :

Estimating equation of minimum Dw methods

Remark

Cauchy-Schwartz inequality

Prametric assumption

Near-Parametric assumption

Our risk function of an estimator is

But our situation is

Cross + varianced Risk estimate

the bias term is

variance term is

where is the estimate from the training date by leaving thei th-example out.

Outlier

Note :

References

[1] Begg, C. B., Satogopan, J. M. and Berwick, M. (1998). A new strategy for evaluating the impact of epidemiologic risk factors for cancerwith applications to melanoma. J. Amer. Statist. Assoc. 93, 415-426.

[2] Berwick, M, Begg, C. B., Fine, J. A., Roush, G. C. and Barnhill, R. L. (1996). Screening for cutaneous melanoma by self skin examination. J. National Cancer Inst., 88, 17-23.

[3] Eguchi, S and Copas, J. (2000). A Class of Logistic-type Discriminant Functions. Technical Report of Department of Statistics, University of Warwick.

[4] Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179-188.

[5] Hand, D. J. and Henley, W. E. (1997). Statistical classification methods in consumer credit scoring: a review. J. Roy. Statist. Soc., A, 160, 523-541.

[6] McLachlan, G. J. (1992). Discriminant Analysis and Statistical Pattern Recognition. Wiley: New York.

[7] Schapire R., Freund, Y., Bartlett, P. and Lee, W. S. (1998) Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Statist. 26, 1651-1686.

[8] Vapnik, V. N. (1999). The Nature of Statistical Learning Theory. Springer: New York.

1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi –...

Documents

Transcript of 1 Information Geometry on Classification Logistic, AdaBoost, Area under ROC curve Shinto Eguchi –...

Adaboost Tutorial

Raghu seminor

Seminor Report

AdaBoost - cmp.felk.cvut.czcmp.felk.cvut.cz/cmp/courses/recognition/AdaBoost2/adaboost_talk.… · AdaBoost variants. 3/12 What is AdaBoost? AdaBoost is an algorithm for constructing

Train Lighting System Seminor Presentation

Vijay seminor ..analysis

Adaboost Talk

BOOSTING & ADABOOST

SE seminor

Seminor Class

Seminor dream

Eguchi MusicAndReligionOfDiasporicIndiansInPittsburg

seminor presentation on wimax

AdaBoost - unibas.ch · AdaBoost: AdaBoost vs. SVM 24 • Comparison AdaBoost vs. SVM AdaBoost’s decision line SVM’s decision line These decision lines are for a low noise case

project seminor

Seminor on Abhishek

ESP Technical Seminor

seminor 1.pptx

Quantum Computing Seminor

Space robotics(my seminor) final