A Bootstrap Interval Estimator for Bayes’ Classification Error
description
Transcript of A Bootstrap Interval Estimator for Bayes’ Classification Error
A Bootstrap Interval Estimator for Bayes’ Classification ErrorChad M. Hawesa,b, Carey E. Priebea
a The Johns Hopkins University, Dept of Applied Mathematics & Statisticsb The Johns Hopkins University Applied Physics Laboratory
Abstract PMH Distribution• Given finite length classifier training set, we propose a new estimation
approach that provides an interval estimate of the Bayes’-optimal classification error L*, by:• Assuming power-law decay for unconditional error rate of k-
nearest neighbor (kNN) classifier• Constructing bootstrap-sampled training sets of varying size• Evaluating kNN classifier on bootstrap training sets to estimate
unconditional error rate• Fitting resulting kNN error rate decay as function of training set
size to assumed power-law form• Standard kNN rule provides upper bound on L*• Hellman’s (k,k’) nearest neighbor rule with reject option provides
lower bound on L*• Result is asymptotic interval estimate of L* using finite sample• We apply this L* interval estimator to two classification datasets
Motivation Approach: Part 1 Pima Indians• Knowledge of Bayes’-optimal classification error L* tells us the best
any classification rule could do on a given classification problem:• Difference between your classifier’s error rate Ln and L* indicates
how much improvement is possible by changes to your classifier, for a fixed feature set
• If L* is small and |Ln-L*| is large, then it’s worth spending time & money to improve your classifier
• Knowledge of Bayes’-optimal classification error L* indicates how good our features are for discriminating between our (two) classes:
• If L* is large and |Ln-L*| is small, then better to spend time & money finding better features (changing FXY) than improving your classifier
• Estimate of Bayes’ error L* is useful for guiding where to invest time & money for classification rule improvement and feature development
Theory
Model & NotationWe have training data:
Conditional probability of error for kNN rule:Finite sample:Asymptotic:
Feature Vector:Class Label:
We have testing data:
We build k-nearest neighbor (kNN) classification rule:
denoted as
Unconditional probability of error for kNN rule:Finite sample:Asymptotic:
Empirical distribution puts mass 1/n on n training samples
• No approach to estimate Bayes’ error can work for all joint distributions FXY:• Devroye 1982: For any (fixed) integer n, e>0, and classification rule gn there
exists a distribution FXY with Bayes’ error L*=0 such that
there exist conditions on FXY for which our technique applies
• Asymptotic kNN-rule error rates form an interval bound on L*:• Devijver 1979: For fixed k: , where lower bound is
asymptotic error rate of the kNN-rule with reject option (Hellman 1970) if estimate asymptotic rates w/ finite sample, we have L* estimate
• KNN-rule’s unconditional error follows known form for class of distributions FXY:
• Snapp & Venkatesh 1998: Under regularity conditions on FXY, the finite sample unconditional error rate of the kNN-rule, for fixed k, follows the asymptotic expansion
there exists known parametric form for kNN-rule’s error rate decay
1. Construct B bootstrap-sampled training datasets of size nj from Dn using • For each bootstrap-constructed training dataset, estimate kNN-rule
conditional error rate on test set Tm, yielding
2. Estimate mean & variance of for training sample size nj:
• Mean provides estimate of unconditional error rate• Variance used for weighted fitting of error rate decay curve
3. Repeat steps 1 and 2 for desired training sample sizes : • Yields estimates
4. Construct estimated unconditional error rate decay curve versus training sample size n
Approach: Part 21. Assume kNN-rule error rates decay according to simple power-lay form:
2. Perform weighted nonlinear least squares fit to constructed error rate curve:• Use variance of bootstrapped conditional error rate estimates as weights
3. Resulting forms upper bound for L*: • Strong assumption on form of error rate decay enables estimate of
asymptotic error rate using only a finite sample
4. Repeat entire procedure using Hellman’s (k,k’) nearest neighbor rule with reject option to form lower bound estimate for L*: • This yields interval estimate for Bayes’ classification error as
Priebe, Marchette, Healy (PMH) distribution has known L* = 0.0653, d=6:
Training size n = 2000
Test set size m = 2000
Symbols are bootstrap estimates of unconditional error rate
Interval estimate:
UCI Pima Indian Diabetes distribution has unknown L*, d=8:
Training size n = 500
Test set size m = 268
Symbols are bootstrap estimates of unconditional error rate
Interval estimate:
References[1] Devijver, P. “New error bounds with the nearest neighbor rule,” IEEE Trans. on Informtion Theory, 25, 1979.
[2] Devroye, L. “Any discrimination rule can have an arbitrarily bad probability of error for finite sample size,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 4, 1982.
[3] Hellman, M. “The nearest neighbor classification rule with a reject option,” IEEE Trans. on Systems Science & Cybernetics, 6, 1970.
[4] Priebe, C., D. Marchette, & D. Healy. “Integrated sensing and processing decision trees,” IEEE Trans. on Pattern Analysis & Machine Intelligence, 26, 2004.
[5] Snapp, R. & S. Venkatesh. “Asymptotic expansions of the k nearest neighbor risk,” Annals of Statistics, 26, 1998.