Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice...

Near-Minimax Optimal Learning with Decision Trees

University of Wisconsin-Madison and Rice University

Rob Nowak and Clay Scott

Supported by the NSF and the ONR

[email protected]

Basic Problem

Classification: build a decision rule based on labeled training data

Given n training points, how well can we do ?

Smooth Decision Boundaries

Suppose that the Bayes decision boundary behaves locally like a Lipschitz function

Mammen & Tsybakov ‘99

Dyadic Thinking about Classification Trees

recursive dyadic partition

Pruned dyadic partition

Pruned dyadic tree

Dyadic Thinking about Classification Trees

Hierarchical structure facilitates optimization

The Classification Problem

Problem:

Classifiers

The Bayes Classifier:

Minimum Empirical Risk Classifier:

Generalization Error Bounds

Selecting a good h

Convergence to Bayes Error

Ex. Dyadic Classification Trees

labeled training data Bayes decision boundary complete RDP pruned RDP

Dyadic classification tree

Codes for DCTs

0

1

00

0 01 1 1 1

1

code-lengths:

ex:

code: 0001001111+ 6 bits for leaf labels

Error Bounds for DCTs

Compare with CART:

Rate of Convergence

Suppose that the Bayes decision boundary behaves locally like a Lipschitz function

Mammen & Tsybakov ‘99 C. Scott & RN ‘02

Why too slow ?

because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced

all |T| leaf trees are equally favored

Local Error Bounds in Classification

Spatial Error Decomposition: Mansour & McAllester ‘00

Relative Chernoff Bound

Local Error Bounds in Classification

Bounded Densities

Global vs. Local

Key: local complexity is offset by small volumes!

Local Bounds for DCTs

Unbalanced Tree

J leafsdepth J-1

Global bound:

Local bound:

Convergence to Bayes Error

Mammen & Tsybakov ‘99 C. Scott & RN ‘03

Concluding Remarks

~

data dependent bound

Neural Information Processing Systems 2002, 2003 [email protected]

Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice...

Documents

Transcript of Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice...