Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice...

27
Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and the ONR [email protected]

description

Smooth Decision Boundaries Suppose that the Bayes decision boundary behaves locally like a Lipschitz function Mammen & Tsybakov ‘99

Transcript of Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice...

Page 1: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Near-Minimax Optimal Learning with Decision Trees

University of Wisconsin-Madison and Rice University

Rob Nowak and Clay Scott

Supported by the NSF and the ONR

[email protected]

Page 2: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Basic Problem

Classification: build a decision rule based on labeled training data

Given n training points, how well can we do ?

Page 3: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Smooth Decision Boundaries

Suppose that the Bayes decision boundary behaves locally like a Lipschitz function

Mammen & Tsybakov ‘99

Page 4: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Dyadic Thinking about Classification Trees

recursive dyadic partition

Page 5: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Pruned dyadic partition

Pruned dyadic tree

Dyadic Thinking about Classification Trees

Hierarchical structure facilitates optimization

Page 6: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

The Classification Problem

Problem:

Page 7: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Classifiers

The Bayes Classifier:

Minimum Empirical Risk Classifier:

Page 8: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Generalization Error Bounds

Page 9: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Generalization Error Bounds

Page 10: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Generalization Error Bounds

Page 11: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Selecting a good h

Page 12: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Convergence to Bayes Error

Page 13: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Ex. Dyadic Classification Trees

labeled training data Bayes decision boundary complete RDP pruned RDP

Dyadic classification tree

Page 14: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Codes for DCTs

0

1

00

0 01 1 1 1

1

code-lengths:

ex:

code: 0001001111+ 6 bits for leaf labels

Page 15: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Error Bounds for DCTs

Compare with CART:

Page 16: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Rate of Convergence

Suppose that the Bayes decision boundary behaves locally like a Lipschitz function

Mammen & Tsybakov ‘99 C. Scott & RN ‘02

Page 17: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Why too slow ?

because Bayes boundary is a (d-1)-dimensional manifold “good” trees are unbalanced

all |T| leaf trees are equally favored

Page 18: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Local Error Bounds in Classification

Spatial Error Decomposition: Mansour & McAllester ‘00

Page 19: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Relative Chernoff Bound

Page 20: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Relative Chernoff Bound

Page 21: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Local Error Bounds in Classification

Page 22: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Bounded Densities

Page 23: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Global vs. Local

Key: local complexity is offset by small volumes!

Page 24: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Local Bounds for DCTs

Page 25: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Unbalanced Tree

J leafsdepth J-1

Global bound:

Local bound:

Page 26: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Convergence to Bayes Error

Mammen & Tsybakov ‘99 C. Scott & RN ‘03

Page 27: Near-Minimax Optimal Learning with Decision Trees University of Wisconsin-Madison and Rice University Rob Nowak and Clay Scott Supported by the NSF and.

Concluding Remarks

~

data dependent bound

Neural Information Processing Systems 2002, 2003 [email protected]