A PAC-Bayes Risk Bound for General Loss Functions

19
1 A PAC-Bayes Risk Bound for General Loss Functions NIPS 2006 NIPS 2006 Pascal Germain, Alexandre Lacasse, François Laviolette, Mario Marchand Université Laval, Québec, Canada

description

A PAC-Bayes Risk Bound for General Loss Functions. NIPS 2006 Pascal Germain, Alexandre Lacasse, Fran ç ois Laviolette, Mario Marchand Université Laval, Québec, Canada. Summary. - PowerPoint PPT Presentation

Transcript of A PAC-Bayes Risk Bound for General Loss Functions

Page 1: A PAC-Bayes Risk Bound for General Loss Functions

1

A PAC-Bayes Risk Bound for General Loss Functions

NIPS 2006NIPS 2006

Pascal Germain, Alexandre Lacasse, François Laviolette, Mario Marchand Université Laval, Québec, Canada

Page 2: A PAC-Bayes Risk Bound for General Loss Functions

2

Summary

We provide a (tight) PAC-Bayesian bound for the expected loss of convex combinations of classifiers under a wide class of loss functions like the exponential loss and the logistic loss.

Experiments with Adaboost indicate that the upper bound (computed on the training set) behaves very similarly as the true loss (estimated on the testing set).

Page 3: A PAC-Bayes Risk Bound for General Loss Functions

3

Convex Combinations of Classifiers

Consider any set H of {-1, +1}-valued classifiers and any posterior Q on H .

For any input example x, the [-1,+1]-valued output fQ(x) of a convex combination of classifiers is given by

Page 4: A PAC-Bayes Risk Bound for General Loss Functions

4

The Margin and WQ(x,y) WQ(x,y) is the fraction, under measure Q, of

classifiers that err on example (x,y)

It is relate to the margin y fQ(x) by

Page 5: A PAC-Bayes Risk Bound for General Loss Functions

5

General Loss Functions Q(x,y)

Hence, we consider any loss function Q(x,y) that can be written as a Taylor series

and our task is to provide tight bounds for the expected loss Q that depend on the empirical loss measured on a training set of m examples, where

Page 6: A PAC-Bayes Risk Bound for General Loss Functions

6

Bounds for the Majority Vote A bound on Q also provides a bound on the majority

vote since

Page 7: A PAC-Bayes Risk Bound for General Loss Functions

7

A PAC-Bayes Bound on Q

Page 8: A PAC-Bayes Risk Bound for General Loss Functions

8

Proof

where h1-k denotes the product of k classifiers. Hence

Page 9: A PAC-Bayes Risk Bound for General Loss Functions

9

Proof (cnt.) Let us define the “error rate” R(h1-k ) as

to relate Q to the error rate of a new Gibbs classifier:

Page 10: A PAC-Bayes Risk Bound for General Loss Functions

10

Proof (ctn.) Where is a distribution over products of

classifiers that works as follows: A number k is chosen according to

k classifiers in H are chosen according to Qk

So denotes the risk of this Gibbs classifier:

Page 11: A PAC-Bayes Risk Bound for General Loss Functions

11

Proof (ctn.) The standard PAC-Bayes theorem implies that for

any prior on H* = [k2 N+ Hk , we have

Our theorem follows for any having the same structure of (i.e: k is first chosen according to |g(k)|/c, then k

classifiers are chosen accord. to Pk) since, in that case, we have

Page 12: A PAC-Bayes Risk Bound for General Loss Functions

12

Remark Since we have

any looseness in the bound for R(GQ) will be amplified by c on the bound for Q.

Hence, the bound on Q can be tight only for small c.

This is the case for Q(x,y) = |fQ(x) – y|r since we have c = 1 for r = 1 and c = 3 for r = 2.

Page 13: A PAC-Bayes Risk Bound for General Loss Functions

13

Bound Behavior During Adaboost

Here H is the set of decision stumps. The output h(x) of decision stump h on attribute x with threshold t is given by h(x) = sgn(x-t) .

If P(h) = 1/|H| hH, then

H(Q) generally increases at each boosting round

Page 14: A PAC-Bayes Risk Bound for General Loss Functions

14

Results for the Exponential Loss

For this loss function, we have

Since c increases exponentially rapidly with , so will the risk bound.

Page 15: A PAC-Bayes Risk Bound for General Loss Functions

15

Exponential Loss Results (ctn.)

Page 16: A PAC-Bayes Risk Bound for General Loss Functions

16

Exponential Loss Results (ctn.)

Page 17: A PAC-Bayes Risk Bound for General Loss Functions

17

Results for the Sigmoid Loss

For this loss function, we have

The Taylor series for tanh(x) converges only for |x| < /2. We are thus limited to < /2.

Page 18: A PAC-Bayes Risk Bound for General Loss Functions

18

Sigmoid Loss Results (ctn.)

Page 19: A PAC-Bayes Risk Bound for General Loss Functions

19

Conclusion We have obtained PAC-Bayesian risk bounds for

any loss function Q having a convergent Taylor expansion around WQ = ½.

The bound is tight only for small c.

On Adaboost, the loss bound is basically parallel to the true loss.