A PAC-Bayes Risk Bound for General Loss Functions

NIPS 2006NIPS 2006

Pascal Germain, Alexandre Lacasse, François Laviolette, Mario Marchand Université Laval, Québec, Canada

Summary

We provide a (tight) PAC-Bayesian bound for the expected loss of convex combinations of classifiers under a wide class of loss functions like the exponential loss and the logistic loss.

Experiments with Adaboost indicate that the upper bound (computed on the training set) behaves very similarly as the true loss (estimated on the testing set).

Convex Combinations of Classifiers

Consider any set H of {-1, +1}-valued classifiers and any posterior Q on H .

For any input example x, the [-1,+1]-valued output fQ(x) of a convex combination of classifiers is given by

The Margin and WQ(x,y) WQ(x,y) is the fraction, under measure Q, of

classifiers that err on example (x,y)

It is relate to the margin y fQ(x) by

General Loss Functions Q(x,y)

Hence, we consider any loss function Q(x,y) that can be written as a Taylor series

and our task is to provide tight bounds for the expected loss Q that depend on the empirical loss measured on a training set of m examples, where

Bounds for the Majority Vote A bound on Q also provides a bound on the majority

vote since

A PAC-Bayes Bound on Q

where h1-k denotes the product of k classifiers. Hence

Proof (cnt.) Let us define the “error rate” R(h1-k ) as

to relate Q to the error rate of a new Gibbs classifier:

Proof (ctn.) Where is a distribution over products of

classifiers that works as follows: A number k is chosen according to

k classifiers in H are chosen according to Qk

So denotes the risk of this Gibbs classifier:

Proof (ctn.) The standard PAC-Bayes theorem implies that for

any prior on H* = [k2 N+ Hk , we have

Our theorem follows for any having the same structure of (i.e: k is first chosen according to |g(k)|/c, then k

classifiers are chosen accord. to Pk) since, in that case, we have

Remark Since we have

any looseness in the bound for R(GQ) will be amplified by c on the bound for Q.

Hence, the bound on Q can be tight only for small c.

This is the case for Q(x,y) = |fQ(x) – y|r since we have c = 1 for r = 1 and c = 3 for r = 2.

Bound Behavior During Adaboost

Here H is the set of decision stumps. The output h(x) of decision stump h on attribute x with threshold t is given by h(x) = sgn(x-t) .

If P(h) = 1/|H| hH, then

H(Q) generally increases at each boosting round

Results for the Exponential Loss

For this loss function, we have

Since c increases exponentially rapidly with , so will the risk bound.

Exponential Loss Results (ctn.)

Results for the Sigmoid Loss

For this loss function, we have

The Taylor series for tanh(x) converges only for |x| < /2. We are thus limited to < /2.

Sigmoid Loss Results (ctn.)

Conclusion We have obtained PAC-Bayesian risk bounds for

any loss function Q having a convergent Taylor expansion around WQ = ½.

The bound is tight only for small c.

On Adaboost, the loss bound is basically parallel to the true loss.

A PAC-Bayes Risk Bound for General Loss Functions

Documents

Transcript of A PAC-Bayes Risk Bound for General Loss Functions

BOUND XENOBIOTIC RESIDUES IN FOOD COMMODITIES OF PLANT …media.iupac.org/publications/pac/1998/pdf/7007x1423.pdf · OF PLANT AND ANIMAL ORIGIN (Technical Report) ... bound xenobiotic

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

Bayesian inference, Naïve Bayes model - Svetlana Lazebnikslazebni.cs.illinois.edu/fall16/lec13_bayesian_inference.pdf · Bayesian inference, Naïve Bayes model ... Bayes Rule •

Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.

Fundamental Advantages of Bayes in Drug Developmenthbiostat.org/doc/bayes/meetup.pdf · 2020-04-26 · Fundamental Advantages of Bayes in Drug Development Background Freq&Bayes Needed

Submitted to Statistical Science Bayes, Oracle …statweb.stanford.edu/~ckirby/brad/papers/2017...Submitted to Statistical Science Bayes, Oracle Bayes, and Empirical Bayes Bradley

Spam Filtering with Naive Bayes â€“ Which Naive Bayes?

Lecture 9: Bayesian Learning - Otto-Friedrich- · PDF fileLEARNING, MDL principle, Bayes Optimal Classiﬁer, Naive Bayes Classiﬁer, Bayes Belief Networks ... on Bayes theorem Lecture

Naïve Bayes - santini.sesantini.se/teaching/ml/2016/Lect_06/06_NaiveBayes.pdf · Lecture 6 - Self-Study: Naive Bayes 19 Naïve Bayes: discussion ! Naïve Bayes works surprisingly

Meta-Learning by Adjusting Priors Based on Extended PAC ... · The PAC-Bayes framework has been widely studied in re-cent years, and has given rise to signiﬁcant ﬂexibility in

The Naïve Bayes Classifier - svivek · •The naïve Bayes Classifier •Learning the naïve Bayes Classifier •Practical concerns 2. Today’s lecture •The naïve Bayes Classifier

Spam Filtering with Naive Bayes - Which Naive Bayes? · PDF fileSpam Filtering with Naive Bayes – Which Naive Bayes? ∗ Vangelis Metsis † Institute of Informatics and Telecommunications,

Spam Filtering with Naïve Bayes – Which Naïve Bayes?cobweb.cs.uga.edu/.../CSCI6900...Presentation2.pdf · Title: Spam Filtering with Naïve Bayes – Which Naïve Bayes? Author:

23: Naïve Bayes - Stanford Universityweb.stanford.edu/.../lectures/23_naive_bayes_blank.pdf21 “Brute Force Bayes” 24b_brute_force_bayes 32 Naïve Bayes Classifier 24c_naive_bayes

PAC-Bayesian Bound for the Conditional Value at Risk · PAC-Bayesian Bound for the Conditional Value at Risk Zakaria Mhammedi The Australian National University and Data61 zak.mhammedi@anu.edu.au

Data Dependent Priors in PAC-Bayes Bounds - … · PAC-Bayes Analysis Linear Classiﬁers Data ... its empirical error rate John Shawe-Taylor University ... John Shawe-Taylor University

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Bayes Factors 1 Running head: BAYES FACTORS Bayes Factor

CIS 522: Lecture 15 - Penn Engineeringcis522/slides/CIS522_Lecture... · 2020-03-13 · PAC-Bayes with neural networks Data-dependent bounds on generalization Can get first non-trivial

Bayes-CPACE: PAC Optimal Exploration in Continuous ...dani, Srinivasa, and Bagnell 2015). Although BAMDPs provide an elegant problem formu-lation for model uncertainty, Probably Approximately