Predictive Automatic Relevance Determination by Expectation Propagation

Predictive Automatic Relevance Determination by Expectation

Propagation

T.P. Minka

R.W. Picard

Z. Ghahramani

Motivation

Task 1: Classify high dimensional datasets with many irrelevant features e.g., normal v.s. cancer microarray data.

Task 2: Sparse Bayesian kernel classifiers for fast test performance

Automatic Relevance Determination (ARD)

• Give the feature weights independent Gaussian priors whose variance, , controls how far away from zero each weight is allowed to go.

• Maximize the marginal likelihood of the model with respect to .

• Outcome: many elements of go to infinity, which naturally prunes irrelevant features in the data.

Risk of Optimizing

• Choosing a simple model can also overfit if we maximize the model marginal likelihood.

• Particularly, if maximizing the marginal likelihood of the model and the dimension of (the number of the features) is large, there exists the risk of overfitting.

Predictive-ARD

• Choosing the model with the best estimated predictive performance instead of the most probable model.

• Expectation propagation (EP) estimates the leave-one-out predictive performance without performing any expensive cross-validation.

Estimate Predictive Performance

• Predictive posterior given a test data point

• EP estimate of predictive leave-one-out error probability

• EP estimate of predictive leave-one-out error count

wDwpwxtpDxtp d)|()*,|*()*,|*(

iiii wDwqwxtp

1\ d)|(),|(1

1),|(1

\ )),|(I(1

iii DxtpN

Comparison of different model selection criteria for ARD training

• 1st row: Test error• 2nd row: Estimated leave-one-out error probability• 3rd row: Estimated leave-one-out error counts• 4th row: Evidence (Model marginal likelihood)• 5th row: Fraction of selected features

Gene Expression Classification

Task: Classify gene expression datasets into different categories, e.g., normal v.s. cancer

Challenge: Thousands of genes measured in the micro-array data. Only a small subset of genes are probably correlated with the classification task.

Classifying Leukemia Data

• The task: distinguish acute myeloid leukemia (AML) from acute lymphoblastic leukemia (ALL).

• The dataset: 47 and 25 samples of type ALL and AML respectively with 7129 features per sample.

• The dataset was randomly split 100 times into 36 training and 36 testing samples.

Classifying Colon Cancer Data

• The task: distinguish normal and cancer samples

• The dataset: 22 normal and 40 cancer samples with 2000 features per sample.

• The dataset was randomly split 100 times into 50 training and 12 testing samples.

• SVM results from Li et al. 2002

Summary

• ARD is an excellent Bayesian feature selection and sparse learning method.

• However, maximizing marginal likelihood can lead to overfitting if there are a lot of features.

• We propose Predictive ARD based on EP• In practice it works very well.

Sequential Update

• EP approximates true observations by simpler virtual observations.

• Based on virtual observations, we can achieve efficient sequential updates without maintaining and updating a full covariance matrix.

Bayesian Sparse Classifiers

• The trained classifier is defined by a small subset of the training set.

• Fast test performance.

Test error rates and numbers of relevance or support vectors on breast cancer dataset.

50 partitionings of the data were used. All these methods use the same Gaussian kernel with kernel width = 5. The trade-off parameter C in SVM is chosen via 10-fold cross-validation for each partition.

Test error rates on diabetes data.

100 partitionings of the data were used. Evidence and Predictive ARD-EPs use the Gaussian kernel with kernel width = 5.

Predictive Automatic Relevance Determination by Expectation Propagation

Documents

Transcript of Predictive Automatic Relevance Determination by Expectation Propagation

INTERNATIONALISATION OF RETAIL - Cushman & …/media/reports/netherlands/Retail-Repor… · started 2017 with an expectation that the ... tain their relevance in the eyes of the consumer

Expectation Propagation for Recti ed Linear Poisson Regressionproceedings.mlr.press › v45 › Ko15.pdf · 2017-05-29 · EP for Rectified Linear Poisson Regression 1860 1880 1900

Extending Expectation Propagation for Graphical Modelspeople.csail.mit.edu/alanqi/papers/main.pdf · Extending Expectation Propagation for Graphical Models by Yuan Qi Submitted to

Expectation Propagation for Graphical Models

Extending Expectation Propagation on Graphical Models

Expectation Propagation as a Solution for Digital ...

Expectation propagation for approximate inference in ...tomh/techreports/heskes03extended.pdfin models in which exact inference becomes intractable, as, for example, in switching linear

Contrastive Relevance Propagation for Interpreting …...Contrastive Relevance Propagation for Interpreting Predictions by a Single-Shot Object Detector Hideomi Tsunakawa1, Yoshitaka

Expectation Particle Belief Propagationswoh.web.engr.illinois.edu/courses/IE598/handout/fall2016_slide11.pdf · Motivation of Particle-based Belief Propagation •Popular choice for

Expectation Propagation of Gaussian Process Classiﬁcation ...murphyk/Teaching/CS532c_Fall04/Projects/tan.pdf · Department of Computer Science University of British Columbia mtan@cs.ubc.ca

Expectation Propagation Detection for High-order High ... · Expectation Propagation Detection for High-Order High-Dimensional MIMO Systems Javier Céspedes, Pablo M. Olmos, Member,

Extending Expectation Propagation on Graphical Models Yuan (Alan) Qi Yuanqi@mit.edu.

Extending Expectation Propagation for Graphical Models

Meetings, Convention & Conference Industry News …...THE JOB DESCRIPTION IS CHANGING Role expansion Rising expectation of relevance & value Proof of impact Proliferation of technology

Expectation propagation for continuous time stochastic ... · Expectation propagation for continuous time stochastic processes Botond Cseke 1, David Schnoerry2,3, Manfred Opperz4,

Training Deep Gaussian Processes using Stochastic ......Training Deep Gaussian Processes using Stochastic Expectation Propagation and Probabilistic Backpropagation Thang D. Bui, José

Expectation Particle Belief Propagationweb.engr.illinois.edu/~swoh/courses/IE598/handout/fall2016_slide11.pdf · Motivation of Particle-based Belief Propagation •Popular choice

The von Mises Graphical Model: Expectation Propagation for Inference

CFCS - Expectation and Variance; Chebyshev's …Expectation and Related Concepts Chebyshev’s Theorem Expectation Mean Variance Expectation Much of probability theory comes from gambling.

Expectation propagation as a way of life∗ 1. Background: Bayesian ...