Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan...
-
Upload
kelly-ball -
Category
Documents
-
view
213 -
download
1
Transcript of Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan...
![Page 1: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/1.jpg)
Sparse Gaussian Process Classification
With Multiple Classes
Matthias W. SeegerMichael I. Jordan
University of California, Berkeley
www.cs.berkeley.edu/~mseeger
![Page 2: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/2.jpg)
Gaussian Processes are different Kernel Machines:
Estimate single “best” function to solve problem
Bayesian Gaussian Processes:Inference over random functions mean predictions and uncertainty estimates
Gives posterior distribution over functions More expressive Powerful empirical Bayesian model selection Combination in larger probabilistic structure
Harder to run, but worth it!
![Page 3: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/3.jpg)
The Need for Linear Time “So Gaussian Processes aim for more than
Kernel Machines --- Do they run much slower then?” Not necessarily (anymore)!
GP multi-way classification:
Linear in number datapoints Linear in number classes No artificial “output coding” Predictive uncertainties Empirical Bayesian model selection
![Page 4: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/4.jpg)
Sparse GP Approximations Lawrence, Seeger, Herbrich: IVM (NIPS 02)
Home in on active set , size Replace likelihood by a
likelihood approximation , a Gaussian functionof only
Use information criteria to find I greedily Restricted to models with one process only
(like other sparse GP methods)
![Page 5: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/5.jpg)
Multi-Class Models Multinomial Likelihood (“Softmax”)
Use one process uc(¢) for each class Processes independent
a priori Different kernels K(c)
for each class
![Page 6: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/6.jpg)
“But That’s Easy…” … we thought back then, but: Posterior
covariance
Both are block-diagonal, but in different systems!Together: A has no simple structure!
![Page 7: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/7.jpg)
Second Order Approximation u(c) should be coupled a posteriori Diagonal
not useful Hessian of has simple form
Allow for likelihood coupling to be represented exactly up to second order:
, diagonal minus rank 1
![Page 8: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/8.jpg)
Subproblems Efficient representation exploiting the
prior independence and constrained form ADF projections to constrained Gaussian to
compute site precision blocks
Forward selection of I Extensions of simple myopic scheme Model selection based on conditional inference
![Page 9: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/9.jpg)
Representation Exploits block-diagonal matrix structures Nontrivial to get numerics right (Cholesky
factors) Dominating stub buffers , to compute
marginal moments Update after inclusion (stubs)
in total
![Page 10: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/10.jpg)
Restricted ADF Projection
Hard (non-convex) because constrained Use double-loop scheme: outer loop analytic,
inner loop convex very fast Initialization matters. Our choice can be
motivated from second order approximation (once more)
![Page 11: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/11.jpg)
Information Gain Criterion Selection score measures “informativeness” of
candidates, given current belief
after inclusion of candidate i Points close or wrong side of class boundaries Requires marginal computed from stubs Score candidates prior to each
inclusion
![Page 12: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/12.jpg)
Extensions of Myopic Scheme
Solid Set Liquid Set
Active Set I
i
Freezing
Inclu
sion
• growing• fixed site parameters (for efficiency)
• fixed size• site parameters iteratively updated using EP
![Page 13: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/13.jpg)
Overview Inference Algorithm
Inclusion Phase: Include pattern. Move oldest liquid to solid active set
EP Phase: Run EP updates iteratively on liquid set site parameters
Selection Phase: Compute marginals, score O(n/C) candidates. Select winner
![Page 14: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/14.jpg)
Model Selection Use variational bound on marginal likelihood
based on inference approximation
Gradient costs inference plus Minimize using Quasi Newton, reselecting I and
site parameters for new search directions(non-standard optimization problem)
![Page 15: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/15.jpg)
Preliminary Experiments Small part of MNIST (even digits, C=5, n=800) No model selection (MS not yet tested), all K(c)
the same:
dfinal=150, L=25 (liquid set)
![Page 16: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/16.jpg)
Preliminary Experiments (2)
![Page 17: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/17.jpg)
Preliminary Experiments (3)
![Page 18: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/18.jpg)
Preliminary Experiments (4)
![Page 19: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/19.jpg)
Future Experiments Much larger experiments are in preparation,
including model selection Uses novel powerful object oriented Matlab/C+
+ interface Control over very large persistent C++ objects from
Matlab Faster transition: prototype (Matlab) product (C+
+) Powerful matrix classes (masking, LAPACK/BLAS) Optimization code
Will be released into public domain
![Page 20: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/20.jpg)
Future Work Experiments on much larger tasks Model selection with independent, heavily
parameterized kernels (ARD,…) Present scheme cannot be used for large C
![Page 21: Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley mseeger.](https://reader035.fdocuments.in/reader035/viewer/2022070418/56649ebd5503460f94bc69b7/html5/thumbnails/21.jpg)
Future Work (2)Gaussian process priors in large structured networks
Gaussian process conditional random fields, … Previous work adresses function “point estimation”.
We aim for GP inference including uncertainty estimates
Have to deal with huge random field: correlations not only between datapoints, but also along time Automatic factorizations will be crucial
The multi-class scheme will be a major building block