Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of...

65
Scalable Gaussian Process Methods James Hensman Oxford, February 2015

Transcript of Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of...

Page 1: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Scalable Gaussian Process Methods

James Hensman

Oxford, February 2015

Page 2: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Joint work with

Nicolo FusiAlex

Matthews

NeilLawrence

ZoubinGhahramani

Page 3: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 4: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

GPs

0 5 10 15

t−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

f

Page 5: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

GPs - prior over functions

f ∼ GP(0, k(t, t′))

0 5 10 15

t−4

−3

−2

−1

0

1

2

3

4

f

Page 6: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

GPs - posterior over functions

0 5 10 15

t−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

f

Page 7: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

GPs - analytical solution

f |y ∼ GP(k(x, x)K−1y, k(x, x′) − k(x, x)K−1k(x, x′)

)

0 5 10 15

t−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

f

Page 8: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

What about noise?

Gaussian noise is tractable - additional parameter σn

0 5 10 15

t−1.5

−1.0

−0.5

0.0

0.5

1.0

1.5

2.0

2.5

f

Page 9: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Parameters

−4−3−2−1

01234

−4−3−2−1

01234

0 5 10 15−4−3−2−1

01234

0 5 10 15

Page 10: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 11: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Motivation

Inference in a GP has the following demands:

Complexity: O(n3)Storage: O(n2)

Inference in a sparse GP has the following demands:

Complexity: O(nm2)Storage: O(nm)

where we get to pick m!

Page 12: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Still not good enough!

Big Data

I In parametric models, stochastic optimisation is used.I This allows for application to Big Data.

This work

I Show how to use Stochastic Variational Inference in GPsI Stochastic optimisation scheme: each step requires O(m3)

Page 13: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 14: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Incomplete bibliography

(With apologies for 27,000 omissions)

I Csato and Opper, 2002I Seeger 2003, Lawrence and Seeger 2003I Snelson and Ghahramani 2006I Quinonero-Candela and Rasmussen 2005I Titsias 2009I Alvarez and Lawrence 2011

Page 15: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Computational savings

Knn ≈ Qnn = KnmK−1mmKmn

Instead of inverting Knn, we make a low rank (or Nystrom)approximation, and invert Kmm instead.

Page 16: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Information capture

Everything we want to do with a GP involvesmarginalising f

I PredictionsI Marginal likelihoodI Estimating covariance parameters

The posterior of f is the central object. This means invertingKnn.

Page 17: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

X, y

input space (X)

function v

alues

Page 18: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

X, y

f (x) ∼ GP

input space (X)

function v

alues

Page 19: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

X, y

f (x) ∼ GP

p(f) = N (0,Knn)

input space (X)

function v

alues

Page 20: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

X, y

f (x) ∼ GP

p(f) = N (0,Knn)

p(f |y,X)

input space (X)

function v

alues

Page 21: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Introducing u

Take and extra M points on the function, u = f (Z).

p(y, f,u) = p(y | f)p(f |u)p(u)

Page 22: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Introducing u

Page 23: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Introducing u

Take and extra M points on the function, u = f (Z).

p(y, f,u) = p(y | f)p(f |u)p(u)

p(y | f) = N(y|f, σ2I

)p(f |u) = N

(f|KnmKmmıu, K

)p(u) = N (u|0,Kmm)

Page 24: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

X, y

f (x) ∼ GP

p(f) = N (0,Knn)

p(f |y,X)

Z,up(u) = N (0,Kmm)

input space (X)

function v

alues

Page 25: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

X, y

f (x) ∼ GP

p(f) = N (0,Knn)

p(f |y,X)

p(u) = N (0,Kmm)

p(u |y,X)

input space (X)

function v

alues

Page 26: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

The alternative posterior

Instead of doing

p(f |y,X) =p(y | f)p(f |X)∫p(y | f)p(f |X)df

We’ll do

p(u |y,Z) =p(y |u)p(u |Z)∫p(y |u)p(u |Z)du

but p(y |u) involves inverting Knn

Page 27: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

The alternative posterior

Instead of doing

p(f |y,X) =p(y | f)p(f |X)∫p(y | f)p(f |X)df

We’ll do

p(u |y,Z) =p(y |u)p(u |Z)∫p(y |u)p(u |Z)du

but p(y |u) involves inverting Knn

Page 28: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational marginalisation of f

p(y |u) =p(y | f)p(f |u)

p(f |y,u)

ln p(y |u) = ln p(y | f) + lnp(f |u)

p(f |y,u)

ln p(y |u) = Ep(f |u)

[ln p(y | f)

]+ Ep(f |u)

[ln

p(f |u)p(f |y,u)

]ln p(y |u) = ln p(y |u) + KL[p(f|u)||p(f |y,u)]

No inversion of Knn required

Page 29: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational marginalisation of f

p(y |u) =p(y | f)p(f |u)

p(f |y,u)

ln p(y |u) = ln p(y | f) + lnp(f |u)

p(f |y,u)

ln p(y |u) = Ep(f |u)

[ln p(y | f)

]+ Ep(f |u)

[ln

p(f |u)p(f |y,u)

]ln p(y |u) = ln p(y |u) + KL[p(f|u)||p(f |y,u)]

No inversion of Knn required

Page 30: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational marginalisation of f

p(y |u) =p(y | f)p(f |u)

p(f |y,u)

ln p(y |u) = ln p(y | f) + lnp(f |u)

p(f |y,u)

ln p(y |u) = Ep(f |u)

[ln p(y | f)

]+ Ep(f |u)

[ln

p(f |u)p(f |y,u)

]

ln p(y |u) = ln p(y |u) + KL[p(f|u)||p(f |y,u)]

No inversion of Knn required

Page 31: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational marginalisation of f

p(y |u) =p(y | f)p(f |u)

p(f |y,u)

ln p(y |u) = ln p(y | f) + lnp(f |u)

p(f |y,u)

ln p(y |u) = Ep(f |u)

[ln p(y | f)

]+ Ep(f |u)

[ln

p(f |u)p(f |y,u)

]ln p(y |u) = ln p(y |u) + KL[p(f|u)||p(f |y,u)]

No inversion of Knn required

Page 32: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

An approximate likelihood

p(y |u) =n∏

i=1

N

(yi|k>mnK−1

mmu, σ2)

exp{−

12σ2

(knn − k>mnK−1

mmkmn)}

A straightforward likelihood approximation, and a penaltyterm

Page 33: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Now we can marginalise u

p(u |y,Z) =p(y |u)p(u |Z)∫p(y |u)p(u |Z)du

I Computing the (approximate) posterior costs O(nm2)I We also get a lower bound of the marginal likelihoodI This is the standard variational sparse GP (?).

Page 34: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 35: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Stochastic Variational Inference

I Combine the ideas of stochastic optimisation withVariational inference

I example: apply Latent Dirichlet allocation to projectGutenberg

I Can apply variational techniques to Big DataI How could this work in GPs?

Page 36: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

But GPs are not factorizing models?

I The variational marginalisation of f introducedfactorisation across the datapoints (conditioned on u)

I Marginalising u re-introdcuced dependencies between thedata

I Solution: a variational treatment of u

Page 37: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational Bayes

p(u |y) =p(y |u)p(u)

p(y)

p(y) =p(y |u)p(u)

p(u |y)

ln p(y) = lnp(y |u)p(u)

q(u)+ ln

q(u)p(u |y)

ln p(y) = Eq(u)

[ln

p(y |u)p(u)q(u)

]+ Eq(u)

[ln

q(u)p(u |y)

]ln p(y) = L + KL

(q(u) ‖ p(u |y)

)

Page 38: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational Bayes

p(u |y) =p(y |u)p(u)

p(y)

p(y) =p(y |u)p(u)

p(u |y)

ln p(y) = lnp(y |u)p(u)

q(u)+ ln

q(u)p(u |y)

ln p(y) = Eq(u)

[ln

p(y |u)p(u)q(u)

]+ Eq(u)

[ln

q(u)p(u |y)

]ln p(y) = L + KL

(q(u) ‖ p(u |y)

)

Page 39: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational Bayes

p(u |y) =p(y |u)p(u)

p(y)

p(y) =p(y |u)p(u)

p(u |y)

ln p(y) = lnp(y |u)p(u)

q(u)+ ln

q(u)p(u |y)

ln p(y) = Eq(u)

[ln

p(y |u)p(u)q(u)

]+ Eq(u)

[ln

q(u)p(u |y)

]ln p(y) = L + KL

(q(u) ‖ p(u |y)

)

Page 40: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational Bayes

p(u |y) =p(y |u)p(u)

p(y)

p(y) =p(y |u)p(u)

p(u |y)

ln p(y) = lnp(y |u)p(u)

q(u)+ ln

q(u)p(u |y)

ln p(y) = Eq(u)

[ln

p(y |u)p(u)q(u)

]+ Eq(u)

[ln

q(u)p(u |y)

]

ln p(y) = L + KL(q(u) ‖ p(u |y)

)

Page 41: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Variational Bayes

p(u |y) =p(y |u)p(u)

p(y)

p(y) =p(y |u)p(u)

p(u |y)

ln p(y) = lnp(y |u)p(u)

q(u)+ ln

q(u)p(u |y)

ln p(y) = Eq(u)

[ln

p(y |u)p(u)q(u)

]+ Eq(u)

[ln

q(u)p(u |y)

]ln p(y) = L + KL

(q(u) ‖ p(u |y)

)

Page 42: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

The objective

L = Eq(f)[log p(y | f)] − KL(q(u) ‖ p(u)

)I Tractable for Gaussian likelihoodsI Numerical integration (1D) for intractable likelihoods

(classification)

Page 43: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Optimisation

The variational objective L3 is a function of

I the parameters of the covariance functionI the parameters of q(u)I the inducing inputs, Z

Original strategy: set Z. Take the data in small minibatches,take stochastic gradient steps in the covariance functionparameters, stochastic natural gradient steps in the parametersof q(u).

New strategy: represent S as LL> (unconstrained). Throwm,L,Z,θ at Adagrad.

Page 44: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Optimisation

The variational objective L3 is a function of

I the parameters of the covariance functionI the parameters of q(u)I the inducing inputs, Z

Original strategy: set Z. Take the data in small minibatches,take stochastic gradient steps in the covariance functionparameters, stochastic natural gradient steps in the parametersof q(u).New strategy: represent S as LL> (unconstrained). Throwm,L,Z,θ at Adagrad.

Page 45: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 46: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation
Page 47: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

UK apartment prices

I Monthly price paid data for February to October 2012(England and Wales)

I from http://data.gov.uk/dataset/land-registry-monthly-price-paid-data/

I 75,000 entriesI Cross referenced against a postcode database to get

lattitude and longitudeI Regressed the normalised logarithm of the apartment

prices

Page 48: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation
Page 49: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation
Page 50: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Airline data

I Flight delays for everycommercial flight in theUSA from January to April2008.

I Average delay was 30minutes.

I We randomly selected800,000 datapoints (wehave limited memory!)

I 700,000 train, 100,000 test

Month

DayOfM

onth

DayOfW

eek

DepTim

e

ArrTim

e

AirTim

e

Distan

ce

PlaneA

ge0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Invers

e length

scale

Page 51: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

N=800 N=1000 N=120032

33

34

35

36

37

RM

SE

GPs on subsets

0 200 400 600 800 1000 1200iteration

32

33

34

35

36

37SVI GP

Page 52: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation
Page 53: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 54: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

What about Classification?

I Some of the above integrals are intractable, because of thenon-Gaussian likelihood

I Since the likelihood factorizes, we need to do None-dimension integrals

I Gauss-hermite quadrature works well! For derivatives alsoI Paper in AISTATS 2015

Page 55: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Does it work?KLSP

M=4 M=8 M=16 M=32 M=64 Full

MF

GFIT

C

Page 56: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Is it competetive?

100 101 102 103 104

0.1

0.2

0.3

0.4

0.5

Time (seconds)

Holdou

tnegativelogprobab

ility

KLSp M=4KLSp M=50KLSp M=200MFSp M=4MFSp M=50MFSp M=200EPFitc M=4EPFitc M=50EPFitc M=200

Page 57: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Big data?

102 103 104 1050.320.340.360.380.4

error(%

)

102 103 104 1050.6

0.62

0.64

0.66

Time (seconds)

-log

p(y)

Page 58: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Overview

Background: Gaussian processes

Sparse Gaussian Processes

Stochastic Variational Inference

Gaussian Likelihoods

Non-Gaussian likelihoods

Bonus: Deep GPs

Page 59: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

Bonus! Deep GPs

Deep models using compound functions:

y = f1( f2(... fn(x)))

Page 60: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

xn

h(1)nqu(1)

d

Z0

Z1

h(2)nqu(2)

dZ2

yndu(3)d

d = 1...D1

d = 1...D2

d = 1...Dn = 1...N

Page 61: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053054

055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107108109

0

0.5

1

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053054

055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107108109

0

0.5

1

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053054

055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107108109

0

0.5

1

000001002003004005006007008009010011012013014015016017018019020021022023024025026027028029030031032033034035036037038039040041042043044045046047048049050051052053054

055056057058059060061062063064065066067068069070071072073074075076077078079080081082083084085086087088089090091092093094095096097098099100101102103104105106107108109

−1 −0.5 0 0.5 1 1.5 2

0

0.5

1

Page 62: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

0 5 10 15 20 25 300

5

10

15

20

25

30

35

40

45

Page 63: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

0.0 0.2 0.4 0.6 0.8 1.0

1.0

0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Page 64: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.430

20

10

0

10

20

30

40

50

60

Page 65: Scalable Gaussian Process Methods...Stochastic Variational Inference I Combine the ideas of stochastic optimisation with Variational inference I example: apply Latent Dirichlet allocation

25 20 15 10 5 0 5 10 15 2030

20

10

0

10

20

30