Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3...

36
Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References Kernel adaptive Sequential Monte Carlo Ingmar Schuster (Paris Dauphine) Heiko Strathmann (University College London) Brooks Paige (Oxford) Dino Sejdinovic (Oxford) December 7, 2015 1 / 36

Transcript of Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3...

Page 1: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Kernel adaptive Sequential Monte Carlo

Ingmar Schuster (Paris Dauphine)Heiko Strathmann (University College London)

Brooks Paige (Oxford)Dino Sejdinovic (Oxford)

December 7, 2015

1 / 36

Page 2: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Section 1

Outline

2 / 36

Page 3: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

1 Introduction

2 Kernel Adaptive SMC (KASS)

3 Implementation Details

4 Evaluation

5 Conclusion

3 / 36

Page 4: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Section 2

Introduction

4 / 36

Page 5: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sequential Monte Carlo Samplers

Approximate integrals with respect to target distribution πT

Build upon Importance Sampling: approximate integral of hwrt density πT using samples following density q (undercertain conditions):∫

h(x)dπT (x) =

∫h(x)

πT (x)

q(x)dq(x)

Given prior π0, build sequence π0, . . . , πi , . . . πT such that

πi+1 is closer to πT than πi(δ(πi+1, πT ) < δ(πi , πT ) for some divergence δ)sample from πi can approximate πi+1 well usingimportance weight function w(·) = πi+1(·)/πi (·)

5 / 36

Page 6: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sequential Monte Carlo Samplers

At i = 0

Using proposal density q0, generate particles{(w0,j ,X0,j)}Nj=1 where w0,j = π0(X0,j)/q0(X0,j)importance resampling, resulting in Nequally weighted particles {(1/N, X̄0,j)}Nj=1

rejuvenation move for each X̄0,j byMarkov Kernel leaving π0 invariant

At i > 0

approximate πi by {(πi (Xi−1,j)/πi−1(Xi−1,j),Xi−1,j)}Nj=1

resamplingrejuvenation leaving πi invariantif πi 6= πT , repeat

6 / 36

Page 7: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sequential Monte Carlo Samplers

estimate evidence ZT of πT by

ZT ≈ Z0

T∏i=1

1

N

∑j

wi ,j

(aka normalizing constant, marginal likelihood)

Can be adaptive in rejuvenation steps without diminishingadaptation as required in adaptive MCMC

Will construct rejuvenation using RKHS-embedding ofparticles

7 / 36

Page 8: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Intractable Likelihoods and Evidence

in nonconjugate latent variable models, intractable likelihoodsarise

when likelihood can be estimated unbiasedly, SMC still valid

simple case: estimate likelihood using IS or SMC, leads to IS2

(Tran et al., 2013) and SMC2 (Chopin et al., 2011)

results in noisy Importance Weights, but evidenceapproximation is still valid (Tran et al., 2013, Lemma 3)

8 / 36

Page 9: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Nonlinear proposals based on positive definite Kernels

Kernel Adaptive Metropolis Hastings (KAMH) was introducedin Sejdinovic et al. (2014)

Given previous samples from target distribution π, draw newones more efficiently

Each sample mapped to functional in Reproducing KernelHilbert Space (RKHS) Hk using pd kernel k(·, ·)Fit Gaussian qk in Hk with

µ =

∫k(·, x)dπ(x) ≈ 1

n

n∑i=1

k(·,Xi )

Σ =

∫k(·, x)⊗ k(·, x)dπ(x)− µ⊗ µ

9 / 36

Page 10: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Nonlinear proposals based on positive definite Kernels

Draw sample from qk and project back into original space, useas proposal in MH

KAMH set in adaptive MCMC, using vanishing adaptation(e.g. vanishing probability to use new samples for computingadaptive proposal)

Depending on used positive definite kernel, can adapt tononlinear targets

10 / 36

Page 11: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Section 3

Kernel Adaptive SMC (KASS)

11 / 36

Page 12: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Adaptive SMC Sampler

SMC works on a sequence of targets, so we use an artificialsequence of distributions leading from prior π0 to posterior πT

parameters of rejuvenation kernel can be adapted beforerejuvenation

Fearnhead and Taylor (2013) used global Gaussianapproximation as proposal in Metropolis Hastings rejuvenation

resulting in adaptive SMC sampler (ASMC)

12 / 36

Page 13: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Kernel adaptive rejuvenation

instead, we use RKHS-proposal projected into input space(in closed form)

given unweighted particles {X̄i}Ni=1, proposal at X̄j is

qKAMH(·|X̄j) = N (·|X̄j , ν2MX,X̄j

CM>X,X̄j

+ γ2I ))

where C = I − 1n11> is centering matrix and

MX,X̄j= 2[∇xk(x , X̄1)|x=X̄j

, ...,∇xk(x , X̄N)|x=X̄j]

results inASMC using linear kernel

k(X ,X ′) = X>X ′

locally adaptive fit using Gaussian RBF

k(X ,X ′) = exp

(−‖X − X ′‖2

2σ2

)13 / 36

Page 14: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

KASS versus ASMC

green: ASMC / KASS with linear kernelred: KASS with Gaussian RBF kernel

14 / 36

Page 15: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Related Work

Most direct relation to ASMC (which is a special case)

All SMC samplers related to Annealed Importance Samplingwhich however does not use resampling (Neal, 1998)

Local Adaptive Importance Sampling (Givens and Raferty,1996, LAIS) has similar locally adaptive effect

at each iteration compute pairwise distances betweenImportance Samplesuse k nearest neighbors for fitting local Gaussian proposalno resampling steps mean decrease in sampling efficiencywhich is exponential in dimensionality of problem

15 / 36

Page 16: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Section 4

Implementation Details

16 / 36

Page 17: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Construction of Target Sequence

For artificial distribution sequence we used geometric bridge

πi ∝ π1−ρi0 πρiT

where (ρi )Ti=1 is an increasing sequence satisfying ρT = 1

another standard choice in Bayesian Inference is addingdatapoints one after another

πi (X ) = π(X |d1, . . . , dbρiDc)

resulting in Iterated Batch Importance Sampling(Chopin, 2002, IBIS)

17 / 36

Page 18: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Stochastic approximation tuning of ν2

KASS’ free scaling parameter ν2 can be tuned for optimalscaling

Fearnhead and Taylor (2013) use auxiliary variable approachwith ESJD criterion

We used stochastic approximation framework of Andrieu andThoms (2008) instead

asymptotically optimal acceptance rate for Random Walkproposals is αopt = 0.234 (Rosenthal, 2011)after rejuvenation, Rao-Blackwellized estimator α̂i available byaveraging MH acceptance probabilitiestune ν2 by

ν2i+1 = ν2

i + λi (α̂i − αopt)

for non-increasing λ1, . . . , λT

18 / 36

Page 19: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Section 5

Evaluation

19 / 36

Page 20: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Synthetic nonlinear target (Banana)

Synthetic target: Banana distribution in 8 dimensions, i.e.Gaussian with twisted second dimension

20 15 10 5 0 5 10 15 20

4

2

0

2

4

6

8

20 / 36

Page 21: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Synthetic nonlinear target (Banana)

Compare performance of Random-Walk rejuvenation withasymptotically optimal scaling (ν = 2.38/

√d), ASMC and

KASS with Gaussian RBF kernel

Fixed learning rate of λ = 0.1 to adapt scale parameter usingstochastic approximation

Geometric bridge of length 20

30 Monte Carlo runs

Report Maximum Mean Discrepancy (MMD) using polynomialkernel of order 3: distance of moments up to order 3 betweenground truth samples and samples produced by each method

21 / 36

Page 22: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Synthetic nonlinear target (Banana)

0 100 200 300 400 500 600

Population size

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

7.5

8.0M

MD

tobe

nchm

ark

sam

ple ×107

KASSRWSMCASMC

Figure: Improved convergence of all mixed moments up to order 3 ofKASS compared to ASMC and RW-SMC.

22 / 36

Page 23: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sensor network localization

Applied problem: infer locations of S = 3 sensors in a sensornetwork measuring distance to each other

Known position for B = 2 base sensors

Measurements successful with probability decayingexponentially in squared distance (otherwise unobserved)

Zi ,j ∼ Binom

(1, exp

(−‖xi − xj‖2

2

2 · 0.32

))Measurements corrupted by Gaussian noise

Yi ,j ∼

{N (‖xi − xj‖, 0.02) if Zi ,j = 1

Yi ,j = 0 else

23 / 36

Page 24: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sensor network localization

run KASS and ASMC with geometric bridge of length 50 and10, 000 particles, fixed learning rate λi = 1

run KAMH for 50 · 10, 000 iterations, discard first half asburn-in, diminishing adaptation λi = 1/

√i

initialize both algorithms with samples from prior

qualitative comparison of KASS and closest adaptive MCMCalgorithm KAMH

24 / 36

Page 25: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sensor network localization: KAMH adaptive MCMC

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.0

0.2

0.4

0.6

0.8

1.0MCMC (KAMH)

Figure: Posterior samples of unknown sensor locations (in color) byKAMH. Set-up of the true sensor locations (black dots) and base sensors(black stars) causes uncertainty in posterior.

25 / 36

Page 26: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sensor network localization: KASS adaptive SMC

−0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80.0

0.2

0.4

0.6

0.8

1.0SMC (KASS)

Figure: Posterior samples of unknown sensor locations (in color) byKASS. Set-up of the true sensor locations (black dots) and base sensors(black stars) causes uncertainty in posterior.

26 / 36

Page 27: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Sensor network localization

MCMC algorithm not able to traverse all the modes withoutspecial care (e.g. Wormhole HMC by Lan et al., 2014)

KASS and ASMC perform similarly in this setup

with S = 2 (higher uncertainty), 1000 particles MMD of

0.76± 0.4 for KASS0.94± 0.7 for ASMC

27 / 36

Page 28: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Evidence approximation for intractable likelihoods

in classification using Gaussian Processes (GP), logistictransformation renders likelihood intractable

likelihood can be unbiasedly estimated using ImportanceSampling from EP approximation

estimate model evidence when using ARD kernel in the GP

particularly hard because noisy likelihoods means noisyimportance weights

ground truth by averaging evidence estimate over 20 longrunning SMC algorithms

28 / 36

Page 29: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Evidence approximation for intractable likelihoods

Figure: Ground truth in red, KASS in blue, ASMC in green.

29 / 36

Page 30: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Section 6

Conclusion

30 / 36

Page 31: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Conclusion (1)

Developed Kernel Adaptive SMC sampler for static models

KASS exploits local covariance of target throughRKHS-informed rejuvenation proposals

combines these with general SMC advantages for multimodaltargets and evidence estimation

especially attractive when likelihoods are intractable

31 / 36

Page 32: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Conclusion (2)

evaluated on a strongly twisted Banana where it was clearlybetter than ASMC

KASS enables exploring multiple modes in nonlinear sensor

KASS exhibits less variance than ASMC in evidenceestimation for GP classification

evidence approximation even in case of intractable likelihoods

32 / 36

Page 33: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Thanks!

33 / 36

Page 34: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Literature I

Andrieu, C. and Thoms, J. (2008). A tutorial on adaptive MCMC.Statistics and Computing, 18(November):343–373.

Chopin, N. (2002). A sequential particle filter method for staticmodels. Biometrika, 89(3):539–552.

Chopin, N., Jacob, P. E., and Papaspiliopoulos, O. (2011).SMCˆ2: an efficient algorithm for sequential analysis ofstate-space models. 0(1):1–27.

Fearnhead, P. and Taylor, B. M. (2013). An Adaptive SequentialMonte Carlo Sampler. Bayesian Analysis, (2):411–438.

Givens, G. H. and Raferty, A. E. (1996). Local AdaptiveImportance Sampling for Multivariate Densities with StrongNonlinear Relationships. Journal of the American StatisticalAssociation, 91(433):132–141.

34 / 36

Page 35: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Literature II

Lan, S., Streets, J., and Shahbaba, B. (2014). Wormholehamiltonian monte carlo. In Twenty-Eighth AAAI Conference onArtificial Intelligence.

Neal, R. (1998). Annealed Importance Sampling. Technical report,University of Toronto.

Rosenthal, J. S. (2011). Optimal Proposal Distributions andAdaptive MCMC. In Handbook of Markov Chain Monte Carlo,chapter 4, pages 93–112. Chapman & Hall.

Sejdinovic, D., Strathmann, H., Lomeli, M. G., Andrieu, C., andGretton, A. (2014). Kernel Adaptive Metropolis-Hastings. InInternational Conference on Machine Learning (ICML), pages1665–1673.

35 / 36

Page 36: Kernel adaptive Sequential Monte Carlo · 11/3/2015  · Applied problem: infer locations of S = 3 sensors in a sensor network measuring distance to each other Known position for

Outline Introduction Kernel Adaptive SMC (KASS) Implementation Details Evaluation Conclusion References

Literature III

Tran, M.-N., Scharth, M., Pitt, M. K., and Kohn, R. (2013).Importance sampling squared for Bayesian inference in latentvariable models. pages 1–39.

36 / 36