Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas...

42
Vecchia-Laplace Approximations for Generalized Gaussian Processes Matthias Katzfuss Department of Statistics Texas A&M University Joint work with Daniel Zilber September 28, 2018 Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 1 / 27

Transcript of Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas...

Page 1: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace Approximations for GeneralizedGaussian Processes

Matthias Katzfuss

Department of StatisticsTexas A&M University

Joint work with Daniel Zilber

September 28, 2018

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 1 / 27

Page 2: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Outline

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27

Page 3: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 3 / 27

Page 4: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Function estimation

Consider a function , observed incompletely , and with noise/error

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 4 / 27

Page 5: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Function estimation

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

Consider a function , observed incompletely , and with noise/error

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 4 / 27

Page 6: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Function estimation

●●

●●

●●

●●

●●

Consider a function , observed incompletely , and with noise/error

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 4 / 27

Page 7: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Gaussian processes (GPs): Probabilistic function estimators

●●

●●

●●

●●

●●

GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 5 / 27

Page 8: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Gaussian processes (GPs): Probabilistic function estimators

●●

●●

●●

●●

●●

GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 5 / 27

Page 9: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Gaussian processes (GPs): Probabilistic function estimators

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

GPs provide an optimal function estimate under the assumption of aninfinite-dimensional normal distributionand quantify uncertainty in the form of a joint probability distribution

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 5 / 27

Page 10: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Challenge: GPs are not scalable

For n data points, need to work with n × n covariance matrix→ Direct inference has O(n3) time and O(n2) memory complexity

0 20 40 60 80 100

010

2030

4050

n (thousands)

hour

scubic

Want methods/approximations that scale linearly in nMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 6 / 27

Page 11: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Challenge: GPs are not scalable

For n data points, need to work with n × n covariance matrix→ Direct inference has O(n3) time and O(n2) memory complexity

0 20 40 60 80 100

010

2030

4050

n (thousands)

hour

scubiclinear

Want methods/approximations that scale linearly in nMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 6 / 27

Page 12: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Gaussian processes

Existing approaches for computational feasibility

Let Σ be the n × n data covariance matrix.

Existing approaches include:

• Sparse Σ (e.g., Furrer et al., 2006; Kaufman et al., 2008)

• Sparse Σ−1 (e.g., Rue and Held, 2005; Lindgren et al., 2011; Nychkaet al., 2015)

• Low-rank Σ (e.g., Higdon, 1998; Wikle and Cressie, 1999;Quinonero-Candela and Rasmussen, 2005; Banerjee et al., 2008;Cressie and Johannesson, 2008)

• Sparse Cholesky factor of Σ−1: Vecchia

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 7 / 27

Page 13: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 8 / 27

Page 14: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 9 / 27

Page 15: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

Vecchia approximation

Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as

f (x) =N∏i=1

f (xi |x1:i−1),

where x1:i−1 := (x1, . . . , xi−1).

This factorization motivates the Vecchia (1988) approximation:

f (x) =N∏i=1

f (xi |xg(i)),

where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.

If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 10 / 27

Page 16: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

Vecchia approximation

Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as

f (x) =N∏i=1

f (xi |x1:i−1),

where x1:i−1 := (x1, . . . , xi−1).

This factorization motivates the Vecchia (1988) approximation:

f (x) =N∏i=1

f (xi |xg(i)),

where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.

If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 10 / 27

Page 17: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

Vecchia approximation

Assume x = (x1, . . . , xN) is multivariate Gaussian. Density function can befactorized as

f (x) =N∏i=1

f (xi |x1:i−1),

where x1:i−1 := (x1, . . . , xi−1).

This factorization motivates the Vecchia (1988) approximation:

f (x) =N∏i=1

f (xi |xg(i)),

where g(i) ⊂ {1, . . . , i − 1} is the conditioning set of size |g(i)| ≤ m.

If screening effect holds, can get good approximation for m� N, whichcan lead to enormous computational savings.

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 10 / 27

Page 18: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

General Vecchia framework (Katzfuss and Guinness, 2017)

If screening is weak for data/responses, might be able to includeunobserved (latent) variables in x for stronger screening effect.

Three choices for general Vecchia:

1. Which variables to include in x?

2. How to order the variables in x?

3. How to choose the conditioning sets g(i)?

Choices can result in tremendous differences in terms of approximationaccuracy and computational speed.

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 11 / 27

Page 19: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

Special cases

• Standard/response Vecchia: Noisy responses in x (e.g., Vecchia,1988; Stein et al., 2004)

• Nearest-neighbor GP: Latent GP in x (Datta et al., 2016)

• Multi-resolution approximation (Katzfuss, 2017):Ordering/conditioning based on iterative domain partitioning

• Special cases: Modified predictive process (Finley et al., 2009),FSA-block (Snelson and Ghahramani, 2007; Sang et al., 2011)

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 12 / 27

Page 20: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Overview and basic idea

Ordering and conditioning

Coordinate

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

●●

●●●

●●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

Max-min distance

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ●

MRA

● ●

● ●

● ● ● ●

● ● ● ●

● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ●

● ●

● ●

● ●

● ●

● ●

● ● ● ●

● ● ● ●

● ● ● ●

● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ●

● ● ● ●

● ● ● ●

• Max-min distance ordering can be much more accurate thancoordinate ordering (Guinness, 2018)

• Conditioning usually on m nearest previously ordered locations, butmore complicated schemes possible

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 13 / 27

Page 21: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 14 / 27

Page 22: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

Response Vecchia (Vecchia, 1988)

Vecchia approximation is applied to data/responses directly

Exact GP vs. response Vecchia with m = 4

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27

Page 23: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

Response Vecchia (Vecchia, 1988)

Vecchia approximation is applied to data/responses directly

Exact GP vs. response Vecchia with m = 4

●●

●●

●● ●

●●

●●●

●●

●●

●●

●●

●●●

●●

Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27

Page 24: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

Response Vecchia (Vecchia, 1988)

Vecchia approximation is applied to data/responses directly

Exact GP vs. response Vecchia with m = 4

●●

●●

●●

●●

●●

Works well for data without noiseWorks very poorly if data are noisyMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 15 / 27

Page 25: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

General Vecchia (Katzfuss and Guinness, 2017)

x consists of noisy data and (unknown/latent) GP realizations

●●

●●

●●

●●

●●

Works well even if data are noisy (m = 4)

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 16 / 27

Page 26: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

Specific methods

• Sparse general Vecchia (SGV) approximations to the likelihood: xcontains noisy data and latent GP realizations (Katzfuss andGuinness, 2017)

• SGV predictions: x contains noisy data, latent GP realizations, andGP at prediction locations (Katzfuss et al., 2018)

• Multi-level Vecchia: Automatically tailors different approximations todifferent scales (Zhang & Katzfuss, in prep.)

Crucial details:

• Include variables and order them such that strong screening effectholds

• Integrate out latent/nuisance variables• Study sparsity and guarantee linear scaling using directed acyclic

graph (DAG) resultsMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 17 / 27

Page 27: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

The general Vecchia framework Gaussian noise

Specific methods

• Sparse general Vecchia (SGV) approximations to the likelihood: xcontains noisy data and latent GP realizations (Katzfuss andGuinness, 2017)

• SGV predictions: x contains noisy data, latent GP realizations, andGP at prediction locations (Katzfuss et al., 2018)

• Multi-level Vecchia: Automatically tailors different approximations todifferent scales (Zhang & Katzfuss, in prep.)

Crucial details:

• Include variables and order them such that strong screening effectholds

• Integrate out latent/nuisance variables• Study sparsity and guarantee linear scaling using directed acyclic

graph (DAG) resultsMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 17 / 27

Page 28: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 18 / 27

Page 29: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Non-Gaussian spatial data: Generalized GP (GGP)

Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .

Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from

Bernoulli distribution

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 19 / 27

Page 30: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Non-Gaussian spatial data: Generalized GP (GGP)

Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .

Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from

Bernoulli distribution

−2

−1

01

2

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 19 / 27

Page 31: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Non-Gaussian spatial data: Generalized GP (GGP)

Conditional on GP, data are independent from exponential family:binary, categorical, counts, . . .

Example: Binary classification using logistic GGP:• Take GP function• Transform into probability using logistic link, then draw from

Bernoulli distribution

−2

−1

01

2

0.0

0.2

0.4

0.6

0.8

1.0

●●

● ●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 19 / 27

Page 32: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Log-Gaussian Cox process (LGCP) as a GGP

• Poisson process with random intensity function λ(·) = ey(·), wherey(·) ∼ GP(µ,C )

• Approximation as GGP with Poisson likelihood:• Partition domain into grid cells A1, . . . ,An with center points a1, . . . , an• Data zi = z(Ai ): number of observed points in Ai

• Then, z1, . . . , zn|y(·) ind.∼ Poisson(µ(Ai )), where µ(Ai ) ≈ |Ai | ey(ai )

Latent

−3

−2

−1

0

1

2

●●

●●

●●

●●

●●

Observed Down−Sampled

12

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 20 / 27

Page 33: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Laplace for non-Gaussian data

For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).

Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data

0.0

0.2

0.4

0.6

0.8

1.0

●●

● ●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

But still O(n3) → infeasible for large n

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 21 / 27

Page 34: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Laplace for non-Gaussian data

For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).

Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data

0.0

0.2

0.4

0.6

0.8

1.0

●●

● ●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

−2

−1

01

23

●●

● ●

●●

●●●

●●●

●●

●●

●●●●●

● ●

●●●●●●

●●

●●●

●●

But still O(n3) → infeasible for large n

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 21 / 27

Page 35: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Laplace for non-Gaussian data

For generalized GP, posterior is intractable → 2nd-order Taylor expansionat the mode (Laplace approximation).

Laplace algorithm: Iterative GP prediction using Gaussian pseudo-data

0.0

0.2

0.4

0.6

0.8

1.0

●●

● ●

● ●

●●

●●● ●●●●●

●●

● ●

●●●●●

● ●

●●●●●●●

● ● ●●●

●●

−2

−1

01

23

●●

● ●

●●

●●●

●●●

●●

●●

●●●●●

● ●

●●●●●●

●●

●●●

●●

But still O(n3) → infeasible for large n

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 21 / 27

Page 36: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Vecchia-Laplace (Katzfuss & Zilber, in prep.)

Based on pseudo-data t, apply Vecchia to x = (t1, . . . , tn, y1, . . . , yn)

Comparison of MSE relative to Laplace:

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 22 / 27

Page 37: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Vecchia-Laplace (Katzfuss & Zilber, in prep.)

Based on pseudo-data t, apply Vecchia to x = (t1, . . . , tn, y1, . . . , yn)

Comparison of MSE relative to Laplace:

0 5 10 15 20 25 30

1.00

1.05

1.10

1.15

1.20

ν = 0.5 (Logistic)

Conditioning Set Size

MS

E

● ●

0 5 10 15 20 25 30

1.00

1.05

1.10

1.15

1.20

ν = 0.5 (Poisson)

Conditioning Set Size

MS

E

●● ●

0 5 10 15 20 25 30

1.00

1.05

1.10

1.15

1.20

1.25

ν = 0.5 (Gamma)

Conditioning Set Size

MS

E

●● ●

● VL Laplace Low Rank

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 22 / 27

Page 38: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Comparison for simulated LGCP

Matern covariance with range 2.5 and smoothness ν on domainD = [0, 50]2, discretized to n = 2,500 = 50× 50 grid

RMSE

ν = 0.5 ν = 1.5

0 20 40 60 0 20 40 60

1.2

1.5

1.8

1.00

1.05

1.10

1.15

1.20

1.25

m

RM

SE

Algorithm Laplace LowRank VL

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 23 / 27

Page 39: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Comparison for simulated LGCP

Matern covariance with range 2.5 and smoothness ν on domainD = [0, 50]2, discretized to n = 2,500 = 50× 50 grid

KL divergence

ν = 0.5 ν = 1.5

0 20 40 60 0 20 40 60

0

1000

2000

3000

0

200

400

600

800

m

LS

Algorithm Laplace LowRank VLMatthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 24 / 27

Page 40: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Vecchia-Laplace approximations for generalized GPs

Vecchia-Laplace algorithm

• Complexity is linear in n

• Typically converges in < 10 iterations

• Unknown hyperparameters: Approximate the integrated likelihood

• Predictions at unobserved locations

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 25 / 27

Page 41: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Conclusions

Outline

1 Gaussian processes

2 The general Vecchia frameworkOverview and basic ideaGaussian noise

3 Vecchia-Laplace approximations for generalized GPs

4 Conclusions

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 26 / 27

Page 42: Vecchia-Laplace Approximations for Generalized Gaussian ......4 Conclusions Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 2 / 27 Gaussian processes

Conclusions

Conclusions

• General Vecchia framework for GP approximations:• Accurate• Choice of x, ordering, and conditioning all important• Can guarantee linear scalability• Extension to GGPs and LGCPs

• R package: https://github.com/katzfuss-group/GPvecchia

• Papers:• Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchia

approximations of Gaussian processes. arXiv:1708.06302.• Katzfuss, M., Guinness, J., and Gong, W. (2018). Vecchia

approximations of Gaussian-process predictions. arXiv:1805.03309.• Zilber, D. and Katzfuss, M. (in prep.) A Vecchia-Laplace

approximation for generalized Gaussian processes.

• Partially supported by NSF Grants DMS–1654083 and DMS–1521676

Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussianpredictive process models for large spatial data sets. Journal of theRoyal Statistical Society, Series B, 70(4):825–848.

Cressie, N. and Johannesson, G. (2008). Fixed rank kriging for very largespatial data sets. Journal of the Royal Statistical Society, Series B,70(1):209–226.

Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016).Hierarchical nearest-neighbor Gaussian process models for largegeostatistical datasets. Journal of the American Statistical Association,111(514):800–812.

Finley, A. O., Sang, H., Banerjee, S., and Gelfand, A. E. (2009).Improving the performance of predictive process modeling for largedatasets. Computational Statistics & Data Analysis, 53(8):2873–2884.

Furrer, R., Genton, M. G., and Nychka, D. (2006). Covariance tapering forinterpolation of large spatial datasets. Journal of Computational andGraphical Statistics, 15(3):502–523.

Guinness, J. (2018). Permutation and grouping methods for sharpeningGaussian process approximations. Technometrics.

Higdon, D. (1998). A process-convolution approach to modellingtemperatures in the North Atlantic Ocean. Environmental andEcological Statistics, 5(2):173–190.

Katzfuss, M. (2017). A multi-resolution approximation for massive spatialdatasets. Journal of the American Statistical Association,112(517):201–214.

Katzfuss, M. and Guinness, J. (2017). A general framework for Vecchiaapproximations of Gaussian processes. arXiv:1708.06302.

Katzfuss, M., Guinness, J., and Gong, W. (2018). Vecchia approximationsof Gaussian-process predictions. arXiv:1805.03309.

Kaufman, C. G., Schervish, M. J., and Nychka, D. W. (2008). Covariancetapering for likelihood-based estimation in large spatial data sets.Journal of the American Statistical Association, 103(484):1545–1555.

Lindgren, F., Rue, H., and Lindstrom, J. (2011). An explicit link betweengaussian fields and gaussian markov random fields: the stochastic partialdifferential equation approach. Journal of the Royal Statistical Society:Series B, 73(4):423–498.

Nychka, D. W., Bandyopadhyay, S., Hammerling, D., Lindgren, F., andSain, S. R. (2015). A multi-resolution Gaussian process model for theanalysis of large spatial data sets. Journal of Computational andGraphical Statistics, 24(2):579–599.

Quinonero-Candela, J. and Rasmussen, C. E. (2005). A unifying view ofsparse approximate Gaussian process regression. Journal of MachineLearning Research, 6:1939–1959.

Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theoryand Applications. CRC press.

Sang, H., Jun, M., and Huang, J. Z. (2011). Covariance approximation forlarge multivariate spatial datasets with an application to multipleclimate model errors. Annals of Applied Statistics, 5(4):2519–2548.

Snelson, E. and Ghahramani, Z. (2007). Local and global sparse Gaussianprocess approximations. In Artificial Intelligence and Statistics 11(AISTATS).

Stein, M. L., Chi, Z., and Welty, L. (2004). Approximating likelihoods forlarge spatial data sets. Journal of the Royal Statistical Society: SeriesB, 66(2):275–296.

Vecchia, A. (1988). Estimation and model identification for continuousspatial processes. Journal of the Royal Statistical Society, Series B,50(2):297–312.

Wikle, C. K. and Cressie, N. (1999). A dimension-reduced approach tospace-time Kalman filtering. Biometrika, 86(4):815–829.

Matthias Katzfuss (Texas A&M) Vecchia-Laplace Approximations September 28, 2018 27 / 27