Bayesian computation with INLA

Post on 10-May-2015

775 views 1 download

Tags:

description

Short-course about Bayesian computation with INLA given on the AS2013 conference in Ribno, Slovenia.

Transcript of Bayesian computation with INLA

Bayesian computation using INLA

Thiago G. Martins

Norwegian University of Science and TechnologyTrondheim, Norway

AS 2013, Ribno, Slovenia

September, 2013

1 / 140

Parte I

Latent Gaussian models and INLA

methodology

2 / 140

Outline

Latent Gaussian models

Are latent Gaussian models important?

Bayesian computing

INLA method

3 / 140

Hierarchical Bayesian models

Hierarchical models are an extremely useful tool in Bayesian modelbuilding.

Three parts:

I Observations (y): Encodes information about observed data,including design and collection issues.

I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.

I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.

4 / 140

Hierarchical Bayesian models

Hierarchical models are an extremely useful tool in Bayesian modelbuilding.

Three parts:

I Observations (y): Encodes information about observed data,including design and collection issues.

I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.

I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.

4 / 140

Hierarchical Bayesian models

Hierarchical models are an extremely useful tool in Bayesian modelbuilding.

Three parts:

I Observations (y): Encodes information about observed data,including design and collection issues.

I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.

I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.

4 / 140

Latent Gaussian models

A latent Gaussian model is a Bayesian hierarchical model of thefollowing form

I Observed data y, yi |xi ∼ π(yi |xi ,θ)

I Latent Gaussian field x ∼ N (·,Σ(θ))

I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood

π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I

π(yi |xi ,θ)

5 / 140

Latent Gaussian models

A latent Gaussian model is a Bayesian hierarchical model of thefollowing form

I Observed data y, yi |xi ∼ π(yi |xi ,θ)

I Latent Gaussian field x ∼ N (·,Σ(θ))

I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood

π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I

π(yi |xi ,θ)

5 / 140

Latent Gaussian models

A latent Gaussian model is a Bayesian hierarchical model of thefollowing form

I Observed data y, yi |xi ∼ π(yi |xi ,θ)

I Latent Gaussian field x ∼ N (·,Σ(θ))

I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood

π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I

π(yi |xi ,θ)

5 / 140

Latent Gaussian models

A latent Gaussian model is a Bayesian hierarchical model of thefollowing form

I Observed data y, yi |xi ∼ π(yi |xi ,θ)

I Latent Gaussian field x ∼ N (·,Σ(θ))

I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood

π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I

π(yi |xi ,θ)

5 / 140

Precision matrix

The precision matrix of the latent field

Q(θ) = Σ(θ)−1

plays a key role!

Two issues

I Building models through conditioning (“hierarchical models”)

I Computational benefits

6 / 140

Precision matrix

The precision matrix of the latent field

Q(θ) = Σ(θ)−1

plays a key role!

Two issues

I Building models through conditioning (“hierarchical models”)

I Computational benefits

6 / 140

Building models through conditioning

If

I x ∼ N (0,Q−1x )

I y|x ∼ N (x,Q−1y )

then

Q(x,y) =

[Qx + Qy −Qy

−Qy Qy

]Not so nice expressions using the Covariance-matrix

7 / 140

Computational benefits

I Precision matrices encodes conditional independence:

xi ⊥ xj |x−ij ⇐⇒ Qij = 0

We are interested in models with sparse precision matrices.

I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1

Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)

I Good computational properties through numerical algorithmsfor sparse matrices

8 / 140

Computational benefits

I Precision matrices encodes conditional independence:

xi ⊥ xj |x−ij ⇐⇒ Qij = 0

We are interested in models with sparse precision matrices.

I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1

Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)

I Good computational properties through numerical algorithmsfor sparse matrices

8 / 140

Computational benefits

I Precision matrices encodes conditional independence:

xi ⊥ xj |x−ij ⇐⇒ Qij = 0

We are interested in models with sparse precision matrices.

I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1

Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)

I Good computational properties through numerical algorithmsfor sparse matrices

8 / 140

Numerical algorithms for sparse matrices: scalingproperties

I Time: O(n)

I Space: O(n3/2)

I Space-time: O(n2)

This is to be compared with general O(n3) algorithms for densematrices.

9 / 140

Numerical algorithms for sparse matrices: scalingproperties

I Time: O(n)

I Space: O(n3/2)

I Space-time: O(n2)

This is to be compared with general O(n3) algorithms for densematrices.

9 / 140

Outline

Latent Gaussian models

Are latent Gaussian models important?

Bayesian computing

INLA method

10 / 140

Example (I): Mixed-effect model

yij |ηij ,θ1 ∼ π(yij |ηij ,θ1), i = 1, . . . ,N, j = 1, . . . ,M

ηij = µ+ cijβ + ui + vj + wij

where u, v and w are “random effects”.

If we assign Gaussian priors on µ, β, u and v, then

x|θ2 = (µ, β,u, v,η)|θ2

is jointly Gaussian.

θ = (θ1,θ2)

11 / 140

Example (I): Mixed-effect model

yij |ηij ,θ1 ∼ π(yij |ηij ,θ1), i = 1, . . . ,N, j = 1, . . . ,M

ηij = µ+ cijβ + ui + vj + wij

where u, v and w are “random effects”.

If we assign Gaussian priors on µ, β, u and v, then

x|θ2 = (µ, β,u, v,η)|θ2

is jointly Gaussian.

θ = (θ1,θ2)

11 / 140

Example (I) - cont.

We can reinterpret the model as

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

I dim(x) could be large 102-105

I dim(θ) is small 1-5

12 / 140

Example (I) - cont.

We can reinterpret the model as

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

I dim(x) could be large 102-105

I dim(θ) is small 1-5

12 / 140

Example (I) - cont.

Precision matrix (η,u, v, µ, β) N = 100, M = 5.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

13 / 140

Example (II): Time-series model

Smoothing of binary time-series

I Data is sequence of 0 and 1s

I Probability for a 1 at time t, pt , depends on time

pt =exp(ηt)

1 + exp(ηt)

I Linear predictor

ηt = µ+ βct + ut + vt , t = 1, . . . , n

14 / 140

Example (II): Time-series model

Smoothing of binary time-series

I Data is sequence of 0 and 1s

I Probability for a 1 at time t, pt , depends on time

pt =exp(ηt)

1 + exp(ηt)

I Linear predictor

ηt = µ+ βct + ut + vt , t = 1, . . . , n

14 / 140

Example (II): Time-series model

Smoothing of binary time-series

I Data is sequence of 0 and 1s

I Probability for a 1 at time t, pt , depends on time

pt =exp(ηt)

1 + exp(ηt)

I Linear predictor

ηt = µ+ βct + ut + vt , t = 1, . . . , n

14 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

is jointly Gaussian.

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

is jointly Gaussian.

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

is jointly Gaussian.

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

is jointly Gaussian.

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

is jointly Gaussian.

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

is jointly Gaussian.

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Example (II) - cont.

We can reinterpret the model as

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

I dim(x) could be large 102-105

I dim(θ) is small 1-5

16 / 140

Example (II) - cont.

We can reinterpret the model as

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

I dim(x) could be large 102-105

I dim(θ) is small 1-5

16 / 140

Example (II) - cont.

Precision matrix (η,u, v, µ, β), n = 100.

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

17 / 140

Example (III): Disease mapping

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative riskηi = µ+ ui + vi + f (ci )

I Structured component u

I Unstructured component v

I Smooth effect of a covariate c

−0.63

−0.37

−0.1

0.17

0.44

0.71

0.98

18 / 140

Example (III): Disease mapping

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative riskηi = µ+ ui + vi + f (ci )

I Structured component u

I Unstructured component v

I Smooth effect of a covariate c

−0.63

−0.37

−0.1

0.17

0.44

0.71

0.98

18 / 140

Example (III): Disease mapping

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative riskηi = µ+ ui + vi + f (ci )

I Structured component u

I Unstructured component v

I Smooth effect of a covariate c

−0.63

−0.37

−0.1

0.17

0.44

0.71

0.98

18 / 140

Example (III): Disease mapping

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative riskηi = µ+ ui + vi + f (ci )

I Structured component u

I Unstructured component v

I Smooth effect of a covariate c

−0.63

−0.37

−0.1

0.17

0.44

0.71

0.98

18 / 140

Example (III): Disease mapping

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative riskηi = µ+ ui + vi + f (ci )

I Structured component u

I Unstructured component v

I Smooth effect of a covariate c

−0.63

−0.37

−0.1

0.17

0.44

0.71

0.98

18 / 140

Yet Another Example (III)

We can reinterpret the model as

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

I dim(x) could be large 102-105

I dim(θ) is small 1-5

19 / 140

Example (III) - cont.

Precision matrix (η,u, v, µ, f)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

20 / 140

What we have learned so far

The latent Gaussian model construct

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

occurs in many, seemingly unrelated, statistical models.

GLM/GAM/GLMM/GAMM/++

21 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

I +++

22 / 140

Outline

Latent Gaussian models

Are latent Gaussian models important?

Bayesian computing

INLA method

23 / 140

Bayesian computing

We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).

This requires the evaluation of integrals of the form

π(xi |y) ∝∫x{−i}

∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}

The computation of massively high dimensional integrals is at thecore of Bayesian computing.

24 / 140

Bayesian computing

We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).

This requires the evaluation of integrals of the form

π(xi |y) ∝∫x{−i}

∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}

The computation of massively high dimensional integrals is at thecore of Bayesian computing.

24 / 140

Bayesian computing

We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).

This requires the evaluation of integrals of the form

π(xi |y) ∝∫x{−i}

∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}

The computation of massively high dimensional integrals is at thecore of Bayesian computing.

24 / 140

But surely we can already do this

I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.

I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.

I The issue of Bayesian computing is not “solved” even thoughMCMC is available

I Hierarchical models are more difficult for MCMC

I Strong dependencies, bad mixing.

I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”

25 / 140

But surely we can already do this

I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.

I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.

I The issue of Bayesian computing is not “solved” even thoughMCMC is available

I Hierarchical models are more difficult for MCMC

I Strong dependencies, bad mixing.

I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”

25 / 140

But surely we can already do this

I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.

I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.

I The issue of Bayesian computing is not “solved” even thoughMCMC is available

I Hierarchical models are more difficult for MCMC

I Strong dependencies, bad mixing.

I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”

25 / 140

So what’s wrong with MCMC?

This is actually a problem with any Monte-Carlo scheme.

Error in expectations

The Monte-Carlo error is

Var

(E(f (X ))− 1

N

N∑i=1

f (xi )

)= O

(1√N

)

In practical terms, to reduce the variance to O(10−p) you needO(102p) samples!

This can be optimistic!

26 / 140

Be more narrow

MCMC

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

INLA

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Be more narrow

MCMC

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

INLA

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Be more narrow

MCMC

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

INLA

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Be more narrow

MCMC

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

INLA

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Be more narrow

MCMC

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

INLA

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Be more narrow

MCMC

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

INLA

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Comparing results with MCMC

I When comparing the results of R-INLA with MCMC, it isimportant to use the same model.

I Here we have compared the EPIL example results with thoseobtained using JAGS via the rjags package

28 / 140

Comparing results with MCMC

I When comparing the results of R-INLA with MCMC, it isimportant to use the same model.

I Here we have compared the EPIL example results with thoseobtained using JAGS via the rjags package

28 / 140

Intercept, 0.125 minutes

a0

Density

1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

Density

−0.5 0.0 0.5 1.0 1.5

0.0

0.5

1.0

1.5

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5

0.0

0.5

1.0

1.5

29 / 140

Intercept, 0.25 minutes

a0

Density

1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

6

Age

alpha.Age

Density

−0.5 0.0 0.5 1.0 1.5

0.0

0.4

0.8

1.2

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 0.5 minutes

a0

De

nsity

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

De

nsity

−0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.4

0.8

1.2

log(tau.Ind)

log(tau.b1)

De

nsity

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

De

nsity

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 1 minutes

a0

De

nsity

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

De

nsity

−0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.4

0.8

1.2

log(tau.Ind)

log(tau.b1)

De

nsity

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

De

nsity

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 2 minutes

a0

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

Density

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

0.0

0.4

0.8

1.2

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 4 minutes

a0

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

Density

−1.0 0.0 0.5 1.0 1.5 2.0

0.0

0.4

0.8

1.2

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 8 minutes

a0

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

Density

−1.0 0.0 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 16 minutes

a0

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

Density

−1.5 −0.5 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.4

0.8

1.2

log(tau.Rand)

log(tau.b)

Density

1.0 1.5 2.0 2.5 3.0

0.0

0.5

1.0

1.5

29 / 140

Intercept, 32 minutes

a0

De

nsity

1.3 1.4 1.5 1.6 1.7 1.8 1.9

01

23

45

Age

alpha.Age

De

nsity

−1.5 −0.5 0.5 1.0 1.5 2.0

0.0

0.2

0.4

0.6

0.8

1.0

log(tau.Ind)

log(tau.b1)

De

nsity

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.4

0.8

1.2

log(tau.Rand)

log(tau.b)

De

nsity

1.0 1.5 2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

29 / 140

Intercept, 64 minutes

a0

De

nsity

1.2 1.4 1.6 1.8

01

23

45

Age

alpha.Age

De

nsity

−1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

log(tau.Ind)

log(tau.b1)

De

nsity

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.4

0.8

1.2

log(tau.Rand)

log(tau.b)

De

nsity

1.0 1.5 2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

29 / 140

Intercept, 120 minutes

a0

Density

1.2 1.4 1.6 1.8 2.0

01

23

45

Age

alpha.Age

Density

−1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

log(tau.Ind)

log(tau.b1)

Density

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.4

0.8

1.2

log(tau.Rand)

log(tau.b)

Density

1.0 1.5 2.0 2.5 3.0 3.5

0.0

0.5

1.0

1.5

29 / 140

Outline

Latent Gaussian models

Are latent Gaussian models important?

Bayesian computing

INLA method

30 / 140

Main aim

Posteriorπ(x,θ|y) ∝ π(θ) π(x|θ)

∏i∈I

π(yi |xi ,θ)

Compute the posterior marginals:

π(xi |y) =

∫π(θ|y) π(xi |θ, y) dθ

π(θj |y) =

∫π(θ|y) dθ−j

31 / 140

Main aim

Posteriorπ(x,θ|y) ∝ π(θ) π(x|θ)

∏i∈I

π(yi |xi ,θ)

Compute the posterior marginals:

π(xi |y) =

∫π(θ|y) π(xi |θ, y) dθ

π(θj |y) =

∫π(θ|y) dθ−j

31 / 140

Tasks

1. Build an approximation to π(θ|y): π̃(θ |y)

2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)

π(xi |y) =

∫π(θ|y) π(xi |θ, y) dθ

π(θj |y) =

∫π(θ|y) dθ−j

3. Do the integration wrt θ numerically.

32 / 140

Tasks

1. Build an approximation to π(θ|y): π̃(θ |y)

2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)

π(xi |y) =

∫π(θ|y) π(xi |θ, y) dθ

π(θj |y) =

∫π(θ|y) dθ−j

3. Do the integration wrt θ numerically.

32 / 140

Tasks

1. Build an approximation to π(θ|y): π̃(θ |y)

2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)

π(xi |y) =

∫π(θ|y) π(xi |θ, y) dθ

π(θj |y) =

∫π(θ|y) dθ−j

3. Do the integration wrt θ numerically.

32 / 140

Tasks

1. Build an approximation to π(θ|y): π̃(θ |y)

2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)

π̃(xi |y) =

∫π̃(θ|y) π̃(xi |θ, y) dθ

π̃(θj |y) =

∫π̃(θ|y) dθ−j

3. Do the integration wrt θ numerically.

32 / 140

Tasks

1. Build an approximation to π(θ|y): π̃(θ |y)

2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)

π̃(xi |y) =

∫π̃(θ|y) π̃(xi |θ, y) dθ

π̃(θj |y) =

∫π̃(θ|y) dθ−j

3. Do the integration wrt θ numerically.

32 / 140

Task 1: π̃(θ|y)

The Laplace approximation for π(θ|y) is

π(θ|y) =π(x,θ|y)

π(x|θ, y)

∝ π(θ) π(x|θ) π(y|x)

π(x|θ, y)

≈ π(θ) π(x|θ) π(y|x,θ)

πG (x|θ, y)

∣∣∣∣∣x=x∗(θ)

where πG (x|θ, y) is the Gaussian approximation of π(x|θ, y) andx∗(θ) is the mode.

33 / 140

The GMRF-approximation

π(x|y) ∝ exp

(−1

2xTQx +

∑i

log π(yi |xi )

)

≈ exp

(−1

2(x− µ)T (Q + diag(ci ))(x− µ)

)= π̃(x|y)

Constructed as follows:

I Locate the mode x∗

I Expand to second order

Markov and computational properties are preserved

34 / 140

Remarks

The Laplace approximation

π̃(θ|y)

turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as

I x is a priori Gaussian.

I y is typically not very informative.

I Observational model is usually ‘well-behaved’.

Note: π̃(θ|y) itself does not look Gaussian!

35 / 140

Remarks

The Laplace approximation

π̃(θ|y)

turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as

I x is a priori Gaussian.

I y is typically not very informative.

I Observational model is usually ‘well-behaved’.

Note: π̃(θ|y) itself does not look Gaussian!

35 / 140

Remarks

The Laplace approximation

π̃(θ|y)

turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as

I x is a priori Gaussian.

I y is typically not very informative.

I Observational model is usually ‘well-behaved’.

Note: π̃(θ|y) itself does not look Gaussian!

35 / 140

Remarks

The Laplace approximation

π̃(θ|y)

turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as

I x is a priori Gaussian.

I y is typically not very informative.

I Observational model is usually ‘well-behaved’.

Note: π̃(θ|y) itself does not look Gaussian!

35 / 140

Remarks

The Laplace approximation

π̃(θ|y)

turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as

I x is a priori Gaussian.

I y is typically not very informative.

I Observational model is usually ‘well-behaved’.

Note: π̃(θ|y) itself does not look Gaussian!

35 / 140

Task 2: π̃(xi |y,θ)

This task is more challenging, since

I dimension of x, n is large

I and there are potential n marginals to compute, or at leastO(n).

Here we present three options:

1. Gaussian approximation

2. Laplace approximation

3. Simplified Laplace approximation

There is a trade-off between accuracy and complexity.

36 / 140

Task 2: π̃(xi |y,θ)

This task is more challenging, since

I dimension of x, n is large

I and there are potential n marginals to compute, or at leastO(n).

Here we present three options:

1. Gaussian approximation

2. Laplace approximation

3. Simplified Laplace approximation

There is a trade-off between accuracy and complexity.

36 / 140

π̃(xi |y,θ) - 1. Gaussian approximation

An obvious simple and fast alternative, is to use theGMRF-approximation πG (x|y,θ)

π̃(xi |θ, y) = N (xi ; µ(θ), σ2(θ))

I It is the fastest option, only need to compute the diagonal ofQ(θ)−1.

I Can present errors in location and asymmetry.

37 / 140

π̃(xi |y,θ) - 1. Gaussian approximation

An obvious simple and fast alternative, is to use theGMRF-approximation πG (x|y,θ)

π̃(xi |θ, y) = N (xi ; µ(θ), σ2(θ))

I It is the fastest option, only need to compute the diagonal ofQ(θ)−1.

I Can present errors in location and asymmetry.

37 / 140

π̃(xi |y,θ) - 2. Laplace approximation

I The Laplace approximation:

π̃(xi | y,θ) ≈ π(x,θ|y)

πGG (x−i |xi , y,θ)

∣∣∣∣∣x−i=x∗−i (xi ,θ)

I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,

I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.

38 / 140

π̃(xi |y,θ) - 2. Laplace approximation

I The Laplace approximation:

π̃(xi | y,θ) ≈ π(x,θ|y)

πGG (x−i |xi , y,θ)

∣∣∣∣∣x−i=x∗−i (xi ,θ)

I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,

I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.

38 / 140

π̃(xi |y,θ) - 2. Laplace approximation

I The Laplace approximation:

π̃(xi | y,θ) ≈ π(x,θ|y)

πGG (x−i |xi , y,θ)

∣∣∣∣∣x−i=x∗−i (xi ,θ)

I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,

I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.

38 / 140

π̃(xi |y,θ) - 3. Simplified Laplace approximation

Taylor expansions of the LA for π(xi |θ, y):

I computational much faster

I correct the Gaussian approximation for error in shift andskewness

log π̃(xi |θ, y) = −1

2x2i + bxi +

1

6d x3

i + · · ·

I Fit a skew-Normal density

2φ(x)Φ(ax)

I sufficiently accurate for most applications

39 / 140

π̃(xi |y,θ) - 3. Simplified Laplace approximation

Taylor expansions of the LA for π(xi |θ, y):

I computational much faster

I correct the Gaussian approximation for error in shift andskewness

log π̃(xi |θ, y) = −1

2x2i + bxi +

1

6d x3

i + · · ·

I Fit a skew-Normal density

2φ(x)Φ(ax)

I sufficiently accurate for most applications

39 / 140

π̃(xi |y,θ) - 3. Simplified Laplace approximation

Taylor expansions of the LA for π(xi |θ, y):

I computational much faster

I correct the Gaussian approximation for error in shift andskewness

log π̃(xi |θ, y) = −1

2x2i + bxi +

1

6d x3

i + · · ·

I Fit a skew-Normal density

2φ(x)Φ(ax)

I sufficiently accurate for most applications

39 / 140

π̃(xi |y,θ) - 3. Simplified Laplace approximation

Taylor expansions of the LA for π(xi |θ, y):

I computational much faster

I correct the Gaussian approximation for error in shift andskewness

log π̃(xi |θ, y) = −1

2x2i + bxi +

1

6d x3

i + · · ·

I Fit a skew-Normal density

2φ(x)Φ(ax)

I sufficiently accurate for most applications

39 / 140

Task 3: Numerical integration wrt θ

Now that we know how to compute:

I π̃(θ|y) - Laplace approximation

I π̃(xi |θ, y) -1. Gaussian2. Laplace3. Simplified Laplace

Lets see how INLA works

40 / 140

Task 3: Numerical integration wrt θ

Now that we know how to compute:

I π̃(θ|y) - Laplace approximation

I π̃(xi |θ, y) -1. Gaussian2. Laplace3. Simplified Laplace

Lets see how INLA works

40 / 140

The integrated nested Laplace approximation (INLA) I

Explore π̃(θ|y)

I Locate the modeI Use the Hessian to construct new variablesI Grid-search

41 / 140

The integrated nested Laplace approximation (INLA) I

Explore π̃(θ|y)

I Locate the modeI Use the Hessian to construct new variablesI Grid-search

41 / 140

The integrated nested Laplace approximation (INLA) I

Explore π̃(θ|y)

I Locate the modeI Use the Hessian to construct new variablesI Grid-search

41 / 140

The integrated nested Laplace approximation (INLA) I

Explore π̃(θ|y)

I Locate the modeI Use the Hessian to construct new variablesI Grid-search

41 / 140

The integrated nested Laplace approximation (INLA) II

Step II For each θj

I For each i , evaluate the Laplace approximationfor selected values of xi

I Build a Skew-Normal or log-spline correctedGaussian

N (xi ; µi , σ2i )× exp(spline)

to represent the conditional marginal density.

42 / 140

The integrated nested Laplace approximation (INLA) II

Step II For each θj

I For each i , evaluate the Laplace approximationfor selected values of xi

I Build a Skew-Normal or log-spline correctedGaussian

N (xi ; µi , σ2i )× exp(spline)

to represent the conditional marginal density.

42 / 140

The integrated nested Laplace approximation (INLA) II

Step II For each θj

I For each i , evaluate the Laplace approximationfor selected values of xi

I Build a Skew-Normal or log-spline correctedGaussian

N (xi ; µi , σ2i )× exp(spline)

to represent the conditional marginal density.

42 / 140

The integrated nested Laplace approximation (INLA) III

Step III Sum out θj

I For each i , sum out θ

π̃(xi | y) ∝∑j

π̃(xi | y,θj)× π̃(θj | y)

I Build a log-spline corrected Gaussian

N (xi ; µi , σ2i )× exp(spline)

to represent π̃(xi | y).

43 / 140

The integrated nested Laplace approximation (INLA) III

Step III Sum out θj

I For each i , sum out θ

π̃(xi | y) ∝∑j

π̃(xi | y,θj)× π̃(θj | y)

I Build a log-spline corrected Gaussian

N (xi ; µi , σ2i )× exp(spline)

to represent π̃(xi | y).

43 / 140

The integrated nested Laplace approximation (INLA) III

Step III Sum out θj

I For each i , sum out θ

π̃(xi | y) ∝∑j

π̃(xi | y,θj)× π̃(θj | y)

I Build a log-spline corrected Gaussian

N (xi ; µi , σ2i )× exp(spline)

to represent π̃(xi | y).

43 / 140

Computing posterior marginals for θj (I)

Main idea

I Use the integration-points and build an interpolant

I Use numerical integration on that interpolant

44 / 140

Computing posterior marginals for θj (I)

Main idea

I Use the integration-points and build an interpolant

I Use numerical integration on that interpolant

44 / 140

How can we assess the error in the approximations?

Tool 1: Compare a sequence of improved approximations

1. Gaussian approximation

2. Simplified Laplace

3. Laplace

45 / 140

How can we assess the error in the approximations?

Tool 2: Estimate the “effective” number of parameters as definedin the Deviance Information Criteria:

pD(θ) = D(x;θ)− D(x;θ)

and compare this with the number of observations.

Low ratio is good.

This criteria has theoretical justification.

46 / 140

Parte II

R-INLA package

47 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

48 / 140

Implementing INLA

All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.

I The GMRFLib-library

I The inla-program

I The INLA package for R

Happily, the R package is all we need to learn!!!

49 / 140

Implementing INLA

All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.

I The GMRFLib-libraryI Basic library written in C for fast computations for GMRFs.

I The inla-program

I The INLA package for R

Happily, the R package is all we need to learn!!!

49 / 140

Implementing INLA

All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.

I The GMRFLib-library

I The inla-program

I Define latent Gaussian models and interface with theGMRFLib-library

I Models are defined using .ini-filesI inla-program write all the results (E/Var/marginals) to files

I The INLA package for R

Happily, the R package is all we need to learn!!!

49 / 140

Implementing INLA

All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.

I The GMRFLib-library

I The inla-program

I The INLA package for R

I R-interface to the inla-program. (That’s why its not onCRAN.)

I Convert “formula”-statements into “.ini”-file definitionsI Run inla-programI Get results back to R

Happily, the R package is all we need to learn!!!

49 / 140

Implementing INLA

All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.

I The GMRFLib-library

I The inla-program

I The INLA package for R

Happily, the R package is all we need to learn!!!

49 / 140

The INLA package for R

Data Frame

formula

− ini file

− Input files

Produces:

1.

2.

3.

inla

program

INLA

package

Collects results

Input

ARuns the

R

of type list

object

Output

plots etc.can get summary,

50 / 140

R-INLA

I Visit the www-site

www.r-inla.org

and follow the instructions.

I www-site contains source-code, examples, reports +++

I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")

Later, you can upgrade the package doing> inla.upgrade()

or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)

I Available for Linux, Windows and Mac

51 / 140

R-INLA

I Visit the www-site

www.r-inla.org

and follow the instructions.

I www-site contains source-code, examples, reports +++

I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")

Later, you can upgrade the package doing> inla.upgrade()

or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)

I Available for Linux, Windows and Mac

51 / 140

R-INLA

I Visit the www-site

www.r-inla.org

and follow the instructions.

I www-site contains source-code, examples, reports +++

I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")

Later, you can upgrade the package doing> inla.upgrade()

or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)

I Available for Linux, Windows and Mac

51 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

52 / 140

The structure of an R program using INLA

There are essentially three parts to an INLA program:

1. The data organization.

2. The formula - notation inherited from R’s native glm function.

3. The call to the INLA program.

53 / 140

The inla function

I This is all that’s needed for a basic call

> result <- inla(

formula = y ~ 1 + x, # This describes your latent

# field

family = "gaussian", # The likelihood distribution.

data = data.frame(y,x) # A list or dataframe

)

54 / 140

The simplest case: Linear regression

n = 100

x = sort(runif(n))

y = 1 + x + rnorm(n, sd = 0.1)

plot(x,y)

formula = y ~ 1 + x

result = inla(formula,

data = data.frame(x,y),

family = "gaussian")

summary(result)

plot(result)

55 / 140

Call:

c("inla(formula = formula, family = \"gaussian\", data = data.frame(x, ", " y))")

Time used:

Pre-processing Running inla Post-processing Total

0.08050394 0.03020334 0.01916695 0.12987423

Fixed effects:

mean sd 0.025quant 0.5quant 0.975quant kld

(Intercept) 0.9690533 0.01849785 0.9327319 0.9690531 1.005387 0

x 1.0426582 0.03126996 0.9812582 1.0426580 1.104079 0

The model has no random effects

Model hyperparameters:

mean sd 0.025quant 0.5quant

Precision for the Gaussian observations 127.45 18.10 95.14 126.37

0.975quant

Precision for the Gaussian observations 166.11

Expected number of effective parameters(std dev): 2.209(0.02362)

Number of equivalent replicates : 45.27

Marginal Likelihood: 88.01

56 / 140

Likelihood functions - family argument

result = inla(formula,

data = data.frame(x,y),

family = "gaussian")

I “binomial”

I “coxph”

I “Exponential”

I “gaussian”

I “gev”

I “laplace”

I “sn”(Skew Normal)

I “stochvol”, ”stochvol.nig”, ”stochvol.t”

I “T”

I “weibull”

I Many others: go to http://r-inla.org/

57 / 140

Likelihood functions - family argument

result = inla(formula,

data = data.frame(x,y),

family = "gaussian")

I “binomial”

I “coxph”

I “Exponential”

I “gaussian”

I “gev”

I “laplace”

I “sn”(Skew Normal)

I “stochvol”, ”stochvol.nig”, ”stochvol.t”

I “T”

I “weibull”

I Many others: go to http://r-inla.org/

57 / 140

A more general model

Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

where

x1, x2 are covariates, linear effect

βi ∼ N (0, τ−11 )

x3 can be the index for spatial effect, random effect, etc

{f1, f2, . . . } ∼ N (0,Q−1f (τ2))

58 / 140

A more general model

Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

where

x1, x2 are covariates, linear effect

βi ∼ N (0, τ−11 )

x3 can be the index for spatial effect, random effect, etc

{f1, f2, . . . } ∼ N (0,Q−1f (τ2))

58 / 140

A more general model (cont.)Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

> formula = y ∼ x1 + x2 + f(x3, ...)

y =

y1

y2...

yn

g−→ η =

η1

η2...ηn

η =

η1

η2...ηn

= β0 ∗

11...1

+ β1 ∗

x11

x12...

x1n

+ β2 ∗

x21

x22...

x2n

+

fx31

fx32

...fx3n

59 / 140

A more general model (cont.)Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

> formula = y ∼ x1 + x2 + f(x3, ...)

y =

y1

y2...

yn

g−→ η =

η1

η2...ηn

η =

η1

η2...ηn

= β0 ∗

11...1

+ β1 ∗

x11

x12...

x1n

+ β2 ∗

x21

x22...

x2n

+

fx31

fx32

...fx3n

59 / 140

A more general model (cont.)Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

> formula = y ∼ x1 + x2 + f(x3, ...)

y =

y1

y2...

yn

g−→ η =

η1

η2...ηn

η =

η1

η2...ηn

= β0 ∗

11...1

+ β1 ∗

x11

x12...

x1n

+ β2 ∗

x21

x22...

x2n

+

fx31

fx32

...fx3n

59 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

62 / 140

EPIL example

Seizure counts in a randomized trial of anti-convulsant therapy inepilepsy. From WinBUGS manual.

Patient y1 y2 y3 y4 Trt Base Age

1 5 3 3 3 0 11 312 3 5 3 3 0 11 30

....59 1 4 3 2 1 12 37

63 / 140

EPIL example (cont.)

I Mixed model with repeated Poisson counts

yjk ∼ Poisson(µjk); j = 1, . . . , 59; k = 1, . . . , 4

log(µjk) = α0 + α1 log(Basej/4) + α2Trtj+α3Trtj log(Basej/4) + α4Agej + α5V 4+Indj + βjk

αi ∼ N (0, τα) τα knownIndj ∼ N (0, τInd) τInd ∼ Gamma(a1, b1)βjk ∼ N (0, τβ) τβ ∼ Gamma(a2, b2)

64 / 140

EPIL example (cont.)The Epil data frame:

y Trt Base Age V4 rand Ind

5 0 11 31 0 1 1

3 0 11 31 0 2 1...

Specifying the model:

formula = y ∼ log(Base/4) + Trt + I(Trt *

log(Base/4)) + log(Age) + V4 +

f(Ind, model = "iid") + f(rand, model="iid")

η =

η1

η2...

η4∗59

= β0 ∗

11...1

+ . . .+

f Ind1

f Ind1...

f Ind59

+

f Rand1

f Rand2

...f Ind4∗59

65 / 140

EPIL example (cont.)The Epil data frame:

y Trt Base Age V4 rand Ind

5 0 11 31 0 1 1

3 0 11 31 0 2 1...

Specifying the model:

formula = y ∼ log(Base/4) + Trt + I(Trt *

log(Base/4)) + log(Age) + V4 +

f(Ind, model = "iid") + f(rand, model="iid")

η =

η1

η2...

η4∗59

= β0 ∗

11...1

+ . . .+

f Ind1

f Ind1...

f Ind59

+

f Rand1

f Rand2

...f Ind4∗59

65 / 140

EPIL example (cont.)The Epil data frame:

y Trt Base Age V4 rand Ind

5 0 11 31 0 1 1

3 0 11 31 0 2 1...

Specifying the model:

formula = y ∼ log(Base/4) + Trt + I(Trt *

log(Base/4)) + log(Age) + V4 +

f(Ind, model = "iid") + f(rand, model="iid")

η =

η1

η2...

η4∗59

= β0 ∗

11...1

+ . . .+

f Ind1

f Ind1...

f Ind59

+

f Rand1

f Rand2

...f Ind4∗59

65 / 140

data(Epil)

my.center = function(x) (x - mean(x))

Epil$CTrt = my.center(Epil$Trt)

Epil$ClBase4 = my.center(log(Epil$Base/4))

Epil$CV4 = my.center(Epil$V4)

Epil$ClAge = my.center(log(Epil$Age))

formula = y ~ ClBase4*CTrt + ClAge + CV4 +

f(Ind, model="iid") + f(rand, model="iid")

result = inla(formula,family="poisson", data = Epil)

summary(result)

plot(result)

66 / 140

Epil-example from Win/Open-BUGS

1.2 1.4 1.6 1.8 2.0

01

23

45

Marginals for α0

67 / 140

Epil-example from Win/Open-BUGS

0 5 10 15

0.0

0.1

0.2

0.3

Marginals for τβ

67 / 140

EPIL example (cont.)

Access results

- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)

I result$summary.fixed

I result$summary.random$Ind

I result$summary.random$rand

I result$summary.hyperpar

- Post. marginals (matrix with x- and y- axis)

I result$marginals.fixed

I result$marginals.random$Ind

I result$marginals.random$rand

I result$marginals.hyperpar

68 / 140

EPIL example (cont.)

Access results

- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)

I result$summary.fixed

I result$summary.random$Ind

I result$summary.random$rand

I result$summary.hyperpar

- Post. marginals (matrix with x- and y- axis)

I result$marginals.fixed

I result$marginals.random$Ind

I result$marginals.random$rand

I result$marginals.hyperpar

68 / 140

Smoothing binary times series

0 100 200 300

0.00.5

1.01.5

2.0

Time

Number of days in Tokyo with rainfall above 1 mm in 1983-84.We want to estimate the probability of rain pt for calendar dayt = 1, . . . , 366

69 / 140

Smoothing binary times series

I Model with time series component

yt ∼ Binomial(nt , pt); t = 1, . . . , 366

pt = exp(ηt)1+exp(ηt)

ηt = f (t)f = {f1, . . . , f366} ∼ cyclic RW2(τ)τ ∼ Gamma(1, 0.0001)

70 / 140

Smoothing binary time series

The Tokyo data frame:

y n time

0 2 1

0 2 2

1 2 3...

71 / 140

Smoothing binary time series

The Tokyo data frame:

y n time

0 2 1

0 2 2

1 2 3...

Specifying the model:formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1

71 / 140

Smoothing binary time series

The Tokyo data frame:

y n time

0 2 1

0 2 2

1 2 3...

Specifying the model:formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1

η =

η1

η2...

η366

=

f time1

f time2

...f time366

71 / 140

data(Tokyo)

formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1

result = inla(formula, family="binomial", Ntrials=n,

data=Tokyo)

72 / 140

Posterior for temporal effect

0 100 200 300

-2.5

-2.0

-1.5

-1.0

-0.5

0.0

0.5

time

PostMean 0.025% 0.5% 0.975%

73 / 140

Posterior for precision

0 10000 20000 30000 40000 50000 60000

0e+00

1e-05

2e-05

3e-05

4e-05

5e-05

6e-05

7e-05

PostDens [Precision for time]

74 / 140

Disease mapping in Germany

Larynx cancer mortality counts are observed in the 544 district ofGermany from 1986 to 1990 and level of smoking consumption(100 possible values).

0.63

0.95

1.27

1.59

1.91

2.23

2.55

26.22

38.02

49.82

61.61

73.41

85.2

97

75 / 140

yi , i = 1, . . . , 544 counts of cancer mortality in Region iEi , i = 1, . . . , 544 known variable accounting for demographicvariation in Region ici , i = 1, . . . , 544 level of smoking consumption registered inRegion i

0.63

0.95

1.27

1.59

1.91

2.23

2.55

26.22

38.02

49.82

61.61

73.41

85.2

97

76 / 140

The model

yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )

where:

I f (ci ) is a smooth effect of the covariate

f = {f1, . . . , f100} ∼ RW2(τf )

I fs(si ) is a spatial effect modeled as an intrinsic GMRF

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

ns

∑s∼s′

fs(s ′),τfsns

)

I fu(si ) is a random effect

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

I µ is an intercept term µ ∼ N (0, 0.0001)

77 / 140

The model

yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )

where:

I f (ci ) is a smooth effect of the covariate

f = {f1, . . . , f100} ∼ RW2(τf )

I fs(si ) is a spatial effect modeled as an intrinsic GMRF

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

ns

∑s∼s′

fs(s ′),τfsns

)

I fu(si ) is a random effect

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

I µ is an intercept term µ ∼ N (0, 0.0001)

77 / 140

The model

yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )

where:

I f (ci ) is a smooth effect of the covariate

f = {f1, . . . , f100} ∼ RW2(τf )

I fs(si ) is a spatial effect modeled as an intrinsic GMRF

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

ns

∑s∼s′

fs(s ′),τfsns

)

I fu(si ) is a random effect

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

I µ is an intercept term µ ∼ N (0, 0.0001)

77 / 140

The model

yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )

where:

I f (ci ) is a smooth effect of the covariate

f = {f1, . . . , f100} ∼ RW2(τf )

I fs(si ) is a spatial effect modeled as an intrinsic GMRF

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

ns

∑s∼s′

fs(s ′),τfsns

)

I fu(si ) is a random effect

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

I µ is an intercept term µ ∼ N (0, 0.0001)

77 / 140

The model

yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )

where:

I f (ci ) is a smooth effect of the covariate

f = {f1, . . . , f100} ∼ RW2(τf )

I fs(si ) is a spatial effect modeled as an intrinsic GMRF

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

ns

∑s∼s′

fs(s ′),τfsns

)

I fu(si ) is a random effect

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

I µ is an intercept term µ ∼ N (0, 0.0001)

77 / 140

For identifiably we define a sum-to-zero constraint for all intrinsicmodels, so ∑

s fs(s) = 0∑i fi = 0

78 / 140

The Germany data frame:

region E Y x

0 7.965008 8 56

1 22.836219 22 65

The model is:

ηi = µ+ f (ci ) + fs(si ) + fu(si )

I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)

I We also need the graph file where the neighborhood structureis specified germany.graph

79 / 140

The Germany data frame:

region E Y x

0 7.965008 8 56

1 22.836219 22 65

The model is:

ηi = µ+ f (ci ) + fs(si ) + fu(si )

I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)

I We also need the graph file where the neighborhood structureis specified germany.graph

79 / 140

The Germany data frame:

region E Y x

0 7.965008 8 56

1 22.836219 22 65

The model is:

ηi = µ+ f (ci ) + fs(si ) + fu(si )

I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)

I We also need the graph file where the neighborhood structureis specified germany.graph

79 / 140

The new data set is:

region E Y x region.struct

0 7.965008 8 56 0

1 22.836219 22 65 1

Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+

f(x,model="rw2")+f(region)

80 / 140

The new data set is:

region E Y x region.struct

0 7.965008 8 56 0

1 22.836219 22 65 1

Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+

f(x,model="rw2")+f(region)

The sum-to-zero constraint is default in the inla function for allintrinsic models.

80 / 140

The new data set is:

region E Y x region.struct

0 7.965008 8 56 0

1 22.836219 22 65 1

Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+

f(x,model="rw2")+f(region)

The sum-to-zero constraint is default in the inla function for allintrinsic models.

80 / 140

The new data set is:

region E Y x region.struct

0 7.965008 8 56 0

1 22.836219 22 65 1

Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+

f(x,model="rw2")+f(region)

80 / 140

The new data set is:

region E Y x region.struct

0 7.965008 8 56 0

1 22.836219 22 65 1

Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+

f(x,model="rw2")+f(region)

The location of the graph file has to be provided here (the graphfile cannot be loaded in R)

80 / 140

The graph file

The germany.graph file:

5441 1 122 2 10 113 4 6 8 15 387...

I Total number of nodes in the graph

I Identifier for the node

I Number of neighbors

I Identifiers for the neighbors

81 / 140

The graph file

The germany.graph file:

5441 1 122 2 10 113 4 6 8 15 387...

I Total number of nodes in the graph

I Identifier for the node

I Number of neighbors

I Identifiers for the neighbors

81 / 140

The graph file

The germany.graph file:

5441 1 122 2 10 113 4 6 8 15 387...

I Total number of nodes in the graph

I Identifier for the node

I Number of neighbors

I Identifiers for the neighbors

81 / 140

The graph file

The germany.graph file:

5441 1 122 2 10 113 4 6 8 15 387...

I Total number of nodes in the graph

I Identifier for the node

I Number of neighbors

I Identifiers for the neighbors

81 / 140

The graph file

The germany.graph file:

5441 1 122 2 10 113 4 6 8 15 387...

I Total number of nodes in the graph

I Identifier for the node

I Number of neighbors

I Identifiers for the neighbors

81 / 140

data(Germany)

g = system.file("demodata/germany.graph", package="INLA")

source(system.file("demodata/Bym-map.R", package="INLA"))

Germany = cbind(Germany, region.struct=Germany$region)

# standard BYM model

formula1 = Y ~ f(region.struct,model="besag",graph=g) +

f(region,model="iid")

# with linear covariate

formula2 = Y ~ f(region.struct,model="besag",graph=g) +

f(region,model="iid") + x

# with smooth covariate

formula3 = Y ~ f(region.struct,model="besag",graph=g) +

f(region,model="iid") + f(x, model="rw2")

82 / 140

result1 = inla(formula1,family="poisson",data=Germany,E=E,

control.compute=list(dic=TRUE))

result2 = inla(formula2,family="poisson",data=Germany,E=E,

control.compute=list(dic=TRUE))

result3 = inla(formula3,family="poisson",data=Germany,E=E,

control.compute=list(dic=TRUE))

83 / 140

Other graph specification

- It is also possible to define the graph structure of your modelusing:

I A symmetric (dense or sparse) matrix, where the non-zeropattern of the matrix defines the graph.

I A inla.graph object.

See FAQ on the webpage for more information.

84 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

85 / 140

Model evaluationI Deviance Information Criterion (DIC):

result = inla(..., control.compute = list(dic = TRUE))

result$dic$dic

I Conditional predictive ordinate (CPO) and probability integraltransform (PIT):

CPOi = π(yi |y−i )

PITi = Prob(Yi ≤ yobsi |y−i )

result = inla(..., control.compute = list(cpo = TRUE))

result$cpo$cpo

result$cpo$dic

86 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

87 / 140

Controlling θ

I We often need to set our own priors and using our ownparameters in these.

I These can be set in two ways

Old style using prior=.., param=..., initial=...,

fixed=...

New style using hyper = list(prec =

list(initial=2, fixed=TRUE, ....))

The old-style is there for backward-compatibility only. The twostyles can also be mixed.

88 / 140

Controlling θ

I We often need to set our own priors and using our ownparameters in these.

I These can be set in two ways

Old style using prior=.., param=..., initial=...,

fixed=...

New style using hyper = list(prec =

list(initial=2, fixed=TRUE, ....))

The old-style is there for backward-compatibility only. The twostyles can also be mixed.

88 / 140

Controlling θ

I We often need to set our own priors and using our ownparameters in these.

I These can be set in two ways

Old style using prior=.., param=..., initial=...,

fixed=...

New style using hyper = list(prec =

list(initial=2, fixed=TRUE, ....))

The old-style is there for backward-compatibility only. The twostyles can also be mixed.

88 / 140

Example- New style

hyper = list(

prec = list(

prior = "loggamma",

param = c(2,0.1),

initial = 3,

fixed = FALSE

)

)

formula = y ~ f(i, model="iid", hyper = hyper) + ...

- Old style

formula = y ~ f(i, model="iid", prior = "loggamma",

param = c(2,0.1), inital = 3,

fixed = FALSE) + ...

89 / 140

Internal and external scale

Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like

θ1 = log(τ)

I Initial values are given in the internal scale

I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.

90 / 140

Internal and external scale

Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like

θ1 = log(τ)

I Initial values are given in the internal scale

I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.

90 / 140

Internal and external scale

Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like

θ1 = log(τ)

I Initial values are given in the internal scale

I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.

90 / 140

Example: AR1 model

hyper

theta1

name log precisionshort.name prec

prior loggammaparam 1 5e-05initial 4fixed FALSE

to.thetafrom.theta

theta2

name logit lag one correlationshort.name rho

prior normalparam 0 0.15initial 2fixed FALSE

to.thetafrom.theta

constr FALSE

nrow.ncol FALSE

augmented FALSE

aug.factor 1

aug.constr

n.div.by

n.required FALSE

set.default.values FALSE

pdf ar1

91 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

92 / 140

Feature: replicate

“replicate” generates iid replicates from the same model with thesame hyperparameters.

If x | θ ∼ AR(1), then nrep=3, makes

x = (x1, x2, x3)

with mutually independent xi ’s from AR(1) with the same θ

Most f()-models can be replicated

93 / 140

Example: replicate

n=100

x1 = arima.sim(n, model=list(ar=0.9)) + 1

x2 = arima.sim(n, model=list(ar=0.9)) - 1

y1 = rpois(n,exp(x1))

y2 = rpois(n,exp(x2))

y = c(y1,y2)

i = rep(1:n,2)

r = rep(1:2,each=n)

intercept = as.factor(r)

formula = y ~ f(i, model="ar1", replicate=r) + intercept -1

result = inla(formula, family = "poisson",

data = data.frame(y=y,i=i,r=r))

94 / 140

Example: replicate

i = rep(1:n,2)

r = rep(1:2,each=n)

intercept = as.factor(r)

formula = y ~ f(i, model="ar1", replicate=r) + intercept -1

y1,1

...yn,1y1,2

...yn,2

g−→

η1,1

...ηn,1η1,2

...ηn,2

=

f i1,1...

f in,1

f i1,2...

f in,2

+ β0,1 ∗

1...10...0

+ β0,2 ∗

0...01...1

95 / 140

Feature: More than one family

Every observation could have its own likelihood!

I Response is a matrix or list

I Each “column” defines a separate “family”

I Each “family” has its own hyperparameters

96 / 140

n=100

phi = 0.9

x1 = 1 + arima.sim(n, model=list(ar=phi))

x2 = 0.5 + arima.sim(n, model=list(ar=phi))

y1 = rbinom(n,size=1, prob=exp(x1)/(1+exp(x1)))

y2 = rpois(n,exp(x2))

y = matrix(NA, 2*n, 2)

y[ 1:n, 1] = y1

y[n+1:n, 2] = y2

i = rep(1:n,2)

r = rep(1:2,each=n)

intercept = as.factor(r)

Ntrials = c(rep(1,n), rep(NA,n))

formula = y ~ f(i, model="ar1", replicate=r) + intercept -1

result = inla(formula, family = c("binomial", "poisson"),

Ntrials = Ntrials, data = data.frame(y,i,r))

97 / 140

y = matrix(NA, 2*n, 2)

y[ 1:n, 1] = y1

y[n+1:n, 2] = y2

i = rep(1:n,2)

r = rep(1:2,each=n)

intercept = as.factor(r)

Ntrials = c(rep(1,n), rep(NA,n))

formula = y ~ f(i, model="ar1", replicate=r) + intercept -1

result = inla(formula, family = c("binomial", "poisson"),

Ntrials = Ntrials, data = data.frame(y,i,r))

y1,1 NA...

...yn,1 NANA y1,2

......

NA yn,2

g−→

η1,1

...ηn,1η1,2

...ηn,2

=

f i1,1...

f in,1f i1,2...

f in,2

+ β0,1 ∗

1...10...0

+ β0,2 ∗

0...01...1

98 / 140

More than one family - More examples

Some rather advanced examples on www.r-inla.org using thisfeature

I Preferential sampling, geostatistics (marked point process)

I Weibull-survival data and “longitudinal” data

99 / 140

Feature: copy

The model

formula = y ~ f(i, ...) + ...

Only allow ONE element from each sub-model, to contribute tothe linear predictor for each observation.

Sometimes this is not sufficient.

100 / 140

Feature: copy

Supposeηi = ui + ui+1 + ...

Then we can code this as

formula = f(i, model="iid") + f(i.plus, copy="i")

I The copy-feature, creates an additional sub-model which isε-close to the target.

I Many copies allowed

I Copy with unknown scaling (default scaling is fixed to 1).

η1...ηn

=

u1...

un

+

u2...

un

101 / 140

Feature: copySuppose that

ηi = ai + bizi + ....

where(ai , bi )

iid∼ N2(0,Σ)

- Simulate data

n = 100

Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2)

z = runif(n)

ab = rmvnorm(n, sigma = Sigma)

a = ab[, 1]

b = ab[, 2]

eta = a + b * z

s = 0.1

y = eta + rnorm(n, sd=s)

102 / 140

i = 1:n

j = 1:n + n

formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1

r = inla(formula, data = data.frame(y, i, j))

η1

...ηn

=

a1

...anb1

...bn

+

b1 ∗ z1

...bn ∗ zn

103 / 140

Feature: Linear-combinations

Possible to extract extra information from the model through linearcombinations of the latent field, say

v = Bx

for a k × n matrix B.

104 / 140

Feature: Linear-combinations (cont.)

Two different approaches.

1. Most “correct” is to do the computations on the enlarged field

x̃ = (x, v)

But this often lead to more dense precision matrix.

2. The second option is to compute these “offline”, as(conditionally on θ)

Var(v1) = Var(bT1 x) ≈ bT

1 Q−1GMRFapproxb1

andE (v1) = b1E (x)

Approximate density of v1 with a Normal.

105 / 140

Feature: Linear-combinations (cont.)

Two different approaches.

1. Most “correct” is to do the computations on the enlarged field

x̃ = (x, v)

But this often lead to more dense precision matrix.

2. The second option is to compute these “offline”, as(conditionally on θ)

Var(v1) = Var(bT1 x) ≈ bT

1 Q−1GMRFapproxb1

andE (v1) = b1E (x)

Approximate density of v1 with a Normal.

105 / 140

formula = y ~ ClBase4*CTrt + ClAge + CV4 +

f(Ind, model="iid") + f(rand, model="iid")

## Now I want the posterior for

##

## 1) 2*CTrt - CV4

## 2) Ind[2] - rand[2]

##

lc1 = inla.make.lincomb( CTrt = 2, CV4 = -1)

names(lc1) = "lc1"

lc2 = inla.make.lincomb( Ind = c(NA,1), rand = c(NA,-1))

names(lc2) = "lc2"

## default is to derive the marginals from lc’s without changing the

## latent field

result1 = inla(formula,family="poisson", data = Epil,

lincomb = c(lc1, lc2))

## but the lincombs can also be additionally included into the latent

## field for increased accurancy...

result2 = inla(formula,family="poisson", data = Epil,

lincomb = c(lc1, lc2),

control.inla = list(lincomb.derived.only = FALSE))

106 / 140

- Get the results

result$summary.lincomb.derived

result$marginals.lincomb.derived # results of the

# default method

result$summary.lincomb

result$marginals.lincomb # alternative method

- Posterior correlation matrix between all the linearcombinations

control.inla = list(lincomb.derived.correlation.matrix = TRUE)

result$misc$lincomb.derived.correlation.matrix

- Many linear combinations at onceUse inla.make.lincombs()

107 / 140

A-matrix in the linear predictor (I)

Usual formulaη = ...

andyi ∼ π(yi | ηi , ...)

108 / 140

A-matrix in the linear predictor (II)

Extended formulaη = ...

η∗ = Aη

andyi ∼ π(yi | η∗i , ...)

Implemented as

A = matrix(...)

A = sparseMatrix(...)

result = inla(formula, ...,

control.predictor = list(A = A))

109 / 140

A-matrix in the linear predictor (II)

Extended formulaη = ...

η∗ = Aη

andyi ∼ π(yi | η∗i , ...)

Implemented as

A = matrix(...)

A = sparseMatrix(...)

result = inla(formula, ...,

control.predictor = list(A = A))

109 / 140

A-matrix in the linear predictor (III)

I Can really simplify model-formulations

I Duplicate to some extent the “copy” feature

I Really useful for some models; the A-matrix need not to be asquare matrix...

110 / 140

Feature: remote computing

For large/huge models, its more convenient to run thecomputations on the remote (Linux/Mac) computational server

inla(...., inla.call="remote")

using ssh (and Cygwin on windows).

111 / 140

Control statements

The control.xxx statements control various parts of the INLAprogram

I control.predictorI A — The ”A matrix”or ”Observational Matrix”linking the

latent field to the data.

I control.modeI x,theta, result — Gives modes to INLA.I restart = TRUE — Tells INLA to try to improve on the

supplied mode

I control.computeI dic, mlik, cpo — Compute measures of fit.

I control.inlaI strategy and int.strategy contain useful advanced

features.

Various other—see help!

112 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

113 / 140

Space-varying regression

Number of (insurance-type) losses Nkt in 431municipalities/regions of Norway in relation to one weathercovariate Wkt .The likelihood is

Nkt ∼ Poisson(Akt pkt); k = 1, . . . , 431 t = 1, . . . , 10

The model for log pkt is:

log pkt = β0 + βk Wkt

where βk is the regression coefficients for each municipality.

114 / 140

Borrow strength..

Few losses is in each region; high variability in the estimates.

Borrow strength, by letting {β1, . . . , β431} to be smooth in space:

{β1, . . . , β431} ∼ CAR(τβ)

115 / 140

Borrow strength..

Few losses is in each region; high variability in the estimates.

Borrow strength, by letting {β1, . . . , β431} to be smooth in space:

{β1, . . . , β431} ∼ CAR(τβ)

115 / 140

The data set:

y region W

1 0 1 0.4

2 0 1 0.4

10 0 1 0.4

11 1 2 0.2

12 0 2 0.2

20 0 2 0.2

116 / 140

Second argument in f() is the weight which defaults to 1

ηi = ...+ wi fi + ...

is represented as

f(i, w, ...)

No need for sum-to-zero constraint!

norway = read.table("norway.dat", header=TRUE)

formula = y ~ 1 + f(region, W, model="besag",

graph.file="norway.graph",

constr=FALSE)

result = inla(formula, family="poisson", data=norway)

117 / 140

Survival models

patient time event age sex1 8,16 1,1 28,28 02 23,13 1,0 48,48 13 22,18 1,1 32,32 0

I Times of infection from the time of insertion of catheter on 38kidney patients using portable dialysis equipment.

I 2 observation for each patient (38 patients).

I Each time can be an event (infection) or a censoring (noinfection)

118 / 140

The Kidney data

The Kidney data frame

time event age sex ID

8 1 28 0 1

16 1 28 0 1

23 1 48 1 2

13 0 48 1 2

22 1 32 0 3

28 1 32 0 3

119 / 140

data(Kidney)

formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid")

result1 = inla(formula, family="coxph", data=Kidney)

result2 = inla(formula, family="weibull", data=Kidney)

result3 = inla(formula, family="exponential", data=Kidney)

120 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

121 / 140

A toy-example using copy

State-space modelyt = xt + vt

xt = 2xt−1 − xt−2 + wt

Rewrite this asyt = xt + vt

0 = xt − 2xt−1 + xt−2 + wt

and implement this as two families

1. Observations yt with precision Prec(vt)

2. Observations 0 with precision Prec(wt), or Prec=HIGH.

122 / 140

A toy-example using copy

State-space modelyt = xt + vt

xt = 2xt−1 − xt−2 + wt

Rewrite this asyt = xt + vt

0 = xt − 2xt−1 + xt−2 + wt

and implement this as two families

1. Observations yt with precision Prec(vt)

2. Observations 0 with precision Prec(wt), or Prec=HIGH.

122 / 140

n = 100

m = n-2

y = sin((1:n)*0.2) + rnorm(n, sd=0.1)

formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) +

f(j, w, copy="i") + f(k, copy="i") +

f(l, model ="iid") -1

Y = matrix(NA, n+m, 2)

Y[1:n, 1] = y

Y[1:m + n, 2] = 0

i = c(1:n, 3:n) # x_t

j = c(rep(NA,n), 3:n -1) # x_t-1

w = c(rep(NA,n), rep(-2,m)) # weights for j

k = c(rep(NA,n), 3:n -2) # x_t-2

l = c(rep(NA,n), 1:m) # v_t

r = inla(formula, data = data.frame(i,j,w,k,l,Y),

family = c("gaussian", "gaussian"),

control.data = list(list(), list(initial=10, fixed=TRUE)))

123 / 140

Stochastic Volatility model

0 200 400 600 800 1000

−2

02

4

Log of the daily difference of the pound-dollar exchange rate fromOctober 1st, 1981, to June 28th, 1985.

124 / 140

Stochastic Volatility model

Simple model

xt | x1, . . . , xt−1, τ, φ ∼ N (φxt−1, 1/τ)

where |φ| < 1 to ensure a stationary process.

Observations are taken to be

yt | x1, . . . , xt , µ ∼ N (0, exp(µ+ xt))

125 / 140

Results

Using just the first 50 data-points only, which makes the problemmuch harder.

126 / 140

Results

−10 −5 0 5 10 15 20

0.00

0.02

0.04

0.06

0.08

0.10

ν = logit(2φ− 1)

126 / 140

Results

0 2 4 6

0.00

0.05

0.10

0.15

0.20

0.25

0.30

log(κx)

126 / 140

Using the full dataset

0 200 400 600 800 1000

−20

24

The Pound-Dollar data.

127 / 140

Using the full dataset

0 200 400 600 800

−3−2

−10

12

x$V1

x$V2

Mean of xt + µ

128 / 140

Using the full dataset

0 100 200 300 400 500

0.000

0.005

0.010

0.015

0.020

convert.dens(xx, yy, FUN = exp)$x

conver

t.dens(

xx, yy

, FUN =

exp)$

y

The posterior marginal for the precision.

129 / 140

Using the full dataset

0.70 0.75 0.80 0.85 0.90 0.95 1.00

010

2030

40

convert.dens(xx, yy, FUN = phi.trans)$x

conver

t.dens(

xx, yy

, FUN =

phi.tra

ns)$y

The posterior marginal for the lag-1 correlation.

130 / 140

Using the full dataset

0 200 400 600 800 1000

−3−2

−10

12

x$V1

x$V2

Predictions for µ+ xt+k

131 / 140

New data-model: Student-tν

Now extend the model to use Student-tν distribution

yt | x1, . . . , xt ∼ exp(µ/2 + xt/2)× Student-tν/√ν/(ν − 2)

132 / 140

Student-tν

0 20 40 60 80 100

0.00

0.02

0.04

0.06

0.08

convert.dens(xx, yy, FUN = dof.trans)$x

conver

t.dens(

xx, yy

, FUN =

dof.tra

ns)$y

Posterior marginal for ν.

133 / 140

Student-tν

0 200 400 600 800 1000

−3−2

−10

12

x$V1

x$V2

Predictions

134 / 140

Student-tν

0 200 400 600 800 1000

−3−2

−10

12

x$V1

x$V2

Comparing predictions with Student−tν and Gaussian

135 / 140

Student-tν

However,I No support for Student-tν in the data

I Bayes-factorI Deviance Information Criteria

136 / 140

Disease mapping: The BYM-model

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative risk ηi = ui + viI Structured component u

I Unstructured component v

I Log-precisions log κu and log κv

−0.63

−0.37

−0.1

0.17

0.44

0.71

0.98

I A hard case: Insulin Dependent Diabetes Mellitus in 366districts of Sardinia. Few counts.

I dim(θ) = 2.

137 / 140

Marginals for θ|y

138 / 140

Marginals for θ|y

138 / 140

Marginals for xi |y

139 / 140

THANK YOU

140 / 140