Bayesian computation with INLA

Bayesian computation using INLA

Thiago G. Martins

Norwegian University of Science and TechnologyTrondheim, Norway

AS 2013, Ribno, Slovenia

September, 2013

1 / 140

Parte I

Latent Gaussian models and INLA

methodology

2 / 140

Outline

Latent Gaussian models

Are latent Gaussian models important?

Bayesian computing

INLA method

3 / 140

Hierarchical Bayesian models

Hierarchical models are an extremely useful tool in Bayesian modelbuilding.

Three parts:

I Observations (y): Encodes information about observed data,including design and collection issues.

I The latent process (x): The unobserved process. May be thefocus of the study, or may be included to reduceautocorrelation. E.g., encode spatial and/or temporaldependence.

I The Parameter model (θ): Models for all of the parameters inthe observation and latent processes.

4 / 140

Three parts:

4 / 140

Three parts:

4 / 140

A latent Gaussian model is a Bayesian hierarchical model of thefollowing form

I Observed data y, yi |xi ∼ π(yi |xi ,θ)

I Latent Gaussian field x ∼ N (·,Σ(θ))

I Hyperparameters θ ∼ π(θ)I variabilityI length/strength of dependenceI parameters in the likelihood

π(x,θ|y) ∝ π(θ) π(x|θ)∏i∈I

π(yi |xi ,θ)

5 / 140

π(yi |xi ,θ)

5 / 140

π(yi |xi ,θ)

5 / 140

π(yi |xi ,θ)

5 / 140

Precision matrix

The precision matrix of the latent field

Q(θ) = Σ(θ)−1

plays a key role!

Two issues

I Building models through conditioning (“hierarchical models”)

I Computational benefits

6 / 140

Precision matrix

The precision matrix of the latent field

Q(θ) = Σ(θ)−1

plays a key role!

Two issues

I Building models through conditioning (“hierarchical models”)

I Computational benefits

6 / 140

Building models through conditioning

I x ∼ N (0,Q−1x )

I y|x ∼ N (x,Q−1y )

Q(x,y) =

[Qx + Qy −Qy

−Qy Qy

]Not so nice expressions using the Covariance-matrix

7 / 140

Computational benefits

I Precision matrices encodes conditional independence:

xi ⊥ xj |x−ij ⇐⇒ Qij = 0

We are interested in models with sparse precision matrices.

I x ∼ N (·,Σ(θ)) with sparse Q(θ) = Σ(θ)−1

Gaussians with a sparse precision matrix are called GaussianMarkov random fields (GMRFs)

I Good computational properties through numerical algorithmsfor sparse matrices

8 / 140

Numerical algorithms for sparse matrices: scalingproperties

I Time: O(n)

I Space: O(n3/2)

I Space-time: O(n2)

This is to be compared with general O(n3) algorithms for densematrices.

9 / 140

Numerical algorithms for sparse matrices: scalingproperties

I Time: O(n)

I Space: O(n3/2)

I Space-time: O(n2)

This is to be compared with general O(n3) algorithms for densematrices.

9 / 140

Outline

Bayesian computing

INLA method

10 / 140

Example (I): Mixed-effect model

yij |ηij ,θ1 ∼ π(yij |ηij ,θ1), i = 1, . . . ,N, j = 1, . . . ,M

ηij = µ+ cijβ + ui + vj + wij

where u, v and w are “random effects”.

If we assign Gaussian priors on µ, β, u and v, then

x|θ2 = (µ, β,u, v,η)|θ2

is jointly Gaussian.

θ = (θ1,θ2)

11 / 140

Example (I): Mixed-effect model

yij |ηij ,θ1 ∼ π(yij |ηij ,θ1), i = 1, . . . ,N, j = 1, . . . ,M

ηij = µ+ cijβ + ui + vj + wij

where u, v and w are “random effects”.

If we assign Gaussian priors on µ, β, u and v, then

x|θ2 = (µ, β,u, v,η)|θ2

θ = (θ1,θ2)

11 / 140

Example (I) - cont.

We can reinterpret the model as

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

I dim(x) could be large 102-105

I dim(θ) is small 1-5

12 / 140

Example (I) - cont.

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

12 / 140

Example (I) - cont.

Precision matrix (η,u, v, µ, β) N = 100, M = 5.

0.0 0.2 0.4 0.6 0.8 1.0

13 / 140

Example (II): Time-series model

Smoothing of binary time-series

I Data is sequence of 0 and 1s

I Probability for a 1 at time t, pt , depends on time

pt =exp(ηt)

1 + exp(ηt)

I Linear predictor

ηt = µ+ βct + ut + vt , t = 1, . . . , n

14 / 140

pt =exp(ηt)

1 + exp(ηt)

I Linear predictor

ηt = µ+ βct + ut + vt , t = 1, . . . , n

14 / 140

pt =exp(ηt)

1 + exp(ηt)

I Linear predictor

ηt = µ+ βct + ut + vt , t = 1, . . . , n

14 / 140

Example (II) - cont.

Prior models

I µ and β are Normal

I u AR-model, likeut = φut−1 + εt

with parameters (φ, σ2ε ).

I v is an unstructured term or a “random effect”

givesx|θ = (µ, β,u, v,η)

Hyperparametersθ = (φ, σ2

ε , σ2v )

15 / 140

Prior models

ε , σ2v )

15 / 140

Prior models

ε , σ2v )

15 / 140

Prior models

ε , σ2v )

15 / 140

Prior models

ε , σ2v )

15 / 140

Prior models

ε , σ2v )

15 / 140

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

16 / 140

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

16 / 140

Precision matrix (η,u, v, µ, β), n = 100.

0.0 0.2 0.4 0.6 0.8 1.0

17 / 140

Example (III): Disease mapping

I Data yi ∼ Poisson(Eiexp(ηi ))

I Log-relative riskηi = µ+ ui + vi + f (ci )

I Structured component u

I Unstructured component v

I Smooth effect of a covariate c

−0.63

−0.37

−0.1

18 / 140

−0.63

−0.37

−0.1

18 / 140

−0.63

−0.37

−0.1

18 / 140

−0.63

−0.37

−0.1

18 / 140

−0.63

−0.37

−0.1

18 / 140

Yet Another Example (III)

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

19 / 140

Example (III) - cont.

Precision matrix (η,u, v, µ, f)

0.0 0.2 0.4 0.6 0.8 1.0

20 / 140

What we have learned so far

The latent Gaussian model construct

θ ∼ π(θ)

x|θ ∼ π(x|θ) = N (0,Q−1(θ))

y|x,θ ∼∏i

π(yi |ηi ,θ)

occurs in many, seemingly unrelated, statistical models.

GLM/GAM/GLMM/GAMM/++

21 / 140

Further Examples

I Dynamic linear models

I Stochastic volatility

I Generalized linear (mixed) models

I Generalized additive (mixed) models

I Spline smoothing

I Semi-parametric regression

I Space-varying (semi-parametric) regression models

I Disease mapping

I Log-Gaussian Cox-processes

I Model-based geostatistics (*)

I Spatio-temporal models

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Further Examples

I Spline smoothing

I Disease mapping

I Survival analysis

22 / 140

Outline

Bayesian computing

INLA method

23 / 140

Bayesian computing

We are interested in the posterior marginal quantities like π(xi |y)and π(θi |y).

This requires the evaluation of integrals of the form

π(xi |y) ∝∫x{−i}

∫θπ(y |x,θ)π(x|θ)π(θ) dθ dx{−i}

The computation of massively high dimensional integrals is at thecore of Bayesian computing.

24 / 140

Bayesian computing

π(xi |y) ∝∫x{−i}

24 / 140

Bayesian computing

π(xi |y) ∝∫x{−i}

24 / 140

But surely we can already do this

I Markov Chain Monte Carlo (MCMC) is widely used by theapplied community.

I There are generic tools available for MCMC, OpenBUGS, JAGS,STAN and others for specific models, like BayesX.

I The issue of Bayesian computing is not “solved” even thoughMCMC is available

I Hierarchical models are more difficult for MCMC

I Strong dependencies, bad mixing.

I A main obstacle for Bayesian modeling is still the issue of“Bayesian computing”

25 / 140

So what’s wrong with MCMC?

This is actually a problem with any Monte-Carlo scheme.

Error in expectations

The Monte-Carlo error is

(E(f (X ))− 1

N∑i=1

f (xi )

(1√N

In practical terms, to reduce the variance to O(10−p) you needO(102p) samples!

This can be optimistic!

26 / 140

Be more narrow

I MCMC ‘works’ for everything, but it is not usually optimalwhen we focus on a specific class of models.

I It works for latent Gaussian models, but it’s too slow.

I (Unfortunately) sometimes it’s the only thing we can do.

I Integrated Nested Laplace Approximations

I Deterministic rather than stochastic algorithm, like MCMC.

I Specially designed for latent Gaussian models.

I Accurate results in a small fraction of computational time,when compared to MCMC.

27 / 140

Be more narrow

27 / 140

Be more narrow

27 / 140

Be more narrow

27 / 140

Be more narrow

27 / 140

Be more narrow

27 / 140

Comparing results with MCMC

I When comparing the results of R-INLA with MCMC, it isimportant to use the same model.

I Here we have compared the EPIL example results with thoseobtained using JAGS via the rjags package

28 / 140

Comparing results with MCMC

I When comparing the results of R-INLA with MCMC, it isimportant to use the same model.

I Here we have compared the EPIL example results with thoseobtained using JAGS via the rjags package

28 / 140

Intercept, 0.125 minutes

Density

1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

Density

−0.5 0.0 0.5 1.0 1.5

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5

29 / 140

Density

1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

Density

−0.5 0.0 0.5 1.0 1.5

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

29 / 140

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

−0.5 0.0 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

0.5 1.0 1.5 2.0 2.5

log(tau.Rand)

log(tau.b)

1.5 2.0 2.5 3.0

29 / 140

Intercept, 1 minutes

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

−0.5 0.0 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

0.5 1.0 1.5 2.0 2.5

log(tau.Rand)

log(tau.b)

1.5 2.0 2.5 3.0

29 / 140

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

Density

−1.0 −0.5 0.0 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

29 / 140

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

Density

−1.0 0.0 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5 3.0

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

29 / 140

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

Density

−1.0 0.0 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5 3.0

log(tau.Rand)

log(tau.b)

Density

1.5 2.0 2.5 3.0

29 / 140

Density

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

Density

−1.5 −0.5 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

Density

0.5 1.0 1.5 2.0 2.5 3.0

log(tau.Rand)

log(tau.b)

Density

1.0 1.5 2.0 2.5 3.0

29 / 140

1.3 1.4 1.5 1.6 1.7 1.8 1.9

alpha.Age

−1.5 −0.5 0.5 1.0 1.5 2.0

log(tau.Ind)

log(tau.b1)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

log(tau.Rand)

log(tau.b)

1.0 1.5 2.0 2.5 3.0 3.5

29 / 140

1.2 1.4 1.6 1.8

alpha.Age

−1 0 1 2

log(tau.Ind)

log(tau.b1)

0.0 0.5 1.0 1.5 2.0 2.5 3.0

log(tau.Rand)

log(tau.b)

1.0 1.5 2.0 2.5 3.0 3.5

29 / 140

Density

1.2 1.4 1.6 1.8 2.0

alpha.Age

Density

−1 0 1 2

log(tau.Ind)

log(tau.b1)

Density

0.0 0.5 1.0 1.5 2.0 2.5 3.0

log(tau.Rand)

log(tau.b)

Density

1.0 1.5 2.0 2.5 3.0 3.5

29 / 140

Outline

Bayesian computing

INLA method

30 / 140

Main aim

Posteriorπ(x,θ|y) ∝ π(θ) π(x|θ)

∏i∈I

π(yi |xi ,θ)

Compute the posterior marginals:

π(xi |y) =

∫π(θ|y) π(xi |θ, y) dθ

π(θj |y) =

∫π(θ|y) dθ−j

31 / 140

Main aim

Posteriorπ(x,θ|y) ∝ π(θ) π(x|θ)

∏i∈I

π(yi |xi ,θ)

Compute the posterior marginals:

π(xi |y) =

π(θj |y) =

∫π(θ|y) dθ−j

31 / 140

1. Build an approximation to π(θ|y): π̃(θ |y)

2. Build an approximation to π(xi |θ, y): π̃(xi |θ, y)

π(xi |y) =

π(θj |y) =

∫π(θ|y) dθ−j

3. Do the integration wrt θ numerically.

32 / 140

π(xi |y) =

π(θj |y) =

∫π(θ|y) dθ−j

32 / 140

π(xi |y) =

π(θj |y) =

∫π(θ|y) dθ−j

32 / 140

π̃(xi |y) =

∫π̃(θ|y) π̃(xi |θ, y) dθ

π̃(θj |y) =

∫π̃(θ|y) dθ−j

32 / 140

π̃(xi |y) =

∫π̃(θ|y) π̃(xi |θ, y) dθ

π̃(θj |y) =

∫π̃(θ|y) dθ−j

32 / 140

Task 1: π̃(θ|y)

The Laplace approximation for π(θ|y) is

π(θ|y) =π(x,θ|y)

π(x|θ, y)

∝ π(θ) π(x|θ) π(y|x)

π(x|θ, y)

≈ π(θ) π(x|θ) π(y|x,θ)

πG (x|θ, y)

∣∣∣∣∣x=x∗(θ)

where πG (x|θ, y) is the Gaussian approximation of π(x|θ, y) andx∗(θ) is the mode.

33 / 140

The GMRF-approximation

π(x|y) ∝ exp

2xTQx +

log π(yi |xi )

≈ exp

2(x− µ)T (Q + diag(ci ))(x− µ)

)= π̃(x|y)

Constructed as follows:

I Locate the mode x∗

I Expand to second order

Markov and computational properties are preserved

34 / 140

Remarks

The Laplace approximation

π̃(θ|y)

turn out to be accurate: x|y,θ appears almost Gaussian in mostcases, as

I x is a priori Gaussian.

I y is typically not very informative.

I Observational model is usually ‘well-behaved’.

Note: π̃(θ|y) itself does not look Gaussian!

35 / 140

Remarks

π̃(θ|y)

35 / 140

Remarks

π̃(θ|y)

35 / 140

Remarks

π̃(θ|y)

35 / 140

Remarks

π̃(θ|y)

35 / 140

Task 2: π̃(xi |y,θ)

This task is more challenging, since

I dimension of x, n is large

I and there are potential n marginals to compute, or at leastO(n).

Here we present three options:

1. Gaussian approximation

2. Laplace approximation

3. Simplified Laplace approximation

There is a trade-off between accuracy and complexity.

36 / 140

Task 2: π̃(xi |y,θ)

This task is more challenging, since

I dimension of x, n is large

I and there are potential n marginals to compute, or at leastO(n).

Here we present three options:

2. Laplace approximation

3. Simplified Laplace approximation

There is a trade-off between accuracy and complexity.

36 / 140

π̃(xi |y,θ) - 1. Gaussian approximation

An obvious simple and fast alternative, is to use theGMRF-approximation πG (x|y,θ)

π̃(xi |θ, y) = N (xi ; µ(θ), σ2(θ))

I It is the fastest option, only need to compute the diagonal ofQ(θ)−1.

I Can present errors in location and asymmetry.

37 / 140

π̃(xi |y,θ) - 1. Gaussian approximation

An obvious simple and fast alternative, is to use theGMRF-approximation πG (x|y,θ)

π̃(xi |θ, y) = N (xi ; µ(θ), σ2(θ))

I It is the fastest option, only need to compute the diagonal ofQ(θ)−1.

I Can present errors in location and asymmetry.

37 / 140

π̃(xi |y,θ) - 2. Laplace approximation

I The Laplace approximation:

π̃(xi | y,θ) ≈ π(x,θ|y)

πGG (x−i |xi , y,θ)

∣∣∣∣∣x−i=x∗−i (xi ,θ)

I Again, approximation is very good, as x−i |xi , θ is ‘almostGaussian’,

I but it is expensive. In order to get the n marginals:I perform n optimizations, andI n factorizations of n − 1× n − 1 matrices.

38 / 140

π̃(xi | y,θ) ≈ π(x,θ|y)

∣∣∣∣∣x−i=x∗−i (xi ,θ)

38 / 140

π̃(xi | y,θ) ≈ π(x,θ|y)

∣∣∣∣∣x−i=x∗−i (xi ,θ)

38 / 140

π̃(xi |y,θ) - 3. Simplified Laplace approximation

Taylor expansions of the LA for π(xi |θ, y):

I computational much faster

I correct the Gaussian approximation for error in shift andskewness

log π̃(xi |θ, y) = −1

2x2i + bxi +

i + · · ·

I Fit a skew-Normal density

2φ(x)Φ(ax)

I sufficiently accurate for most applications

39 / 140

log π̃(xi |θ, y) = −1

2x2i + bxi +

i + · · ·

2φ(x)Φ(ax)

39 / 140

log π̃(xi |θ, y) = −1

2x2i + bxi +

i + · · ·

2φ(x)Φ(ax)

39 / 140

log π̃(xi |θ, y) = −1

2x2i + bxi +

i + · · ·

2φ(x)Φ(ax)

39 / 140

Task 3: Numerical integration wrt θ

Now that we know how to compute:

I π̃(θ|y) - Laplace approximation

I π̃(xi |θ, y) -1. Gaussian2. Laplace3. Simplified Laplace

Lets see how INLA works

40 / 140

Task 3: Numerical integration wrt θ

Now that we know how to compute:

I π̃(θ|y) - Laplace approximation

I π̃(xi |θ, y) -1. Gaussian2. Laplace3. Simplified Laplace

Lets see how INLA works

40 / 140

The integrated nested Laplace approximation (INLA) I

Explore π̃(θ|y)

I Locate the modeI Use the Hessian to construct new variablesI Grid-search

41 / 140

Explore π̃(θ|y)

41 / 140

Explore π̃(θ|y)

41 / 140

Explore π̃(θ|y)

41 / 140

The integrated nested Laplace approximation (INLA) II

Step II For each θj

I For each i , evaluate the Laplace approximationfor selected values of xi

I Build a Skew-Normal or log-spline correctedGaussian

N (xi ; µi , σ2i )× exp(spline)

to represent the conditional marginal density.

42 / 140

The integrated nested Laplace approximation (INLA) III

Step III Sum out θj

I For each i , sum out θ

π̃(xi | y) ∝∑j

π̃(xi | y,θj)× π̃(θj | y)

I Build a log-spline corrected Gaussian

to represent π̃(xi | y).

43 / 140

π̃(xi | y) ∝∑j

43 / 140

π̃(xi | y) ∝∑j

43 / 140

Computing posterior marginals for θj (I)

Main idea

I Use the integration-points and build an interpolant

I Use numerical integration on that interpolant

44 / 140

Computing posterior marginals for θj (I)

Main idea

I Use the integration-points and build an interpolant

I Use numerical integration on that interpolant

44 / 140

How can we assess the error in the approximations?

Tool 1: Compare a sequence of improved approximations

2. Simplified Laplace

3. Laplace

45 / 140

How can we assess the error in the approximations?

Tool 2: Estimate the “effective” number of parameters as definedin the Deviance Information Criteria:

pD(θ) = D(x;θ)− D(x;θ)

and compare this with the number of observations.

Low ratio is good.

This criteria has theoretical justification.

46 / 140

Parte II

R-INLA package

47 / 140

Outline

INLA implementation

R-INLA - Model specification

Some examples

Model evaluation

Controlling hyperparameters and priors

Some more advanced features

More examples

Extras

48 / 140

Implementing INLA

All procedures required to perform INLA need to be carefullyimplemented to achieve a good speed; easy to implement a slowversion of INLA.

I The GMRFLib-library

I The inla-program

I The INLA package for R

Happily, the R package is all we need to learn!!!

49 / 140

Implementing INLA

I The GMRFLib-libraryI Basic library written in C for fast computations for GMRFs.

I The inla-program

49 / 140

Implementing INLA

I The inla-program

I Define latent Gaussian models and interface with theGMRFLib-library

I Models are defined using .ini-filesI inla-program write all the results (E/Var/marginals) to files

49 / 140

Implementing INLA

I The inla-program

I R-interface to the inla-program. (That’s why its not onCRAN.)

I Convert “formula”-statements into “.ini”-file definitionsI Run inla-programI Get results back to R

49 / 140

Implementing INLA

I The inla-program

49 / 140

The INLA package for R

Data Frame

formula

− ini file

− Input files

Produces:

program

package

Collects results

ARuns the

of type list

object

Output

plots etc.can get summary,

50 / 140

R-INLA

I Visit the www-site

www.r-inla.org

and follow the instructions.

I www-site contains source-code, examples, reports +++

I The first time do> source("http://www.math.ntnu.no/inla/givmeINLA.R")

Later, you can upgrade the package doing> inla.upgrade()

or if you want the test-version, which you want,> inla.upgrade(testing=TRUE)

I Available for Linux, Windows and Mac

51 / 140

R-INLA

www.r-inla.org

51 / 140

R-INLA

www.r-inla.org

51 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

52 / 140

The structure of an R program using INLA

There are essentially three parts to an INLA program:

1. The data organization.

2. The formula - notation inherited from R’s native glm function.

3. The call to the INLA program.

53 / 140

The inla function

I This is all that’s needed for a basic call

> result <- inla(

formula = y ~ 1 + x, # This describes your latent

# field

family = "gaussian", # The likelihood distribution.

data = data.frame(y,x) # A list or dataframe

54 / 140

The simplest case: Linear regression

n = 100

x = sort(runif(n))

y = 1 + x + rnorm(n, sd = 0.1)

plot(x,y)

formula = y ~ 1 + x

result = inla(formula,

data = data.frame(x,y),

family = "gaussian")

summary(result)

plot(result)

55 / 140

c("inla(formula = formula, family = \"gaussian\", data = data.frame(x, ", " y))")

Time used:

Pre-processing Running inla Post-processing Total

0.08050394 0.03020334 0.01916695 0.12987423

Fixed effects:

mean sd 0.025quant 0.5quant 0.975quant kld

(Intercept) 0.9690533 0.01849785 0.9327319 0.9690531 1.005387 0

x 1.0426582 0.03126996 0.9812582 1.0426580 1.104079 0

The model has no random effects

Model hyperparameters:

mean sd 0.025quant 0.5quant

Precision for the Gaussian observations 127.45 18.10 95.14 126.37

0.975quant

Precision for the Gaussian observations 166.11

Expected number of effective parameters(std dev): 2.209(0.02362)

Number of equivalent replicates : 45.27

Marginal Likelihood: 88.01

56 / 140

Likelihood functions - family argument

I “binomial”

I “coxph”

I “Exponential”

I “gaussian”

I “gev”

I “laplace”

I “sn”(Skew Normal)

I “stochvol”, ”stochvol.nig”, ”stochvol.t”

I “T”

I “weibull”

I Many others: go to http://r-inla.org/

57 / 140

Likelihood functions - family argument

I “binomial”

I “coxph”

I “Exponential”

I “gaussian”

I “gev”

I “laplace”

I “sn”(Skew Normal)

I “stochvol”, ”stochvol.nig”, ”stochvol.t”

I “T”

I “weibull”

I Many others: go to http://r-inla.org/

57 / 140

A more general model

Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

x1, x2 are covariates, linear effect

βi ∼ N (0, τ−11 )

x3 can be the index for spatial effect, random effect, etc

{f1, f2, . . . } ∼ N (0,Q−1f (τ2))

58 / 140

A more general model

Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

x1, x2 are covariates, linear effect

βi ∼ N (0, τ−11 )

x3 can be the index for spatial effect, random effect, etc

{f1, f2, . . . } ∼ N (0,Q−1f (τ2))

58 / 140

A more general model (cont.)Assume the following model:

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

> formula = y ∼ x1 + x2 + f(x3, ...)

g−→ η =

η2...ηn

= β0 ∗

11...1

+ β1 ∗

x12...

+ β2 ∗

x22...

...fx3n

59 / 140

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

> formula = y ∼ x1 + x2 + f(x3, ...)

g−→ η =

η2...ηn

= β0 ∗

11...1

+ β1 ∗

x12...

+ β2 ∗

x22...

...fx3n

59 / 140

y ∼ π(y |η)

η = g(λ) = β0 + β1x1 + β2x2 + f (x3)

> formula = y ∼ x1 + x2 + f(x3, ...)

g−→ η =

η2...ηn

= β0 ∗

11...1

+ β1 ∗

x12...

+ β2 ∗

x22...

...fx3n

59 / 140

Model specification - INLA packageThe model is specified in R through a formula, similar to glm:

> formula = y ∼ x1 + x2 + f(x3, ...)

I y is the name of your response variable in your data frame.

I An intercept is fitted automatically! Use -1 in your formula toavoid it.

I The fixed effects (β0, β1 and β2) are taken as i.i.d. normalwith zero mean and small precision. (This can be changed)

I The f() function contains the random effect specifications.

Some models

I iid, iid1d, ii2d, iid3d: random effects

I rw1, rw2, ar1: smooth effect of covariates or time effect

I seasonal: seasonal effect

I besag: spatial effect (CAR model)

I generic: user defined precision matrix60 / 140

> formula = y ∼ x1 + x2 + f(x3, ...)

Some models

> formula = y ∼ x1 + x2 + f(x3, ...)

Some models

> formula = y ∼ x1 + x2 + f(x3, ...)

Some models

> formula = y ∼ x1 + x2 + f(x3, ...)

Some models

> formula = y ∼ x1 + x2 + f(x3, ...)

Some models

Specifying random effects

Random effects are added to the formula through the function

f(name, model="...", hyper = ...,

replicate = ..., constr = FALSE, cyclic = FALSE)

I name - the name of the random effect. Also refers to thevalues in data which are used for various things, usuallyindexes, e.g. for space or time.

I model - the latent model. Eg. “iid”, “rw2”, “ar1”, etc.

I hyper - specify the prior on the hyperparameters

I constr - Sum to zero constraint?

I cyclic - Are you cyclic? (RW1, RW2 and AR1)

I The are more advanced options, we see later.

61 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

62 / 140

EPIL example

Seizure counts in a randomized trial of anti-convulsant therapy inepilepsy. From WinBUGS manual.

Patient y1 y2 y3 y4 Trt Base Age

1 5 3 3 3 0 11 312 3 5 3 3 0 11 30

....59 1 4 3 2 1 12 37

63 / 140

EPIL example (cont.)

I Mixed model with repeated Poisson counts

yjk ∼ Poisson(µjk); j = 1, . . . , 59; k = 1, . . . , 4

log(µjk) = α0 + α1 log(Basej/4) + α2Trtj+α3Trtj log(Basej/4) + α4Agej + α5V 4+Indj + βjk

αi ∼ N (0, τα) τα knownIndj ∼ N (0, τInd) τInd ∼ Gamma(a1, b1)βjk ∼ N (0, τβ) τβ ∼ Gamma(a2, b2)

64 / 140

EPIL example (cont.)The Epil data frame:

y Trt Base Age V4 rand Ind

5 0 11 31 0 1 1

3 0 11 31 0 2 1...

Specifying the model:

formula = y ∼ log(Base/4) + Trt + I(Trt *

log(Base/4)) + log(Age) + V4 +

f(Ind, model = "iid") + f(rand, model="iid")

η2...

η4∗59

= β0 ∗

11...1

+ . . .+

f Ind1

f Ind1...

f Ind59

f Rand1

f Rand2

...f Ind4∗59

65 / 140

5 0 11 31 0 1 1

3 0 11 31 0 2 1...

η2...

η4∗59

= β0 ∗

11...1

+ . . .+

f Ind1

f Ind1...

f Ind59

f Rand1

f Rand2

...f Ind4∗59

65 / 140

5 0 11 31 0 1 1

3 0 11 31 0 2 1...

η2...

η4∗59

= β0 ∗

11...1

+ . . .+

f Ind1

f Ind1...

f Ind59

f Rand1

f Rand2

...f Ind4∗59

65 / 140

data(Epil)

my.center = function(x) (x - mean(x))

Epil$CTrt = my.center(Epil$Trt)

Epil$ClBase4 = my.center(log(Epil$Base/4))

Epil$CV4 = my.center(Epil$V4)

Epil$ClAge = my.center(log(Epil$Age))

formula = y ~ ClBase4*CTrt + ClAge + CV4 +

f(Ind, model="iid") + f(rand, model="iid")

result = inla(formula,family="poisson", data = Epil)

summary(result)

plot(result)

66 / 140

Epil-example from Win/Open-BUGS

1.2 1.4 1.6 1.8 2.0

Marginals for α0

67 / 140

Epil-example from Win/Open-BUGS

0 5 10 15

Marginals for τβ

67 / 140

Access results

- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)

I result$summary.fixed

I result$summary.random$Ind

I result$summary.random$rand

I result$summary.hyperpar

- Post. marginals (matrix with x- and y- axis)

I result$marginals.fixed

I result$marginals.random$Ind

I result$marginals.random$rand

I result$marginals.hyperpar

68 / 140

Access results

- Summaries (mean, sd, [0.025, 0.5, 0.975]-quantiles, kld)

I result$summary.fixed

I result$summary.random$Ind

I result$summary.random$rand

I result$summary.hyperpar

- Post. marginals (matrix with x- and y- axis)

I result$marginals.fixed

I result$marginals.random$Ind

I result$marginals.random$rand

I result$marginals.hyperpar

68 / 140

Smoothing binary times series

0 100 200 300

0.00.5

1.01.5

Number of days in Tokyo with rainfall above 1 mm in 1983-84.We want to estimate the probability of rain pt for calendar dayt = 1, . . . , 366

69 / 140

Smoothing binary times series

I Model with time series component

yt ∼ Binomial(nt , pt); t = 1, . . . , 366

pt = exp(ηt)1+exp(ηt)

ηt = f (t)f = {f1, . . . , f366} ∼ cyclic RW2(τ)τ ∼ Gamma(1, 0.0001)

70 / 140

Smoothing binary time series

The Tokyo data frame:

y n time

1 2 3...

71 / 140

y n time

1 2 3...

Specifying the model:formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1

71 / 140

y n time

1 2 3...

Specifying the model:formula = y ∼ f(time, model="rw2", cyclic=TRUE)-1

η2...

f time1

f time2

...f time366

71 / 140

data(Tokyo)

formula = y ~ f(time, model="rw2", cyclic=TRUE) - 1

result = inla(formula, family="binomial", Ntrials=n,

data=Tokyo)

72 / 140

Posterior for temporal effect

0 100 200 300

PostMean 0.025% 0.5% 0.975%

73 / 140

Posterior for precision

0 10000 20000 30000 40000 50000 60000

PostDens [Precision for time]

74 / 140

Disease mapping in Germany

Larynx cancer mortality counts are observed in the 544 district ofGermany from 1986 to 1990 and level of smoking consumption(100 possible values).

75 / 140

yi , i = 1, . . . , 544 counts of cancer mortality in Region iEi , i = 1, . . . , 544 known variable accounting for demographicvariation in Region ici , i = 1, . . . , 544 level of smoking consumption registered inRegion i

76 / 140

The model

yi ∼ Poisson{Ei exp(ηi )}; i = 1, . . . , 544ηi = µ+ f (ci ) + fs(si ) + fu(si )

where:

I f (ci ) is a smooth effect of the covariate

f = {f1, . . . , f100} ∼ RW2(τf )

I fs(si ) is a spatial effect modeled as an intrinsic GMRF

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

∑s∼s′

fs(s ′),τfsns

I fu(si ) is a random effect

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

I µ is an intercept term µ ∼ N (0, 0.0001)

77 / 140

The model

where:

f = {f1, . . . , f100} ∼ RW2(τf )

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

∑s∼s′

fs(s ′),τfsns

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

77 / 140

The model

where:

f = {f1, . . . , f100} ∼ RW2(τf )

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

∑s∼s′

fs(s ′),τfsns

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

77 / 140

The model

where:

f = {f1, . . . , f100} ∼ RW2(τf )

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

∑s∼s′

fs(s ′),τfsns

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

77 / 140

The model

where:

f = {f1, . . . , f100} ∼ RW2(τf )

fs(s)|fs(s ′), s 6= s ′, λs ∼ N (1

∑s∼s′

fs(s ′),τfsns

fu = {fu(s1), . . . , fu(s544)} ∼ N(0, τfu I)

77 / 140

For identifiably we define a sum-to-zero constraint for all intrinsicmodels, so ∑

s fs(s) = 0∑i fi = 0

78 / 140

The Germany data frame:

region E Y x

0 7.965008 8 56

1 22.836219 22 65

The model is:

ηi = µ+ f (ci ) + fs(si ) + fu(si )

I The data set has to contain one separate column for eachterm specified through f() so in this case we have to add onecolumn.> Germany = cbind(Germany, region.struct=Germany$region)

I We also need the graph file where the neighborhood structureis specified germany.graph

79 / 140

region E Y x

0 7.965008 8 56

1 22.836219 22 65

The model is:

79 / 140

region E Y x

0 7.965008 8 56

1 22.836219 22 65

The model is:

79 / 140

The new data set is:

region E Y x region.struct

0 7.965008 8 56 0

1 22.836219 22 65 1

Then the formula isformula <- Y ∼f(region.struct,model="besag",graph="germany.graph")+

f(x,model="rw2")+f(region)

80 / 140

0 7.965008 8 56 0

1 22.836219 22 65 1

The sum-to-zero constraint is default in the inla function for allintrinsic models.

80 / 140

0 7.965008 8 56 0

1 22.836219 22 65 1

The sum-to-zero constraint is default in the inla function for allintrinsic models.

80 / 140

0 7.965008 8 56 0

1 22.836219 22 65 1

80 / 140

0 7.965008 8 56 0

1 22.836219 22 65 1

The location of the graph file has to be provided here (the graphfile cannot be loaded in R)

80 / 140

The graph file

The germany.graph file:

5441 1 122 2 10 113 4 6 8 15 387...

I Total number of nodes in the graph

I Identifier for the node

I Number of neighbors

I Identifiers for the neighbors

81 / 140

The graph file

5441 1 122 2 10 113 4 6 8 15 387...

81 / 140

The graph file

5441 1 122 2 10 113 4 6 8 15 387...

81 / 140

The graph file

5441 1 122 2 10 113 4 6 8 15 387...

81 / 140

The graph file

5441 1 122 2 10 113 4 6 8 15 387...

81 / 140

data(Germany)

g = system.file("demodata/germany.graph", package="INLA")

source(system.file("demodata/Bym-map.R", package="INLA"))

Germany = cbind(Germany, region.struct=Germany$region)

# standard BYM model

formula1 = Y ~ f(region.struct,model="besag",graph=g) +

f(region,model="iid")

# with linear covariate

f(region,model="iid") + x

# with smooth covariate

f(region,model="iid") + f(x, model="rw2")

82 / 140

result1 = inla(formula1,family="poisson",data=Germany,E=E,

control.compute=list(dic=TRUE))

83 / 140

Other graph specification

- It is also possible to define the graph structure of your modelusing:

I A symmetric (dense or sparse) matrix, where the non-zeropattern of the matrix defines the graph.

I A inla.graph object.

See FAQ on the webpage for more information.

84 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

85 / 140

Model evaluationI Deviance Information Criterion (DIC):

result = inla(..., control.compute = list(dic = TRUE))

result$dic$dic

I Conditional predictive ordinate (CPO) and probability integraltransform (PIT):

CPOi = π(yi |y−i )

PITi = Prob(Yi ≤ yobsi |y−i )

result = inla(..., control.compute = list(cpo = TRUE))

result$cpo$cpo

result$cpo$dic

86 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

87 / 140

Controlling θ

I We often need to set our own priors and using our ownparameters in these.

I These can be set in two ways

Old style using prior=.., param=..., initial=...,

fixed=...

New style using hyper = list(prec =

list(initial=2, fixed=TRUE, ....))

The old-style is there for backward-compatibility only. The twostyles can also be mixed.

88 / 140

Controlling θ

fixed=...

88 / 140

Controlling θ

fixed=...

88 / 140

Example- New style

hyper = list(

prec = list(

prior = "loggamma",

param = c(2,0.1),

initial = 3,

fixed = FALSE

formula = y ~ f(i, model="iid", hyper = hyper) + ...

- Old style

formula = y ~ f(i, model="iid", prior = "loggamma",

param = c(2,0.1), inital = 3,

fixed = FALSE) + ...

89 / 140

Internal and external scale

Hyperparameters, like the precision τ is represented internally usinga “good” transformation, like

θ1 = log(τ)

I Initial values are given in the internal scale

I the to.theta and from.theta functions can be used to mapbetween the external and internal scale.

90 / 140

θ1 = log(τ)

90 / 140

θ1 = log(τ)

90 / 140

Example: AR1 model

theta1

name log precisionshort.name prec

prior loggammaparam 1 5e-05initial 4fixed FALSE

to.thetafrom.theta

theta2

name logit lag one correlationshort.name rho

prior normalparam 0 0.15initial 2fixed FALSE

to.thetafrom.theta

constr FALSE

nrow.ncol FALSE

augmented FALSE

aug.factor 1

aug.constr

n.div.by

n.required FALSE

set.default.values FALSE

pdf ar1

91 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

92 / 140

Feature: replicate

“replicate” generates iid replicates from the same model with thesame hyperparameters.

If x | θ ∼ AR(1), then nrep=3, makes

x = (x1, x2, x3)

with mutually independent xi ’s from AR(1) with the same θ

Most f()-models can be replicated

93 / 140

Example: replicate

x1 = arima.sim(n, model=list(ar=0.9)) + 1

x2 = arima.sim(n, model=list(ar=0.9)) - 1

y1 = rpois(n,exp(x1))

y = c(y1,y2)

i = rep(1:n,2)

r = rep(1:2,each=n)

intercept = as.factor(r)

formula = y ~ f(i, model="ar1", replicate=r) + intercept -1

result = inla(formula, family = "poisson",

data = data.frame(y=y,i=i,r=r))

94 / 140

Example: replicate

i = rep(1:n,2)

r = rep(1:2,each=n)

...yn,1y1,2

...yn,2

g−→

...ηn,1η1,2

...ηn,2

f i1,1...

f in,1

f i1,2...

f in,2

+ β0,1 ∗

1...10...0

+ β0,2 ∗

0...01...1

95 / 140

Feature: More than one family

Every observation could have its own likelihood!

I Response is a matrix or list

I Each “column” defines a separate “family”

I Each “family” has its own hyperparameters

96 / 140

phi = 0.9

x1 = 1 + arima.sim(n, model=list(ar=phi))

x2 = 0.5 + arima.sim(n, model=list(ar=phi))

y1 = rbinom(n,size=1, prob=exp(x1)/(1+exp(x1)))

y = matrix(NA, 2*n, 2)

y[ 1:n, 1] = y1

y[n+1:n, 2] = y2

i = rep(1:n,2)

r = rep(1:2,each=n)

Ntrials = c(rep(1,n), rep(NA,n))

result = inla(formula, family = c("binomial", "poisson"),

Ntrials = Ntrials, data = data.frame(y,i,r))

97 / 140

y = matrix(NA, 2*n, 2)

y[ 1:n, 1] = y1

y[n+1:n, 2] = y2

i = rep(1:n,2)

r = rep(1:2,each=n)

Ntrials = c(rep(1,n), rep(NA,n))

result = inla(formula, family = c("binomial", "poisson"),

Ntrials = Ntrials, data = data.frame(y,i,r))

y1,1 NA...

...yn,1 NANA y1,2

......

NA yn,2

g−→

...ηn,1η1,2

...ηn,2

f i1,1...

f in,1f i1,2...

f in,2

+ β0,1 ∗

1...10...0

+ β0,2 ∗

0...01...1

98 / 140

More than one family - More examples

Some rather advanced examples on www.r-inla.org using thisfeature

I Preferential sampling, geostatistics (marked point process)

I Weibull-survival data and “longitudinal” data

99 / 140

Feature: copy

The model

formula = y ~ f(i, ...) + ...

Only allow ONE element from each sub-model, to contribute tothe linear predictor for each observation.

Sometimes this is not sufficient.

100 / 140

Feature: copy

Supposeηi = ui + ui+1 + ...

Then we can code this as

formula = f(i, model="iid") + f(i.plus, copy="i")

I The copy-feature, creates an additional sub-model which isε-close to the target.

I Many copies allowed

I Copy with unknown scaling (default scaling is fixed to 1).

η1...ηn

101 / 140

Feature: copySuppose that

ηi = ai + bizi + ....

where(ai , bi )

iid∼ N2(0,Σ)

- Simulate data

n = 100

Sigma = matrix(c(1, 0.8, 0.8, 1), 2, 2)

z = runif(n)

ab = rmvnorm(n, sigma = Sigma)

a = ab[, 1]

b = ab[, 2]

eta = a + b * z

s = 0.1

y = eta + rnorm(n, sd=s)

102 / 140

i = 1:n

j = 1:n + n

formula = y ~ f(i, model="iid2d", n = 2*n) + f(j, z, copy="i") -1

r = inla(formula, data = data.frame(y, i, j))

...ηn

...anb1

b1 ∗ z1

...bn ∗ zn

103 / 140

Feature: Linear-combinations

Possible to extract extra information from the model through linearcombinations of the latent field, say

v = Bx

for a k × n matrix B.

104 / 140

Feature: Linear-combinations (cont.)

Two different approaches.

1. Most “correct” is to do the computations on the enlarged field

x̃ = (x, v)

But this often lead to more dense precision matrix.

2. The second option is to compute these “offline”, as(conditionally on θ)

Var(v1) = Var(bT1 x) ≈ bT

1 Q−1GMRFapproxb1

andE (v1) = b1E (x)

Approximate density of v1 with a Normal.

105 / 140

Feature: Linear-combinations (cont.)

Two different approaches.

1. Most “correct” is to do the computations on the enlarged field

x̃ = (x, v)

But this often lead to more dense precision matrix.

2. The second option is to compute these “offline”, as(conditionally on θ)

Var(v1) = Var(bT1 x) ≈ bT

1 Q−1GMRFapproxb1

andE (v1) = b1E (x)

Approximate density of v1 with a Normal.

105 / 140

formula = y ~ ClBase4*CTrt + ClAge + CV4 +

f(Ind, model="iid") + f(rand, model="iid")

## Now I want the posterior for

## 1) 2*CTrt - CV4

## 2) Ind[2] - rand[2]

lc1 = inla.make.lincomb( CTrt = 2, CV4 = -1)

names(lc1) = "lc1"

lc2 = inla.make.lincomb( Ind = c(NA,1), rand = c(NA,-1))

names(lc2) = "lc2"

## default is to derive the marginals from lc’s without changing the

## latent field

result1 = inla(formula,family="poisson", data = Epil,

lincomb = c(lc1, lc2))

## but the lincombs can also be additionally included into the latent

## field for increased accurancy...

result2 = inla(formula,family="poisson", data = Epil,

lincomb = c(lc1, lc2),

control.inla = list(lincomb.derived.only = FALSE))

106 / 140

- Get the results

result$summary.lincomb.derived

result$marginals.lincomb.derived # results of the

# default method

result$summary.lincomb

result$marginals.lincomb # alternative method

- Posterior correlation matrix between all the linearcombinations

control.inla = list(lincomb.derived.correlation.matrix = TRUE)

result$misc$lincomb.derived.correlation.matrix

- Many linear combinations at onceUse inla.make.lincombs()

107 / 140

A-matrix in the linear predictor (I)

Usual formulaη = ...

andyi ∼ π(yi | ηi , ...)

108 / 140

A-matrix in the linear predictor (II)

Extended formulaη = ...

η∗ = Aη

andyi ∼ π(yi | η∗i , ...)

Implemented as

A = matrix(...)

A = sparseMatrix(...)

result = inla(formula, ...,

control.predictor = list(A = A))

109 / 140

A-matrix in the linear predictor (II)

Extended formulaη = ...

η∗ = Aη

andyi ∼ π(yi | η∗i , ...)

Implemented as

A = matrix(...)

A = sparseMatrix(...)

result = inla(formula, ...,

control.predictor = list(A = A))

109 / 140

A-matrix in the linear predictor (III)

I Can really simplify model-formulations

I Duplicate to some extent the “copy” feature

I Really useful for some models; the A-matrix need not to be asquare matrix...

110 / 140

Feature: remote computing

For large/huge models, its more convenient to run thecomputations on the remote (Linux/Mac) computational server

inla(...., inla.call="remote")

using ssh (and Cygwin on windows).

111 / 140

Control statements

The control.xxx statements control various parts of the INLAprogram

I control.predictorI A — The ”A matrix”or ”Observational Matrix”linking the

latent field to the data.

I control.modeI x,theta, result — Gives modes to INLA.I restart = TRUE — Tells INLA to try to improve on the

supplied mode

I control.computeI dic, mlik, cpo — Compute measures of fit.

I control.inlaI strategy and int.strategy contain useful advanced

features.

Various other—see help!

112 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

113 / 140

Space-varying regression

Number of (insurance-type) losses Nkt in 431municipalities/regions of Norway in relation to one weathercovariate Wkt .The likelihood is

Nkt ∼ Poisson(Akt pkt); k = 1, . . . , 431 t = 1, . . . , 10

The model for log pkt is:

log pkt = β0 + βk Wkt

where βk is the regression coefficients for each municipality.

114 / 140

Borrow strength..

Few losses is in each region; high variability in the estimates.

Borrow strength, by letting {β1, . . . , β431} to be smooth in space:

{β1, . . . , β431} ∼ CAR(τβ)

115 / 140

Borrow strength..

Few losses is in each region; high variability in the estimates.

Borrow strength, by letting {β1, . . . , β431} to be smooth in space:

{β1, . . . , β431} ∼ CAR(τβ)

115 / 140

The data set:

y region W

1 0 1 0.4

2 0 1 0.4

10 0 1 0.4

11 1 2 0.2

12 0 2 0.2

20 0 2 0.2

116 / 140

Second argument in f() is the weight which defaults to 1

ηi = ...+ wi fi + ...

is represented as

f(i, w, ...)

No need for sum-to-zero constraint!

norway = read.table("norway.dat", header=TRUE)

formula = y ~ 1 + f(region, W, model="besag",

graph.file="norway.graph",

constr=FALSE)

result = inla(formula, family="poisson", data=norway)

117 / 140

Survival models

patient time event age sex1 8,16 1,1 28,28 02 23,13 1,0 48,48 13 22,18 1,1 32,32 0

I Times of infection from the time of insertion of catheter on 38kidney patients using portable dialysis equipment.

I 2 observation for each patient (38 patients).

I Each time can be an event (infection) or a censoring (noinfection)

118 / 140

The Kidney data

The Kidney data frame

time event age sex ID

8 1 28 0 1

16 1 28 0 1

23 1 48 1 2

13 0 48 1 2

22 1 32 0 3

28 1 32 0 3

119 / 140

data(Kidney)

formula = inla.surv(time,event) ~ age + sex + f(ID,model="iid")

result1 = inla(formula, family="coxph", data=Kidney)

result2 = inla(formula, family="weibull", data=Kidney)

result3 = inla(formula, family="exponential", data=Kidney)

120 / 140

Outline

INLA implementation

Some examples

Model evaluation

More examples

Extras

121 / 140

A toy-example using copy

State-space modelyt = xt + vt

xt = 2xt−1 − xt−2 + wt

Rewrite this asyt = xt + vt

0 = xt − 2xt−1 + xt−2 + wt

and implement this as two families

1. Observations yt with precision Prec(vt)

2. Observations 0 with precision Prec(wt), or Prec=HIGH.

122 / 140

A toy-example using copy

State-space modelyt = xt + vt

xt = 2xt−1 − xt−2 + wt

Rewrite this asyt = xt + vt

0 = xt − 2xt−1 + xt−2 + wt

and implement this as two families

1. Observations yt with precision Prec(vt)

2. Observations 0 with precision Prec(wt), or Prec=HIGH.

122 / 140

n = 100

m = n-2

y = sin((1:n)*0.2) + rnorm(n, sd=0.1)

formula = Y ~ f(i, model="iid", initial=-10, fixed=TRUE) +

f(j, w, copy="i") + f(k, copy="i") +

f(l, model ="iid") -1

Y = matrix(NA, n+m, 2)

Y[1:n, 1] = y

Y[1:m + n, 2] = 0

i = c(1:n, 3:n) # x_t

j = c(rep(NA,n), 3:n -1) # x_t-1

w = c(rep(NA,n), rep(-2,m)) # weights for j

k = c(rep(NA,n), 3:n -2) # x_t-2

l = c(rep(NA,n), 1:m) # v_t

r = inla(formula, data = data.frame(i,j,w,k,l,Y),

family = c("gaussian", "gaussian"),

control.data = list(list(), list(initial=10, fixed=TRUE)))

123 / 140

Stochastic Volatility model

0 200 400 600 800 1000

Log of the daily difference of the pound-dollar exchange rate fromOctober 1st, 1981, to June 28th, 1985.

124 / 140

Stochastic Volatility model

Simple model

xt | x1, . . . , xt−1, τ, φ ∼ N (φxt−1, 1/τ)

where |φ| < 1 to ensure a stationary process.

Observations are taken to be

yt | x1, . . . , xt , µ ∼ N (0, exp(µ+ xt))

125 / 140

Results

Using just the first 50 data-points only, which makes the problemmuch harder.

126 / 140

Results

−10 −5 0 5 10 15 20

ν = logit(2φ− 1)

126 / 140

Results

0 2 4 6

log(κx)

126 / 140

Using the full dataset

0 200 400 600 800 1000

The Pound-Dollar data.

127 / 140

0 200 400 600 800

−3−2

Mean of xt + µ

128 / 140

0 100 200 300 400 500

convert.dens(xx, yy, FUN = exp)$x

conver

t.dens(

xx, yy

, FUN =

The posterior marginal for the precision.

129 / 140

0.70 0.75 0.80 0.85 0.90 0.95 1.00

convert.dens(xx, yy, FUN = phi.trans)$x

conver

t.dens(

xx, yy

, FUN =

phi.tra

The posterior marginal for the lag-1 correlation.

130 / 140

0 200 400 600 800 1000

−3−2

Predictions for µ+ xt+k

131 / 140

New data-model: Student-tν

Now extend the model to use Student-tν distribution

yt | x1, . . . , xt ∼ exp(µ/2 + xt/2)× Student-tν/√ν/(ν − 2)

132 / 140

Student-tν

0 20 40 60 80 100

convert.dens(xx, yy, FUN = dof.trans)$x

conver

t.dens(

xx, yy

, FUN =

dof.tra

Posterior marginal for ν.

133 / 140

Student-tν

0 200 400 600 800 1000

−3−2

Predictions

134 / 140

Student-tν

0 200 400 600 800 1000

−3−2

Comparing predictions with Student−tν and Gaussian

135 / 140

Student-tν

However,I No support for Student-tν in the data

I Bayes-factorI Deviance Information Criteria

136 / 140

Disease mapping: The BYM-model

I Log-relative risk ηi = ui + viI Structured component u

I Log-precisions log κu and log κv

−0.63

−0.37

−0.1

I A hard case: Insulin Dependent Diabetes Mellitus in 366districts of Sardinia. Few counts.

I dim(θ) = 2.

137 / 140

Marginals for θ|y

138 / 140

Marginals for θ|y

138 / 140

Marginals for xi |y

139 / 140

THANK YOU

140 / 140

Bayesian computation with INLA

Education

Transcript of Bayesian computation with INLA

Beyond MCMC in fitting complex Bayesian models: The INLA ...valeskaandreozzi.weebly.com/uploads/1/1/9/2/11926888/euroepi_inla.pdf · Beyond MCMC in ﬁtting complex Bayesian models:

Approximate Bayesian Computation in Evolution and Ecologyjtaylor/teaching/Fall2013/...techniques—often dubbed approximate Bayesian computation, or likelihood-free inference—has

Approximate Bayesian Computation (ABC)

Bayesian Source Localization using Stochastic Computation

Approximate Bayesian Computation on GPUs

ABC-CDE: Towards Approximate Bayesian Computation with ...

An introduction to Approximate Bayesian Computation methodsvabar.es/assets/scova16/Castellanos-Talk.pdf · An introduction to Approximate Bayesian Computation methods M.E.Castellanos

Probabilistic Programming for Bayesian Computation: Part I · Probabilistic Programming for Bayesian Computation: Part I Harsha Veeramachaneni 2018/01/01 FG. Probabilistic Models

Introduction to Bayesian Computation

Default Priors and E cient Posterior Computation in Bayesian ...people.ee.duke.edu/~lcarin/DunsonBayesianFA.pdfDefault Priors and E cient Posterior Computation in Bayesian Factor Analysis

Approximate Bayesian Computation (ABC) in R: A Vignette

R INLA: A review - ICEx - UFMGcristianocs/MetComput/Rel8.pdfR INLA: A review 18 de novembro de 2018 Luiza Sette C. Piancastelli, Thaís Pacheco Menezes Artigo de eferrência: Bayesian

Approximate Bayesian computation and machine learning (BigMC 2014)

A compositional approach to scalable Bayesian computation ...ndjw1/docs/djw-acmll.pdf · A compositional approach to scalable Bayesian computation and probabilistic programming Darren

Bayesian Spatial Modelling with R-INLA - · PDF fileBayesian Spatial Modelling with R-INLA ... Markov models in image analysis and spatial statistics have been largely con- ... The

Bayesian Modeling and Computation for Complex Spatial ...

Bayesian Theory and Computation [1em] Lecture 3: Monte ...

Bayesian Hierarchical Models - UBC Department of …gavin/STEPIBookNewStyle/course/clapem/... · I Day 2 - Implementing Bayesian models using R-INLA (Practical) I Day 3 ... Bayesian

Approximate Bayesian computation (ABC) 1cm NIPS Tutorial · Approximate Bayesian computation (ABC) NIPS Tutorial Richard Wilkinson r.d.wilkinson@nottingham.ac.uk School of Mathematical

Bayesian Computing with INLA: A Review...INLA performs inference within a reasonable time frame and in most cases is both faster and more accurate than MCMC alternatives. This might