Lecture 15 Generalised linear mixed models in WinBUGS.

Lecture 15

Generalised linear mixed models in WinBUGS

Lecture Contents

• Differences in WinBUGS methods

• Rejection sampling algorithm

• Adaptive rejection (AR) sampling

• Reunion Island dataset

• Hierarchical centering

• Probit formulation

Logistic regression model (recap)

• A standard Bayesian logistic regression model (e.g. for the rat tumour example) can be written as follows:

• Both MLwiN and WinBUGS can fit this model but can we write out the conditional posterior distributions and use Gibbs Sampling?

),0(~),,0(~

)(

),(~

1100 mNmN

Xplogit

pnBinomialy

i

iii

Conditional distribution for β0

?~),|(

)exp(1

1

)exp(1

)exp()

2exp(

1

),|()(),|(

10

1010

10

0

20

0

10010

yp

xx

x

mm

yppyp

i

yn

i

y

i

i

iii

This distribution is not a standard distribution and so we cannot simply simulate from a standardrandom number generator. However both WinBUGS and MLwiN can fit this model using MCMC. We will in this lecture describe how WinBUGS deals with this problem.

Why not use Gibbs?

To use Gibbs we would need to generate samples from the distribution

Although this is not a standard distribution and so there is no easily available function to generate directly from this distribution other methods exist.

The method that WinBUGS uses is called adaptive rejection (AR) sampling and can be used to sample from any log concave distribution function.

Before going on to AR sampling we will consider rejection sampling in general.

).,|( 10 yp

Rejection Sampling

Assume we wish to generate from a distribution function f(x) but it is difficult to generate from this function directly. However there exists a distribution function g(x) that is easy to sample from and f(x) ≤ M g(x) for all x.

Note that this is often easier for distributions with a bounded range but can also be used for unbounded distributions.

Then we sample from g(x) and accept the sampled value with probability Mg(x)/f(x). If we reject we sample again until we accept a value. Note the algorithm works best if g(x) is close to f(x) and M is small.

Truncated Normal example

Let us assume that we wish to generate random variables from a truncated standard normal truncated at -2 and +2

The standard method would be to generate from a Normal(0,1) and reject values outside the range

(-2,2)

An alternative would be to generate from a (scaled) Uniform distribution on (2,-2) as shown. The probability of acceptance is around 0.95*sqrt(2π)/4=0.595

5000 random draws results in acceptance rate of .606

Adapted Rejection Sampling

AR sampling (Gilks and Wild 1992) takes rejection sampling a step further by clever construction of the function g(x) that is used. The algorithm works for all log concave functions f(x). A few points in f(x) are chosen and the tangent to log f(x) is constructed at each point. These tangents are joined to form a piecewise linear function g(x) that is an envelope for log f(x).

When a point is chosen from g(x) it is accepted/rejected using rejection sampling. Also however the tangent at the point is evaluated and the envelope function is updated to be a closer approximation to log f(x).

An illustration of AR sampling

WinBUGS 1.4 New Method

• In fact for the models we consider WinBUGS 1.4 no longer uses AR sampling for fixed effects.

• Instead it uses a method by Gamerman(1997). This is essentially a Metropolis-Hastings algorithm where at each iteration the MV normal proposal distribution is formed by performing one iteration, starting at the current point, of Iterative Weighted Least Squares (IWLS). A routine that is similar to the IGLS estimation method used in MLwiN for MQL/PQL estimates.

• To see the AR sampler in action you will need to use WinBUGS 1.3 or go to blocking options under the Options/Blocking options screen and remove the tick box. (You can try this in the practical if you wish).

Reunion Island 2 level dataset

We will here consider the 2 level reunion dataset as considered in MLwiN and the final model:

WinBUGS code for the modelmodel{# Level 1 definitionfor(i in 1:N) {fscr[i] ~ dbin(p[i],denom[i])logit(p[i]) <- beta[1] * cons[i] + beta[2] * ai[i]+ u2[herd[i]] * cons[i] }# Higher level definitionsfor (j in 1:n2) {u2[j] ~ dnorm(0,tau.u2)}# Priors for fixed effectsfor (k in 1:2) { beta[k] ~ dflat() }# Priors for random termstau.u2 ~dgamma(0.001000,0.001000)sigma2.u2 <- 1/tau.u2}

Here we see the code for a 2-level logistic regression model. Note the use of dbin for the binomial distribution and the logit link function.

The file also contains initial values from the 1st order MQL run of this model and data values.

We are using Gamma(ε,ε) priors.

WinBUGS node info

On the info menu you can select node info.

This will inform you (in a funny language) what method is being used for each node.

For this model we get:

beta[1] UpdaterGLM.LogitUpdater – means the Gamerman method.

u2[1] UpdaterRejection.Logit – means the AR method. (I think!)

tau.u2 UpdaterGamma.Updater – means conjugate Gibbs sampling.

WinBUGS chains

The model was run for 5,000 iterations after a burnin of 500 which took just over 2 minutes.

The chains all look reasonable.

beta[1]

iteration

501 2000 4000

0.0

0.5

1.0

1.5

beta[2]

iteration

501 2000 4000

-2.0

-1.5

-1.0

-0.5

sigma2.u2

iteration

501 2000 4000

0.0

0.2

0.4

0.6

Point Estimates

Estimates are similar to MLwiN:

MLwin estimates after 50,000 iterations

Node Mean SD

Beta[1] 0.651 0.169

Beta[2] -1.113 0.174

Sigma2.u2 0.091 0.056

Reunion Island 3 level dataset

We will here consider the 3 level reunion dataset as considered in MLwiN and the final model which gave estimation difficulties:

WinBUGS chains

The model was run for 5000 iterations after a burnin of 500 which took 10 minutes!

Although the intercept chain on the left looks good and better than MH, the cow level variance has clearly not converged as yet. So the WinBUGS method does not fix the problem.

sigma2.u2

iteration

501 2000 4000

0.0

0.2

0.4

0.6

beta[1]

iteration

501 2000 4000

0.0

0.25

0.5

0.75

1.0

Between cows variance after 50k iterations using WinBUGS

Hierarchical centering formulation

This is a method for improving the mixing of an MCMC method.

Consider the VC model

This can be written equivalently as

and the Gibbs sampling algorithm written using this parameterisation. This may improve mixing if j is less correlated with 0 than uj is. See Browne (2004) for models where this works well.

),0(~),,0(~ 22

0

eijuj

ijjij

NeNu

euy

),0(~),,(~ 220 eijuj

ijjij

NeN

ey

Hierarchical centering in WinBUGS

model{# Level 1 definitionfor(i in 1:N) {fscr[i] ~ dbin(p[i],denom[i])logit(p[i]) <- beta[2] * ai[i]+ u2[cow[i]] }# Higher level definitionsfor (j in 1:n2) {u2[j] ~ dnorm(u3[herd[j]],tau.u2)}for (j in 1:n3) {u3[j] ~ dnorm(beta[1],tau.u3)}# Priors for fixed effectsfor (k in 1:2) { beta[k] ~ dflat() }# Priors for random termstau.u2 ~ dgamma(0.001000,0.001000)sigma2.u2 <- 1/tau.u2tau.u3 ~ dgamma(0.001000,0.001000)sigma2.u3 <- 1/tau.u3}

It is easy to modify the code in WinBUGS to fit a hierarchically centred model as shown to the left. Note the main difficulty is for a 3 level model the mapping vector herd must now map from cows to herd rather than observations to herd.

This means changing the datafile.

Trace plot for hierarchical centred formulation

Unfortunately this doesn’t improve things much!

The parameter expansion method also discussed in Browne (2004) may work better here and may be worth examining.

Estimates comparison

In the following table we compare estimates after 50k (100k for MLwiN) iterations from 3 methods and see reasonable agreement.

Parameter MLwiN WinBUGS H.C.

β0 0.563 (0.125)

0.560 (0.127)

0.570 (0.131)

β1 -1.014 (0.130)

-1.007 (0.130)

-1.021 (0.131)

σ2v 0.094

(0.043)0.094

(0.043)0.096

(0.044)

σ2u 0.229

(0.137)0.202

(0.139)0.231

(0.127)

Probit Regression

The logistic link function is only one possible link function for Binomial data.

Any functions that will map from (0,1) to the whole real line will do and another popular choice is the probit link (the inverse of the normal cdf).

This link with the normal distribution can work to our advantage and allow another way of using Gibbs sampling for a binary data model. This is through the use of latent variables.

Latent variable approach

One source of binary data is the thresholding of a continuous response, for example in education students often take exams and rather than a mark we observe whether the student passes or fails.

The latent variable approach works like the reverse of this i.e. we see the binary response and from this we generate the underlying continuous variable.

Simple example

Consider the random effects probit regression model:

This model is equivalent to the following Normal response model:

where yij* is unobserved but is restricted to positive values

when y = 1 and negative values when y = 0.

),0(~,)probit(

)(Bernoulli~2ujjijij

ijij

NuuXy

py

),0(~),,0(~ 22

*

eijuj

ijjij

NeNu

euXyij

Gibbs Sampling algorithm

We then have four steps for our latent variable model:

1. Generate yij* from its truncated Normal conditional

distribution for all i and j.

2. Generate from its (multivariate) Normal conditional distribution.

3. Generate uj from its Normal conditional distribution for each j.

4. Generate 2u from its inverse Gamma conditional

distribution.

Probit model for 2 level reunion island dataset (demo)

Using Gibbs in MLwiN:

Trace plots for only 5k iterations

Probit for 3 level reunion island dataset

• Still not great but ESS upto 206!

Introduction to the Practical

In the next practical you will have the chance to explore the possible methods from this lecture on the two datasets from the last practical:

• The Bangladesh contraceptive use dataset.

• The pig pneumonia dataset.

Lecture 15 Generalised linear mixed models in WinBUGS.

Documents

Transcript of Lecture 15 Generalised linear mixed models in WinBUGS.