Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i...

35
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Aus´ ın and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering Conchi Aus´ ın and Mike Wiper Regression and hierarchical models Masters Programmes 1 / 35

Transcript of Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i...

Page 1: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Bayesian Inference

Chapter 4: Regression and Hierarchical Models

Conchi Ausın and Mike Wiper

Department of Statistics

Universidad Carlos III de Madrid

Master in Business Administration and Quantitative MethodsMaster in Mathematical Engineering

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 1 / 35

Page 2: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Objective

AFM Smith Dennis Lindley

We analyze the Bayesian approach to fitting normal and generalized linear modelsand introduce the Bayesian hierarchical modeling approach. Also, we study themodeling and forecasting of time series.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 2 / 35

Page 3: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Contents

1 Normal linear models

1.1. ANOVA model

1.2. Simple linear regression model

2 Generalized linear models

3 Hierarchical models

4 Dynamic models

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 3 / 35

Page 4: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Normal linear models

A normal linear model is of the following form:

y = Xθ + ε,

where y = (y1, . . . , yn)′ is the observed data, X is a known n × k matrix, calledthe design matrix, θ = (θ1, . . . , θk)′ is the parameter set and ε follows amultivariate normal distribution. Usually, it is assumed that:

ε ∼ N(0k ,

1

φIk

).

A simple example of normal linear model is the simple linear regression model

where X =

(1 1 . . . 1x1 x2 . . . xn

)T

and θ = (α, β)T .

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 4 / 35

Page 5: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Normal linear models

Consider a normal linear model, y = Xθ + ε. A conjugate prior distribution is anormal-gamma distribution:

θ | φ ∼ N(m,

1

φV

)φ ∼ G

(a

2,b

2

).

Then, the posterior distribution given y is also a normal-gamma distribution with:

m∗ =(XTX + V−1

)−1 (XTy + V−1m

)V∗ =

(XTX + V−1

)−1

a∗ = a + n

b∗ = b + yTy + mTV−1m−m∗TV∗−1m∗

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 5 / 35

Page 6: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Normal linear models

The posterior mean is given by:

E [θ | y] =(XTX + V−1

)−1 (XTy + V−1m

)=(XTX + V−1

)−1(XTX

(XTX

)−1XTy + V−1m

)=(XTX + V−1

)−1(XTXθ + V−1m

)where θ =

(XTX

)−1XTy is the maximum likelihood estimator.

Thus, this expression may be interpreted as a weighted average of the priorestimator, m, and the MLE, θ, with weights proportional to precisions since,conditional on φ, the prior variance is 1

φV and that the distribution of the MLE

from the classical viewpoint is θ | φ ∼ N(θ, 1

φ (XTX)−1)

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 6 / 35

Page 7: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Normal linear models

Consider a normal linear model, y = Xθ + ε, and assume the limiting priordistribution,

p(θ, φ) ∝ 1

φ.

Then, we have that,

θ | y, φ ∼ N(θ,

1

φ

(XTX

)−1),

φ | y ∼ G

n − k

2,yTy − θ

T (XTX

2

.

Note that σ2 =yT y−θT(XTX)θ

n−k is the usual classical estimator of σ2 = 1φ .

In this case, Bayesian credible intervals, estimators etc. will coincide with theirclassical counterparts.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 7 / 35

Page 8: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

ANOVA model

The ANOVA model is an example of normal lineal model where:

yij = θi + εij ,

where εij ∼ N (0, 1φ ), for i = 1, . . . , k , and j = 1, . . . , ni .

Thus, the parameters are θ = (θ1, . . . , θk), the observed data arey = (y11, . . . , y1n1 , y21, . . . , y2n2 , . . . , yk1, . . . , yknk )T , the design matrix is:

X =

1 0 · · · 0...

...1n1 0 · · · 00 1 0...

......

0 1n2 0...

......

0 0 1

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 8 / 35

Page 9: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

ANOVA model

Assume conditionally independent normal priors, θi ∼ N(mi ,

1αiφ

), for

i = 1, . . . , k , and a gamma prior φ ∼ G( a2 ,

b2 ).

This corresponds to a normal-gamma prior distribution for (θ, φ) where

m = (m1, . . . ,mk) and V = diag(

1α1, . . . , 1

αk

).

Then, it is obtained that,

θ | y, φ ∼ N

n1y1·+α1m1

n1+α1

...n1y1·+α1m1

n1+α1

,1

φ

1

α1+n1

. . .1

αk+nk

and

φ | y ∼ G

(a + n

2,b +

∑ki=1

∑nij=1 (yij − yi·)

2 +∑k

i=1ni

ni+αi(yi· −mi )

2

2

)

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 9 / 35

Page 10: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

ANOVA model

If we assume alternatively the reference prior, p(θ, φ) ∝ 1φ , we have:

θ | y, φ ∼ N

y1·

...yk·

,1

φ

1n1

. . .1nk

,

φ ∼ G(n − k

2,

(n − k) σ2

2

),

where σ2 = 1n−k∑k

i=1 (yij − yi·)2 is the classical variance estimate for this

problem.

A 95% posterior interval for θ1 − θ2 is given by:

y1· − y2· ± σ√

1

n1+

1

n2tn−k (0.975) ,

which is equal to the usual, classical interval.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 10 / 35

Page 11: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: ANOVA model

Suppose that an ecologist is interested in analysing how the masses ofstarlings (a type of birds) vary between four locations.

A sample data of the weights of 10 starlings from each of the four locationscan be downloaded from:http://arcue.botany.unimelb.edu.au/bayescode.html.

Assume a Bayesian one-way ANOVA model for these data where a differentmean is considered for each location and the variation in mass betweendifferent birds is described by a normal distribution with a common variance.

Compare the results with those obtained with classical methods.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 11 / 35

Page 12: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Simple linear regression model

Another example of normal linear model is the simple regression model:

yi = α + βxi + εi ,

for i = 1, . . . , n, where εi ∼ N(

0, 1φ

).

Suppose that we use the limiting prior:

p(α, β, φ) ∝ 1

φ.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 12 / 35

Page 13: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Simple linear regression model

Then, we have that:

αβ

∣∣∣∣ y, φ ∼ N( α

β

),

1

φnsx

n∑i=1

x2i −nx

−nx n

φ | y ∼ G

(n − 2

2,sy(1− r2

)2

)

where:

α = y − βx , β =sxysx,

sx =∑n

i=1 (xi − x)2, sy =

∑ni=1 (yi − y)2

,

sxy =∑n

i=1 (xi − x) (yi − y) , r =sxy√sxsy

, σ2 =sy(1− r2

)n − 2

.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 13 / 35

Page 14: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Simple linear regression model

Thus, the marginal distributions of α and β are Student-t distributions:

α− α√σ2

n

∑ni=1x

2i

sx

| y ∼ tn−2

β − β√σ2

sx

| y ∼ tn−2

Therefore, for example, a 95% credible interval for β is given by:

β ± σ√sxtn−2(0.975)

equal to the usual classical interval.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 14 / 35

Page 15: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Simple linear regression model

Suppose now that we wish to predict a future observation:

ynew = α + βxnew + εnew .

Note that,

E [ynew | φ, y] = α + βxnew

V [ynew | φ, y] =1

φ

(∑ni=1x

2i + nx2

new − 2nxxnewnsx

+ 1

)=

1

φ

(sx + nx2 + nx2

new − 2nxxnewnsx

+ 1

)Therefore,

ynew | φ, y ∼ N

(α + βxnew ,

1

φ

((x − xnew )2

sx+

1

n+ 1

))

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 15 / 35

Page 16: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Simple linear regression model

And then,ynew − α + βxnew

σ

√((x−xnew )2

sx+ 1

n + 1) | y ∼ tn−2

leading to the following 95% credible interval for ynew :

α + βxnew ± σ

√√√√( (x − xnew )2

sx+

1

n+ 1

)tn−2 (0.975) ,

which coincides with the usual, classical interval.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 16 / 35

Page 17: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: Simple linear regression model

Consider the data file prostate.data that can be downloaded from:http://statweb.stanford.edu/~tibs/ElemStatLearn/.

This includes, among other clinical measures, the level of prostate specificantigen in logs (lpsa) and the log cancer volume (lcavol) in 97 men whowere about to receive a radical prostatectomy.

Use a Bayesian linear regression model to predict the lpsa in terms of thelcavol.

Compare the results with a classical linear regression fit.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 17 / 35

Page 18: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Generalized linear models

The generalized linear model generalizes the normal linear model by allowing thepossibility of non-normal error distributions and by allowing for a non-linearrelationship between y and x.

A generalized linear model is specified by two functions:

1 A conditional, exponential family density function of y given x, parameterizedby a mean parameter, µ = µ(x) = E [Y | x] and (possibly) a dispersionparameter, φ > 0, that is independent of x.

2 A (one-to-one) link function, g(·), which relates the mean, µ = µ(x) to thecovariate vector, x, as g(µ) = xθ.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 18 / 35

Page 19: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Generalized linear models

The following are generalized linear models with the canonical link function whichis the natural parameterization to leave the exponential family distribution incanonical form.

A logistic regression is often used for predicting the occurrence of an eventgiven covariates:

Yi | pi ∼ Bin(ni , pi )

logpi

1− pi= xiθ

A Poisson regression is used for predicting the number of events in a timeperiod given covariates:

Yi | pi ∼ P(λi )

log λi = xiθ

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 19 / 35

Page 20: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Generalized linear models

The Bayesian specification of a GLM is completed by defining (typicallynormal or normal gamma) prior distributions p(θ, φ) over the unknown modelparameters.

As with standard linear models, when improper priors are used, it is thenimportant to check that these lead to valid posterior distributions.

Clearly, these models will not have conjugate posterior distributions, but,usually, they are easily handled by Gibbs sampling.

In particular, the posterior distributions from these models are usually logconcave and are thus easily sampled via adaptive rejection sampling.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 20 / 35

Page 21: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: A logistic regression model

The O-Ring data consist of 23 observations on Pre-Challenger Space ShuttleLaunches

On each launch, it is observed whether there is at least one O-ring failure,and the temperature at launch

The goal is to model the probability of at least one O-ring failure as afunction of temperature.

Temperatures were 53, 57, 58, 63, 66, 67, 67, 67, 68, 69,70, 70, 70, 70, 72,73, 75, 75, 76, 76, 78, 79, 81

Failures occurred at 53, 57, 58, 63, 70, 70, 75

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 21 / 35

Page 22: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: A logistic regression model

The table shows the relationship, for 64 infants, between gestational age of theinfant (in weeks) at the time of birth (x) and whether the infant was breastfeeding at the time of release from hospital (y).

x 28 29 30 31 32 33# {y = 0} 4 3 2 2 4 1# {y = 1} 2 2 7 7 16 14

Let xi represent the gestational age and ni the number of infants with this age.Then we can model the probability that yi infants were breast feeding at time ofrelease from hospital via a standard binomial regression model.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 22 / 35

Page 23: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Hierarchical models

Suppose we have data, x, and a likelihood function f (x | θ) where theparameter values θ = (θ1, . . . , θk) are judged to be exchangeable, that is, anypermutation of them has the same distribution.

In this situation, it makes sense to consider a multilevel modeling assuming aprior distribution, f (θ | φ), which depends upon a further, unknownhyperparameter, φ, and use a hyperprior distribution, f (φ).

In theory, this process could continue further, using hyperhyperpriordistributions to estimate the hyperprior distributions. This is a method toelicit the optimal prior distributions.

One alternative is to estimate the hyperparameter using classical methods,which is known as empirical Bayes. A point estimate φ is then obtained toapproximate the posterior distribution. However, the uncertainty in φ isignored.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 23 / 35

Page 24: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Hierarchical models

In most hierarchical models, the joint posterior distributions will not beanalytically tractable as it will be,

f (θ, φ | x) ∝ f (x | θ)f (θ | φ)f (φ)

However, often a Gibbs sampling approach can be implemented by samplingfrom the conditional posterior distributions:

f (θ | x, φ) ∝ f (x | θ)f (θ | φ)

f (φ | x,θ) ∝ f (θ | φ)f (φ)

It is important to check the propriety of the posterior distribution whenimproper hyperprior distributions are used. An alternative (as in for exampleWinbugs) is to use proper but high variance hyperprior distributions.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 24 / 35

Page 25: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Hierarchical models

For example, a hierachical normal linear model is given by:

xij | θi , φ ∼ N(θi ,

1

φ

), i = 1, . . . , n, j = 1, . . . ,m.

Assuming that the means, θi , are exchangeable, we may consider the followingprior distribution:

θi | µ, ψ ∼ N(µ,

1

ψ

),

where the hyperparameters are µ y ψ.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 25 / 35

Page 26: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: A hierarchical one-way ANOVA

Suppose that 5 individuals take 3 different IQ test developed by 3 differentpsychologists obtaining the following results:

1 2 3 4 5Test 1 106 121 159 95 78Test 2 108 113 158 91 80Test 3 98 115 169 93 77

Then, we can assume that:

Xij | θi , φ ∼ N(θi ,

1

φ

),

θi | µ, ψ ∼ N(µ,

1

ψ

),

for i = 1, . . . , 5, and j = 1, 2, 3, where θi represents the true IQ of subject i and µthe mean true IQ in the population.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 26 / 35

Page 27: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: A hierarchical Poisson model

The number of failures, Xi at a power plant i is assumed to follow a Poissondistribution:

Xi | λi ∼ P(λi ti ), para i = 1, . . . , 10,

where λi is the failure rate for pump i and ti is the length of operation time of thepump (in 1000s of hours). It seems natural to assume that the failure rates areexchangeable and thus we might assume:

λi | γ ∼ E(γ),

where γ is the prior hyperparameter. The observed data are:

Pump 1 2 3 4 5 6 7 8 9 10ti 94.5 15.7 62.9 126 5.24 31.4 1.05 1.05 2.1 10.5xi 5 1 5 14 3 19 1 1 4 22

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 27 / 35

Page 28: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

The univariate normal dynamic linear model (DLM) is:

yt = Ftθt + νt , νt ∼ N (0,Vt)

θt = Gtθt−1 + ωt , ωt ∼ N (0,Wt).

These models are linear state space models, where xt = Ftθt represents thesignal, θt is the state vector, Ft is a regression vector and Gt is a state matrix.

The usual features of a time series such as trend and seasonality can be modeledwithin this format.

If the matrices Ft , Gt , Vt and Wt are constants, the model is said to be timeinvariant.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 28 / 35

Page 29: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

One of the simplest DLMs is the random walk plus noise model, also called firstorder polynomial model. It is used to model univariate observations and the statevector is unidimensional:

yt = θt + νt , νt ∼ N (0,Vt)

θt = θt−1 + ωt , ωt ∼ N (0,Wt).

This is a slowly varying level model where the observations fluctuate around amean which varies according to a random walk.

Assuming known variances, Vt and Wt , a straightforward Bayesian analysis can becarried out as follows.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 29 / 35

Page 30: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

Suppose that the information at time t − 1 is yt−1 = {y1, y2, ..., yt−1} and assumethat:

θt−1 | yt−1 ∼ N (mt−1,Ct−1).

Then, we have that:

The prior distribution for θt is:

θt | yt−1 ∼ N (mt−1,Rt)

where Rt = Ct−1 + Wt

The one step ahead predictive distribution for yt is:

yt | yt−1 ∼ N (mt−1,Qt)

where Qt = Rt + Vt .

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 30 / 35

Page 31: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

The joint distribution of θt and yt is:(θtyt

)| yt−1 ∼ N

(mt−1

mt−1,

(Rt Rt

Rt Qt

))The posterior distribution for θt given yt =

{yt−1, yt

}is:

θt | yt ∼ N(mt ,Ct), where

mt = mt−1 + Atet ,

At = Rt/Qt ,

et = yt −mt−1,

Ct = Rt − A2tQt .

Note that et is simply a prediction error term. The posterior mean formulacould also be written as:

mt = (1− At)mt−1 + Atyt .

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 31 / 35

Page 32: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Example: First order polynomial DLM

Assume a slowly varying level model for the water level in Lake Huron with knownvariances: Vt = 1 and Wt = 1.

1 Estimate the filtered values of the state vector based on the observations upto time t from f (θt | yt).

2 Estimate the predicted values of the state vector based on the observationsup to time t − 1 from f (θt | yt−1).

3 Estimate the predicted values of the signal based on the observations up totime t − 1 from f (yt | yt−1).

4 Compare the results using e.g:I Vt = 10 and Wt = 1.I Vt = 1 and Wt = 10.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 32 / 35

Page 33: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

When the variances are not known, the Bayesian inference for the system ismore complex.

One possibility is the use of MCMC algorithms which are usually based onthe so-called forward filtering backward sampling algorithm.

1 The forward filtering step is the standard normal linear analysis to givef (θt |yt) at each t, for t = 1, . . . ,T .

2 The backward sampling step uses the Markov property and samples θ∗T fromf (θT |yT ) and then, for t = T − 1, . . . , 1, samples from f (θt | yt , θ∗t+1)

Thus, a sample from the posterior parameter structure is generated.

However, MCMC may be computationally very expensive for on-lineestimation. One possible alternative is the use of particle filters.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 33 / 35

Page 34: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

Other examples of DLM are the following:

A dynamic linear regression model is given by:

yt = Ftθt + νt , νt ∼ N (0,Vt)

θt = θt−1 + ωt , ωt ∼ N (0,Wt).

The AR(p) model with time-varying coefficients takes the form:

yt = θ0t + θ1tyt−1 + . . .+ θptyt−p + νt , θit = θi,t−1 + ωit ,

This model can be expressed in state space form by setting θ = (θ0t , . . . , θpt)and F = (1, yt−1, . . . , yt − p).

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 34 / 35

Page 35: Bayesian Inference Chapter 4: Regression and Hierarchical ... · Y i jp i ˘P( i) log i = x i Conchi Aus n and Mike Wiper Regression and hierarchical models Masters Programmes 19

Dynamic models

The additive structure of the DLMs makes it easy to think of observed series asoriginating form the sum of different components,e.g.,

yt = y1t + . . . , yh,t

where y1t might represent a trend component, y2t a seasonal component, and soon. Then, each component, yit , might be described by a different DLM:

yt = Fitθit + νit , νit ∼ N (0,Vit)

θit = Gitθt−1 + ωit , ωit ∼ N (0,Wit).

By the assumption of independence of the components, yt is also a DLMdescribed by:

Ft = (F1t |. . .|Fht) , Vt = V1t + . . .+ Vht ,

and

Gt =

G1t

. . .

Ght

, Wt =

W1t

. . .

Wht

.

Conchi Ausın and Mike Wiper Regression and hierarchical models Masters Programmes 35 / 35