Applied Bayesian Inference, KSU, April 29, 2012 § / §❻ Hierarchical (Multi-Stage) Generalized...

Applied Bayesian Inference, KSU, April 29, 2012

§ / 1

§❻ Hierarchical (Multi-Stage) Generalized Linear Models

Robert J. Tempelman


§ / 2

Introduction

• Some inferential problems require non-classical approaches; e.g.– Heterogeneous variances and covariances across

environments.– Different distributional forms (e.g. heavy-tailed or

mixtures for residual/random effects).– High dimensional variable selection models

• Hierarchical Bayesian modeling provides some flexibility for such problems.


§ / 3

Heterogeneous variance models(Kizilkaya and Tempelman, 2005)

• Consider a study involving different subclasses (e.g. herds).– Mean responses are

different.– But suppose residual

variances are different too.

• Let’s discuss in context of LMM (linear mixed model)


§ / 4

Recall linear mixed model

• Given:

has a certain “heteroskedastic” specification.

• determines the nature of heterogeneous residual variances

= + +y Xβ Zu e

~ ,e 0 R ξN

R ξ

| , ~ | , ,y β u,ξ y β u,ξ Xβ Zu R ξp N

ξ


§ / 5

Modeling Heterogeneous Variances

• Suppose

– with as a “fixed” intercept residual variance

– gk > 0 kth fixed scaling effect.

– vl > 0 lth random scaling effect.

11 12e = e e est

e ~ 0,R ξ =Ikl kl

2kl kl n eN

2 2 ; 1, 2, ; 1, 2, . .kle e lk kv s l t

2e


§ / 6

Subjective and Subjective Priors• “intercept” variance : subjective flat or

conjugate vague inverted-gamma (IG) prior • Invoke typical constraints for “fixed effects”

– Corner parameterization: gs= 1.

– Flat or vague IG prior p(gk); k=1,2,..,s

• Structural prior for “random effects”

– i.e., vl ~ IG(a, a-1).

• E(vl )=1;

( 1) 1( 1)( | ) ( ) exp

( )

e

e eel e l

e l

p v vv

2 1=Var( )

2v le

v

2 2~e ep

a functions like a “variance component” for residual variances.-> hyperparameter

1( | )

2ee

lCV v


§ / 7

Remaining priors

• “Classical” random effects

• “Classical” fixed effects

• “Classical” random effects VC

• Hyperparameter (Albert, 1988)

2

1~ ( )

(1 )e ee

p

~ ( )β βp

| ~ ( | ) = , ( )u φ u φ 0 G φp N

~ ( )φ φp

SAS PROC MCMC doesn’t seem to handle this…prior can’t be written as function of corresponding parameter


§ / 8

What was the last prior again???

2

1

1v

v

2

1v

v

1

1 vUniform(0,1) on

1

vUniform(0,1) on

Different diffuse priors can have different impacts on posterior inferences!...if data info is poor

Rosa et al. (2004)


§ / 9

Joint Posterior Density

• LMM:

2

1 1

2

2 ( ) ( | )

|

| , , ,

, , , |

(

,

)

,

,

,

y β u γ

β u γ v φ y

β u φφv

s

e v

e e

t

k lk l

e

e p

p

p

p p

p

p

p

v

p


§ / 10

Details on FCD

• All provided by Kizilkaya and Tempelman (2005)– All are recognizeable except for av:

– Use Metropolis-Hastings random walk on using normal as proposal density.

• For MH, generally a good idea to transform parameters so that parameter space is entire real line…but don’t forget to include Jacobian of transform.

11

1 1

| , , , , , , ,

1exp 1

β u φ γ v y L τ

e

e

e

t tte

e l l etl le

p

v v p

loge e


§ /

Small simulation study• Two different levels of heterogeneity:

– ae = 5, ae = 15– = 1

• Two different average random subclass sizes:– ne = 10 vs. ne = 30– 20 subclasses (habitats) in total

• Also modeled fixed effects:– Sex (2 levels) for location and dispersion (g1=2, g2=1).

• Additional set of random effects:– 30 levels (e.g. sires) cross-classified with habitats. 11

2e

1( | )

2l ee

CV v


§ / 12

PROC MIXED code• “Fixed” effects models for residual variances

– REML estimates of “herd” variances expressed relative to average.

proc mixed data=phenotype; class sireid habitatid sexid; model y = sexid; random intercept /subject = habitatid ; random intercept /subject = sireid; repeated / local = exp(sexid habitatid); ods output covparms=covparms;run;

2 2 ; 1, 2, ; 1, 2, . .kle e lk kv s l t

but treats vl as a fixed effect.Models


§ / 13

MCMC analyses (code available online)Posterior summaries on ae.

ae = 15; ne = 10

Mean Median Std Dev

1st Pctl 99th Pctl

58.84 20.36 99.72 3.755 562.6

ae = 5; ne = 10

Mean Median Std Dev

1st Pctl 99th Pctl

4.531 3.416 3.428 2.073 22.24

ae = 5; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

3.683 3.382 1.302 2.081 8.006

ae = 15; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

67.24 41.25 85.30 7.918 487.5


§ / 14

MCMC (₀) and REML (•) estimates of subclass residual variances vs. truth (vl)

ae=15;ne=10ae=5;ne=10

ae=5;ne=30 ae=15;ne=30

High shrinkage situation

Low shrinkage situation


§ / 15

Heterogeneous variances for ordinal categorical data

• Suppose we had a situation where residual variances were heterogeneous on the underlying latent scale– i.e., greater

frequency of extreme vs. intermediate categories in some subclasses 5 10 15

0.0

00

.05

0.1

00

.15

0.2

0

liability

de

nsi

ty

Herd 1Herd 2Herd 3


§ / 16

Heterogeneous variances for ordinal categorical data?

• On liability scale:

has a certain “heteroskedastic” specification.

• determines the nature of heterogeneous variances

= + +Xβ Zu e

~ ,e 0 R ξN

R ξ

| , ~ | , ,β u,ξ β u,ξ Xβ Zu R ξp N

ξ


§ / 17

Cumulative probit mixed model (CPMM)

• For CPMM, l maps to Y:

1

1 2

1

1 if ,

2 if ,

if ;

o i

ii

k i C

Y

k

-111 1 1

p( | , ) 1 1y L τklns t C

j ikl j ikljk l i

L y j


§ / 18

Modeling Heterogeneous Variances in CPMM

• Suppose

– With as a “fixed” reference residual variance

– gk > 0 kth fixed scaling effect.

– vl > 0 lth random scaling effect.– All other priors same as with LMM

11 12e = e e est

e ~ 0,R ξ =Ikl kl

2kl kl n eN

2 2 ; 1, 2, ; 1, 2, . .kle e lk kv s l t

2e


§ / 19

Joint Posterior Density in CPMM

• CPMM:

2

1 1

2

2

( ) ( |

, , , , , , , , |

()

| , | ,

|

,

)

, ,

τ β u

y L τ L β u γ

L τ β u γ

v

φ φ

v φ y

s t

e

e v

e k l vk l

vpp p p p

p

p

p

p

pp v


§ / 20

Another small simulation study

• Two different levels of heterogeneity:– ae = 5, ae = 15

• Average random subclass size: ne = 30– 20 subclasses (habitats) in total

• Also modeled fixed effects:– Sex (2 levels) for location and dispersion.

• Additional set of random effects:– 30 levels (e.g. sires) cross-classified with habitats.

• Thresholds: t1 = -1, t1 = 1.5


§ / 21

ae = 15; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

49.44 23.21 75.31 5.018 404.7

ESS = 391

ae = 5; ne = 30

Mean Median Std Dev

1st Pctl 99th Pctl

5.018 4.344 2.118 2.125 11.56

ESS = 1422

ae = 5; ne = 30 ae = 15; ne = 30


§ / 22

Posterior means of subclass residual variances vs. truth (vl)

ae=5;ne=30 ae=15;ne=30

No PROC GLIMMIX counterpartAnother alternative: Heterogeneous thresholds!!! (Varona and Hernandez, 2006)


§ / 23

Additional extensions• PhD work by Fernando Cardoso

– Heterogeneous residual variances as functions of multiple fixed effects and multiple random effects.

– Heterogeneous t-error (Cardoso et al., 2007).

– Helps separates outliers from high variance subclasses from effects of outliers.

– Other candidates for distribution of wj lead to alternative heavy-tailed specifications (Rosa et al., 2004)

1

2

12

jk jl

j

K p

kk

j

L q

le l

e w

| ~ ,2 2jp w Gamma

t-error is outlier

robust


§ /

Posterior densities of breed-group heritabilities in multibreed Brazilian cattle (Fernando

Cardoso)

a) Gaussian homoskedastic model

05

1015202530

0 0.1 0.2 0.3 0.4 0.5

Heritability

Po

ster

ior

den

sity

Nelore Hereford F1 A38

c) Gaussian heteroskedastic model

05

101520253035

0 0.1 0.2 0.3 0.4 0.5

Heritability

Po

ster

ior

den

sity

Nelore Hereford F1 A38

Some of most variable herds were exclusively Herefords

Based on homogeneous residual variance (Cardoso and Tempelman, 2004)

Based on heterogeneous residual variances (Fixed: breed additive&dominance,sex; Random: CG (Cardoso et al., 2005)

•Estimated CV of CG-specific s2e →

0.72±0.06•F1 s2

e = 0.70±0.16 purebred s2e


§ / 25

Heterogeneous G-side scale parameters

• Could be accommodated in a similar manner.• In fact, the borrowing of information across

subclasses in estimating subclass-specific random effects variances is even more critical.– Low information per subclass? REML estimates

will converge to zero.


§ /

Heterogeneous bivariate G-side and R-side inferences!

, ,1

2, ,

,

,

'

'

z 0

0 z

u

u

j milk j

j CI j

milk j

CI j

milk

CI

milk fixed effects

fixed effect eCI

e

s

Bello et al. (2010, 2012)

Investigated herd-level and cow-level relationship between 305-day milk production and calving interval (CI) as a function of various factors

CG (herd-year) effects

Residual (cow) effects


§ /

Herd-Specific and Cow- Specific (Co)variances

,

,

2,,

2, ,

milk CImilk

j

milk CI CI

e je je

e j e j

,

,

2,,

, 2, ,

milk CImilk

milk CI CI

u ku ku k

u k u k

Herd k

Cow j

, ,

2,

milk CI

milk

u kuk

u k

Let

and , ,

2,

milk CI

milk

e jej

e j


§ /

Rewrite this

|

2 2, ,

2 2, ,

2,

milk milk

milk milk C ilk

j

I m

ej

e e

e j e

j

j

e j e j e jj

e

|

2 2, ,

2 2, ,

,2

,

milk milk

milk mil CI milkk

uk

u

u k u

u uk k

k

u k uk

u

k

k

ku

Herd k

Cow j

Model each of these different colored terms as functions of fixed and random effects (in addition to the classical b and u)!


§ /

bST effect on Herd-Level Association betweenMilk Yield and Calving Interval

-1.5

-1

-0.5

0

0.5

0% <50% ≥50%% herd on

bST supplementation

days

per

100

kg

milk

yie

ld

0.01a 0.07a

-1.37b

a,b P < 0.0001

bST:Bovine somatotropin

uk


§ /

Number of times milking/day on Cow-level Association betweenMilk Yield and Calving Interval

0

0.2

0.4

0.6

2X 3+X Daily Milking Frequency

days

per

100

kg

milk

yie

ld

0.57a

0.45b

a,b P < 0.0001

Overall Antagonism

0.51±0.01 day longer CI per 100 kg increase in cumulative 305-d milk yield

ej


§ /

Variability between Herds for (Random effects)

• DICM0 – DICM1 = 243

• Expected range between extreme herd-years

± 2 = 0.7 d / 100 kg

2ˆ 0.030 0.005em

2em

Ott and Longnecker, 2001

0.0 0.2 0.4 0.6 0.8 1.0Increase in # of days of CI / 100 kg herd milk yield

0.16 0.7 d/100kg 0.86

ej

ej


§ /

Whole Genome Selection (WGS)

• Model: ' ; z gi i i iy fixed effects u e i = 1,2,…,n.

(e.g. age, parity)

1 2 3 'g mg g g g

2

1~ ,u= 0 A

n

i uiu N

2

1~ ,e= 0 I

n

i eie N

Genotypes

SNP allelic substitution effects

Polygenic Effects

Residual effects

Phenotype

LD (linkage disequilibrium)

Phenotypes

'1 2 3zi i i i imz z z z

Anim

al

Genotypes

m >>>>>n


§ /

Typical WGS specifications

• Random effects spec. on g (Meuwissen et al. 2001)– BLUP:– BayesA/B:

BayesA = BayesB with p = 0.– “Random effects/Bayes” modeling allows m >> n

• Borrowing of information across genes.

2~ ,g 0 I gN

22 2 , with prob : (1 )

~ , ; ~0 with prob :

g 0 j jg g

SN diag


§ /

First-order antedependence-specifications (Yang and Tempelman, 2012)

• Instead of independence, specify first order antedependence:SNP Marker Genetic Effect

SNP 1: g1 = d1,

SNP 2: g2 = t21g1 + d2,

SNP 3: g3 = t32g2 + d3,

⁞ ⁞

SNP m: gm = tm,m-1gm-1 + dm.

2, 1 ~ ,j j t tt N

22

1

2 , with prob : (1 )~ , ; ~

0 with prob :δ 0

j

m

jjS

N diag

Ante-BayesB

Ante-BayesA = Ante-BayesB with p = 0

1 1 2 1 2 3

1 2 2 3

1 2 2 3

1 2 3 2 3 3

1

1

1

1

Correlation

Random effects modeling: facilitates borrowing of information across SNP intervals

, 1( )j j jf t

SNP 1 SNP 2 SNP 3 SNP 4


§ /

Results from a simulation study

• Advantage of Ante-BayesA/B over conventional BayesA/B increases with increasing marker density (LD = linkage disequilbrium)

0.18 0.20 0.22 0.24 0.26 0.28 0.30 0.32

0.7

00

.75

0.8

00

.85

0.9

00

.95

1.0

0

LD level

Acc

ura

cy

BayesAante.BayesABayesBante.BayesB

Accuracy of Genomic EBV vs. LD level

(r2) P<.001 Bayes A/B vs. AnteBayesA/B


§ / 36

Other examples of multi-stage hierarchical modeling?

• Spatial variability in agronomy using t-error (Besag and Higdon, 1999)

• Ecology (Cressie et al., 2009).• Conceptually, one could model heterogeneous

and spatially correlated overdispersion parameters in Poisson/binomial GLMM as well!


§ / 37

What I haven’t covered in this workshop

• Model choice criteria– Bayes factors (generally, too challenging to compute)– DIC (Deviance information criteria)

• Bayesian model averaging– Advantage over conditioning on one model (e.g. for

multiple regression involving many covariates)• Posterior predictive checks.

– Great for diagnostics• Residual diagnostics based on latent residuals for

GLMM (Johnson and Albert, 1999).


§ / 38

Some closing comments/opinions

• Merit of Bayesian inference– Marginal for LMM with classical assumptions.

• GLS with REML seems to work fine.

– Of greater benefit for GLMM• Especially binary data with complex error structures

– Greatest benefit for multi-stage hierarchical models.

• Larger datasets nevertheless required than with more classical (homogeneous assumptions).


§ / 39

Implications

• Increased programming capabilities/skills are needed.– Cloud/cluster computing wouldn’t hurt.

• Don’t go in blind with canned Bayesian software. – Watch the diagnostics (e.g. trace plots) like a hawk!

• Don’t go on autopilot.

– WinBugs/PROC MCMC works nicely for the simpler stuff.– Highly hierarchical models require statistical/algorithmic

insights…do recognize limitations in parameter identifiability (Cressie et al., 2009)


§ /

National Needs PhD FellowshipsMichigan State University

Focus: Integrated training in quantitative, statistical and molecular genetics, and breeding of food animals

Features:• Research in animal genetics/genomics with collaborative faculty team• Industry internship experience• Public policy internship in Washington, DC• Statistical consulting center experience• Teaching or Extension/outreach learning opportunities• Optional affiliation with inter-departmental programs in Quantitative

Biology, Genetics, othersFaculty Team:

C. Ernst, J. Steibel, R. Tempelman, R. Bates, H. Cheng, T. Brown, B. Alston-MillsEligibility is open to citizens and nationals of the US. Women

and underrepresented groups are encouraged to apply.


§ / 41

Thank You!!!• Any more questions???

http://actuary-info.blogspot.com/2011/05/homo-actuarius-bayesianes.html

Applied Bayesian Inference, KSU, April 29, 2012 § / §❻ Hierarchical (Multi-Stage) Generalized...

Documents

Transcript of Applied Bayesian Inference, KSU, April 29, 2012 § / §❻ Hierarchical (Multi-Stage) Generalized...