Spatially structured random effects

46
Spatiall Spatiall y y structur structur ed ed random random effects effects by by Daniel A. Griffith Daniel A. Griffith Ashbel Smith Ashbel Smith Professor of Professor of Geospatial Geospatial Information Sciences Information Sciences

description

Spatially structured random effects. by Daniel A. Griffith Ashbel Smith Professor of Geospatial Information Sciences. ABSTRACT. - PowerPoint PPT Presentation

Transcript of Spatially structured random effects

Page 1: Spatially  structured  random  effects

Spatially Spatially structured structured

random random effectseffects byby

Daniel A. GriffithDaniel A. GriffithAshbel Smith Professor Ashbel Smith Professor

of Geospatial Information of Geospatial Information SciencesSciences

Page 2: Spatially  structured  random  effects

ABSTRACTABSTRACTResearchers increasingly are employing random effects modeling to analyze data. When data are georeferenced, a random effect term needs to be spatially structured in order to account for spatial autocorrelation. Spatial structuring can be achieved in various ways, including the use of semivariogram, spatial autoregressive, and spatial filter models. SAS implements the semivariogram option for linear mixed models. GeoBUGS implements the spatial autoregressive option for either linear or generalized linear mixed modeling. Recently developed spatial filtering methodology can be used in either case, as well as with the SAS generalized linear mix model procedure, and furnishes one means of estimating space-time mixed models. This presentation summarizes comparisons of these three forms of spatial structuring, illustrating implementations with selected ecological data for the municipalities of Puerto Rico.

Page 3: Spatially  structured  random  effects

From LegendreSpatial structures in communities indicate that some process hasbeen at work to create them. Two families of mechanisms cangenerate spatial structures in communities: • Autocorrelation model: the spatial structures are generated by thespecies assemblage themselves (response variables). • Induced spatial dependence model: forcing (explanatory) variablesare responsible for the spatial structures found in the speciesassemblage. They represent environmental or biotic control of thespecies assemblages, or historical dynamicsTo understand the mechanisms that generate these structures, we needto explicitly incorporate the spatial community structures, at all scales, into the statistical model.Spatial autocorrelation (SA) is technically defined as the dependence, due to geographic proximity, present in the residuals of a [regression-type] model of a response variable y whicht akes into account all deterministic effects due to forcing variables.

Page 4: Spatially  structured  random  effects

Spatial autocorrelation can be interpreted in different ways

As a spatial process mechanism – the cartoon As a diagnostic tool – the Cliff-Ord Eire example (the model specification should be nonlinear) As a nuisance parameter – eliminating spatial dependency to avoid statistical complications As a spatial spillover effect – georeferencing of pediatric lead poisoning cases in Syracuse, NY  As an outcome of areal unit demarcation – the modifiable areal unit problem (MAUP) As redundant information – spatial sampling; map interpolation As map pattern – spatial filtering (to be discussed in this course)  As a missing variables indicator/surrogate – a possible implication of spatial filtering As self-correlation – what is discussed next

Page 5: Spatially  structured  random  effects

The magic box is a physical modelof spatial autocorrelation

Page 6: Spatially  structured  random  effects

The permutation perspective

Page 7: Spatially  structured  random  effects

The SASIM game

http://www.nku.edu/~longa/cgi-bin/cgi-tcl-examples/generic/SA/SA.cgi

Page 8: Spatially  structured  random  effects
Page 9: Spatially  structured  random  effects
Page 10: Spatially  structured  random  effects
Page 11: Spatially  structured  random  effects

Measures of spatial autocorrelation

n

1i

2i

n

1i

n

1jjiji

n

1i

n

1jij )y(y

)y(yc)y(y

c

nMC

n

1i

2i

n

1i

n

1j

2jiij

n

1i

n

1jij )y(y

)y(yc

c2

1-nGR

MC: Moran Coefficient; GR: Geary Ratio; semivariogram

Spherical

Exponential

Bessel function (1st order, 2nd kind)

K1,2,...,k),df(

n*2

)y(y

)dγ(

k

n

1i k

2ji

k

k

Page 12: Spatially  structured  random  effects

Georeferenced data scatterplots

• The horizontal axis is the measurement scale for some attribute variable

• The vertical axis is the measurement scale for neighboring values (topological distance-based) of the same attribute variable

OR• The horizontal axis is (usually) Euclidean

distance between geocoded locations• The vertical axis is the measurement scale

for geographic variability

Page 13: Spatially  structured  random  effects

Describing a scatterplot trendpositive relationship:

High Y with High X& Medium Y with Medium X

& Low Y with Low X

negative relationship:High Y with Low X

& Medium Y with Medium X& Low Y with High X

Page 14: Spatially  structured  random  effects

Description of the Moran scatterplot

MC = 0.49GR = 0.58

2002 populationdensity

Positive spatial autocorrelation- high values tend to be

surrounded by nearby high values- intermediate values tend to be surrounded

by nearby intermediate values- low values tend to be surrounded by

nearby low values

Page 15: Spatially  structured  random  effects

Description of the Moran scatterplot

MC = -0.16sMC = 0.075GR = 1.04

Negative spatial autocorrelation- high values tend to be

surrounded by nearby low values- intermediate values tend to be surrounded

by nearby intermediate values- low values tend to be surrounded by

nearby high values

competition for space

Page 16: Spatially  structured  random  effects

Graphical portrayals of spatial

autocorrelation latent in

transformed As data

Page 17: Spatially  structured  random  effects

Constructing eigenfunctions for filtering spatial autocorrelation out of georeferenced variables:

Moran Coefficient = (n/1T C1)x

YT(I – 11T/n)C (I – 11T/n)Y/ YT(I – 11T/n)Y

the eigenfunctions come from

(I – 11T/n)C (I – 11T/n)

Page 18: Spatially  structured  random  effects

Random effects model

is a random observation effect (differences among individual observational units)

is a time-varying residual error (links to change over time)

The composite error term is the sum of the two.

) , f( εξXβY ξ

ε

Page 19: Spatially  structured  random  effects

Random effects model: normally distributed intercept term

• ~ N(0, ) and uncorrelated with covariates• supports inference beyond the nonrandom

sample analyzed• simplest is where intercept is allowed to vary

across areal units (repeated observations are individual time series)

• The random effect variable is integrated out (with numerical methods) of the likelihood fcn

• accounts for missing variables & within unit correlation (commonality across time periods)

2σξ

Page 20: Spatially  structured  random  effects

Spatial structuring of random effects

• CAR: conditional autoregressive model• ICAR: improper conditional autoregressive

model (spatial autocorrelation set to 1,and a spatially structured and a spatially unstructured variance component is estimated)─should be specified as a convolution prior (spatially structured & unstructured random effects)

• SF: spatial filter identified with a frequentist GLM

Page 21: Spatially  structured  random  effects

Frequentist Bayesian Definition of probability

Long-run expected frequency in repeated (actual or hypothetical) experiments (Law of LN)

Relative degree of belief in the state of the world

Point estimate

Maximum likelihood estimate

Mean, mode or median of the posterior probability distribution

Uncertainty intervals for parameters

“confidence intervals” based on the Likelihood Ratio Test (LRT) i.e., the expected probability distribution of the maximum likelihood estimate over many experiments

“credible intervals” based on the posterior probability distribution

Page 22: Spatially  structured  random  effects

Uncertainty intervals of non-parameters

Based on likelihood profile/LRT, or by resampling from the sampling distribution of the parameter

Calculated directly from the distribution of parameters

Model selection

Discard terms that are not significantly different from a nested (null) model at a previously set confidence level

Retain terms in models, on the argument that processes are not absent simply because they are not statistically significant

Difficulties Confidence intervals are confusing (range that will contain the true value in a proportion α of repeated experiments); rejection of model terms for “non-significance”

Subjectivity; need to specify priors

Page 23: Spatially  structured  random  effects

Impact of sample size

•prior•distribution •likelihood

•distribution

As the sample size increases, a prior distri-bution has less and less impact on results; BUT

•effective•sample size•for spatially

•autocorrelated•data

Page 24: Spatially  structured  random  effects

What is BUGS?Bayesian inference Using Gibbs Sampling

• is a piece of computer software for the Bayesian analysis of complex statistical models using Markov chain Monte Carlo (MCMC) methods.

• It grew from a statistical research project at the MRC BIOSTATISTICAL UNIT in Cambridge, but now is developed jointly with the Imperial College School of Medicine at St Mary’s, London.

Page 25: Spatially  structured  random  effects

• The Classic BUGS program uses text-based model description and a command-line interface, and versions are available for major computer platforms (e.g., Sparc, Dos). However, it is not being further developed.

•BUGS

•Classic BUGS

•WinBUGS (Windows Version)

• GeoBUGS (spatial models)

• PKBUGS (pharmokinetic modeling)

Page 26: Spatially  structured  random  effects

What is WinBUGS?

• WinBUGS, a windows program with an option of a graphical user interface, the standard ‘point-and-click’ windows interface, and on-line monitoring and convergence diagnostics. It also supports Batch-mode running (version 1.4).

• GeoBUGS, an add-on to WinBUGS that fits spatial models and produces a range of maps as output.

• PKBUGS, an efficient and user-friendly interface for specifying complex population pharmacokinetic and pharmacodynamic (PK/PD) models within the WinBUGS software.

Page 27: Spatially  structured  random  effects

What is GeoBUGS?• Available via

http://www.mrc-bsu.cam.ac.uk/ bugs/winbugs/geobugs.shtml

• Bayesian inference is used to spatially smooth the standardized incidence ratios using Markov chain Monte Carlo (MCMC) methods. GeoBUGS implements models for data that are collected within discrete regions (not at the individual level), and smoothing is done based on Markov random field models for the neighborhood structure of the regions relative to each other.

Page 28: Spatially  structured  random  effects

What is MCMCMCMC?

MCMC is used to simulate from some distribution p known only up to a constant

factor, C:pi = Cqi

where qi is known but C is unknown and too horrible to calculate.

MCMC begins with conditional (marginal) distributions, and MCMC sampling outputs a

sample of parameters drawn from their posterior (joint) distribution.

Page 29: Spatially  structured  random  effects

The geographic distribution of elevation across the island of

Puerto Rico

From a USGS DEM containing 87,358,136 points. Darkness of gray scale is directly proportional to elevation.

Page 30: Spatially  structured  random  effects

SAS PROC MIXED summary results for a quadratic gradient LMM: LN( + 17.5)

Semivario-gram model none spherical expo-

nential Gaussian power Bessel

Variance (nugget) --- 0.0331 0.2151 0.2210 0.2514 0.2450

Spatial correlation --- < 0.0001 0.7643 0.5730 0.2702 0.6089

Residual 0.217 0.169 < 0.0001 0.0253 < 0.0001 0.0057b0 6.080*** 6.080*** 6.157*** 6.201*** 6.157*** 6.202***bu

2 -0.349*** -0.349*** -0.463*** -0.486*** -0.463*** -0.486***buv -0.263*** -0.263*** -0.254** -0.255** -0.254** -0.257**bv -0.270*** -0.270*** -0.114 -0.168 -0.114 -0.168bv

2 -0.527*** -0.527*** -0.529*** -0.561*** -0.529*** -0.569***

elev

Page 31: Spatially  structured  random  effects

The average random effects term example MCMC chain from a WinBUGS run

ICAR

spatial filter (SF)

Page 32: Spatially  structured  random  effects

WinBUGS: geographic distributions of unstructured (left) and spatially

structured (right) random effects

WinBUGS: ICAR

spatial filter (SF)

Page 33: Spatially  structured  random  effects

Comparative parameter estimates for a LMM quadratic gradient description of LN( + 17.5)elev

Param-eter

SAS semivariogram (Bessel) model

SAS SF GeoBUGS-ICAR WinBUGS-SF(100 weeded replications)

estimate se estimate se estimate se estimate se

b0 6.1906 0.287 6.1101 0.055 6.5175 0.168 6.1101 0.061bu

2 -0.5048 0.122 -0.3881 0.031 -0.7507 0.153 -0.3878 0.035buv -0.2229 0.123 -0.2939 0.030 -0.2031 0.101 -0.2920 0.030bv

2 -0.5314 0.125 -0.5193 0.032 -0.5683 0.061 -0.5190 0.037var 0.0055 0.019 0 --- 0.0049 0.007 0.0305 0.024varure 0.2856 0.091 0.0001 --- 0.0047 0.007 0.0318 0.025varssre 0.7205 0.221 0.0282 --- 0.4854 0.093 0.0301 ---

varure denotes the variance of the unstructured random effectsvarssre denotes the variance of the spatially structured random effects

Page 34: Spatially  structured  random  effects

binomial GLMM random effectsSAS SF

WinBUGS SF

WinBUGS ICAR

Page 35: Spatially  structured  random  effects

SF SF GLMMGLMM

SAS NLMIXED(SF)

WinBUGS (100 weeded replications)SF ICAR

statistic estimate se estimate se estimate seb0 -1.2867 0.2624 -1.3114 0.2852 -1.5340 0.2419

-0.0111 0.0013 -0.0110 0.0014 -0.0100 0.00133.0646 1.1559 3.0600 1.3632 ***3.2182 1.3433 3.0116 1.4256 ***0.0015 0.0054 0.00660.7045 0.7144 0.37270.9787 0.9783 0.9583

P(S-W) <0.0001 < 0.0001 < 0.0001MCss 0.967 0.975 0.787GRss 0.158 0.154 0.177

0.119 0.132 0.0361.045 1.000 1.1290.356 0.357 0.3880.739 0.739 0.6960.001 0.001 0.011-0.001 -0.009 ***0.001 0.022 ***

1bE

4bE

elevelevb

ξμ̂2ξσ̂2

SSξσ̂

ξ̂ MC

ξ̂ GR

SSξ̂ MC

SSξ̂ GR

elev,ξ̂ r

1,ξ̂ r

E

4,ξ̂ r

E

Page 36: Spatially  structured  random  effects

Graphical diagnostics of residuals for the GLMM estimated with SAS

Page 37: Spatially  structured  random  effects

Scatterplot of the SAS and mean

WinBUGS estimated spatially structured random

effects terms

Page 38: Spatially  structured  random  effects

Individual GLMM estimation results for each Puerto Rican sugar cane

crop year

Crop year

Individ-ual SF eigen-vector

#s

Raw per-centages

# 0s

Point-in-time estimation

MC GR -a - P(S-W)

1965/66

1,4,6,24

0.484 0.458 3 1.1959 0.0064 0.0020 1.0135 0.0055

1966/67

0.490 0.454 4 1.3786 0.0060 0.0021 1.1138 0.0027

1967/68

0.474 0.434 6 1.7354 0.0050 0.0018 1.0856 0.0017

ξμ̂ 2ξσ̂

elevelevb

Page 39: Spatially  structured  random  effects

Space-time data: preliminaries

Random effects term is constant across time; spatial structuring changes over time

random effect (re) re + ss: 1965/66

re + ss: 1966/67 re + ss: 1967/68

Page 40: Spatially  structured  random  effects

Space-time GLMM: Puerto Rican sugar cane crop years 1965/66-1967/68 when all

fixed effects are year-specific

statistic crop year 1965/66 crop year 1966/7 crop year 1967/68estimate se estimate se estimate se

b0 -1.2291 0.2336 -1.3122 0.2336 -1.4520 0.2336-0.0065 0.0009 -0.0064 0.0009 -0.0065 0.00094.5040 1.1684 4.5785 1.1684 4.9226 1.16844.9713 1.2432 5.3372 1.2432 5.8053 1.2432-4.5091 1.1620 -4.8209 1.1620 -5.0203 1.1621-4.0290 1.0994 -3.9657 1.0994 -4.0285 1.0995

pseudo-R2 0.9950 0.9976 0.99290.449 0.467 0.4930.580 0.562 0.537

MCresiduals 0.019 0.042 0.009 GRresiduals 0.916 0.839 0.808

1bE

4bE

6bE

24bE

elevelevb

SSξ̂ GR

SSξ̂ MC

Page 41: Spatially  structured  random  effects

Discussion & Implications

1. All three common specifications of spatial structuring—semivariogram, spatial autoregressive and SF models—for a random effect term in mixed statistical models perform in an equivalent fashion.

2. Matching Bayesian model priors with their implicit frequentist counterparts yields estimation results from both approaches that are essentially the same.

3. making use of spatially structured random effects tends to furnish an alternative to quasi-likelihood estimation techniques for GLMMs

Page 42: Spatially  structured  random  effects

4. Semivariogram models offer a geostatistical theoretical basis and have been implemented in SAS for LMMs.• A spatial statistics practitioner with the necessary computer

programming skills can employ WinBUGS in order to utilize them with GLMMs.

5. Spatial autoregressive modeling offers a theoretical basis for spatial structuring, and is available in GeoBUGS.• This would be very difficult to trick SAS into doing.

6. Spatial filtering, which can be derived from spatial autoregressive model specifications,• tends to be more exploratory in nature (being akin to principal

components analysis)• can be implemented in either SAS or WinBUGS for either LMMs

or GLMMs, and • can be easily extended to space-time datasets with either of

these software packages.

Page 43: Spatially  structured  random  effects

7. Illustrative Puerto Rico sugar cane examples tend to have a random effect term that virtually equates to the corresponding LMM/GLMM residual variate.• This is not always the case, as is highlighted

by the extension of a GLMM specification to a space-time sugar cane dataset.

8. All of the estimated random effects terms for the various Puerto Rico examples tend to be non-normal.

Page 44: Spatially  structured  random  effects

9. once a random effect term has been estimated with a frequentist approach, using it when calculating a deviance statistic allows its number of degrees of freedom to be approximated for GLMMs.• Although n values are estimated, because

they are correlated, the resulting number of degrees of freedom is less than n.

• This particular finding should help spatial statistics practitioners better understand the cost of employing a statistical mixed model.

Page 45: Spatially  structured  random  effects

A df aside: future research• Spiegelhalter et al. (2002) address the df problem

for complex hierarchical models in which the number of parameters is not clearly defined because, for instance, of the presence of random effects.

• An information-theoretic argument is used to approximate the effective number of parameters in a model, equivalent to the trace of the product of the Fisher information and the posterior covariance matrices.– this particular approximation is equivalent to the trace of

the ‘hat’ matrix for linear models with a normally distributed error term.

Page 46: Spatially  structured  random  effects

k dfs for random effects

binomial neative

Poisson

deviancedeviance1)(pnk

1deviance1)(pnk binomial