Hierarchical Models and Variance Components
Hierarchical Models and Variance Components
Will Penny
Wellcome Department of Imaging Neuroscience, University College London, UK
SPM Course, London, May 2003
Outline Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level) General Framework
Multiple variance components and Hierarchical models Multiple variance components
F-tests and conjunctions @2nd levelModelling fMRI serial correlation @1st level
Hierarchical models for Bayesian InferenceSPMs versus PPMs
Outline Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level) General Framework
Multiple variance components and Hierarchical models Multiple variance components
F-tests and conjunctions @2nd levelModelling fMRI serial correlation @1st level
Hierarchical models for Bayesian InferenceSPMs versus PPMs
1st Level 2nd Level
^
1^
^
2^
^
11^
^
12^
Data Design Matrix Contrast Images
)ˆ(ˆ
ˆ
craV
ct
Random Effects Analysis:Summary-Statistic Approach
SPM(t)
One-samplet-test @2nd level
Validity of approach
Gold Standard approach is EM – see later –estimates population mean effect as MEANEM
the variance of this estimate as VAREM
For N subjects, n scans per subject and equal within-subject variancewe have
VAREM = Var-between/N + Var-within/Nn
In this case, the SS approach gives the same results, on average:
Avg[MEANEM
Avg[Var()] =VAREM
In other cases, with N~12, and typical ratios of between-subject to within-subject variance found in fMRI, the SS approach will give very similar results to EM.
^^
Example: Multi-session study of auditory processing
SS results EM results
Friston et al. (2003) Mixed effects and fMRI studies, Submitted.
Two populations
Contrast images
Estimatedpopulation means
Two-samplet-test @2nd level
Outline Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level) General Framework
Multiple variance components and Hierarchical models Multiple variance components
F-tests and conjunctions @2nd levelModelling fMRI serial correlation @1st level
Hierarchical models for Bayesian InferenceSPMs versus PPMs
y = X + N 1 N L L 1 N 1
2 Basic AssumptionsIdentityIndependence
The General Linear Model
IC
N
N
Error covariance
We assume ‘sphericity’
y = X + N 1 N L L 1 N 1
Multiple variance components
N
N
Error covariance
QC kk
k
Errors can now have different variances and there can be correlations
We allow for ‘nonsphericity’
Errors are independent but not identical
Errors are not independent and not identical
Error Covariance
Non-Sphericity
)()()()1(
)2()2()2()1(
)1()1()1(
nnnn X
X
Xy
General Framework
)()()( i
k
i
kk
i QC
Hierarchical Models Multiple variance componentsat each level
With hierarchical models we can define priors and make Bayesian inferences.
If we know the variance components we can compute the distributions over the parameters at each level.
)()()()1(
)2()2()2()1(
)1()1()1(
nnnn X
X
Xy
E-Step
yCXC
XCXC
T
yy
T
y
1
11
M-Stepy
Xyr
for i and j {
}{
}{}{
11
11111
CQCQtrJ
XCQCXCtrrCQCrCQtrg
ijij
i
T
yi
T
ii
}
kkQCC
gJ
1
Friston, K. et al. (2002), Neuroimage
EM algorithm
)()()( i
k
i
kk
i QC
Estimation
)()()()1(
)2()2()2()1(
)1()1()1(
nnnn X
X
Xy
)()()1()()1()1()2()1()1( nnnn XXXXXy
Hierarchical model
Single-level model
ParametricEmpiricalBayes (PEB)
RestrictedMaximimumLikelihood(ReML)
Algorithm Equivalence
EM=PEB=ReML
Outline Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level) General Framework
Multiple variance components and Hierarchical models Multiple variance components
F-tests and conjunctions @2nd levelModelling fMRI serial correlation @1st level
Hierarchical models for Bayesian InferenceSPMs versus PPMs
Errors are independent but not identical
Errors are not independent and not identical
Error Covariance
Non-Sphericity
Error can be Independent but Non-Identical when…
1) One parameter but from different groups
e.g. patients and control groups
2) One parameter but design matrices differ across subjects
e.g. subsequent memory effect
Non-Sphericity
Error can be Non-Independent and Non-Identical when…
1) Several parameters per subject e.g. Repeated Measurement design
2) Conjunction over several parameters e.g. Common brain activity for different cognitive processes
3) Complete characterization of the hemodynamic response e.g. F-test combining HRF, temporal derivative and dispersion regressors
Non-Sphericity
jump touch koob
Stimuli: Auditory Presentation (SOA = 4 secs) of
(i) words and (ii) words spoken backwards
Subjects: (i) 12 control subjects(ii) 11 blind subjects
Scanning: fMRI, 250 scans per subject, block design
Example I
“click”
Q. What are the regions that activate for real words relative to reverse words in both blind and control groups?
U. Noppeney et al.
2nd Level
Controls Blinds
Independent but Non-Identical Error
1st Level
Conjunctionbetween the
2 groups
Controls and Blinds
jump touch
motion actionvisualsound
Stimuli: Auditory Presentation (SOA = 4 secs) of words
Subjects: (i) 12 control subjects
Scanning: fMRI, 250 scans per subject, block design
Example 2
“click”“jump” “click” “pink” “turn”
Q. What regions are affected by the semantic content of the words ?
U. Noppeney et al.
Non-Independent and Non-Identical Error
1st Leve visual sound hand motion
2nd Level
?=
?=
?=
F-test
Scanning: fMRI, 250 scans per subject, block design
Stimuli: (i) Sentences presented visually
(ii) False fonts (symbols)
Example III
Some of the sentences are syntactically primed
U. Noppeney et al.
Q. Which brain regions of the “sentence reading system” are affected by Priming?
1st Level Sentence > Symbols No-Priming>Priming
Orthogonal contrasts
2nd Level
Non-Independent and Non-Identical Error
Conjunction of 2 contrasts
LeftAnteriorTemporal
Modelling serial correlation in fMRI time series
Model errors for each subjectas AR(1) + white noise.
Example IV
Outline Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level) General Framework
Multiple variance components and Hierarchical models Multiple variance components
F-tests and conjunctions @2nd levelModelling fMRI serial correlation @1st level
Hierarchical models for Bayesian InferenceSPMs versus PPMs
Bayes Rule
Example 2:Univariate model
Likelihood and Prior
)2()2()1(
)1()1(
y
),()(
),()|()2()2()1(
)1()1()1(
Np
Nyp
)2(
)1(
)2(
)1(
)1(
)1(
)1(
)2()1()1(
)1()1()1( ),()|(
PPm
P
PmNyp
Posterior
)2( m )1( )1(
Relative Precision Weighting
Example 2:Univariate model
Likelihood and Prior
)2()2()1(
)1()1(
y
),()(
),()|()2()2()1(
)1()1()1(
Np
Nyp
)2(
)1(
)2(
)1(
)1(
)1(
)1(
)2()1()1(
)1()1()1( ),()|(
PPm
P
PmNyp
Posterior
Similar expressions exist for posterior distributionsin multivariate models
AIM: Make inferences based onposterior distribution
But how do we compute thevariance components or ‘hyperparameters’ ?
)()()()1(
)2()2()2()1(
)1()1()1(
nnnn X
X
Xy
E-Step
yCXC
XCXC
T
yy
T
y
1
11
M-Stepy
Xyr
for i and j {
}{
}{}{
11
11111
CQCQtrJ
XCQCXCtrrCQCrCQtrg
ijij
i
T
yi
T
ii
}
kkQCC
gJ
1
Friston, K. et al. (2002), Neuroimage
EM algorithm
)()()( i
k
i
kk
i QC
Estimation
Estimating mean and variance
N
nny
N 1
1
N
nny
N 1
2)(
11
N
nny
N 1
1
Maximum Likelihood (ML), maximises p(Y|,)
Expectation-Maximisation (EM),maximises
N
nny
N 1
2)(
1
11
dpYpYp )(),|()|(
for ‘vague’ prior on
Estimating mean and variance
N
nny
N 1
2)(
11
N
nny
N 1
For a prior on with prior mean 0 and prior precision Expectation-Maximisation (EM) gives
where
10
N
N
Larger more shrinkage
Estimating mean and variance atmultiple voxels
N
nini
ii
yN 1
2
, )(11
10
i
ii N
N
N
nni
ii y
N 1,
For a prior on over voxels with prior mean 0 and prior precision Expectation-Maximisation (EM) gives at voxel i=1..V, scan n=1..N
where
V
ii
ii 1
211
Prior precision can be estimated from data. If mean activation overall voxels is 0 then these EM estimates are more accurate than ML
The Interface
WLSParameters,REMLHyperparameters
PEBParametersandHyperparameters
No Priors Shrinkagepriors
Bayesian Inference
1st level = within-voxel
2nd level = between-voxels
Likelihood
Shrinkage Prior
)2()2()2()1(
)1()1()1(
X
Xy
In the absence of evidenceto the contrary parameterswill shrink to zero
LikelihoodLikelihood PriorPriorPosteriorPosterior
SPMsSPMs
PPMsPPMs
u
)(yft
)0|( tp)|( yp
Bayesian Inference: Posterior Probability Maps
)()|()|( pypyp
SPMs and PPMsS
PM
mip
[0, 0
, 0]
<
< <
PPM2.06
rest [2.06]
SPMresults:C:\home\spm\analysis_PET
Height threshold P = 0.95
Extent threshold k = 0 voxels
Design matrix1 4 7 10 13 16 19 22
147
1013161922252831343740434649525560
contrast(s)
4
SP
Mm
ip[0
, 0, 0
]
<
< <
SPM{T39.0
}
rest
SPMresults:C:\home\spm\analysis_PET
Height threshold T = 5.50
Extent threshold k = 0 voxels
Design matrix1 4 7 10 13 16 19 22
147
1013161922252831343740434649525560
contrast(s)
3
PPMs: Show activationsof a given size
SPMs: show voxels with non-zero activations
PPMs
Advantages Disadvantages
One can infer a causeDID NOT elicit a response
SPMs conflate effect-size and effect-variability
P-values don’t change withsearch volume
For reasonable thresholds have intrinsically high specificity
Use of shrinkagepriors over voxelsis computationally demanding
Utility of Bayesian approach is yetto be established
Summary Random Effects Analysis
Summary statistic approach (t-tests @ 2nd level) Multiple variance components
F-tests and conjunctions @2nd levelModelling fMRI serial correlation @1st level
Hierarchical models for Bayesian InferenceSPMs versus PPMs
General FrameworkMultiple variance components and Hierarchical models
Top Related