Applications of Variational Bayes & DAGs in Neuroimaging
ECE 6504: Advanced Topics in Machine Learning
Rosalyn Moran [email protected]
ì Overview
1. Dynamics in Dynamic Causal Modeling
2. Graphical Model -‐ VariaFonal Inversion
-‐ StaFsFcal Inference from VB
3. Examples -‐ ANenFon in the Human Brain
-‐ Synesthesia
Dynamic Causal Modelling
Friston et al 2003; Stephan et al 2008
Kiebel et al, 2006; Garrido et al, 2007
David et al, 2006; Moran et al, 2007
dxdt
Time Series
DCM is not intended for ‘modelling’ DCM is an analysis framework for empirical data DCM uses a Fmes series to test mechanisFc hypotheses Hypotheses are constrained by the underlying dynamic generaFve (biological) model
),,( θuxFdtdx
=
Neural state equation:
Electromagnetic forward model:
neural activity→EEGMEG LFP
simple neuronal model complicated forward model
complicated neuronal model simple forward model
fMRI EEG/MEG
Hemodynamicforward model:neural activity→BOLD
Dynamic Causal Modelling (DCM)
DCM for fMRI
u1 A(1,1)
A(2,1)
A(1,2)
A(2,2)
x1
x = (A+uB)x +Cuy = g(x,H )+εε ~ N(0,σ )
u2 B(1,2)
H{1}
y
H{2}
y
x2
C(1)
),,( θuxFdtdx
=
x1 x2 x3
System states xt
ConnecFvity parameters θ
Inputs ut
Aim: model temporal evoluFon of a set of neuronal states xt
Neuronal model
State changes are dependent on:
– the current state x – external inputs u – its connecFvity θ
Example: a linear model of interacFng visual regions
Visual input in the visual field -‐ le\ (LVF) -‐ right (RVF) LG = lingual gyrus FG = fusiform gyrus
LG le\
LG right
RVF LVF
FG right
FG le\
x1 x2
x4 x3
u2 u1
x3 = a31x1 + a33x3 + a34x4
x1 = a11x1 + a12x2 + a13x3 + c12u2
x4 = a42x2 + a43x3 + a44x4
x2 = a21x1 + a22x2 + a24x4 + c21u1
Example: a linear model of interacFng visual regions
x1 = a11x1 + a12x2 + a13x3 + c12u2x2 = a21x1 + a22x2 + a24x4 + c21u1x3 = a31x1 + a33x3 + a34x4x4 = a42x2 + a43x3 + a44x4
Visual input in the visual field -‐ le\ (LVF) -‐ right (RVF) LG = lingual gyrus FG = fusiform gyrus
LG le\
LG right
RVF LVF
FG right
FG le\
x1 x2
x4 x3
u2 u1
Visual input in the visual field -‐ le\ (LVF) -‐ right (RVF) LG = lingual gyrus FG = fusiform gyrus
Example: a linear model of interacFng visual regions
state changes
effective connectivity
externalinputs
systemstate
input parameters
x1x2x3x4
!
"
######
$
%
&&&&&&
=
a11 a12 a13 0
a21 a22 0 a24a31 0 a33 a340 a42 a43 a44
!
"
######
$
%
&&&&&&
x1x2x3x4
!
"
######
$
%
&&&&&&
+
0c2100
c12000
!
"
#####
$
%
&&&&&
u1u2
!
"
##
$
%
&&
x = Ax +Cu
},{ CA=θ
LG le\
LG right
RVF LVF
FG right
FG le\
x1 x2
x4 x3
u2 u1
Example: a linear model of interacFng visual regions
LG le\
LG right
RVF LVF
FG right
FG le\
x1 x2
x4 x3
u2 u1
ATTENTION u3
x = (A+ u jB( j )
j=1
m
∑ )x +Cu
x1x2x3x4
!
"
######
$
%
&&&&&&
=
a11 a12 a13 0
a21 a22 0 a24a31 0 a33 a340 a42 a43 a44
!
"
######
$
%
&&&&&&
+u3
0 b12(3) 0 0
0 0 0 00 0 0 b34
(3)
0 0 0 0
!
"
#####
$
%
&&&&&
'
(
)))
*
)))
+
,
)))
-
)))
x1x2x3x4
!
"
######
$
%
&&&&&&
+
0c2100
c12000
0000
!
"
#####
$
%
&&&&&
u1u2u3
!
"
####
$
%
&&&&
DeterminisFc Bilinear DCM
CuxBuAdtdx m
i
ii +⎟
⎠
⎞⎜⎝
⎛+= ∑
=1
)(
Bilinear state equation:
driving input
modulation
...)0,(),(2
0 +∂∂
∂+
∂∂
+∂∂
+≈= uxuxfu
ufx
xfxfuxf
dtdx
Simply a two-dimensional taylor expansion (around x0=0, u0=0):
A= ∂f∂x u=0
C = ∂f∂u x=0
B = ∂2 f∂x∂u
u2
u1
x1
x2
stimulus u1
context u2
x1
x2
21a
Context-‐dependent enhancement
( )
( ) ⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡+⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡+⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
++=
2
111
2
1221
22
1
2221
11
2
1
22
000
0000
uuc
xx
bu
xx
aaa
xx
CuxBuAxx
endogenous connecFvity
direct inputs
modulaFon of connecFvity
Neural state equaFon CuxBuAx jj ++= ∑ )( )(
ux
C
xx
uB
xx
A
j
j
∂
∂=
∂
∂
∂
∂=
∂
∂=
)(
hemodynamic model H
x
y
integraFon
Stephan & Friston (2007), Handbook of Brain Connectivity
BOLD y y y
ac#vity x1(t)
ac#vity x2(t) ac#vity
x3(t)
Neuronal states
t
driving input u1(t)
modulatory input u2(t)
t
DCM for fMRI: the full picture
ì Cognitive system is modelled at its underlying neuronal level (not directly accessible for fMRI).
ì The modelled neuronal dynamics (x) are transformed into area-specific BOLD signals (y) by a hemodynamic model (λ).
ì Overcomes regional variability at the hemodynamic level
ì DCM not based on temporal precedence at measurement level
DCM: Neuronal and hemodynamic level
hemodynamic model
H
x
y
integraFon
The hemodynamic “Balloon” model
ì 3 hemodynamic parameters
ì Region-‐specific HRFs
ì Important for model fibng, but of no interest
Hemodynamic model
Z: neuronal activity Y: BOLD response
y represents the simulated observaFon of the bold response, including noise, i.e.
y = h(u,θ)+e
BOLD
(with noise added)
BOLD
(with noise added)
y1
y2
u1
u2 z1
z2
0 20 40 60
024
0 20 40 60
024
seconds
Haemodynamics: reciprocal connections
BOLD
with
Noise added
BOLD
with
Noise added
y1
y2
blue: neuronal activity red: bold response
u1
u2 z1
z2
euhy ),( y represents simulated observation of BOLD response, i.e. includes noise
ì Overview
1. Dynamics in Dynamic Causal Modeling
2. Graphical Model -‐ Varia#onal Inversion
-‐ Bayesian Sta#s#cal Inference from VB
3. Examples -‐ ANenFon in the Human Brain
-‐ Synesthesia
y1
y2
u1
u2 x1
x2
EsFmate neural & hemodynamic parameters such that the MODELLED and MEASURED BOLD signals are similar (model evidence is opFmised), using variaFonal bayes under mean field: P(X, λ, A, B, C | Y)
Parameter estimation: Bayesian inversion
Recall from Tuesday
Main Issues in PGMs
• Representa#on -‐ How do we store P(X1, X2, …, XN) -‐ What does my model mean/imply/assume? (SemanFcs)
• Inference
-‐ How do I answer quesFons/queries with my model, such as -‐ Marginal EsFmaFon: P(X5 | X1, X4) -‐ Most Probable ExplanaFon: argmax P(X1, X2, …, XN)
• Learning
-‐ How do we learn parameters and structure of P(X1, X2, …, XN) from data -‐ What is the right model for my data?
VB: A procedure to do inference: That implicitly ‘does double duty’ in Directed Graphs!
Key Results for VB
• Approximate Inference using constrained opFmizaFon
• Where: The approximaFon arises from construcFng an approximaFng distribuFon over X: q(X) which is closest in p(X) “in the KL sense” • Derived a cost funcFon Which can be maximized • And is equivalent to minimizing KL(q|p)
)|(ln pqKLZF −=
[ ]qHFq+=∑
φ
φln
• Z: ParFFon FuncFon; a normalizaFon funcFon equal to the probability of the evidence in directed graphs
Key Result for Mean-‐Field, Structured VB
• The structured variaFonal approach aims to opFmize F over a coherent distribuFon q (ie. giving a proper joint distribuFon), at the expense of capturing all the informaFon in p.
• Assume the approximaFng or proposal density factorizes over groups of
parameters -‐ where this factorizaFon is a relaxaBon (a superspace) of the space of true marginals.
• Approximate q using a factorizaFon • Found iteraFve update equaFons for q using fixed point soluFons • F is a guaranteed lower bound on ln(Z)
∏=i
ixqXq )()(
ZxIxq i
i)](exp[)( =
)|(ln pqKLZF −=
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
Dynamics
DCM: Probabilistic Graphical Model Representation
y1(t)
x1(t) y1
y2
u1
u2 x1
x2
b12 a12
y1(t+1)
x1(t+1)
y1(t+2)
y2(t) y2(t+1) y2(t+2)
x1(t+2)
x2(t+1) x2(t+2) x2(t)
Dynamics
Causal Links expressed through implicit delays, which makes the graph a Directed Acyclic Graph
DAG
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
x
Dynamics
A
B
H
C
λ
)),,,,,((),,,,( IHXCBAfNHXCBAyp λ→N
N
N =Time steps x # Regions
DAG
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
Dynamics
A
B
H
C
λ
)),,,,,((),,,,( IHXCBAfNHXCBAyp λ→N N =Time steps x # Regions
Bayes Net: PGM
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
Dynamics
A
B
H
C
λ
)),,,,,((),,,,( IHXCBAfNHXCBAyp λ→N N =Time steps x # Regions
Bayes Net: ProbabilisFc Graphical Model
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
Dynamics
θ
λ
Goal: Find the set of latent variables θ, given y: p(θ|y) Ie. inference or Query for the marginal distribuFon of the connecFvity parameters given data, marginalized w.r.t noise parameter
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
Dynamics
θ
λ
Given this type of graph we know: )(
),()()(),(
ypyppp
ypλθλθ
λθ = θ λ |y
Goal: Find the set of latent variables θ, given y: p(θ|y) Ie. inference or Query for the marginal distribuFon of the connecFvity parameters given data, marginalized w.r.t noise parameter
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
Dynamics
θ
λ
Given this type of graph we know: )(
),()()(),(
ypyppp
ypλθλθ
λθ = and θ λ |y
But Employ ApproximaFng Density q, Using the mean field structure:
Where:
Goal: Find the set of latent variables θ, given y: p(θ|y) Ie. inference or Query for the marginal distribuFon of the connecFvity parameters given data, marginalized w.r.t noise parameter
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
y
Dynamics
θ
λ
But Employ ApproximaFng Density q, Using the mean field structure:
Given this type of graph we know:
Where:
)(),()()(
),(ypyppp
ypλθλθ
λθ =
)()(),( yqyqyp λθλθ =
),0()(
),()(
INyq
Nyq
λλ
µθ
→
∑→
Goal: Find the set of latent variables θ, given y: p(θ|y) Ie. inference or Query for the marginal distribuFon of the connecFvity parameters given data, marginalized w.r.t noise parameter
DCM: Probabilistic Graphical Model Representation
y1
y2
u1
u2 x1
x2
b12 a12
Dynamics
y
θ
λ
But Employ ApproximaFng Density q, Using the mean field structure:
Given this type of graph we know:
Where:
)(),()()(
),(ypyppp
ypλθλθ
λθ =
)()(),( yqyqyp λθλθ =
),0()(
),()(
INyq
Nyq
λλ
µθ
→
∑→
Goal: Find the set of latent variables θ, given y: p(θ|y) Ie. inference or Query for the marginal distribuFon of the connecFvity parameters given data, marginalized w.r.t noise parameter
DCM: Probabilistic Graphical Model Representation
y
θ
λ
Goal: Find the set of latent variables θ, given y,
Daunizeau et al. 2009
),( yp λθ
)()( yqyq λθ
• Assuming Independence of parameters & hyperparameters • And a Gaussian form on the PDF
)( yq θ)( yq λ
VB with a mean-‐field approximaFon
( ) ( ) ( )
( ) ( ) ( )
( )
( )
exp exp ln , ,
exp exp ln , ,
q
q
q I p y
q I p y
θ λ
λ θ
θ θ λ
λ θ λ
⎡ ⎤∝ =⎣ ⎦
⎡ ⎤∝ =⎣ ⎦
� IteraFve updaFng of sufficient staFsFcs of approx. posteriors by gradient ascent.
� Mean field approx.
� Free-‐energy approx. to model evidence.
� Fixed point soluFons for two factors
))|,()|,((),,(ln ypyqKLypFq
λθλθλθ −=
)()(),( yqyqyp λθλθ =
5 10 15 20 25 30 35 40
5
10
15
20
25
30
35
40
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
A
B C
θh
ε
Stephan et al. (2007) NeuroImage
How independent are neural and hemodynamic parameter esFmates?
),()( Σ→ µθ Nyq
Regional responses Specify generaFve forward model
(with prior distribuFons of parameters)
VariaFonal ExpectaFon-‐MaximizaFon algorithm
IteraFve procedure: 1. Compute model response using current set of parameters
2. Compare model response with data 3. Improve parameters, if possible
1. Gaussian posterior distribuFons of parameters
2. Model evidence )|( myp
),|( myp θ
µθ|y
Roadmap inversion
ì Gaussian assumpFons about the posterior distribuFons of the parameters
ì posterior probability that a certain parameter (or contrast of parameters) is above a chosen threshold γ:
ì By default, γ is chosen as zero – the prior ("does the effect exist?").
Inference about DCM parameters: Bayesian single subject analysis
NiiN
iN
yy
N
iyyyy
N
iyyy
,...,|1
|1|,...,|
1
1|
1,...,|
11
1
θθθθ
θθ
µµ Σ⎟⎠
⎞⎜⎝
⎛Σ=
Σ=Σ
∑
∑
=
−
=
−−
group posterior covariance
individual posterior covariances
group posterior mean
individual posterior covariances and means
FFX group analysis
ì Likelihood distribuFons from different subjects are independent
ì Under Gaussian assumpFons, this is easy to compute
ì Simply ‘weigh’ each subject’s contribuFon by your certainty of the parameter
Inference about DCM parameters: Bayesian parameter averaging
Separate fibng of idenFcal models for each subject
SelecFon of parameters of interest
one-‐sample t-‐test: parameter > 0 ?
paired t-‐test: parameter 1 > parameter 2 ?
rmANOVA: e.g. in case of mulFple sessions per subject
Inference about DCM parameters: RFX analysis (frequentist)
ì ‘Summary StaFsFc Approach’
∑∑ −=kk
mypmypBF )(ln)(ln 212,1
Fixed Effects Model selection via
log Group Bayes factor:
accounts for both accuracy and complexity of the model
allows for inference about structure (generalisability) of the model
( | , )p r y α
Random Effects Model selection
via Model probability:
)( 1 Kkqkr ααα ++= …
ì Prior / instead of to inference on parameters
ì Which of various mechanisms / models best explains my data
ì Use model evidence
Inference about models: Bayesian model comparison
Bayes factors
)|()|(
2
112 myp
mypB =
For a given dataset, to compare two models, we compare their evidences.
B12 p(m1|y) Evidence
1 to 3 50-75% weak
3 to 20 75-95% positive
20 to 150 95-99% strong
≥ 150 ≥ 99% Very strong
Kass & Ra\ery classificaFon:
Kass & Ra\ery 1995, J. Am. Stat. Assoc.
or their log evidences
2112)ln( FFB −≈ Ketamine modulates: 1. All extrinsic connecFons, 2. Intrinsic NMDA and 3. Inhibitory / Modulatory processes (one of the red
arrows) : use log bayes factors
Bayesian Model Comparison One other way to view F!!
( ) ( )[ ]mypqKLmypF ,|,)|(log θθ−=
[ ]
( ) ( )θθθθθθθ µµµµ
θθ
−Σ−+Σ−Σ= −y
Tyy
mpqKL
|1
|| 21ln
21ln
21
)|(),(
Accuracy -‐ Complexity
The complexity term of F is higher the more independent the prior parameters (↑ effective DFs)
the more dependent the posterior parameters
the more the posterior mean deviates from the prior mean
y1
y2
u1
u2 z1
z2
ì Overview
1. Dynamics in Dynamic Causal Modeling
2. Graphical Model -‐ VariaFonal Inversion
-‐ StaFsFcal Inference from VB
3. Examples -‐ ANenFon in the Human Brain
-‐ Synesthesia
Example: Attention to motion
Friston et al. (2003) NeuroImage
V1
V5
SPC Photic
Motion
Time [s]
Attention
We used this model to assess the site of attention modulation during visual motion processing in an fMRI paradigm reported by Büchel & Friston.
Friston et al. 2003, NeuroImage
Attention to motion in the visual system
- fixation only - observe static dots + photic V1 - observe moving dots + motion V5 - task on moving dots + attention V5 + parietal cortex
?
m1 m2
V1 V5 stim
PPC
Modulation By attention
V1 V5 External stim
PPC
Modulation By attention
m3
V1 V5 stim
PPC
Modulation By attention
m4
V1 V5 stim
PPC
Modulation By attention
V1 V5 stim
PPC
attention
1.25
0.13
0.46
0.39 0.26
0.26
0.10
estimated effective synaptic strengths
for best model (m4)
models marginal likelihood ln p y m( )
Bayesian model selection
V1 V5 stim
PPC
attention
motion -2 -1 0 1 2 3 4 50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
%1.99)|0( 1,5 => yDp PPCVV
1.25
0.13
0.46
0.39 0.26
0.50
0.26
0.10
MAP = 1.25
Parameter inference
Stephan et al. 2008, NeuroImage
V1 V5 PPC
observed fiNed
moFon & aNenFon
moFon & no aNenFon
staFc dots
Data fits
ì Specific sensory sFmuli lead to unusual, addiFonal experiences
ì Grapheme-‐color synesthesia: color
ì Involuntary, automaFc; stable over Fme, prevalence ~4%
ì PotenFal cause: aberrant cross-‐ac#va#on between brain areas ì grapheme encoding area ì color area V4 ì superior parietal lobule (SPL)
Example 2: Brain Connectivity in Synesthesia
Hubbard, 2007
Can changes in effecFve connecFvity explain synesthesia acFvity in V4?
DCM of Synesthesia
Models
Hubbard, 2007
Van Leeuwen, den Ouden, Hagoort (2011) JNeurosci
DCM of Synesthesia
Van Leeuwen, den Ouden, Hagoort (2011) JNeurosci
Model Evidence: F ≤ Z
Relative model evidence predicts sensory experience
Van Leeuwen, den Ouden, Hagoort (2011) JNeurosci
DCM Roadmap
fMRI data
posterior parameters
neuronal dynamics haemodynamics
model comparison
Bayesian Model
Inversion
state-‐space model
priors
Some useful references
• 10 Simple Rules for DCM (2010). Stephan et al. NeuroImage 52.
• The first DCM paper: Dynamic Causal Modelling (2003). Friston et al. NeuroImage 19:1273-‐1302.
• Physiological validaFon of DCM for fMRI: IdenFfying neural drivers with funcFonal MRI: an electrophysiological validaFon (2008). David et al. PLoS Biol. 6 2683–2697
• Hemodynamic model: Comparing hemodynamic models with DCM (2007). Stephan et al. NeuroImage 38:387-‐401
• Nonlinear DCM:Nonlinear Dynamic Causal Models for FMRI (2008). Stephan et al. NeuroImage 42:649-‐662
• Two-‐state DCM: Dynamic causal modelling for fMRI: A two-‐state model (2008). Marreiros et al. NeuroImage 39:269-‐278
• StochasFc DCM: Generalised filtering and stochasFc DCM for fMRI (2011). Li et al. NeuroImage 58:442-‐457.
• Bayesian model comparison: Comparing families of dynamic causal models (2010). Penny et al. PLoS Comput Biol. 6(3):e1000709.
5 10 15 20 25 30 35 40
5
10
15
20
25
30
35
40
-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
A
B C
θh
ε
Stephan et al. (2007) NeuroImage
How independent are neural and hemodynamic parameter esFmates?
Top Related