Download - Applicationsof Variational+Bayes+&+DAGs+ inNeuroimagings14ece6504/slides/... · Stephan & Friston (2007), Handbook of Brain Connectivity y y y BOLD ac#vity x 1 (t) ac#vity x 2 (t)

Applications of Variational Bayes & DAGs in Neuroimaging

ECE 6504: Advanced Topics in Machine Learning

Rosalyn Moran [email protected]

ì Overview

1.  Dynamics in Dynamic Causal Modeling

2.  Graphical Model -‐  VariaFonal Inversion

-‐  StaFsFcal Inference from VB

3.  Examples -‐ ANenFon in the Human Brain

-‐ Synesthesia

Dynamic Causal Modelling

Friston et al 2003; Stephan et al 2008

Kiebel et al, 2006; Garrido et al, 2007

David et al, 2006; Moran et al, 2007

dxdt

Time Series

DCM is not intended for ‘modelling’ DCM is an analysis framework for empirical data DCM uses a Fmes series to test mechanisFc hypotheses Hypotheses are constrained by the underlying dynamic generaFve (biological) model

),,( θuxFdtdx

=

Neural state equation:

Electromagnetic forward model:

neural activity→EEGMEG LFP

simple neuronal model complicated forward model

complicated neuronal model simple forward model

fMRI EEG/MEG

Hemodynamicforward model:neural activity→BOLD

Dynamic Causal Modelling (DCM)

DCM for fMRI

u1 A(1,1)

A(2,1)

A(1,2)

A(2,2)

x1

x = (A+uB)x +Cuy = g(x,H )+εε ~ N(0,σ )

u2 B(1,2)

H{1}

y

H{2}

y

x2

C(1)

),,( θuxFdtdx

=

x1 x2 x3

System states xt

ConnecFvity parameters θ

Inputs ut

Aim: model temporal evoluFon of a set of neuronal states xt

Neuronal model

State changes are dependent on:

–  the current state x –  external inputs u –  its connecFvity θ

Example: a linear model of interacFng visual regions

Visual input in the visual field -‐ le\ (LVF) -‐ right (RVF) LG = lingual gyrus FG = fusiform gyrus

LG le\

LG right

RVF LVF

FG right

FG le\

x1 x2

x4 x3

u2 u1

x3 = a31x1 + a33x3 + a34x4

x1 = a11x1 + a12x2 + a13x3 + c12u2

x4 = a42x2 + a43x3 + a44x4

x2 = a21x1 + a22x2 + a24x4 + c21u1


x1 = a11x1 + a12x2 + a13x3 + c12u2x2 = a21x1 + a22x2 + a24x4 + c21u1x3 = a31x1 + a33x3 + a34x4x4 = a42x2 + a43x3 + a44x4


LG le\

LG right

RVF LVF

FG right

FG le\

x1 x2

x4 x3

u2 u1



state changes

effective connectivity

externalinputs

systemstate

input parameters

x1x2x3x4

!

"

######

$

%

&&&&&&

=

a11 a12 a13 0

a21 a22 0 a24a31 0 a33 a340 a42 a43 a44

!

"

######

$

%

&&&&&&

x1x2x3x4

!

"

######

$

%

&&&&&&

+

0c2100

c12000

!

"

#####

$

%

&&&&&

u1u2

!

"

##

$

%

&&

x = Ax +Cu

},{ CA=θ

LG le\

LG right

RVF LVF

FG right

FG le\

x1 x2

x4 x3

u2 u1


LG le\

LG right

RVF LVF

FG right

FG le\

x1 x2

x4 x3

u2 u1

ATTENTION u3

x = (A+ u jB( j )

j=1

m

∑ )x +Cu

x1x2x3x4

!

"

######

$

%

&&&&&&

=

a11 a12 a13 0

a21 a22 0 a24a31 0 a33 a340 a42 a43 a44

!

"

######

$

%

&&&&&&

+u3

0 b12(3) 0 0

0 0 0 00 0 0 b34

(3)

0 0 0 0

!

"

#####

$

%

&&&&&

'

(

)))

*

)))

+

,

)))

-

)))

x1x2x3x4

!

"

######

$

%

&&&&&&

+

0c2100

c12000

0000

!

"

#####

$

%

&&&&&

u1u2u3

!

"

####

$

%

&&&&

DeterminisFc Bilinear DCM

CuxBuAdtdx m

i

ii +⎟

⎠

⎞⎜⎝

⎛+= ∑

=1

)(

Bilinear state equation:

driving input

modulation

...)0,(),(2

0 +∂∂

∂+

∂∂

+∂∂

+≈= uxuxfu

ufx

xfxfuxf

dtdx

Simply a two-dimensional taylor expansion (around x0=0, u0=0):

A= ∂f∂x u=0

C = ∂f∂u x=0

B = ∂2 f∂x∂u

u2

u1

x1

x2

stimulus u1

context u2

x1

x2

21a

Context-‐dependent enhancement

( )

( ) ⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡+⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡=⎥

⎦

⎤⎢⎣

⎡

++=

2

111

2

1221

22

1

2221

11

2

1

22

000

0000

uuc

xx

bu

xx

aaa

xx

CuxBuAxx

endogenous connecFvity

direct inputs

modulaFon of connecFvity

Neural state equaFon CuxBuAx jj ++= ∑ )( )(

ux

C

xx

uB

xx

A

j

j

∂

∂=

∂

∂

∂

∂=

∂

∂=

)(

hemodynamic model H

x

y

integraFon

Stephan & Friston (2007), Handbook of Brain Connectivity

BOLD y y y

ac#vity x1(t)

ac#vity x2(t) ac#vity

x3(t)

Neuronal states

t

driving input u1(t)

modulatory input u2(t)

t

DCM for fMRI: the full picture

ì  Cognitive system is modelled at its underlying neuronal level (not directly accessible for fMRI).

ì  The modelled neuronal dynamics (x) are transformed into area-specific BOLD signals (y) by a hemodynamic model (λ).

ì  Overcomes regional variability at the hemodynamic level

ì  DCM not based on temporal precedence at measurement level

DCM: Neuronal and hemodynamic level

hemodynamic model

H

x

y

integraFon

The hemodynamic “Balloon” model

ì  3 hemodynamic parameters

ì  Region-‐specific HRFs

ì  Important for model fibng, but of no interest

Hemodynamic model

Z: neuronal activity Y: BOLD response

y represents the simulated observaFon of the bold response, including noise, i.e.

y = h(u,θ)+e

BOLD

(with noise added)

BOLD

(with noise added)

y1

y2

u1

u2 z1

z2

0 20 40 60

024

0 20 40 60

024

seconds

Haemodynamics: reciprocal connections

BOLD

with

Noise added

BOLD

with

Noise added

y1

y2

blue: neuronal activity red: bold response

u1

u2 z1

z2

euhy ),( y represents simulated observation of BOLD response, i.e. includes noise

ì Overview


2.   Graphical Model -‐  Varia#onal Inversion

-‐  Bayesian Sta#s#cal Inference from VB


-‐ Synesthesia

y1

y2

u1

u2 x1

x2

EsFmate neural & hemodynamic parameters such that the MODELLED and MEASURED BOLD signals are similar (model evidence is opFmised), using variaFonal bayes under mean field: P(X, λ, A, B, C | Y)

Parameter estimation: Bayesian inversion

Recall from Tuesday

Main Issues in PGMs

•  Representa#on -‐  How do we store P(X1, X2, …, XN) -‐  What does my model mean/imply/assume? (SemanFcs)

•  Inference

-‐  How do I answer quesFons/queries with my model, such as -‐  Marginal EsFmaFon: P(X5 | X1, X4) -‐  Most Probable ExplanaFon: argmax P(X1, X2, …, XN)

•  Learning

-‐  How do we learn parameters and structure of P(X1, X2, …, XN) from data -‐  What is the right model for my data?

VB: A procedure to do inference: That implicitly ‘does double duty’ in Directed Graphs!

Key Results for VB

•  Approximate Inference using constrained opFmizaFon

•  Where: The approximaFon arises from construcFng an approximaFng distribuFon over X: q(X) which is closest in p(X) “in the KL sense” •  Derived a cost funcFon Which can be maximized •  And is equivalent to minimizing KL(q|p)

)|(ln pqKLZF −=

[ ]qHFq+=∑

φ

φln

•  Z: ParFFon FuncFon; a normalizaFon funcFon equal to the probability of the evidence in directed graphs

Key Result for Mean-‐Field, Structured VB

•  The structured variaFonal approach aims to opFmize F over a coherent distribuFon q (ie. giving a proper joint distribuFon), at the expense of capturing all the informaFon in p.

•  Assume the approximaFng or proposal density factorizes over groups of

parameters -‐ where this factorizaFon is a relaxaBon (a superspace) of the space of true marginals.

•  Approximate q using a factorizaFon •  Found iteraFve update equaFons for q using fixed point soluFons •  F is a guaranteed lower bound on ln(Z)

∏=i

ixqXq )()(

ZxIxq i

i)](exp[)( =

)|(ln pqKLZF −=

DCM: Probabilistic Graphical Model Representation

y1

y2

u1

u2 x1

x2

b12 a12

Dynamics


y1(t)

x1(t) y1

y2

u1

u2 x1

x2

b12 a12

y1(t+1)

x1(t+1)

y1(t+2)

y2(t) y2(t+1) y2(t+2)

x1(t+2)

x2(t+1) x2(t+2) x2(t)

Dynamics

Causal Links expressed through implicit delays, which makes the graph a Directed Acyclic Graph

DAG


y1

y2

u1

u2 x1

x2

b12 a12

y

x

Dynamics

A

B

H

C

λ

)),,,,,((),,,,( IHXCBAfNHXCBAyp λ→N

N

N =Time steps x # Regions

DAG


y1

y2

u1

u2 x1

x2

b12 a12

y

Dynamics

A

B

H

C

λ

)),,,,,((),,,,( IHXCBAfNHXCBAyp λ→N N =Time steps x # Regions

Bayes Net: PGM


y1

y2

u1

u2 x1

x2

b12 a12

y

Dynamics

A

B

H

C

λ

)),,,,,((),,,,( IHXCBAfNHXCBAyp λ→N N =Time steps x # Regions

Bayes Net: ProbabilisFc Graphical Model


y1

y2

u1

u2 x1

x2

b12 a12

y

Dynamics

θ

λ

Goal: Find the set of latent variables θ, given y: p(θ|y) Ie. inference or Query for the marginal distribuFon of the connecFvity parameters given data, marginalized w.r.t noise parameter


y1

y2

u1

u2 x1

x2

b12 a12

y

Dynamics

θ

λ

Given this type of graph we know: )(

),()()(),(

ypyppp

ypλθλθ

λθ = θ λ |y



y1

y2

u1

u2 x1

x2

b12 a12

y

Dynamics

θ

λ

Given this type of graph we know: )(

),()()(),(

ypyppp

ypλθλθ

λθ = and θ λ |y

But Employ ApproximaFng Density q, Using the mean field structure:

Where:



y1

y2

u1

u2 x1

x2

b12 a12

y

Dynamics

θ

λ


Given this type of graph we know:

Where:

)(),()()(

),(ypyppp

ypλθλθ

λθ =

)()(),( yqyqyp λθλθ =

),0()(

),()(

INyq

Nyq

λλ

µθ

→

∑→



y1

y2

u1

u2 x1

x2

b12 a12

Dynamics

y

θ

λ


Given this type of graph we know:

Where:

)(),()()(

),(ypyppp

ypλθλθ

λθ =


),0()(

),()(

INyq

Nyq

λλ

µθ

→

∑→



y

θ

λ

Goal: Find the set of latent variables θ, given y,

Daunizeau et al. 2009

),( yp λθ

)()( yqyq λθ

•  Assuming Independence of parameters & hyperparameters •  And a Gaussian form on the PDF

)( yq θ)( yq λ

VB with a mean-‐field approximaFon

( ) ( ) ( )

( ) ( ) ( )

( )

( )

exp exp ln , ,

exp exp ln , ,

q

q

q I p y

q I p y

θ λ

λ θ

θ θ λ

λ θ λ

⎡ ⎤∝ =⎣ ⎦

⎡ ⎤∝ =⎣ ⎦

� IteraFve updaFng of sufficient staFsFcs of approx. posteriors by gradient ascent.

� Mean field approx.

� Free-‐energy approx. to model evidence.

�  Fixed point soluFons for two factors

))|,()|,((),,(ln ypyqKLypFq

λθλθλθ −=


5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

A

B C

θh

ε

Stephan et al. (2007) NeuroImage

How independent are neural and hemodynamic parameter esFmates?

),()( Σ→ µθ Nyq

Regional responses Specify generaFve forward model

(with prior distribuFons of parameters)

VariaFonal ExpectaFon-‐MaximizaFon algorithm

IteraFve procedure: 1.  Compute model response using current set of parameters

2.  Compare model response with data 3.  Improve parameters, if possible

1.  Gaussian posterior distribuFons of parameters

2.  Model evidence )|( myp

),|( myp θ

µθ|y

Roadmap inversion

ì  Gaussian assumpFons about the posterior distribuFons of the parameters

ì  posterior probability that a certain parameter (or contrast of parameters) is above a chosen threshold γ:

ì  By default, γ is chosen as zero – the prior ("does the effect exist?").

Inference about DCM parameters: Bayesian single subject analysis

NiiN

iN

yy

N

iyyyy

N

iyyy

,...,|1

|1|,...,|

1

1|

1,...,|

11

1

θθθθ

θθ

µµ Σ⎟⎠

⎞⎜⎝

⎛Σ=

Σ=Σ

∑

∑

=

−

=

−−

group posterior covariance

individual posterior covariances

group posterior mean

individual posterior covariances and means

FFX group analysis

ì  Likelihood distribuFons from different subjects are independent

ì  Under Gaussian assumpFons, this is easy to compute

ì  Simply ‘weigh’ each subject’s contribuFon by your certainty of the parameter

Inference about DCM parameters: Bayesian parameter averaging

Separate fibng of idenFcal models for each subject

SelecFon of parameters of interest

one-‐sample t-‐test: parameter > 0 ?

paired t-‐test: parameter 1 > parameter 2 ?

rmANOVA: e.g. in case of mulFple sessions per subject

Inference about DCM parameters: RFX analysis (frequentist)

ì  ‘Summary StaFsFc Approach’

∑∑ −=kk

mypmypBF )(ln)(ln 212,1

Fixed Effects Model selection via

log Group Bayes factor:

accounts for both accuracy and complexity of the model

allows for inference about structure (generalisability) of the model

( | , )p r y α

Random Effects Model selection

via Model probability:

)( 1 Kkqkr ααα ++= …

ì  Prior / instead of to inference on parameters

ì  Which of various mechanisms / models best explains my data

ì  Use model evidence

Inference about models: Bayesian model comparison

Bayes factors

)|()|(

2

112 myp

mypB =

For a given dataset, to compare two models, we compare their evidences.

B12 p(m1|y) Evidence

1 to 3 50-75% weak

3 to 20 75-95% positive

20 to 150 95-99% strong

≥ 150 ≥ 99% Very strong

Kass & Ra\ery classificaFon:

Kass & Ra\ery 1995, J. Am. Stat. Assoc.

or their log evidences

2112)ln( FFB −≈ Ketamine modulates: 1.  All extrinsic connecFons, 2.  Intrinsic NMDA and 3.  Inhibitory / Modulatory processes (one of the red

arrows) : use log bayes factors

Bayesian Model Comparison One other way to view F!!

( ) ( )[ ]mypqKLmypF ,|,)|(log θθ−=

[ ]

( ) ( )θθθθθθθ µµµµ

θθ

−Σ−+Σ−Σ= −y

Tyy

mpqKL

|1

|| 21ln

21ln

21

)|(),(

Accuracy -‐ Complexity

The complexity term of F is higher the more independent the prior parameters (↑ effective DFs)

the more dependent the posterior parameters

the more the posterior mean deviates from the prior mean

y1

y2

u1

u2 z1

z2

ì Overview


2.  Graphical Model -‐  VariaFonal Inversion

-‐  StaFsFcal Inference from VB


-‐ Synesthesia

Example: Attention to motion

Friston et al. (2003) NeuroImage

V1

V5

SPC Photic

Motion

Time [s]

Attention

We used this model to assess the site of attention modulation during visual motion processing in an fMRI paradigm reported by Büchel & Friston.

Friston et al. 2003, NeuroImage

Attention to motion in the visual system

- fixation only - observe static dots + photic V1 - observe moving dots + motion V5 - task on moving dots + attention V5 + parietal cortex

?

m1 m2

V1 V5 stim

PPC

Modulation By attention

V1 V5 External stim

PPC


m3

V1 V5 stim

PPC


m4

V1 V5 stim

PPC


V1 V5 stim

PPC

attention

1.25

0.13

0.46

0.39 0.26

0.26

0.10

estimated effective synaptic strengths

for best model (m4)

models marginal likelihood ln p y m( )

Bayesian model selection

V1 V5 stim

PPC

attention

motion -2 -1 0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

%1.99)|0( 1,5 => yDp PPCVV

1.25

0.13

0.46

0.39 0.26

0.50

0.26

0.10

MAP = 1.25

Parameter inference

Stephan et al. 2008, NeuroImage

V1 V5 PPC

observed fiNed

moFon & aNenFon

moFon & no aNenFon

staFc dots

Data fits

ì  Specific sensory sFmuli lead to unusual, addiFonal experiences

ì  Grapheme-‐color synesthesia: color

ì  Involuntary, automaFc; stable over Fme, prevalence ~4%

ì  PotenFal cause: aberrant cross-‐ac#va#on between brain areas ì  grapheme encoding area ì  color area V4 ì  superior parietal lobule (SPL)

Example 2: Brain Connectivity in Synesthesia

Hubbard, 2007

Can changes in effecFve connecFvity explain synesthesia acFvity in V4?

DCM of Synesthesia

Models

Hubbard, 2007

Van Leeuwen, den Ouden, Hagoort (2011) JNeurosci

DCM of Synesthesia


Model Evidence: F ≤ Z

Relative model evidence predicts sensory experience


DCM Roadmap

fMRI data

posterior parameters

neuronal dynamics haemodynamics

model comparison

Bayesian Model

Inversion

state-‐space model

priors

Some useful references

•  10 Simple Rules for DCM (2010). Stephan et al. NeuroImage 52.

•  The first DCM paper: Dynamic Causal Modelling (2003). Friston et al. NeuroImage 19:1273-‐1302.

•  Physiological validaFon of DCM for fMRI: IdenFfying neural drivers with funcFonal MRI: an electrophysiological validaFon (2008). David et al. PLoS Biol. 6 2683–2697

•  Hemodynamic model: Comparing hemodynamic models with DCM (2007). Stephan et al. NeuroImage 38:387-‐401

•  Nonlinear DCM:Nonlinear Dynamic Causal Models for FMRI (2008). Stephan et al. NeuroImage 42:649-‐662

•  Two-‐state DCM: Dynamic causal modelling for fMRI: A two-‐state model (2008). Marreiros et al. NeuroImage 39:269-‐278

•  StochasFc DCM: Generalised filtering and stochasFc DCM for fMRI (2011). Li et al. NeuroImage 58:442-‐457.

•  Bayesian model comparison: Comparing families of dynamic causal models (2010). Penny et al. PLoS Comput Biol. 6(3):e1000709.

5 10 15 20 25 30 35 40

5

10

15

20

25

30

35

40

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

A

B C

θh

ε

Stephan et al. (2007) NeuroImage

How independent are neural and hemodynamic parameter esFmates?