Model-based classification Kay Henning Brodersen · 2020. 5. 17. · evidence for sparse coding vs....

Model-based classification through generative embedding

Kay Henning Brodersen

Machine Learning and Pattern Recognition Group Department of Computer Science, ETH Zurich

Computational Neuroeconomics Group Department of Economics, University of Zurich

http://people.inf.ethz.ch/bkay/

2

Computational science and psychiatry

Spectrum diseases

diverse genetic basis

strong gene-environment interactions

variability in treatment response and outcome

Consequences

multiple pathophysiological mechanisms

even when symptoms are similar, causes may differ across patients

need to infer pathophysiological mechanisms in individual patients!

4

Model-based inference on individual pathophysiology

model-based

diagnostic tests

application to brain activity data from individual patients

model of neuronal (patho)physiology

spectrum disease

type 2

type 3

type 1

diagnostic classification

5

Conventional vs. model-based classification

-10

0

10

-0.5

0

0.5

-0.1

0

0.1

0.2

0.3

0.4

-0.4

-0.2

0 -0.5

0

0.5-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-10

0

10

-0.5

0

0.5

-0.1

0

0.1

0.2

0.3

0.4

-0.4

-0.2

0 -0.5

0

0.5-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

ge

ne

rativ

e

em

be

ddin

g

L.H

G →

L.H

G

Vo

xe

l (6

4,-

24

,4)

mm

L.MGB → L.MGB Voxel (-42,-26,10) mm

Voxel (-56,-20,10) mm R.HG → L.HG

controls patients

Conventional classification Model-based classification

6

Colleagues & collaborators

Joachim M Buhmann ETH Zurich

Klaas Enno Stephan University of Zurich · University College London

Kate Lomakina University of Zurich · ETH Zurich

Alexander Leff University College London

Cheng Soon Ong ETH Zurich

Thomas Schofield University College London

7

Model-based classification

Can we exploit the rich discriminative information encoded in individual patterns of connection strengths?

Data representations in classification analyses

Structure-based classification

• mild traumatic brain injury • Alzheimer’s disease • autistic spectrum disorder • frontotemporal

dementia • mild cognitive

impairment • schizophrenia • aphasia

Activation-based classification

• depression • schizophrenia • mild cognitive

impairment

8

Prediction versus inference

The goal of prediction is to find a highly accurate encoding or decoding function.

The goal of inference is to decide between competing hypotheses about mechanisms or representations in the brain.

predicting a cognitive state using a

brain-machine interface

predicting a subject-specific

diagnostic status

comparing a model that links distributed neuronal

activity to a cognitive state with a model that does not

weighing the evidence for sparse

coding vs. dense coding

powerful discriminative algorithms for classification

mechanistically interpretable generative models of brain function

9

Model-based classification through generative embedding

Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comp Biol

step 2 — kernel construction

step 1 — model inversion

measurements from an individual

subject

subject-specific inverted generative model

subject representation in the generative score space

A → B

A → C

B → B

B → C

A

C B

step 3 — classification

separating hyperplane to discriminate between groups

A

C B

jointly discriminative connection strengths

step 4 — interpretation

-2 0 2 4 6 8-1

0

1

2

3

4

5

Voxel 1

Voxe

l 2

-0.4 -0.35 -0.3 -0.25 -0.2 -0.15-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

(1) L.MGB -> L.MGB

(14)

R.H

G -

> L

.HG

10

activity 𝑧1(𝑡)

Choosing a generative model: DCM for fMRI

intrinsic connectivity

direct inputs

modulation of connectivity

neural state equation

CuzBuAz j

j )( )(

u

zC

z

z

uB

z

zA

j

j

)(

haemodynamic forward model

𝑥 = 𝑔(𝑧, 𝜃ℎ)

observed BOLD signal

neuronal states

t

driving input 𝑢1(𝑡)

modulatory input 𝑢2(𝑡)

t



signal 𝑥1(𝑡)

signal 𝑥2(𝑡)

signal 𝑥3(𝑡)

Jansen & Rit (1995) Biological Cybernetics Friston, Harrison & Penny (2003) NeuroImage

Stephan & Friston (2007), Handbook of Brain Connectivity

12

Choosing a generative model: DCM for LFP/EEG

4

4

3

3

1

1

2

2

1

2

4914

41

2))(( xxuaxsHx

xx

eeee

1

2

4914

41

2))(( xxuaxsHx

xx

eeee

Excitatory spiny cells in granular layers

Exogenous input u

4

4

3

3

1

1

2

2

Intrinsic

connections5

5

Excitatory spiny cells in granular layers

Excitatory pyramidal cells in agranular layers

Inhibitory cells in agranular layers

),( uxfx

11812

10

2

1112511

1110

7

2

8938

87

2)(

2)()(

xxx

xxxSHx

xx

xxxSAAHx

xx

iiii

ee

LB

ee

1

2

4914

41

2))()(( xxCuxSAAHx

xx

ee

LF

ee

Synaptic ‘alpha’ kernelSynaptic ‘alpha’ kernel

Sigmoid functionSigmoid function

659

3

2

61246

63

2

2

51295

52

2)(

2))()()((

xxx

xxxSHx

x

xxxSxSAAHx

xx

iiii

ee

LB

ee

Extrinsic

Connections:

Forward

Backward

Lateral

Moran et al. 2009 NeuroImage

13

Training and testing a model-based classifier

n

i

n

j

n

i ijijiji xxkcc1 1 1

),(2

1)(max L

n

i iicts1

0..

niCi ,...,10

Training a kernel-based discriminant classifier:

n

i niin bxxkxf1

*

1

*

1 ),()(

))(sgn(:ˆ11 nn xfc

Using the model to make predictions:

Linear SVM

In the case of generative embedding:

𝑘 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖𝑇𝑥𝑗

14

1 ROI definition and n model inversions

unbiased estimate

Repeat n times: 1 ROI definition and n model inversions

unbiased estimate


slightly optimistic estimate: voxel selection for training set and test set based on test data

Repeat n times: 1 ROI definition and 1 model inversion

slightly optimistic estimate: voxel selection for training set based on test data and test labels

Repeat n times: 1 ROI definition and n model inversions

unbiased estimate


highly optimistic estimate: voxel selection for training set and test set based on test data and test labels

Specifying and inverting the model – how?

F

D

E C

B

A

Definition of ROIs

Are regions of interest defined anatomically or functionally?

anatomically functionally

Functional contrasts

Are the functional contrasts defined across all subjects or between groups?

across subjects

between groups

15

Model We model the likelihood functions for 𝑘+ positive and 𝑘− negative correct predictions as:

𝑝 𝑘+ 𝜋+, 𝑛+ = Bin 𝑘+ 𝜋+, 𝑛+ 𝑝 𝑘− 𝜋−, 𝑛− = Bin(𝑘−|𝜋−, 𝑛−)

The class-specific accuracies 𝜋+ and 𝜋− can be modelled as latent random variables with conjugate Beta priors:

𝑝 𝜋+ 𝛼+, 𝛽+ = Beta 𝜋+ 𝛼+, 𝛽+

𝑝 𝜋− 𝛼−, 𝛽− = Beta 𝜋− 𝛼−, 𝛽−

This prior is uninformative when using the hyperparameters 𝛼+ = 𝛽+ = 𝛼− = 𝛽− = 1. The balanced

accuracy is given by 𝜙 ≔1

2𝜋+ + 𝜋− .

Full Bayesian approach to performance evaluation

Beta(𝜋+|1,1)

Beta(𝜋+|7,3)

Brodersen, Chumbley, Mathys, Daunizeau, Ong, Buhmann & Stephan (in preparation)

𝑘+ 𝑘−

𝜋+ 𝜋−

𝛼+, 𝛽+ 𝛼−, 𝛽−

𝜋 + 𝜋 −

~

𝜙 =𝜋 + + 𝜋 −

2

Inference Inverting the model yields the posterior balanced classification accuracy,

𝑝 𝜙 𝑘+, 𝑘−, 𝑛+, 𝑛−, 𝛼+, 𝛽+, 𝛼−, 𝛽−

= Beta 2 𝜙 − 𝑧 𝛼𝑛+, 𝛽𝑛

+ Beta 2𝑧 𝛼𝑛−, 𝛽𝑛

− 𝑑𝑧1

0.

16

Summary of the analysis

pre-processing

estimation of group contrasts based on all subjects except subject j selection of voxels for regions of interest

unsupservised DCM inversion for each subject

training the SVM on all subjects except subject j testing the SVM on subject j

performance evaluation

1 2 3

repeat for each subject

A

C B

17

Example: diagnosis of moderate aphasia

18

Regions of interest

x = –56 mm y = –20 mm z = 8 mm

L R

19

Neuronal model

Schofield, Penny, Stephan, Crinion, Thompson, Price & Leff (under review) Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann & Stephan (2011) PLoS Comp Biol

L.MGB

L.PT

L.HG (A1)

R.MGB

R.PT

R.HG (A1)

20

Univariate analysis

range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x)

range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x)

L.MGB → L.MGB L.MGB → L.HG L.MGB → L.PT L.HG → L.HG *** L.HG → L.PT *** L.HG → R.HG L.PT → L.MGB L.PT → L.HG

L.PT → L.PT L.PT → R.PT R.MGB → R.MGB R.MGB → R.HG R.MGB → R.PT *** R.HG → L.HG *** R.HG → R.HG R.HG → R.PT

R.PT → L.PT R.PT → R.MGB R.PT → R.HG R.PT → R.PT input to L.MGB input to R.MGB patients controls

21

Connectional fingerprints

patients controls

22

16218917734729133230781 2433893600.4

0.5

0.6

0.7

0.8

0.9

1

bala

nced

accu

racy

0 0.5 10

0.2

0.4

0.6

0.8

1

FPR (1 - specificity)T

PR

(sen

sitiv

ity)

0 0.5 10

0.2

0.4

0.6

0.8

1

TPR (recall)

PP

V (

pre

cis

ion

)

Classification performance

16218917734729133230781 2433893600.4

0.5

0.6

0.7

0.8

0.9

1

bala

nced

accu

racy

0 0.5 10

0.2

0.4

0.6

0.8

1

FPR (1 - specificity)T

PR

(sen

sitiv

ity)

0 0.5 10

0.2

0.4

0.6

0.8

1

TPR (recall)

PP

V (

pre

cis

ion

)

anatomical contrast searchlight PCA means correlations eigenvariates correlations eigenvariates z-correlations gen.embed., original model gen.embed., feedforward gen.embed., left hemisphere gen.embed., right hemisphere

activation- based

correlation- based

model- based

a c s p m e z o f l r

23

Biologically implausible models perform poorly

A B

L.MGB

L.PT

L.HG (A1)

R.MGB

R.PT

R.HG (A1)

C

L.MGB

L.PT

L.HG (A1)

R.MGB

R.PT

R.HG (A1)

auditory stimuli auditory stimuli auditory stimuli

24

Discriminative features in model space

L.MGB

L.PT

L.HG (A1)

R.MGB

R.PT

R.HG (A1)

25

Discriminative features in model space

L.MGB

L.PT

L.HG (A1)

R.MGB

R.PT

R.HG (A1)

26

Illustration of the generative score space

-10

0

10

-0.5

0

0.5

-0.1

0

0.1

0.2

0.3

0.4

-0.4

-0.2

0 -0.5

0

0.5-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-10

0

10

-0.5

0

0.5

-0.1

0

0.1

0.2

0.3

0.4

-0.4

-0.2

0 -0.5

0

0.5-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

ge

ne

rativ

e

em

be

ddin

g

L.H

G →

L.H

G

Vo

xe

l (6

4,-

24

,4)

mm

L.MGB → L.MGB Voxel (-42,-26,10) mm

Voxel (-56,-20,10) mm R.HG → L.HG

controls patients

Voxel-based input space Generative score space

27

❶ Strong classification performance

Generative embedding exploits the rich discriminative information encoded in ‘hidden’ quantities, such as coupling parameters. It may therefore outperform conventional schemes.

❷ Creation of a low-dimensional, interpretable feature space

The approach replaces high-dimensional fMRI data by a low-dimensional subject-specific fingerprint, where each dimension has a specific biological interpretation.

❸ Domains of application

Generative embedding can be used both for trial-by-trial decoding (EEG, MEG, or LFP data) and for subject-by-subject classification analyses (fMRI data).

Summary

Model-based classification Kay Henning Brodersen · 2020. 5. 17. · evidence for sparse coding vs....

Documents

Transcript of Model-based classification Kay Henning Brodersen · 2020. 5. 17. · evidence for sparse coding vs....