Model-based classification Kay Henning Brodersen · 2020. 5. 17. · evidence for sparse coding vs....
Transcript of Model-based classification Kay Henning Brodersen · 2020. 5. 17. · evidence for sparse coding vs....
Model-based classification through generative embedding
Kay Henning Brodersen
Machine Learning and Pattern Recognition Group Department of Computer Science, ETH Zurich
Computational Neuroeconomics Group Department of Economics, University of Zurich
http://people.inf.ethz.ch/bkay/
2
Computational science and psychiatry
Spectrum diseases
diverse genetic basis
strong gene-environment interactions
variability in treatment response and outcome
Consequences
multiple pathophysiological mechanisms
even when symptoms are similar, causes may differ across patients
need to infer pathophysiological mechanisms in individual patients!
4
Model-based inference on individual pathophysiology
model-based
diagnostic tests
application to brain activity data from individual patients
model of neuronal (patho)physiology
spectrum disease
type 2
type 3
type 1
diagnostic classification
5
Conventional vs. model-based classification
-10
0
10
-0.5
0
0.5
-0.1
0
0.1
0.2
0.3
0.4
-0.4
-0.2
0 -0.5
0
0.5-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-10
0
10
-0.5
0
0.5
-0.1
0
0.1
0.2
0.3
0.4
-0.4
-0.2
0 -0.5
0
0.5-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
ge
ne
rativ
e
em
be
ddin
g
L.H
G →
L.H
G
Vo
xe
l (6
4,-
24
,4)
mm
L.MGB → L.MGB Voxel (-42,-26,10) mm
Voxel (-56,-20,10) mm R.HG → L.HG
controls patients
Conventional classification Model-based classification
6
Colleagues & collaborators
Joachim M Buhmann ETH Zurich
Klaas Enno Stephan University of Zurich · University College London
Kate Lomakina University of Zurich · ETH Zurich
Alexander Leff University College London
Cheng Soon Ong ETH Zurich
Thomas Schofield University College London
7
Model-based classification
Can we exploit the rich discriminative information encoded in individual patterns of connection strengths?
Data representations in classification analyses
Structure-based classification
• mild traumatic brain injury • Alzheimer’s disease • autistic spectrum disorder • frontotemporal
dementia • mild cognitive
impairment • schizophrenia • aphasia
Activation-based classification
• depression • schizophrenia • mild cognitive
impairment
8
Prediction versus inference
The goal of prediction is to find a highly accurate encoding or decoding function.
The goal of inference is to decide between competing hypotheses about mechanisms or representations in the brain.
predicting a cognitive state using a
brain-machine interface
predicting a subject-specific
diagnostic status
comparing a model that links distributed neuronal
activity to a cognitive state with a model that does not
weighing the evidence for sparse
coding vs. dense coding
powerful discriminative algorithms for classification
mechanistically interpretable generative models of brain function
9
Model-based classification through generative embedding
Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImage Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comp Biol
step 2 — kernel construction
step 1 — model inversion
measurements from an individual
subject
subject-specific inverted generative model
subject representation in the generative score space
A → B
A → C
B → B
B → C
A
C B
step 3 — classification
separating hyperplane to discriminate between groups
A
C B
jointly discriminative connection strengths
step 4 — interpretation
-2 0 2 4 6 8-1
0
1
2
3
4
5
Voxel 1
Voxe
l 2
-0.4 -0.35 -0.3 -0.25 -0.2 -0.15-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
(1) L.MGB -> L.MGB
(14)
R.H
G -
> L
.HG
10
activity 𝑧1(𝑡)
Choosing a generative model: DCM for fMRI
intrinsic connectivity
direct inputs
modulation of connectivity
neural state equation
CuzBuAz j
j )( )(
u
zC
z
z
uB
z
zA
j
j
)(
haemodynamic forward model
𝑥 = 𝑔(𝑧, 𝜃ℎ)
observed BOLD signal
neuronal states
t
driving input 𝑢1(𝑡)
modulatory input 𝑢2(𝑡)
t
activity 𝑧2(𝑡)
activity 𝑧3(𝑡)
signal 𝑥1(𝑡)
signal 𝑥2(𝑡)
signal 𝑥3(𝑡)
Jansen & Rit (1995) Biological Cybernetics Friston, Harrison & Penny (2003) NeuroImage
Stephan & Friston (2007), Handbook of Brain Connectivity
12
Choosing a generative model: DCM for LFP/EEG
4
4
3
3
1
1
2
2
1
2
4914
41
2))(( xxuaxsHx
xx
eeee
1
2
4914
41
2))(( xxuaxsHx
xx
eeee
Excitatory spiny cells in granular layers
Exogenous input u
4
4
3
3
1
1
2
2
Intrinsic
connections5
5
Excitatory spiny cells in granular layers
Excitatory pyramidal cells in agranular layers
Inhibitory cells in agranular layers
),( uxfx
11812
10
2
1112511
1110
7
2
8938
87
2)(
2)()(
xxx
xxxSHx
xx
xxxSAAHx
xx
iiii
ee
LB
ee
1
2
4914
41
2))()(( xxCuxSAAHx
xx
ee
LF
ee
Synaptic ‘alpha’ kernelSynaptic ‘alpha’ kernel
Sigmoid functionSigmoid function
659
3
2
61246
63
2
2
51295
52
2)(
2))()()((
xxx
xxxSHx
x
xxxSxSAAHx
xx
iiii
ee
LB
ee
Extrinsic
Connections:
Forward
Backward
Lateral
Moran et al. 2009 NeuroImage
13
Training and testing a model-based classifier
n
i
n
j
n
i ijijiji xxkcc1 1 1
),(2
1)(max L
n
i iicts1
0..
niCi ,...,10
Training a kernel-based discriminant classifier:
n
i niin bxxkxf1
*
1
*
1 ),()(
))(sgn(:ˆ11 nn xfc
Using the model to make predictions:
Linear SVM
In the case of generative embedding:
𝑘 𝑥𝑖 , 𝑥𝑗 = 𝑥𝑖𝑇𝑥𝑗
14
1 ROI definition and n model inversions
unbiased estimate
Repeat n times: 1 ROI definition and n model inversions
unbiased estimate
1 ROI definition and n model inversions
slightly optimistic estimate: voxel selection for training set and test set based on test data
Repeat n times: 1 ROI definition and 1 model inversion
slightly optimistic estimate: voxel selection for training set based on test data and test labels
Repeat n times: 1 ROI definition and n model inversions
unbiased estimate
1 ROI definition and n model inversions
highly optimistic estimate: voxel selection for training set and test set based on test data and test labels
Specifying and inverting the model – how?
F
D
E C
B
A
Definition of ROIs
Are regions of interest defined anatomically or functionally?
anatomically functionally
Functional contrasts
Are the functional contrasts defined across all subjects or between groups?
across subjects
between groups
15
Model We model the likelihood functions for 𝑘+ positive and 𝑘− negative correct predictions as:
𝑝 𝑘+ 𝜋+, 𝑛+ = Bin 𝑘+ 𝜋+, 𝑛+ 𝑝 𝑘− 𝜋−, 𝑛− = Bin(𝑘−|𝜋−, 𝑛−)
The class-specific accuracies 𝜋+ and 𝜋− can be modelled as latent random variables with conjugate Beta priors:
𝑝 𝜋+ 𝛼+, 𝛽+ = Beta 𝜋+ 𝛼+, 𝛽+
𝑝 𝜋− 𝛼−, 𝛽− = Beta 𝜋− 𝛼−, 𝛽−
This prior is uninformative when using the hyperparameters 𝛼+ = 𝛽+ = 𝛼− = 𝛽− = 1. The balanced
accuracy is given by 𝜙 ≔1
2𝜋+ + 𝜋− .
Full Bayesian approach to performance evaluation
Beta(𝜋+|1,1)
Beta(𝜋+|7,3)
Brodersen, Chumbley, Mathys, Daunizeau, Ong, Buhmann & Stephan (in preparation)
𝑘+ 𝑘−
𝜋+ 𝜋−
𝛼+, 𝛽+ 𝛼−, 𝛽−
𝜋 + 𝜋 −
~
𝜙 =𝜋 + + 𝜋 −
2
Inference Inverting the model yields the posterior balanced classification accuracy,
𝑝 𝜙 𝑘+, 𝑘−, 𝑛+, 𝑛−, 𝛼+, 𝛽+, 𝛼−, 𝛽−
= Beta 2 𝜙 − 𝑧 𝛼𝑛+, 𝛽𝑛
+ Beta 2𝑧 𝛼𝑛−, 𝛽𝑛
− 𝑑𝑧1
0.
16
Summary of the analysis
pre-processing
estimation of group contrasts based on all subjects except subject j selection of voxels for regions of interest
unsupservised DCM inversion for each subject
training the SVM on all subjects except subject j testing the SVM on subject j
performance evaluation
1 2 3
repeat for each subject
A
C B
17
Example: diagnosis of moderate aphasia
18
Regions of interest
x = –56 mm y = –20 mm z = 8 mm
L R
19
Neuronal model
Schofield, Penny, Stephan, Crinion, Thompson, Price & Leff (under review) Brodersen, Schofield, Leff, Ong, Lomakina, Buhmann & Stephan (2011) PLoS Comp Biol
L.MGB
L.PT
L.HG (A1)
R.MGB
R.PT
R.HG (A1)
20
Univariate analysis
range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x)
range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x) range(d1$x, d2$x)
L.MGB → L.MGB L.MGB → L.HG L.MGB → L.PT L.HG → L.HG *** L.HG → L.PT *** L.HG → R.HG L.PT → L.MGB L.PT → L.HG
L.PT → L.PT L.PT → R.PT R.MGB → R.MGB R.MGB → R.HG R.MGB → R.PT *** R.HG → L.HG *** R.HG → R.HG R.HG → R.PT
R.PT → L.PT R.PT → R.MGB R.PT → R.HG R.PT → R.PT input to L.MGB input to R.MGB patients controls
21
Connectional fingerprints
patients controls
22
16218917734729133230781 2433893600.4
0.5
0.6
0.7
0.8
0.9
1
bala
nced
accu
racy
0 0.5 10
0.2
0.4
0.6
0.8
1
FPR (1 - specificity)T
PR
(sen
sitiv
ity)
0 0.5 10
0.2
0.4
0.6
0.8
1
TPR (recall)
PP
V (
pre
cis
ion
)
Classification performance
16218917734729133230781 2433893600.4
0.5
0.6
0.7
0.8
0.9
1
bala
nced
accu
racy
0 0.5 10
0.2
0.4
0.6
0.8
1
FPR (1 - specificity)T
PR
(sen
sitiv
ity)
0 0.5 10
0.2
0.4
0.6
0.8
1
TPR (recall)
PP
V (
pre
cis
ion
)
anatomical contrast searchlight PCA means correlations eigenvariates correlations eigenvariates z-correlations gen.embed., original model gen.embed., feedforward gen.embed., left hemisphere gen.embed., right hemisphere
activation- based
correlation- based
model- based
a c s p m e z o f l r
23
Biologically implausible models perform poorly
A B
L.MGB
L.PT
L.HG (A1)
R.MGB
R.PT
R.HG (A1)
C
L.MGB
L.PT
L.HG (A1)
R.MGB
R.PT
R.HG (A1)
auditory stimuli auditory stimuli auditory stimuli
24
Discriminative features in model space
L.MGB
L.PT
L.HG (A1)
R.MGB
R.PT
R.HG (A1)
25
Discriminative features in model space
L.MGB
L.PT
L.HG (A1)
R.MGB
R.PT
R.HG (A1)
26
Illustration of the generative score space
-10
0
10
-0.5
0
0.5
-0.1
0
0.1
0.2
0.3
0.4
-0.4
-0.2
0 -0.5
0
0.5-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
-10
0
10
-0.5
0
0.5
-0.1
0
0.1
0.2
0.3
0.4
-0.4
-0.2
0 -0.5
0
0.5-0.4
-0.35
-0.3
-0.25
-0.2
-0.15
ge
ne
rativ
e
em
be
ddin
g
L.H
G →
L.H
G
Vo
xe
l (6
4,-
24
,4)
mm
L.MGB → L.MGB Voxel (-42,-26,10) mm
Voxel (-56,-20,10) mm R.HG → L.HG
controls patients
Voxel-based input space Generative score space
27
❶ Strong classification performance
Generative embedding exploits the rich discriminative information encoded in ‘hidden’ quantities, such as coupling parameters. It may therefore outperform conventional schemes.
❷ Creation of a low-dimensional, interpretable feature space
The approach replaces high-dimensional fMRI data by a low-dimensional subject-specific fingerprint, where each dimension has a specific biological interpretation.
❸ Domains of application
Generative embedding can be used both for trial-by-trial decoding (EEG, MEG, or LFP data) and for subject-by-subject classification analyses (fMRI data).
Summary