Introduction to classifiers for multivariate decoding of fMRI data
description
Transcript of Introduction to classifiers for multivariate decoding of fMRI data
![Page 1: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/1.jpg)
Introduction to classifiers for multivariate decoding of fMRI data
Evelyn Eger
MMN 15/12/08
![Page 2: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/2.jpg)
Two directions of inference
Psychological variable
Data
1) Forward modelling:
(p-value)
Data Psychological variable
2) Decoding:
(predictionaccuracy)
![Page 3: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/3.jpg)
Two directions of inference
Inverse inference (decoding) is of special interest e.g., for brain – computer interface, automated diagnosis, etc.
In other cases the two are in principle interchangeable, both demonstrate a statistical dependency between experimental variable and data
In many paradigms applying decoding to fMRI, the direction of inference is not central for the interpretation (eg., Haynes & Rees, 2006, Kriegeskorte & Bandettini, 2007 for reviews)
Efficient, powerful methods based on decoding exist for pattern-based (multivariate) applications
![Page 4: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/4.jpg)
Univariate versus multivariate
Univariate analysis:
effects are analysed for a single dependent variablee.g., t-test, F-test, ANOVA
Special case: „mass-univariate“ analysis in brain imaging: we test effects in a large number of voxels treated as independent
Multivariate analysis:
Effects are analysed for multiple dependent variablese.g., Hotelling´s t-square test, Wilks Lambda, MANOVA
![Page 5: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/5.jpg)
Adapted from Haynes et al. 2006
Stimulus conditions:
1 2
Discrimination can be improved with higher dimensions
Significance of individual voxels not required
Why go multivariate in brain imaging
![Page 6: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/6.jpg)
Linear classification (in 2D space)
Voxel 1
Vox
el 2
b
w
Set of points xi
with labels yi Є {1,-1}
separated by ahyperplane y = wTx + b
so that yi(wxi + b) > 1
For dimensions NHyperplane N-1
![Page 7: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/7.jpg)
Linear classification (in 2D space)
Voxel 1
Vox
el 2 New data projected
onto previously learned hyperplane
Assignment to classesyi Є {1,-1}
prediction accuracy
Which hyperplane to choose ?
![Page 8: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/8.jpg)
Difference between means
w m2-m1
Corresponding to a classifier based on Euclidean distance / correlation
m2
m1
![Page 9: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/9.jpg)
Examples difference between means
From Haxby et al., 2001
used to demonstrate distinct multi-voxel
activity patterns for object categories in
ventral visual cortex (Haxby et al., 2001)
and for other recent studies on object
representation, e.g. position tolerance
(Schwarzlose et al., 2008), perceived shape
similarity (Op de Beeck et al., 2008)
![Page 10: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/10.jpg)
Difference between means
w m2-m1
Corresponding to a classifier based on Euclidean distance / correlation
not taking into account variances/covariances
m2
m1
![Page 11: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/11.jpg)
Fishers linear discriminant
w S-1(m2-m1)
S – covariance matrix
Distance measure:Mahalanobis distance
m2
m1
![Page 12: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/12.jpg)
Examples Fishers linear discriminant
Decoding of conscious and unconscious stimulus orientation from early visual cortex activity (Haynes & Rees, 2005)
Discrimination of individual faces in anterior inferotemporal cortex (Kriegeskorte et al., 2007)
From Haynes & Rees, 2006 review
From Kriegeskorte et al, 2007
![Page 13: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/13.jpg)
Fishers linear discriminant
w S-1(m2-m1)
S – covariance matrix
Distance measure:Mahalanobis distance
Curse of dimensionality:
S is not invertible when dimensionality exeeds number of data points
m2
m1
![Page 14: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/14.jpg)
w : weighted linear combination of support vectors
minimising ||w||/2 subject to yi(wxi + b) > 1, i = 1 : N
“hard-margin” classifier
Support vector machines
SupportVector
SupportVector
SupportVector
![Page 15: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/15.jpg)
“soft-margin” classifier
ξ
ξ
Support vector machines
w : weighted linear combination of support vectors
minimising ||w||/2 + C∑ξi
subject to yi(wxi + b) ≥ 1 – ξi, i = 1 : N, ξ >0
C – regularisation parameter(trade-off largest margin versus fewest misclassi-fications)
SupportVector
SupportVector
SupportVector
![Page 16: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/16.jpg)
Examples SVM
Decoding of attented orientation and motion direction from early visual cortex activity (Kamitani & Tong, 2005, 2006)
From Kamitani & Tong, 2005
![Page 17: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/17.jpg)
Support vector machines
Non-linear classifier
SupportVector
SupportVector
SupportVector
SupportVector
Use of non-linear kernel functions
Potential of overfitting, especially when few training examples available
Hardly used in fMRI
![Page 18: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/18.jpg)
Comparison of classifier performance
From Cox & Savoy, 2003
![Page 19: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/19.jpg)
Analysis work flow
1) ROI definition
...
Condition 1 Condition 2
2) Data extraction
Patternclassifier
3) Training
Object discrimination(same size)
Size generalisation(1 step)
4) Test
![Page 20: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/20.jpg)
Patternclassifier
Object discrimination(same size)
Size generalisation(1 step)
...
Condition 1 Condition 2
Analysis work flow
1) ROI definition 2) Data extraction
3) Training
4) Test
![Page 21: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/21.jpg)
ROI definition – voxel selection
Regions of interest have to be defined by orthogonal contrast
(e.g., in object exemplar discrimination experiment, LOC
localiser session, all stimuli vs baseline etc.)
if a further voxel-selection is performed based on the contrast
of interest, this has to be on training data only to avoid bias
also other criteria for voxel selection (e.g., „reproducibility“ of
voxelwise response to different conditions in separate sessions,
Grill-Spector et al., 2006, Nat Neurosci) can be biased
![Page 22: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/22.jpg)
Patternclassifier
Object discrimination(same size)
Size generalisation(1 step)
...
Condition 1 Condition 2
Analysis work flow
1) ROI definition 2) Data extraction
3) Training
4) Test
![Page 23: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/23.jpg)
Data extraction
Which data to use for classification?
No general rule, different studies used beta images or raw EPI
images
ideally as many images as possible for optimal classification
performance
in typical neuroimaging studies, there is a tradeoff between
number of images and their individual signal-to-noise ratio
fewer, but less noisy images are sometimes preferable (when
using SVM)
![Page 24: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/24.jpg)
Patternclassifier
Object discrimination(same size)
Size generalisation(1 step)
...
Condition 1 Condition 2
Analysis work flow
1) ROI definition 2) Data extraction
3) Training
4) Test
![Page 25: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/25.jpg)
Crossvalidation (Training – test)
Classifier performance always has to be tested on independent
data
Split-half crossvalidation (often used in studies employing
correlation) – one half of data for training, the other for test
Leave-one-out crossvalidation (common with other classifiers),
e.g. all but one sessions for training, remaining session for test
![Page 26: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/26.jpg)
testSVM patternclassifier
training?
…
…
Leave-one-out Crossvalidation
Leave one out with N-fold cross-validation
…
…
Condition 1
Condition 2
Block 1 : N
(all but one patterns / condition)
![Page 27: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/27.jpg)
testSVM patternclassifier
training?
…
…
(all but one patterns / condition)
Leave one out with N-fold cross-validation
…
…
Condition 1
Condition 2
Block 1 : N
Leave-one-out Crossvalidation
![Page 28: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/28.jpg)
Crossvalidation (Training – test)
Classifier performance always has to be tested on independent
data
Split-half crossvalidation (often used in studies employing
correlation) – one half of data for training, the other for test
Leave-one-out crossvalidation (common with other classifiers),
e.g. all but one sessions for training, remaining session for test
Importantly, „leave-one-out“ should mean leave one image of
each condition out (all of one session) – avoid biases due to
session effects and unequal prior probabilities (with SVM)
![Page 29: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/29.jpg)
Implementations
General SVM implementations exist in different languages:
Matlab: SVM toolbox (University of Southampton,UK) http://www.isis.ecs.soton.ac.uk/resources/svminfo
SVM toolbox (TU Graz, Austria)http://ida.first.fraunhofer.de/~anton/software.html
C: SVM-light
http://svmlight.joachims.org
Python or R
Multi - Voxel Pattern Analysis (MVPA) toolbox for fMRI data
developed at Princeton University (beta version - matlab, python)http://www.csbmb.princeton.edu/mvpa
![Page 30: Introduction to classifiers for multivariate decoding of fMRI data](https://reader035.fdocuments.in/reader035/viewer/2022062408/56813c94550346895da64292/html5/thumbnails/30.jpg)
Appendix: Distance measures
Given an m-by-n data matrix X, which is treated as m (1-by-n) row vectors x1, x2,
..., xm, the various distances between the vector xr and xs are defined as:
Euclidean distance:
Drs2 = (xr-xs)(xr-xs)´
Standardised Euclidean distance:
Drs2 = (xr-xs)D-1(xr-xs)´
D - diagonal matrix with diagonal elements given by the variance of the variable
Xi over the m objects
Mahalanobis distance:
Drs2 = (xr-xs)S-1(xr-xs)´
S - sample covariance matrix