Single session analysis using FEAT
description
Transcript of Single session analysis using FEAT
Single session analysis using FEAT
David Field
Thanks to….Tom Johnstone, Jason Gledhill,
FMRIB
Single “session” or “first level” FMRI analysis
• In FMRI you begin with 1000’s of time courses, one for each voxel location
• Lots of “preprocessing” is applied to maximise the quality of the data
• Then, each voxel time course is modelled independently– the model is a set of regressors
(EV’s) that vary over time– the same model is applied to
every voxel time course– if (part of) the model fits the
voxel time course well the voxel is said to be “active”
260
256 241
234242
250
254
Plan for today1) Detailed look at the process of modelling a single voxel
time course• The input is a voxel time course of image intensity values and the
design matrix, which is made up of multiple EV’s• The GLM is used to find the linear combination of EV’s that
explains the most variance in the voxel time course• The output is a PE for each EV in the design matrix
2) Preprocessing of voxel time courses• this topic probably only makes much sense if you understand 1)
above, which is why I am breaking with tradition and covering this topic second rather than first
3) Implementing a single session analysis in FSL using FEAT (workshop)
• Note: There is no formal meeting in week 3 of the course, but the room will be open for you to complete worksheets from today and last week
– at least one experienced FSL user will be here to help
Preprocessing….
• The BET brain extraction option refers to the 4D functional series
• This will not run BET on the structural
• Normally turn this on• Brain extraction for
the structural image that will be used as the target for registration has to be performed by you, before you use FEAT to set up the processing pipeline
• On the Misc tab you can toggle balloon help
• Balloon help will tell you what most of the options mean if you hover the mouse over a button or tickbox
• But it gets annoying after a while
• If progress watcher is selected then FEAT opens a web browser that shows regular updates of the stage your analysis has reached
Motion Correction: MCFLIRT• Aims to make sure that there is a consistent
mapping between voxel X,Y,Z position and actual anatomical locations
• Each image in the 4D series is registered using a 6 DOF rigid body spatial transform to the reference image (by default, the series mid point)
• This means every image apart from the reference image is moved slightly– MCFLIRT plots the movements that were made as
output
MCFLIRT outputHead rotation
Head translation
Total displacement
The total displacement plot
Relative displacement is head position at each time point relative to the previous time point. Absolute displacement is relative to the reference image.
displacement is calculated relative to the volume in the middle of the time series
The total displacement plot
Why should you be particularly concerned about high values in the relative motion plot?
The first thing to do is look at the range of values plotted on the y axis, because MCFLIRT auto-scales the y axis to the data range
Slice timing correction
Each slice in a functional image is acquired separately.
Acquisition is normally interleaved, which prevents blending of signal from adjacent slices
Assuming a TR of 3 seconds and, what is the time difference between the acquisition of adjacent slices?
A single functional brain area may span two or more slices
Why are the different slice timings a problem?
• The same temporal model (design matrix) is fitted to every voxel time course– therefore, the model assumes that all the voxel values in a single
functional volume were measured simultaneously
• Given that they are not measured simultaneously, what are the implications?
• Consider two voxels from adjacent slices– both voxels are from the same functional brain area– this time the TR is 1.5, but the slice order differed from the
standard interleaved procedure, so there is a 1 sec time gap between acquisition of two voxels in the adjacent slices that cover the functional brain area
Blue line = real HRF in response to a brief event at time 0
Blue squares: intensity values at a voxel first sampled at 0.5 sec, then every 1.5 sec thereafter (TR = 1.5)
Red circles: intensity values at a voxel from an adjacent slice first sampled at 1.5 sec, then every 1.5 sec thereafter (TR = 1.5)
These are the two voxel time courses that are submitted to the model
These are the two voxel time courses that are submitted to the model
The model time course is yoked to the mid point of the volume acquisition (TR), so there will be a better fit for voxels in slices acquired at or near that time.
Slice timing solutions• Any ideas based on what was covered earlier?
– Including temporal derivatives of the main regressors in the model allows the model to be shifted in time to fit voxels in slices acquired far away from the mid point of the TR
• But,– this makes it difficult to interpret the PE’s for the
derivatives: do they represent slice timing, or do they represent variations in the underlying HRF?
– and, you end up having to use F contrasts instead of t contrasts (see later for why this is to be avoided if possible)
Slice timing solutions• Any ideas based on what was covered earlier?
– Use a block design with blocks long enough that the BOLD response is summated over a long time period
• But,– Not all experimental stimuli / tasks can be presented in
a block design• So,
– another option is to shift the data from different slices in time by small amounts so that it is as if all slices were acquired at once
– this is what the preprocessing option in FEAT does
Shifting the data in time - caveats• But if the TR is 3 seconds, and you need to move the data
for a given slice by, e.g., 1 sec, you don’t have a real data point to use– get round this by interpolating the missing time points to allow whole
voxel time courses to be shifted in time so effectively they were all sampled at once at the middle of the TR
• But…– Interpolation works OK for short TR, but it does not work well for
long TR (> 3 sec?)– So this solution only works well when slice timing issues are
relatively minor• There is a debate about whether to do slice timing
correction before or after motion correction– FSL does motion correction first– some people advise against any slice timing correction– Generally, if you have an event related design then use it, but make
sure you check carefully with the scanner technician what order your slices were acquired in!
Temporal filtering• Filtering in time and/or space is a long-established method
in any signal detection process to help "clean up" your signal
• The idea is if your signal and noise are present at separable frequencies in the data, you can attenuate the noise frequencies and thus increase your signal to noise ratio
AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR
AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRRTime
Raw data
150
200
250
300
350
400
450
500
550
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
forc
e
After low pass filter
150
200
250
300
350
400
450
500
550
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
forc
e
Very low frequency component, suggests that high pass filter also needed
150
200
250
300
350
400
450
500
550
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
forc
e
Low and high frequencies removed
150
200
250
300
350
400
450
500
550
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570
time (sec)
forc
e
Setting the high pass filter for FMRI• The rule recommended by FSL is that the lowest setting
for the high pass filter should be equal to the duration of a single cycle of the design– In an ABCABCABC block design, it will be equal to ABC
• If you set a shorter duration than this you will remove signal that is associated with the experiment
• If you set a higher duration than this then any unsystematic variation (noise) in the data with a periodic structure lower in frequency than the experimental cycle time will remain in voxel time courses
• In complicated event related designs lacking a strongly periodic structure there is a subjective element to the setting of the high pass filter
Setting the high pass filter for FMRI
• Do not use experimental designs with many conditions where the duration of a single experimental cycle is very long– e.g. ABCDEABCDE, where ABCDE = 300 seconds– setting the high pass to 300 sec will allow a lot of the low frequency
FMRI noise to pass through to the modelling stage– Furthermore, the noise can easily become correlated with the
experimental time course because you are using an experimental time course that has a similar frequency to that of FMRI noise
– In any signal detection experiment, not just FMRI, you need to ensure that the signal of interest and noise are present at different temporal frequencies
Low pass filter?• As well as removing oscillations with a longer cycle time
than the experiment, you can also elect to remove oscillations with a higher cycle time than the experiment– high frequency noise
• In theory this should enhance signal and reduce noise, and it was practiced in the early days of FMRI
• However, it has now been demonstrated that because FMRI noise has temporal structure (i.e. it is not white noise), the low pass filter can actually enhance noise relative to signal
• The temporal structure in the noise is called “temporal autocorrelation” and is dealt with in FSL using FILM prewhitening instead of low pass filtering– in another break with the traditional structure of FMRI courses
temporal autocorrelation and spatial smoothing will be covered after t and F contrasts
After preprocessing and model fitting• You can now begin to answer the questions you
set out to answer….
Which voxels “activated” in each experimental condition?
• In the auditory / visual stimulus experiment, how do you decide if a voxel was more “activated” during the visual stimulus than the baseline?– If the visual condition PE is > 0 then the voxel is active– but > 0 has to take into account the noise in the data– We need to be confident that if you repeated the experiment many
times the PE would nearly always be > 0 • How can you take the noise into account and quantify
confidence that PE > 0?– PE / residual variation in the voxel time course– this is a t statistic, which can be converted to a p value by taking
into account the degrees of freedom– The p value is the probability of a PE as large or larger than the
observed PE if the true value of the PE was 0 (null hypothesis) and the only variation present in the data was random variation
– FSL converts t statistics to z scores, simplifying interpretation because z can be converted to p without using deg of freedom
Why include “effects of no interest” in the model?
• Also called “nuisance regressors”• Imagine an experiment about visual processing of faces
versus other classes of object• Why add EV’s to the design matrix based on the time
course of:– head motion parameters– image intensity spikes– physiological variables
• galvanic skin response, heart rate, pupil diameter
• Answer: to reduce the size of the residual error term– t will be bigger when PE is bigger, but t will also be bigger when
error is smaller– You can model residual variation that is systematic in some way,
but some of the residual variation is truly random in nature, e.g. thermal noise from the scanner, and cannot be modelled out
t contrasts• Contrast is short for Contrast of Parameter
Estimates (COPE)– it means a linear sum of PE’s. The simplest examples
are implicit contrasts of individual PE’s with baseline. Using the example from the interactive spreadsheet
– visual PE * 1 + auditory PE * 0– visual PE * 0 + auditory PE * 1– To locate voxels where the visual PE is larger than the
auditory PE– visual PE * 1 + auditory PE * -1– To locate voxels where the auditory PE is larger than
the visual PE– visual PE * -1 + auditory PE * 1
t contrasts• The value of the contrast in each voxel is divided
by an estimate of the residual variation in the voxel time course (standard error)– produces a t statistic of the contrast– residual variation is based on the raw time course
values minus the predicted values from the fitted model• Activation maps (red and yellow blobs
superimposed on anatomical images) are produced by mapping the t value at each voxel to a colour– crudely, thresholding is just setting a value of t below
which no colour is assigned.
What will the [1 1] contrast give you?
F contrasts• These can be used to find voxels that are active in any one
or more of a set of t contrasts– Visual Auditory Tactile 1 -1 0 1 0 -1• F contrasts are bidirectional (1 -1 also implies -1 1)
• rarely a good thing in practice…..• If the response to an event was modelled with the standard
HRF regressor plus time derivative then you can use an F contrast to view both components of model of the response to the event on a single activation map• If you are relying on the time derivative to deal with slice timing
correction then you are strongly advised to do this
Thresholding / multiple comparisons problem
Temporal autocorrelation• The image intensity value in a voxel at time X is partially
predictable from the same voxel at times X-1, X-2, X-3 etc (even in baseline scans with no experiment)– why is this a problem?– it makes it hard to know the number of statistically independent
observations in a voxel time course– life would be simple if the number of observations was equal to the
number of functional volumes– temporal autocorrelation results in a true N is lower than this
• Why do we need to know the number of independent observations? – because calculating a t statistic requires dividing by the standard
error, and the standard error is SD / square root (N-1)– Degrees of freedom are needed to convert t stats to p values
• If you use the number of time points in the voxel time course as N then p values will be too small (false positive)
Measuring autocorrelation
Measuring autocorrelation
Measuring autocorrelation
Measuring autocorrelation (SPSS style)
Plot the correlation against the degree of shift
Temporal autocorrelation• Generally, the value of a voxel at time t is partially
predictable from nearby time points about 3-6 seconds in the past– So, if you use a very long TR, e.g. 6, then you mostly
avoid the problem as the original time points will have sufficient independence from each other
• Voxel values are also predictable from more distant time points due to low frequency noise with periodic structure– but the high pass filter should deal with this problem
Autocorrelation: FILM prewhitening
• First, fit the model (regressors of interest and no interest) to the voxel time course using the GLM– (ignoring the autocorrelation for the moment)
• Estimate the temporal autocorrelation structure in the residuals – note: if model is good residuals = noise?
• The estimated structure can be inverted and used as a temporal filter to undo the autocorrelation structure in the original data– the time points are now independent and so N = the number of time
points (volumes)– the filter is also applied to the design matrix
• Refit the GLM– Run t and F tests with valid standard error and degrees of freedom
• Prewhitening is selected on the stats tab in FEAT– it is computationally intensive, but with a modern PC it is manageable and
there are almost no circumstances where you would turn this option off
Spatial smoothing• FMRI noise varies across space as well as time
– smoothing is a way of reducing spatial noise and thereby increasing the ratio of signal to noise (SNR) in the data
• Unlike FMRI temporal noise, FMRI spatial noise is more like white noise, making it easier to deal with– it is essentially random, essentially independent from voxel to
voxel, and has as mean of about zero– therefore if you average image intensity across several voxels,
noise tends to average towards zero, whereas signal that is common to the voxels you are averaging across will remain unchanged, dramatically improving the signal to noise ratio (SNR)
• A secondary benefit of smoothing is to reduce anatomical variation between participants that remains after registration to the template image– this is because smoothing blurs the images
Spatial smoothing: FWHM– FSL asks you to specify a
Gaussian smoothing kernel defined by its Full Width at Half Maximum (FWHM) in mm
– To find the FWHM of a Gaussian
– Find the point on the y axis where the function attains half its maximum value
– Then read off the corresponding x axis values
Spatial smoothing: FWHM
• The Gaussian is centred on a voxel, and the value of the voxel is averaged with that of adjacent voxels that fall under the Gaussian
• The averaging is weighted by the y axis value of the Gaussian at the appropriate distance
No smoothing
4 mm
9 mm
Effects of Smoothing on activations
Unsmoothed Data
Smoothed Data (kernel width 5 voxels)
When should you smooth? When should you not?• Smoothing is a good idea if
– You're not particularly concerned with voxel-by-voxel resolution. – You're not particularly concerned with finding small (less than a
handful of voxels) clusters – You want (or need) to improve your signal-to-noise ratio – You're averaging results over a group, in a brain region where
functional anatomy and organization isn't precisely known – You want to use p-values corrected for multiple comparisons with
Gaussian field theory (as opposed to False Discovery Rate)• this is the “Voxel” option in FSL and the “FWE” option in SPM
• Smoothing kernel should be small (or no smoothing) if– You need voxel-by-voxel resolution – You believe your activations of interest will only be a few voxels in
size– You're confident your task will generate large amounts of signal
relative to noise – You're working primarily with single-subject results – You're mainly interested in getting region-of-interest data from very
specific structures that you've drawn with high resolution on single subjects
How do you determine the size of the kernel?
• Based on functional voxel size? Or brain structure size?– A little of both, it seems.
• The matched filter theorem, from the signal processing field, tells us that if we're trying to recover a signal (like an activation) in noisy data (like FMRI), we can best do it by smoothing our data with a kernel that's about the same size as our activation.
• Trouble is, though, most of us don't know how big our activations are going to be before we run our experiment
• Even if you have a particular structure of interest (say, the hippocampus), you may not get activation over the whole region - only a part
• A lot of people set FWHM to functional voxel size * 2
Old slides beyond this point
Slice timing correction• Each functional volume that forms part of the 4D time series
is made up of slices• Each slice is acquired at a different point in time relative to
the start of the TR– e.g., slice 1 at 100 msec, slice 2 at 200 msec, etc
• For each slice, it’s the same time point relative to the start of the TR in every volume
• So, the interval between successive acquisitions is constant for every voxel
• But the actual time of acquisition is different for every slice• The model of the time course assumes that within each
volume, every slice was acquired simultaneously at the mid point of the TR– so, the model is likely to fit better for one slice than all the others (bad)
• To use slice timing correction, you will need to tell FSL the order your slices were acquired in– interleaved is the most common, but ask your scanner technician!– Adjustment is to the middle of the TR period
Slice timing correction• For each voxel, slice-timing correction examines the time
course and shifts it by a small amount • This requires interpolating between the time points you
actually sampled to infer a more detailed version of the time course
• The more detailed time course can have small shifts applied to it that are slightly different for each voxel, depending on the actual order the slices were acquired in
• This allows you to make the assumption in your modelling that every voxel in each volume was acquired simultaneously
Slice timing correction• The problem this tries to solve is more severe if you have a
longer TR (e.g. 4 seconds)– two adjacent slices in an interleaved sequence could be sampled
almost 2 seconds apart• But temporal interpolation also becomes dodgy with longer
TR’s • For block designs (stimuli that are long relative to the TR,
e.g. TR = 2 sec, stimulus lasts 16 sec) slice timing errors are not a significant factor influencing the fitting of a model to the data
• For event related designs (brief stimuli separated by variable pauses), slice timing correction is important
• People argue about whether to do slice timing correction before or after motion correction– FSL does motion correction first– some people advise against any slice timing correction
Temporal derivatives• In the FEAT practical you will add temporal
derivatives of the HRF convolved experimental time courses to the design matrix– what is the purpose of this?
• Each experimental time course is convolved with a model of the HRF– this is to build the delay and
blurring of the blood flow response relative to the neural response into the model
– but the delay varies between brain areas and between people
Temporal Derivatives• The green line is the first temporal
derivative of the blue line– it’s rate of change– the positive max of the derivative is
earlier than the normal HRF peak– the negative max of the derivative is
later than the normal HRF peak
• If fitting the model results in a positive beta weight on a derivative this implies that the HRF peak is earlier in that voxel
• A negative beta weight for the derivative implies a later peak than “typical”
Temporal derivatives• The basic HRF shape (blue on the previous slide) has
some physiological underpinning (in visual cortex…)• But the use of the derivative to model faster / slower
responses is just a mathematical convenience• The second temporal derivative (dispersion in time) can be
used to model haemodynamic responses that are “thinner” or “fatter” in time than the basic shape
• The three functions together are sometimes called the “informed basis set” by SPM users– the blue line is referred to as “canonical”, but in fact it is only
canonical for primary visual cortex• The informed basis set can only model slight departures
from the canonical response shape• If you are interested in the prefrontal cortex of the elderly
you’ll need to use a more flexible basis set to model the temporal dynamics of the response– or use a block design where timing issues are less severe
Cluster size based thresholding• Intuitively, if a voxel with a Z statistic of 1.96 for a particular
COPE is surrounded by other voxels with very low Z values this looks suspicious– unless you are looking for a very small brain area
• Consider a voxel with a Z statistic of 1.96 is surrounded by many other voxels with similar Z values, forming a large blob
• Intuitively, for such a voxel the Z of 1.96 (p = 0.05) is an overestimate of the probability of the model fit to this voxel being a result of random, stimulus unrelated, fluctuation in the time course
• The p value we want to calculate is the probability of obtaining one or more clusters of this size or larger under a suitable null hypothesis– “one or more” gives us control over the multiple comparisons
problem by setting the family wise error rate– p value will be low for big clusters– p value will be high for small clusters
Comparison of voxel (“height based”) thresholding and cluster thresholding
Significant Voxels
space
No significant Voxels
is the height threshold, e.g. 0.001 applied voxelwise (will be Z = about 3)
Comparison of voxel (“height based”) thresholding and cluster thresholding
Cluster not significant
spaceCluster
significantk k
K is the probability of the image containing 1 or more blobs with k or more voxels (and you can control is at 0.05)
The cluster size, in voxels, that corresponds to a particular value of K depends upon the initial value of height threshold used to define the number of clusters in the image and their size
It is usual to set height quite low when using cluster level thresholding, but this arbitrary choice will influence the outcome
Dependency of number of clusters on choice of height threshold
The number and size of clusters also depends upon the amount of smoothing that took place in preprocessing
• Nyquist frequency is important to know about– Half the sampling rate (e.g. TR 2 sec is 0.5 Hz,
so Nyquist is 0.25 hz, or 4 seconds)– No signal higher frequency than Nyquist can be
present in the data (important for experimental design)
– But such signal could appear as an aliasing artefact at a lower frequency
Overview• Today’s practical session will cover processing of a single
functional session from a single participant using FEAT– FEAT is an umbrella program that brings together various other
FSL programs into a customisable processing pipeline• for example, it makes use of BET and FLIRT, which
were programs covered in week 1– Definitions of “single session” and “multi session”
• We will also make use of an interactive spreadsheet that demonstrates how the general linear model (GLM) can be used to locate active regions of the brain given your predictions about the time course of activation
• The lecture will provide theoretical background for each processing step
• There is no formal meeting in week 3 of the course, but the room will be open for you to complete worksheets from today and last week– at least one experienced FSL user will be here to help
Overview of single session FMRI• The data is a 4D functional time series
– Many thousands of spatial locations (voxels)– Each voxel has a time course defined by a single intensity value
per TR (= per volume acquired)• The task is to model the changes in image intensity over
time separately for each voxel– “mass univariate approach”– begin with a set of regressors (“design matrix” / model)– regressors usually reflect the time course of experimental
conditions– find the best linear combination of regressors to explain each voxel
time course (basically, multiple regression)• Before modelling the 4D time series a number of
preprocessing steps are applied to the data– remove unwanted sources of variation from the time series– increase the signal to noise ratio
Voxel-wise single session modelling• After the data has been optimised by preprocessing you
search for voxels where the time course of image intensity changes is correlated with the experimental time course– activation
• This is achieved using the General Linear Model (GLM)– similar to multiple regression
• The input to the GLM is the data, plus a set of explanatory variables called the “Design Matrix”– sometimes EV’s are included to model sources of variation that are
of no interest to the experimenter– this is to reduce the residual (error) variance
• The GLM is fitted independently for each voxel timecourse– ignores the spatial structure in the brain
Regressor = 1 (stimulus off)
Regressor = 0 (stimulus on)
The voxel time courses are standardised so that beta weights are comparable between voxels
Experimental time course regressors no longer square wave because convolved with HRF model
If there is structure in the residual time courses something important has not been modelled
Autocorrelation: FILM prewhitening• First, fit the GLM• Estimate the temporal autocorrelation structure in
the residuals• The estimated structure can be inverted and used
as a temporal filter to undo the autocorrelation structure in the data– the filter is also applied to the design matrix
• Refit the GLM– DOF n-1 will now correctly reflect what is really free to
vary in the timecourse• Prewhitening is selected on the stats tab in FEAT
– it is computationally intensive, but with a modern PC it is manageable and there are almost no circumstances where you would turn this option off
Temporal filtering• Filtering in time and/or space is a long-established method
in any signal detection process to help "clean up" your signal
• The idea is if your signal and noise are present at separable frequencies in the data, you can attenuate the noise frequencies and thus increase your signal to noise ratio
I could illustrate this by drawing a low frequency sinusoid called noise on the board, or with matlab. Then draw a high frequency one called signal underneath. Draw a third where they are added together, and point out that the two sinusoids could be seperated mathematically, even if you did not know apriori their amplitudes and frequencies. In a second example I make noise and signal have similar frequency and show that when added together they are “inseperable”. This is key point of FMRI data analysis and guiding principle in experimental design.