Single session analysis using FEAT

Single session analysis using FEAT

David Field

Thanks to….Tom Johnstone, Jason Gledhill,

FMRIB

Single “session” or “first level” FMRI analysis

• In FMRI you begin with 1000’s of time courses, one for each voxel location

• Lots of “preprocessing” is applied to maximise the quality of the data

• Then, each voxel time course is modelled independently– the model is a set of regressors

(EV’s) that vary over time– the same model is applied to

every voxel time course– if (part of) the model fits the

voxel time course well the voxel is said to be “active”

260

256 241

234242

250

254

Plan for today1) Detailed look at the process of modelling a single voxel

time course• The input is a voxel time course of image intensity values and the

design matrix, which is made up of multiple EV’s• The GLM is used to find the linear combination of EV’s that

explains the most variance in the voxel time course• The output is a PE for each EV in the design matrix

2) Preprocessing of voxel time courses• this topic probably only makes much sense if you understand 1)

above, which is why I am breaking with tradition and covering this topic second rather than first

3) Implementing a single session analysis in FSL using FEAT (workshop)

• Note: There is no formal meeting in week 3 of the course, but the room will be open for you to complete worksheets from today and last week

– at least one experienced FSL user will be here to help

Preprocessing….

• The BET brain extraction option refers to the 4D functional series

• This will not run BET on the structural

• Normally turn this on• Brain extraction for

the structural image that will be used as the target for registration has to be performed by you, before you use FEAT to set up the processing pipeline

• On the Misc tab you can toggle balloon help

• Balloon help will tell you what most of the options mean if you hover the mouse over a button or tickbox

• But it gets annoying after a while

• If progress watcher is selected then FEAT opens a web browser that shows regular updates of the stage your analysis has reached

Motion Correction: MCFLIRT• Aims to make sure that there is a consistent

mapping between voxel X,Y,Z position and actual anatomical locations

• Each image in the 4D series is registered using a 6 DOF rigid body spatial transform to the reference image (by default, the series mid point)

• This means every image apart from the reference image is moved slightly– MCFLIRT plots the movements that were made as

output

MCFLIRT outputHead rotation

Head translation

Total displacement

The total displacement plot

Relative displacement is head position at each time point relative to the previous time point. Absolute displacement is relative to the reference image.

displacement is calculated relative to the volume in the middle of the time series

The total displacement plot

Why should you be particularly concerned about high values in the relative motion plot?

The first thing to do is look at the range of values plotted on the y axis, because MCFLIRT auto-scales the y axis to the data range

Slice timing correction

Each slice in a functional image is acquired separately.

Acquisition is normally interleaved, which prevents blending of signal from adjacent slices

Assuming a TR of 3 seconds and, what is the time difference between the acquisition of adjacent slices?

A single functional brain area may span two or more slices

Why are the different slice timings a problem?

• The same temporal model (design matrix) is fitted to every voxel time course– therefore, the model assumes that all the voxel values in a single

functional volume were measured simultaneously

• Given that they are not measured simultaneously, what are the implications?

• Consider two voxels from adjacent slices– both voxels are from the same functional brain area– this time the TR is 1.5, but the slice order differed from the

standard interleaved procedure, so there is a 1 sec time gap between acquisition of two voxels in the adjacent slices that cover the functional brain area

Blue line = real HRF in response to a brief event at time 0

Blue squares: intensity values at a voxel first sampled at 0.5 sec, then every 1.5 sec thereafter (TR = 1.5)

Red circles: intensity values at a voxel from an adjacent slice first sampled at 1.5 sec, then every 1.5 sec thereafter (TR = 1.5)

These are the two voxel time courses that are submitted to the model

These are the two voxel time courses that are submitted to the model

The model time course is yoked to the mid point of the volume acquisition (TR), so there will be a better fit for voxels in slices acquired at or near that time.

Slice timing solutions• Any ideas based on what was covered earlier?

– Including temporal derivatives of the main regressors in the model allows the model to be shifted in time to fit voxels in slices acquired far away from the mid point of the TR

• But,– this makes it difficult to interpret the PE’s for the

derivatives: do they represent slice timing, or do they represent variations in the underlying HRF?

– and, you end up having to use F contrasts instead of t contrasts (see later for why this is to be avoided if possible)

Slice timing solutions• Any ideas based on what was covered earlier?

– Use a block design with blocks long enough that the BOLD response is summated over a long time period

• But,– Not all experimental stimuli / tasks can be presented in

a block design• So,

– another option is to shift the data from different slices in time by small amounts so that it is as if all slices were acquired at once

– this is what the preprocessing option in FEAT does

Shifting the data in time - caveats• But if the TR is 3 seconds, and you need to move the data

for a given slice by, e.g., 1 sec, you don’t have a real data point to use– get round this by interpolating the missing time points to allow whole

voxel time courses to be shifted in time so effectively they were all sampled at once at the middle of the TR

• But…– Interpolation works OK for short TR, but it does not work well for

long TR (> 3 sec?)– So this solution only works well when slice timing issues are

relatively minor• There is a debate about whether to do slice timing

correction before or after motion correction– FSL does motion correction first– some people advise against any slice timing correction– Generally, if you have an event related design then use it, but make

sure you check carefully with the scanner technician what order your slices were acquired in!

Temporal filtering• Filtering in time and/or space is a long-established method

in any signal detection process to help "clean up" your signal

• The idea is if your signal and noise are present at separable frequencies in the data, you can attenuate the noise frequencies and thus increase your signal to noise ratio

AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRR

AAAAAAAVVVVVVVRRRRRRRAAAAAAAVVVVVVVRRRRRRRTime

Raw data

150

200

250

300

350

400

450

500

550

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570

time (sec)

forc

e

After low pass filter

150

200

250

300

350

400

450

500

550

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570

time (sec)

forc

e

Very low frequency component, suggests that high pass filter also needed

150

200

250

300

350

400

450

500

550

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570

time (sec)

forc

e

Low and high frequencies removed

150

200

250

300

350

400

450

500

550

0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 450 480 510 540 570

time (sec)

forc

e

Setting the high pass filter for FMRI• The rule recommended by FSL is that the lowest setting

for the high pass filter should be equal to the duration of a single cycle of the design– In an ABCABCABC block design, it will be equal to ABC

• If you set a shorter duration than this you will remove signal that is associated with the experiment

• If you set a higher duration than this then any unsystematic variation (noise) in the data with a periodic structure lower in frequency than the experimental cycle time will remain in voxel time courses

• In complicated event related designs lacking a strongly periodic structure there is a subjective element to the setting of the high pass filter

Setting the high pass filter for FMRI

• Do not use experimental designs with many conditions where the duration of a single experimental cycle is very long– e.g. ABCDEABCDE, where ABCDE = 300 seconds– setting the high pass to 300 sec will allow a lot of the low frequency

FMRI noise to pass through to the modelling stage– Furthermore, the noise can easily become correlated with the

experimental time course because you are using an experimental time course that has a similar frequency to that of FMRI noise

– In any signal detection experiment, not just FMRI, you need to ensure that the signal of interest and noise are present at different temporal frequencies

Low pass filter?• As well as removing oscillations with a longer cycle time

than the experiment, you can also elect to remove oscillations with a higher cycle time than the experiment– high frequency noise

• In theory this should enhance signal and reduce noise, and it was practiced in the early days of FMRI

• However, it has now been demonstrated that because FMRI noise has temporal structure (i.e. it is not white noise), the low pass filter can actually enhance noise relative to signal

• The temporal structure in the noise is called “temporal autocorrelation” and is dealt with in FSL using FILM prewhitening instead of low pass filtering– in another break with the traditional structure of FMRI courses

temporal autocorrelation and spatial smoothing will be covered after t and F contrasts

After preprocessing and model fitting• You can now begin to answer the questions you

set out to answer….

Which voxels “activated” in each experimental condition?

• In the auditory / visual stimulus experiment, how do you decide if a voxel was more “activated” during the visual stimulus than the baseline?– If the visual condition PE is > 0 then the voxel is active– but > 0 has to take into account the noise in the data– We need to be confident that if you repeated the experiment many

times the PE would nearly always be > 0 • How can you take the noise into account and quantify

confidence that PE > 0?– PE / residual variation in the voxel time course– this is a t statistic, which can be converted to a p value by taking

into account the degrees of freedom– The p value is the probability of a PE as large or larger than the

observed PE if the true value of the PE was 0 (null hypothesis) and the only variation present in the data was random variation

– FSL converts t statistics to z scores, simplifying interpretation because z can be converted to p without using deg of freedom

Why include “effects of no interest” in the model?

• Also called “nuisance regressors”• Imagine an experiment about visual processing of faces

versus other classes of object• Why add EV’s to the design matrix based on the time

course of:– head motion parameters– image intensity spikes– physiological variables

• galvanic skin response, heart rate, pupil diameter

• Answer: to reduce the size of the residual error term– t will be bigger when PE is bigger, but t will also be bigger when

error is smaller– You can model residual variation that is systematic in some way,

but some of the residual variation is truly random in nature, e.g. thermal noise from the scanner, and cannot be modelled out

t contrasts• Contrast is short for Contrast of Parameter

Estimates (COPE)– it means a linear sum of PE’s. The simplest examples

are implicit contrasts of individual PE’s with baseline. Using the example from the interactive spreadsheet

– visual PE * 1 + auditory PE * 0– visual PE * 0 + auditory PE * 1– To locate voxels where the visual PE is larger than the

auditory PE– visual PE * 1 + auditory PE * -1– To locate voxels where the auditory PE is larger than

the visual PE– visual PE * -1 + auditory PE * 1

t contrasts• The value of the contrast in each voxel is divided

by an estimate of the residual variation in the voxel time course (standard error)– produces a t statistic of the contrast– residual variation is based on the raw time course

values minus the predicted values from the fitted model• Activation maps (red and yellow blobs

superimposed on anatomical images) are produced by mapping the t value at each voxel to a colour– crudely, thresholding is just setting a value of t below

which no colour is assigned.

What will the [1 1] contrast give you?

F contrasts• These can be used to find voxels that are active in any one

or more of a set of t contrasts– Visual Auditory Tactile 1 -1 0 1 0 -1• F contrasts are bidirectional (1 -1 also implies -1 1)

• rarely a good thing in practice…..• If the response to an event was modelled with the standard

HRF regressor plus time derivative then you can use an F contrast to view both components of model of the response to the event on a single activation map• If you are relying on the time derivative to deal with slice timing

correction then you are strongly advised to do this

Thresholding / multiple comparisons problem

Temporal autocorrelation• The image intensity value in a voxel at time X is partially

predictable from the same voxel at times X-1, X-2, X-3 etc (even in baseline scans with no experiment)– why is this a problem?– it makes it hard to know the number of statistically independent

observations in a voxel time course– life would be simple if the number of observations was equal to the

number of functional volumes– temporal autocorrelation results in a true N is lower than this

• Why do we need to know the number of independent observations? – because calculating a t statistic requires dividing by the standard

error, and the standard error is SD / square root (N-1)– Degrees of freedom are needed to convert t stats to p values

• If you use the number of time points in the voxel time course as N then p values will be too small (false positive)

Measuring autocorrelation

Measuring autocorrelation (SPSS style)

Plot the correlation against the degree of shift

Temporal autocorrelation• Generally, the value of a voxel at time t is partially

predictable from nearby time points about 3-6 seconds in the past– So, if you use a very long TR, e.g. 6, then you mostly

avoid the problem as the original time points will have sufficient independence from each other

• Voxel values are also predictable from more distant time points due to low frequency noise with periodic structure– but the high pass filter should deal with this problem

Autocorrelation: FILM prewhitening

• First, fit the model (regressors of interest and no interest) to the voxel time course using the GLM– (ignoring the autocorrelation for the moment)

• Estimate the temporal autocorrelation structure in the residuals – note: if model is good residuals = noise?

• The estimated structure can be inverted and used as a temporal filter to undo the autocorrelation structure in the original data– the time points are now independent and so N = the number of time

points (volumes)– the filter is also applied to the design matrix

• Refit the GLM– Run t and F tests with valid standard error and degrees of freedom

• Prewhitening is selected on the stats tab in FEAT– it is computationally intensive, but with a modern PC it is manageable and

there are almost no circumstances where you would turn this option off

Spatial smoothing• FMRI noise varies across space as well as time

– smoothing is a way of reducing spatial noise and thereby increasing the ratio of signal to noise (SNR) in the data

• Unlike FMRI temporal noise, FMRI spatial noise is more like white noise, making it easier to deal with– it is essentially random, essentially independent from voxel to

voxel, and has as mean of about zero– therefore if you average image intensity across several voxels,

noise tends to average towards zero, whereas signal that is common to the voxels you are averaging across will remain unchanged, dramatically improving the signal to noise ratio (SNR)

• A secondary benefit of smoothing is to reduce anatomical variation between participants that remains after registration to the template image– this is because smoothing blurs the images

Spatial smoothing: FWHM– FSL asks you to specify a

Gaussian smoothing kernel defined by its Full Width at Half Maximum (FWHM) in mm

– To find the FWHM of a Gaussian

– Find the point on the y axis where the function attains half its maximum value

– Then read off the corresponding x axis values

Spatial smoothing: FWHM

• The Gaussian is centred on a voxel, and the value of the voxel is averaged with that of adjacent voxels that fall under the Gaussian

• The averaging is weighted by the y axis value of the Gaussian at the appropriate distance

No smoothing

4 mm

9 mm

Effects of Smoothing on activations

Unsmoothed Data

Smoothed Data (kernel width 5 voxels)

When should you smooth? When should you not?• Smoothing is a good idea if

– You're not particularly concerned with voxel-by-voxel resolution. – You're not particularly concerned with finding small (less than a

handful of voxels) clusters – You want (or need) to improve your signal-to-noise ratio – You're averaging results over a group, in a brain region where

functional anatomy and organization isn't precisely known – You want to use p-values corrected for multiple comparisons with

Gaussian field theory (as opposed to False Discovery Rate)• this is the “Voxel” option in FSL and the “FWE” option in SPM

• Smoothing kernel should be small (or no smoothing) if– You need voxel-by-voxel resolution – You believe your activations of interest will only be a few voxels in

size– You're confident your task will generate large amounts of signal

relative to noise – You're working primarily with single-subject results – You're mainly interested in getting region-of-interest data from very

specific structures that you've drawn with high resolution on single subjects

How do you determine the size of the kernel?

• Based on functional voxel size? Or brain structure size?– A little of both, it seems.

• The matched filter theorem, from the signal processing field, tells us that if we're trying to recover a signal (like an activation) in noisy data (like FMRI), we can best do it by smoothing our data with a kernel that's about the same size as our activation.

• Trouble is, though, most of us don't know how big our activations are going to be before we run our experiment

• Even if you have a particular structure of interest (say, the hippocampus), you may not get activation over the whole region - only a part

• A lot of people set FWHM to functional voxel size * 2

Old slides beyond this point

Slice timing correction• Each functional volume that forms part of the 4D time series

is made up of slices• Each slice is acquired at a different point in time relative to

the start of the TR– e.g., slice 1 at 100 msec, slice 2 at 200 msec, etc

• For each slice, it’s the same time point relative to the start of the TR in every volume

• So, the interval between successive acquisitions is constant for every voxel

• But the actual time of acquisition is different for every slice• The model of the time course assumes that within each

volume, every slice was acquired simultaneously at the mid point of the TR– so, the model is likely to fit better for one slice than all the others (bad)

• To use slice timing correction, you will need to tell FSL the order your slices were acquired in– interleaved is the most common, but ask your scanner technician!– Adjustment is to the middle of the TR period

Slice timing correction• For each voxel, slice-timing correction examines the time

course and shifts it by a small amount • This requires interpolating between the time points you

actually sampled to infer a more detailed version of the time course

• The more detailed time course can have small shifts applied to it that are slightly different for each voxel, depending on the actual order the slices were acquired in

• This allows you to make the assumption in your modelling that every voxel in each volume was acquired simultaneously

Slice timing correction• The problem this tries to solve is more severe if you have a

longer TR (e.g. 4 seconds)– two adjacent slices in an interleaved sequence could be sampled

almost 2 seconds apart• But temporal interpolation also becomes dodgy with longer

TR’s • For block designs (stimuli that are long relative to the TR,

e.g. TR = 2 sec, stimulus lasts 16 sec) slice timing errors are not a significant factor influencing the fitting of a model to the data

• For event related designs (brief stimuli separated by variable pauses), slice timing correction is important

• People argue about whether to do slice timing correction before or after motion correction– FSL does motion correction first– some people advise against any slice timing correction

Temporal derivatives• In the FEAT practical you will add temporal

derivatives of the HRF convolved experimental time courses to the design matrix– what is the purpose of this?

• Each experimental time course is convolved with a model of the HRF– this is to build the delay and

blurring of the blood flow response relative to the neural response into the model

– but the delay varies between brain areas and between people

Temporal Derivatives• The green line is the first temporal

derivative of the blue line– it’s rate of change– the positive max of the derivative is

earlier than the normal HRF peak– the negative max of the derivative is

later than the normal HRF peak

• If fitting the model results in a positive beta weight on a derivative this implies that the HRF peak is earlier in that voxel

• A negative beta weight for the derivative implies a later peak than “typical”

Temporal derivatives• The basic HRF shape (blue on the previous slide) has

some physiological underpinning (in visual cortex…)• But the use of the derivative to model faster / slower

responses is just a mathematical convenience• The second temporal derivative (dispersion in time) can be

used to model haemodynamic responses that are “thinner” or “fatter” in time than the basic shape

• The three functions together are sometimes called the “informed basis set” by SPM users– the blue line is referred to as “canonical”, but in fact it is only

canonical for primary visual cortex• The informed basis set can only model slight departures

from the canonical response shape• If you are interested in the prefrontal cortex of the elderly

you’ll need to use a more flexible basis set to model the temporal dynamics of the response– or use a block design where timing issues are less severe

Cluster size based thresholding• Intuitively, if a voxel with a Z statistic of 1.96 for a particular

COPE is surrounded by other voxels with very low Z values this looks suspicious– unless you are looking for a very small brain area

• Consider a voxel with a Z statistic of 1.96 is surrounded by many other voxels with similar Z values, forming a large blob

• Intuitively, for such a voxel the Z of 1.96 (p = 0.05) is an overestimate of the probability of the model fit to this voxel being a result of random, stimulus unrelated, fluctuation in the time course

• The p value we want to calculate is the probability of obtaining one or more clusters of this size or larger under a suitable null hypothesis– “one or more” gives us control over the multiple comparisons

problem by setting the family wise error rate– p value will be low for big clusters– p value will be high for small clusters

Comparison of voxel (“height based”) thresholding and cluster thresholding

Significant Voxels

space

No significant Voxels

is the height threshold, e.g. 0.001 applied voxelwise (will be Z = about 3)

Comparison of voxel (“height based”) thresholding and cluster thresholding

Cluster not significant

spaceCluster

significantk k

K is the probability of the image containing 1 or more blobs with k or more voxels (and you can control is at 0.05)

The cluster size, in voxels, that corresponds to a particular value of K depends upon the initial value of height threshold used to define the number of clusters in the image and their size

It is usual to set height quite low when using cluster level thresholding, but this arbitrary choice will influence the outcome

Dependency of number of clusters on choice of height threshold

The number and size of clusters also depends upon the amount of smoothing that took place in preprocessing

• Nyquist frequency is important to know about– Half the sampling rate (e.g. TR 2 sec is 0.5 Hz,

so Nyquist is 0.25 hz, or 4 seconds)– No signal higher frequency than Nyquist can be

present in the data (important for experimental design)

– But such signal could appear as an aliasing artefact at a lower frequency

Overview• Today’s practical session will cover processing of a single

functional session from a single participant using FEAT– FEAT is an umbrella program that brings together various other

FSL programs into a customisable processing pipeline• for example, it makes use of BET and FLIRT, which

were programs covered in week 1– Definitions of “single session” and “multi session”

• We will also make use of an interactive spreadsheet that demonstrates how the general linear model (GLM) can be used to locate active regions of the brain given your predictions about the time course of activation

• The lecture will provide theoretical background for each processing step

• There is no formal meeting in week 3 of the course, but the room will be open for you to complete worksheets from today and last week– at least one experienced FSL user will be here to help

Overview of single session FMRI• The data is a 4D functional time series

– Many thousands of spatial locations (voxels)– Each voxel has a time course defined by a single intensity value

per TR (= per volume acquired)• The task is to model the changes in image intensity over

time separately for each voxel– “mass univariate approach”– begin with a set of regressors (“design matrix” / model)– regressors usually reflect the time course of experimental

conditions– find the best linear combination of regressors to explain each voxel

time course (basically, multiple regression)• Before modelling the 4D time series a number of

preprocessing steps are applied to the data– remove unwanted sources of variation from the time series– increase the signal to noise ratio

Voxel-wise single session modelling• After the data has been optimised by preprocessing you

search for voxels where the time course of image intensity changes is correlated with the experimental time course– activation

• This is achieved using the General Linear Model (GLM)– similar to multiple regression

• The input to the GLM is the data, plus a set of explanatory variables called the “Design Matrix”– sometimes EV’s are included to model sources of variation that are

of no interest to the experimenter– this is to reduce the residual (error) variance

• The GLM is fitted independently for each voxel timecourse– ignores the spatial structure in the brain

Regressor = 1 (stimulus off)

Regressor = 0 (stimulus on)

The voxel time courses are standardised so that beta weights are comparable between voxels

Experimental time course regressors no longer square wave because convolved with HRF model

If there is structure in the residual time courses something important has not been modelled

Autocorrelation: FILM prewhitening• First, fit the GLM• Estimate the temporal autocorrelation structure in

the residuals• The estimated structure can be inverted and used

as a temporal filter to undo the autocorrelation structure in the data– the filter is also applied to the design matrix

• Refit the GLM– DOF n-1 will now correctly reflect what is really free to

vary in the timecourse• Prewhitening is selected on the stats tab in FEAT

– it is computationally intensive, but with a modern PC it is manageable and there are almost no circumstances where you would turn this option off

Temporal filtering• Filtering in time and/or space is a long-established method

in any signal detection process to help "clean up" your signal

• The idea is if your signal and noise are present at separable frequencies in the data, you can attenuate the noise frequencies and thus increase your signal to noise ratio

I could illustrate this by drawing a low frequency sinusoid called noise on the board, or with matlab. Then draw a high frequency one called signal underneath. Draw a third where they are added together, and point out that the two sinusoids could be seperated mathematically, even if you did not know apriori their amplitudes and frequencies. In a second example I make noise and signal have similar frequency and show that when added together they are “inseperable”. This is key point of FMRI data analysis and guiding principle in experimental design.

Single session analysis using FEAT

Documents

Transcript of Single session analysis using FEAT