Classical (frequentist) inference

43
Classical (frequentist) inference Methods & models for fMRI data analysis in neuroeconomics November 2010 Klaas Enno Stephan Laboratory for Social and Neural System Research Institute for Empirical Research in Economics University of Zurich Functional Imaging Laboratory (FIL) Wellcome Trust Centre for Neuroimaging University College London With many thanks for slides & images to: FIL Methods group

description

Classical (frequentist) inference. Klaas Enno Stephan Laboratory for Social and Neural System Research Institute for Empirical Research in Economics University of Zurich Functional Imaging Laboratory (FIL) Wellcome Trust Centre for Neuroimaging University College London. - PowerPoint PPT Presentation

Transcript of Classical (frequentist) inference

Page 1: Classical (frequentist) inference

Classical (frequentist) inference

Methods & models for fMRI data analysis in neuroeconomicsNovember 2010

Klaas Enno Stephan Laboratory for Social and Neural System ResearchInstitute for Empirical Research in EconomicsUniversity of Zurich

Functional Imaging Laboratory (FIL)Wellcome Trust Centre for NeuroimagingUniversity College London

With many thanks for slides & images to:FIL Methods group

Page 2: Classical (frequentist) inference

Overview of SPM

Realignment Smoothing

Normalisation

General linear model

Statistical parametric map (SPM)Image time-series

Parameter estimates

Design matrix

Template

Kernel

Gaussian field theory

p <0.05

Statisticalinference

Page 3: Classical (frequentist) inference

Time

BOLD signalTim

esingle voxel

time series

Voxel-wise time series analysis

modelspecificati

onparameterestimationhypothesis

statistic

SPM

Page 4: Classical (frequentist) inference

Overview• A recap of model specification and parameter estimation

• Hypothesis testing

• Contrasts and estimability• T-tests• F-tests

• Design orthogonality

• Design efficiency

Page 5: Classical (frequentist) inference

Mass-univariate analysis: voxel-wise GLM

=

e+y

X

N

1

N N

1 1p

pModel is specified by1. Design matrix X2. Assumptions about

eN: number of scansp: number of regressors

eXy

The design matrix embodies all available knowledge about experimentally controlled factors and potential confounds.

),0(~ 2INe

Page 6: Classical (frequentist) inference

Parameter estimation

eXy

= +

e

2

1

Ordinary least squares estimation

(OLS) (assuming i.i.d. error):

yXXX TT 1)(ˆ

Objective:estimate parameters to minimize

N

tte

1

2

y X

Page 7: Classical (frequentist) inference

OLS parameter estimation

These estimators minimise . They are found solving either

yXXX TT 1)(ˆ The Ordinary Least Squares (OLS) estimators are:

Under i.i.d. assumptions, the OLS estimates correspond to ML estimates:

),0(~ 2INe ),(~ 2IXNY

))(,(~ˆ 12 XXN TpN

eeT

ˆˆˆ 2

eee Tt 2

or 0eX T

2

t

te

NB: precision of our estimates depends on design matrix!

Page 8: Classical (frequentist) inference

Maximum likelihood (ML) estimation)|( yfy

)|()|()|()|(

yfyLyL

yf

)|(maxargˆ yL

probability density function ( fixed!)

likelihood function (y fixed!)

ML estimator

For cov(e)=2V, the ML estimator is equivalent to a weighted least sqaures (WLS) estimate (with W=V-1/2):

1ˆ ( )T TX WX X Wy

For cov(e)=2I, the ML estimator is equivalent to the OLS estimator: yXXX TT 1)(ˆ OLS

WLS

Page 9: Classical (frequentist) inference

WeWXWy

c = 1 0 0 0 0 0 0 0 0 0 0

)ˆ(ˆˆ

T

T

cdtsct

cWXWXc

cdtsTT

T

)()(ˆ

)ˆ(ˆ

2

)(

ˆˆ

2

2

RtrWXWy

ReML-estimates

WyWX )(

)(2

2/1

eCovV

VW

)(WXWXIRX

SPM: t-statistic based on ML estimates

iiQ

V

TT XWXXWX 1)()( For brevity:

Page 10: Classical (frequentist) inference

Statistic• A statistic is the result of applying a mathematical function to a sample (set

of data).

• More formally, statistical theory defines a statistic as a function of a sample where the function itself is independent of the sample's distribution. The term is used both for the function and for the value of the function on a given sample.

• A statistic is distinct from an unknown statistical parameter, which is a population property and can only be estimated approximately from a sample.

• A statistic used to estimate a parameter is called an estimator. For example, the sample mean is a statistic and an estimator for the population mean, which is a parameter.

Page 11: Classical (frequentist) inference

Hypothesis testing

• “Null hypothesis” H0 = “there is no effect” cT = 0 This is what we want to disprove. The “alternative hypothesis” H1 represents the outcome of interest.

To test an hypothesis, we construct a “test statistic”.

• The test statistic T The test statistic summarises the evidence for

H0. Typically, the test statistic is small in magnitude

when H0 is true and large when H0 is false. We need to know the distribution of T under

the null hypothesis.Null Distribution of T

Page 12: Classical (frequentist) inference

Hypothesis testing

• Observation of test statistic t, a realisation of T: A p-value summarises evidence against H0. This is the probability of observing t, or a more

extreme value, under the null hypothesis:

Null Distribution of T

)|( 0HtTp

• Type I Error α: Acceptable false positive rate α. Threshold uα controls the false positive rate

t

p

Null Distribution of T

u

• The conclusion about the hypothesis: We reject H0 in favour of H1 if t > uα

)|( 0HuTp

Page 13: Classical (frequentist) inference

Types of error Actual conditionTe

st re

sult

Reject H0

Failure to reject H0

H0 true H0 false

True negative(TN)

True positive(TP)

False positive (FP)

Type I error

False negative (FN)

Type II error β

specificity: 1- = TN / (TN + FP)= proportion of actual negatives which are correctly identified

sensitivity (power): 1- = TP / (TP + FN)= proportion of actual positives which are correctly identified

Page 14: Classical (frequentist) inference

One cannot accept the null hypothesis(one can just fail to reject it)

Absence of evidence is not evidence of absence!If we do not reject H0, then all can we say is that there is not enough evidence in the data to reject H0. This does not mean that we can accept H0.

What does this mean for neuroimaging results based on classical statistics?A failure to find an “activation” in a particular area does not mean we can conclude that this area is not involved in the process of interest.

Page 15: Classical (frequentist) inference

Contrasts

• A contrast cT selects a specific effect of interest: a contrast vector c is a vector of length p

cT is a linear combination of regression coefficients

2 1ˆ ~ ( , ( ) )T T T Tc N c c X X c

• Under i.i.d assumptions:

• We are usually not interested in the whole vector.

cTβ = 11 + 02 + 03 + 04 + 05 + . . .

cT = [1 0 0 0 0 …]

cTβ = 01 + -12 + 13 + 04 + 05 + . . .

cT = [0 -1 1 0 0 …]

NB: the precision of our estimates depends on design

matrix and the chosen contrast !

Page 16: Classical (frequentist) inference

Estimability of parameters• If X is not of full rank then different parameters

can give identical predictions, i.e. X1 = X2 with 1≠ 2.

• The parameters are therefore ‘non-unique’, ‘non-identifiable’ or ‘non-estimable’.

• For such models, XTX is not invertible so we must resort to generalised inverses (SPM uses the Moore-Penrose pseudo-inverse).

• This gives a parameter vector that has the smallest norm of all possible solutions.

1 0 11 0 11 0 11 0 10 1 10 1 10 1 10 1 1

One-way ANOVA(unpaired two-sample t-test)

Rank(X)=2• However, even when parameters are non-

estimable, certain contrasts may well be!

parameters

imag

es

Fact

or1

Fact

or2

Mea

n

parameter estimability

(gray ®

not uniquely specified)

Page 17: Classical (frequentist) inference

Estimability of contrasts

• Linear dependency: there is one contrast vector q for which Xq = 0.

• Thus: y = Xb+Xq+e = X(b+q)+e• So if we test cTb we also test cT(b+q), thus an estimable contrast

has to satisfy cTq = 0.

• In the above example, any contrast that is orthogonal to [1 1 -1] is estimable:

[1 0 0], [0 1 0], [0 0 1] are not estimable.[1 0 1], [0 1 1], [1 -1 0], [0.5 0.5 1] are estimable.

Page 18: Classical (frequentist) inference

Student's t-distribution• first described by William Sealy Gosset, a statistician at the Guinness brewery at Dublin• t-statistic is a signal-to-noise measure: t = effect / standard deviation

• t-distribution is an approximation to the normal distribution for small samples

• t-contrasts are simply combinations of the betas the t-statistic does not depend on the scaling of the regressors or on the scaling of the contrast

• Unilateral test: vs.

-5 -4 -3 -2 -1 0 1 2 3 4 50

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

n=1n=2n=5n=10n=¥

Probability density function of Student’s t distribution

0:0 TcH 0:1 TcH

Page 19: Classical (frequentist) inference

cT = 1 0 0 0 0 0 0 0

t =

contrast ofestimated

parameters

varianceestimate

box-car amplitude > 0 ?=

H1 = cT> 0 ?

1 2 3 4 5 ...

t-contrasts – SPM{t}Question:

Null hypothesis: H0: cT=0

Test statistic:

pNTT

T

T

T

tcXXc

ccdts

ct ~

ˆ

ˆ

)ˆ(ˆˆ

12

)0ˆ|( Tcyp

Page 20: Classical (frequentist) inference

t-contrasts in SPM

ResMS image

yXXX TT 1)(ˆ

con_???? image

Tc

pNeeT

ˆˆˆ 2beta_???? images

spmT_???? image

SPM{t}

For a given contrast c:

Page 21: Classical (frequentist) inference

t-contrast: a simple example

Q: activation during listening ?

cT = [ 1 0 ]

Null hypothesis: 01

)ˆ(

ˆ

T

T

cStdct

Passive word listening versus rest

SPMresults:Height threshold T = 3.2057 {p<0.001}

Statistics: p-values adjusted for search volumeset-level

c p cluster-level

p corrected p uncorrectedk E

voxel-levelp FWE-corr p FDR-corr p uncorrectedT (Z

º)

mm mm mm

0.000 10 0.000 520 0.000 0.000 0.000 13.94 Inf 0.000 -63 -27 150.000 0.000 12.04 Inf 0.000 -48 -33 120.000 0.000 11.82 Inf 0.000 -66 -21 6

0.000 426 0.000 0.000 0.000 13.72 Inf 0.000 57 -21 120.000 0.000 12.29 Inf 0.000 63 -12 -30.000 0.000 9.89 7.83 0.000 57 -39 6

0.000 35 0.000 0.000 0.000 7.39 6.36 0.000 36 -30 -150.000 9 0.000 0.000 0.000 6.84 5.99 0.000 51 0 480.002 3 0.024 0.001 0.000 6.36 5.65 0.000 -63 -54 -30.000 8 0.001 0.001 0.000 6.19 5.53 0.000 -30 -33 -180.000 9 0.000 0.003 0.000 5.96 5.36 0.000 36 -27 90.005 2 0.058 0.004 0.000 5.84 5.27 0.000 -45 42 90.015 1 0.166 0.022 0.000 5.44 4.97 0.000 48 27 240.015 1 0.166 0.036 0.000 5.32 4.87 0.000 36 -27 42

Design matrix

0.5 1 1.5 2 2.5

10

20

30

40

50

60

70

80

1

X

)0ˆ|( Tcyp

Page 22: Classical (frequentist) inference

F-test: the extra-sum-of-squares principle

Model comparison: Full vs. reduced model

Full model (X0 + X1)?

Null Hypothesis H0: True model is X0 (reduced model)

X1 X0

RSS

Or reduced model?

X0

RSS0 RSSRSSRSSF

0

21 ,~ nnFRSSESSF

F-statistic: ratio of unexplained variance under X0 and total unexplained variance under the full model

n1 = rank(X) – rank(X0)n2 = N – rank(X)

2ˆ fulle 2ˆreducede

Page 23: Classical (frequentist) inference

F-test: multidimensional contrasts – SPM{F}

Tests multiple linear hypotheses:

0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 00 0 0 0 1 0 0 00 0 0 0 0 1 0 00 0 0 0 0 0 1 00 0 0 0 0 0 0 1

cT =

H0: 3 = 4 = ... = 9 = 0

X1 (3-9)X0

Full model? Reduced model?

H0: True model is X0

X0

test H0 : cT = 0 ?

SPM{F6,322}

Page 24: Classical (frequentist) inference

F-test: a few remarks

• F-tests can be viewed as testing for the additional variance explained by a larger model wrt. a simpler (nested) model model comparison

0000010000100001

• F-tests are not directional:When testing a uni-dimensional contrast with an F-test, for example 1 – 2, the result will be the same as testing 2 – 1.

• Hypotheses:

Null hypothesis H0: β1 = β2 = ... = βp = 0

Alternative hypothesis H1: At least one βk ≠ 0

Page 25: Classical (frequentist) inference

F-contrast in SPM

ResMS image

yXXX TT 1)(ˆ pN

eeT

ˆˆˆ 2beta_???? images

spmF_???? images

SPM{F}

ess_???? images

( RSS0 - RSS )

Page 26: Classical (frequentist) inference

F-test example: movement related effects

To assess movement-related activation:

There is a lot of residual movement-related artifact in the data (despite spatial realignment), which tends to be concentrated near the boundaries of tissue types.

By including the realignment parameters in our design matrix, we can “regress out” linear components of subject movement, reducing the residual error, and hence improve our statistics for the effects of interest.

Page 27: Classical (frequentist) inference

True signal (--) and observed signal

Model (green, peak at 6sec)TRUE signal (blue, peak at 3sec)

Fitting (1 = 0.2, mean = 0.11)

Test for the green regressor not significant

Residuals (still contain some signal)

Example: a suboptimal model

Page 28: Classical (frequentist) inference

e

= +

Y X

1 = 0.222 = 0.11 Residual Var.= 0.3

p(Y| b1 = 0) p-value = 0.1

(t-test)

p(Y| b1 = 0) p-value = 0.2

(F-test)

Example: a suboptimal model

Page 29: Classical (frequentist) inference

t-test of the green regressor almost significant F-test very significant t-test of the red regressor very significant

A better model

True signal + observed signal

Model (green and red)and true signal (blue ---)Red regressor: temporal derivative of the green regressor

Total fit (blue)and partial fit (green & red)Adjusted and fitted signal

Residuals (less variance & structure)

Page 30: Classical (frequentist) inference

e

= +

Y X

1 = 0.222 = 2.153 = 0.11

Residual Var. = 0.2

p(Y| b1 = 0) p-value = 0.07

(t-test)

p(Y| b1 = 0, b2 = 0) p-value = 0.000001

(F-test)

A better model

Page 31: Classical (frequentist) inference

x1

x2x2*

y

Correlation among regressors

When x2 is orthogonalized with regard to x1, only the parameter estimate for x1 changes, not that for x2!

Correlated regressors = explained variance is shared between regressors

121

2211

exxy

1;1 *21

*2

*211

exxy

Page 32: Classical (frequentist) inference

Design orthogonality

• For each pair of columns of the design matrix, the orthogonality matrix depicts the magnitude of the cosine of the angle between them, with the range 0 to 1 mapped from white to black.

• The cosine of the angle between two vectors a and b is obtained by:

baab

cos

• If both vectors have zero mean then the cosine of the angle between the vectors is the same as the correlation between the two variates.

Page 33: Classical (frequentist) inference

True signal

Fit (blue : total fit)

Residual

Model (green and red)

Correlated regressors

Page 34: Classical (frequentist) inference

e

= +

Y X

1 = 0.792 = 0.853 = 0.06

Residual var. = 0.3

p(Y| b1 = 0) p-value = 0.08

(t-test)

P(Y| b2 = 0) p-value = 0.07

(t-test)

p(Y| b1 = 0, b2 = 0) p-value = 0.002

(F-test)

Correlated regressors

1 2

1

2

Page 35: Classical (frequentist) inference

True signal

Residuals (do not change)

Fit (does not change)

Model (green and red)red regressor has been

orthogonalised with respect to the green one

remove everything that correlates with the green regressor

After orthogonalisation

Page 36: Classical (frequentist) inference

e

= +

Y X

Residual var. = 0.3

p(Y| b1 = 0)p-value = 0.0003

(t-test)

p(Y| b2 = 0)p-value = 0.07

(t-test)

p(Y| b1 = 0, b2 = 0)p-value = 0.002

(F-test)

(0.79)(0.85)(0.06)

1 = 1.472 = 0.853 = 0.06

1 2

1

2

After orthogonalisation

does change

does notchange

does notchange

Page 37: Classical (frequentist) inference

Design efficiency

1122 ))(ˆ(),,ˆ( cXXcXc TTe

)ˆvar(

ˆ

T

T

c

cT • The aim is to minimize the standard error of a t-contrast

(i.e. the denominator of a t-statistic).

cXXcc TTT 12 )(ˆ)ˆvar(

• This is equivalent to maximizing the efficiency ε:

Noise variance Design variance

• If we assume that the noise variance is independent of the specific design:

11 ))((),( cXXcXc TTe

• This is a relative measure: all we can say is that one design is more efficient than another (for a given contrast).

NB: efficiency depends on design

matrix and the chosen contrast !

Page 38: Classical (frequentist) inference

Design efficiency

• XTX is the covariance matrix of the regressors in the design matrix• efficiency decreases with increasing covariance • but note that efficiency differs across contrasts

19.09.01

XX T

cT = [1 0] → ε = 0.19

cT = [1 1] → ε = 0.05

cT = [1 -1] → ε = 0.95

11 ))((),( cXXcXc TTe

blue dots:noise with the covariance structure of XTX

Page 39: Classical (frequentist) inference

Example: working memory

• A: Response follows each stimulus with (short) fixed delay.• B: Jittering the delay between stimuli and responses.• C: Requiring a response only for half of all trials (randomly chosen).

Stimulus Response Stimulus Response Stimulus ResponseA B C

Time (s)

Correlation = -.65Efficiency ([1 0]) = 29

Correlation = +.33Efficiency ([1 0]) = 40

Correlation = -.24Efficiency ([1 0]) = 47

Page 40: Classical (frequentist) inference

Bibliography

• Friston KJ et al. (2007) Statistical Parametric Mapping: The Analysis of Functional Brain Images. Elsevier.

• Christensen R (1996) Plane Answers to Complex Questions: The Theory of Linear Models. Springer.

• Friston KJ et al. (1995) Statistical parametric maps in functional imaging: a general linear approach. Human Brain Mapping 2: 189-210.

• Mechelli A et al. (2003) Estimating efficiency a priori: a comparison of blocked and randomized designs. NeuroImage 18:798-805.

Page 41: Classical (frequentist) inference

Thank you

Page 42: Classical (frequentist) inference

Supplementary slides

Page 43: Classical (frequentist) inference

Differential F-contrasts

• equivalent to testing for effects that can be explained as a linear combination of the 3 differences

• useful when using informed basis functions and testing for overall shape differences in the HRF between two conditions