T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging...

39
t-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa

Transcript of T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging...

Page 1: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

t-tests, ANOVAs & Regression

and their application to the statistical analysis of neuroimaging

Carles Falcon &

Suz Prejawa

Page 2: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

OVERVIEW

• Basics, populations and samples

• T-tests

• ANOVA

• Beware!

• Summary Part 1

• Part 2

Page 3: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Basics

• Hypotheses– H0 = Null-hypothesis

– H1 = experimental/

research hypothesis

• Descriptive vs inferential statistics

• (Gaussian) distributions

• p-value & alpha-level (probability and significance)

Activation in the left occipitotemporal regions , esp the visual word form area, is greatest

for written words.

Page 4: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Populations and samplesPopulation

z-tests and distributions

Sample(of a population)t-tests and distributions

NOTE: a sample can be 2 sets of scores, eg fMRI data from 2 conditions

Page 5: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Comparison between Samples

Are these groups different?

Page 6: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Comparison between Conditions (fMRI)

Reading aloud vs Picture naming

Reading aloud (script) vs “Reading” finger spelling (sign)

Page 7: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

right hemisphereLeft hemisphere

lesion site

12

10

8

6

95%

CI

infercomp

t-tests

• Compare the mean between 2 samples/ conditions• if 2 samples are taken from the same population,

then they should have fairly similar means if 2 means are statistically different, then the samples are likely to be drawn from 2 different populations, ie they really are different

Exp. 1 Exp. 2

Page 8: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

t-test in VWFA

• Exp. 1: activation patterns are similar, not significantly different they are similar tasks and recruit the VWFA in a similar way

• Exp. 2: activation patterns are very (and significantly) different reading aloud recruits the VWFA a lot more than naming

right hemisphereLeft hemisphere

lesion site

12

10

8

6

95%

CI

infercomp

right hemisphereLeft hemisphere

lesion site

12

10

8

6

95%

CI

infercomp

Exp. 1 Exp. 2

Page 9: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Formula

21

21

xxs

xxt

Reporting convention: t= 11.456, df= 9, p< 0.001

Difference between the means divided by the pooled standard error of the mean

Page 10: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Formula cont.

21

21

xxs

xxt

2

22

1

21

21 n

s

n

ss xx Cond. 1 Cond. 2

Page 11: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Types of t-tests

Independent Samples

Related Samples also called dependent means test

Interval measures/ parametric

Independent samples t-test*

Paired samples t-test**

Ordinal/ non-parametric

Mann-Whitney U-Test

Wilcoxon test

* 2 experimental conditions and different participants were assigned to each condition

** 2 experimental conditions and the same participants took part in both conditions of the experiments

Page 12: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Types of t-tests cont.

• 2-tailed tests vs one-tailed tests

• 2 sample t-tests vs 1 sample t-tests

2.5%2.5%

5%

Mean Mean

Mean

A known value

Page 13: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Comparison of more than 2 samples

Tell me the difference

between these groups…

Thank God I have ANOVA

Page 14: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

ANOVA in VWFA (2x2)• Is activation in VWFA for

different for a) naming and reading and b) influenced by age and if so (a + b) how so?

• H1 & H0

• H2 & H0

• H3 & H0

reading causes significantly stronger activation in the VWFA but only in the older group so the VWFA is more strongly activated during reading but this seems to be affected by age (related to reading skill?)

right hemisphereLeft hemisphere

lesion site

12

10

8

6

95%

CI

infercomp

right hemisphereLeft hemisphere

lesion site

12

10

8

6

95%

CI

infercomp

Naming Reading

TASK

Naming Reading Aloud

AGE Young

Old

Page 15: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

ANOVA

• ANalysis Of VAriance (ANOVA) – Still compares the differences in means between groups but it

uses the variance of data to “decide” if means are different

• Terminology (factors and levels)

• F- statistic– Magnitude of the difference between the different conditions– p-value associated with F is probability that differences between

groups could occur by chance if null-hypothesis is correct – need for post-hoc testing (ANOVA can tell you if there is an

effect but not where)

Reporting convention: F= 65.58, df= 4,45, p< .001

Page 16: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Types of ANOVAs

Type 2-way ANOVA for independent groups

repeated measures ANOVA mixed ANOVA

Participants Condition I

Condition II

Task I Participant group A

Participant group B

Task II

Participant group C

Participant group D

Condition I

Condition II

Task I Participant group A

Participant group A

Task II

Participant group A

Participant group A

Condition I

Condition II

Task I Participant group A

Participant group B

Task II

Participant group A

Participant group B

NOTE: You may have more than 2 levels in each condition/ task

Between-subject design

Within-subject design

both

Page 17: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

BEWARE!• Errors

– Type I: false positives– Type II: false negatives

• Multiple comparison problem esp prominent in fMRI

Page 18: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

SUMMARY

• t-tests compare means between 2 samples and identify if they are significantly/ statistically different

• may compare two samples to each other OR one sample to a predefined value

• ANOVAs compare more than two samples, over various conditions (2x2, 2x3 or more)

• They investigate variances to establish if means are significantly different

• Common statistical problems (errors, multiple comparison problem)

Page 19: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

PART 2

Correlation- How much linear is the relationship of two variables? (descriptive)

Regression- How good is a linear model to explain my data? (inferential)

Page 20: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Correlation:- How much depend the value of one variable on the value of

the other one?

Page 21: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

How to describe correlation (1):

Covariance

- The covariance is a statistic representing the degree to which 2 variables vary together

(note that Sx2 = cov(x,x) )

n

yyxxyx

i

n

ii ))((

),cov( 1

Page 22: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

cov(x,y) = mean of products of each point desviation from mean values

Geometrical interpretation: mean of ‘signed’ areas from rectangles defined by points and the mean value lines

n

yyxxyx

i

n

ii ))((

),cov( 1

Page 23: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

sign of covariance =

sign of correlation

Page 24: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

How to describe correlation (2):

Pearson correlation coefficient (r)

- r is a kind of ‘normalised’ (dimensionless) covariance

- r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation

yxxy ss

yxr

),cov( (S = st dev of sample)

Page 25: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Pearson correlation coefficient (r)

Problems:

- It is sensitive to outlayers

- r is an estimate from the sample, but does it represent the population parameter?

Page 26: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Linear regression:

- Regression: Prediction of one variable from knowledge of one or more other variables

- How good is a linear model (y=ax+b) to explain the relationship of two variables?

- If there is such a relationship, we can ‘predict’ the value y for a given x. But, which error could we be doing?

(25, 7.498)

Page 27: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Preliminars:

Lineal dependence between 2 variablesTwo variables are linearly dependent when the increase of one variable is

proportional to the increase of the other one

x

y

Samples: - Energy needed to boil water - Money needed to buy coffeepots

Page 28: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

The equation y= mx+n that connects both variables has two parameters: -‘m’ is the unitary increase/decerease of y (how much increases or decreases y when x increases one unity)- ‘n’ the value of y when x is zero (usually zero)

Samples: ‘m’= Energy needed to boil one liter of water , ‘n’=0 ‘m’ = prize of one coffeepot, ‘n’= fixed tax/comission to add

n

m

10

12

12

xxyy

m

01)1( ny

m

Page 29: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Fiting data to a straight line (o viceversa): Here, ŷ = ax + b

– ŷ : predicted value of y– a: slope of regression line– b: intercept

Residual error (εi): Difference between obtained and predicted values of y (i.e. y i- ŷi)

Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (yi- ŷi)2

ε i

εi = residual= yi , observed= ŷi, predicted

ŷ = ax + b

Page 30: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Adjusting the straight line to data:

• Minimise (yi- ŷi)2 , which is (yi-axi+b)2

• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus

• Take partial derivatives of (yi-axi-b)2 respect parametres a and b and solve for 0 as simultaneous equations, giving:

• This calculus can allways be done, whatever is the data!!

x

y

s

rsa xayb

Page 31: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

How good is the model?

• We can calculate the regression line for any data, but how well does it fit the data?

• Total variance = predicted variance + error variance: Sy2 = Sŷ

2 + Ser2

• Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model

r2 = Sŷ2 / Sy

2 • Insert r2Sy

2 into Sy2 = Sŷ

2 + Ser2 and rearrange to get:

Ser2 = Sy

2 (1 – r2)

• From this we can see that the greater the correlation the smaller the error variance, so the better our prediction

Page 32: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Is the model significant?

• i.e. do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

• F-statistic:

• And it follows that:

F(dfŷ,dfer) =sŷ2

ser2r2 (n - 2)2

1 – r2=......=

complicatedrearranging

t(n-2) =r (n - 2)

√1 – r2

So all we need to know are r and n !!!

Page 33: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Generalization to multiple variables

• Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

• The different x variables are combined in a linear way and each has its own regression coefficient:

y = 0 + 1x1+ 2x2 +…..+ nxn + ε

• The a parameters reflect the independent contribution of each independent variable, x , to the value of the dependent variable, y

• i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for

Page 34: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Geometric view, 2 variables:

ŷ = 0 + 1x1+ 2x2x1

x2

y

ε

‘Plane’ of regression: Plane nearest all the sample points distributed over a 3D space:

y = 0 + 1x1+2x2 + ε

Page 35: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Multiple regression in SPM:

y : voxel value

x1, x2,… : parameters that are supposed to justify y variation (regressors)

GLM: given a set of values yi, (voxel value at a determinated position for a sample of images) and a set of explanatories variables xi (group, factors, age, TIV, … for VBM or condition, movement parameters,…. for fMRI) find the (hiper)plane nearest all the points. The coeficients defining the plane are named1, 2,…, n

equation: y = 0 + 1x1+ 2x2 +…..+ nxn + ε

Page 36: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Matrix representation and results:

Page 37: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Last remarks:

- Correlated doesn’t mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C02 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!!

- Relationship between two variables doesn’t mean causality(e.g leaves on the forest floor and hours of sun)

- Cov(x,y)=0 doesn’t mean x,y being independents (yes for linear relationship but it could be quadratic,…)

Page 38: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

Questions ?

Please don’t!

Page 39: T-tests, ANOVAs & Regression and their application to the statistical analysis of neuroimaging Carles Falcon & Suz Prejawa.

REFERENCES

• Field, A. (2005). Discovering Statistics Using SPSS (3rd ed). London: Sage Publications Ltd.

• Field, A. (2009). Discovering Statistics Using SPSS (2nd ed). London: Sage Publications Ltd.

• Various stats websites (google yourself happy)

• Old MfD slides, esp 2008