Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

54
Jennifer Siegel

Transcript of Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Page 1: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Jennifer Siegel

Page 2: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Statistical background

Z-Test

T-Test

Anovas

Page 3: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Science tries to predict the futureGenuine effect?

Attempt to strengthen predictions with stats

Use P-Value to indicate our level of certainty that result = genuine effect on whole population (more on this later…)

Page 4: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 5: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Develop an experimental hypothesisH0 = null hypothesisH1 = alternative hypothesis

Statistically significant result P Value = .05

Page 6: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Probability that observed result is trueLevel = .05 or 5%95% certain our experimental effect is

genuine

Page 7: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Type 1 = false positiveType 2 = false negativeP = 1 – Probability of Type 1 error

Page 8: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 9: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Let’s pretend you came up with the following theory…

Having a baby increases brain volume (associated with possible structural changes)

Page 10: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Z - test

T - test

Page 11: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Population

x

z

Page 12: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

CostNot able to include everyoneToo time consumingEthical right to privacy

Realistically researchers can only do sample based studies

Page 13: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

T = differences between sample means / standard error of sample means

Degrees of freedom = sample size - 1

21

21

xxs

xxt

meansbetweensdifferenceoferrordardtansestimated

meanssamplebetweensdifferencet

______

___

2

22

1

21

21 n

s

n

ss xx

Page 14: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 15: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

H0 = There is no difference in brain size before or after giving birth

H1 = The brain is significantly smaller or significantly larger after giving birth (difference detected)

Page 16: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Before Delivery 6 Weeks After Delivery Difference1437.4 1494.5 57.11089.2 1109.7 20.51201.7 1245.4 43.71371.8 1383.6 11.81207.9 1237.7 29.81150.7 1180.1 29.41221.9 1268.8 46.91208.7 1248.3 39.6

Sum 9889.3 10168.1 278.8Mean 1236.1625 1271.0125 34.85SD 113.8544928 119.0413426 5.18685

T=(1271-1236)/(119-113)

Page 17: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

T 6.718914454DF 7

http://www.danielsoper.com/statcalc/calc08.aspx

Women have a significantly larger brain after giving birth

Page 18: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

One-sample (sample vs. hypothesized mean)

Independent groups (2 separate groups)Repeated measures (same group, different

measure)

Page 19: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 20: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

ANalysis Of VArianceFactor = what is being compared (type of

pregnancy)Levels = different elements of a factor (age of

mother)F-Statistic Post hoc testing

Page 21: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

1 Way Anova 1 factor with more than 2 levels

Factorial AnovaMore than 1 factor

Mixed Design AnovasSome factors are independent, others are

related

Page 22: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

There is a significant difference somewhere between groups

NOT where the difference lies

Finding exactly where the difference lies requires further statistical analysis = post hoc analysis

Page 23: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 24: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Z-Tests for populationsT-Tests for samplesANOVAS compare more than 2 groups in

more complicated scenarios

Page 25: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Varun V.Sethi

Page 26: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Objective

Correlation

Linear Regression

Take Home Points.

Page 27: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 28: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Correlation- How much linear is the relationship of two variables? (descriptive)

Regression- How good is a linear model to explain my data? (inferential)

Page 29: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 30: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Correlation

Correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear

relationships (bottom).

Page 31: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Strength and direction of the relationship between variables

Scattergrams

Y

X

YY

X

YY Y

Positive correlation Negative correlation No correlation

Page 32: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 33: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Measures of Correlation

1)Covariance

2) Pearson Correlation Coefficient (r)

Page 34: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

1) Covariance

- The covariance is a statistic representing the degree to which 2 variables vary together

{Note that Sx2 = cov(x,x) }

n

yyxxyx

i

n

ii ))((

),cov( 1

Page 35: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

A statistic representing the degree to which 2 variables vary together

Covariance formula

cf. variance formula

n

yyxxyx

i

n

ii ))((

),cov( 1

n

xxS

n

ii

x

2

12

)(

Page 36: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

2) Pearson correlation coefficient (r)

- r is a kind of ‘normalised’ (dimensionless) covariance

- r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation

yxxy ss

yxr

),cov( (S = st dev of sample)

Page 37: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 38: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Limitations:

Sensitive to extreme values

Relationship not a prediction.

Not Causality

Page 39: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 40: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Regression: Prediction of one variable from knowledge of one or more other variables

Page 41: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

How good is a linear model (y=ax+b) to explain the relationship of two variables?

- If there is such a relationship, we can ‘predict’ the value y for a given x.

(25, 7.498)

Page 42: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Linear dependence between 2 variables

Two variables are linearly dependent when the increase of one variable is proportional to the increase of the other one

x

y

Samples: - Energy needed to boil water - Money needed to buy coffeepots

Page 43: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 44: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Fiting data to a straight line (o viceversa): Here, ŷ = ax + b

– ŷ : predicted value of y– a: slope of regression line– b: intercept

Residual error (εi): Difference between obtained and predicted values of y (i.e. y i- ŷi)

Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (yi- ŷi)2

ε i

εi = residual= yi , observed= ŷi, predicted

ŷ = ax + b

Page 45: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.
Page 46: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Adjusting the straight line to data:

• Minimise (yi- ŷi)2 , which is (yi-axi+b)2

• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus

• Take partial derivatives of (yi-axi-b)2 respect parametres a and b and solve for 0 as simultaneous equations, giving:

• This can always be done

x

y

s

rsa xayb

Page 47: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

We can calculate the regression line for any data, but how well does it fit the data?

Total variance = predicted variance + error variancesy

2 = sŷ2 + ser

2

Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model

r2 = sŷ2 / sy

2

Insert r2 sy2 into sy

2 = sŷ2 + ser

2 and rearrange to get:

ser2 = sy

2 (1 – r2)

From this we can see that the greater the correlation the smaller the error variance,

so the better our prediction

Page 48: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

F-statistic

Page 49: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Prediction / Forecasting Quantify strength between y and Xj ( X1, X2,

X3 )

Page 50: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

A General Linear Model is just any model that describes the data in terms of a straight line

Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.

y = bx + a +ε

Page 51: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

The different x variables are combined in a linear way and each has its own regression coefficient:

y = b0 + b1x1+ b2x2 +…..+ bnxn + ε

The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. i.e. the amount of variance in y that is accounted

for by each x variable after all the other x variables have been accounted for

Page 52: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Take Home Points

- Correlated doesn’t mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C02 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!!

- Relationship between two variables doesn’t mean causality(e.g leaves on the forest floor and hours of sun)

Page 53: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y

Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y

Both are types of General Linear Model

Page 54: Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Thank You