Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Jennifer Siegel

Statistical background

Z-Test

T-Test

Anovas

Science tries to predict the futureGenuine effect?

Attempt to strengthen predictions with stats

Use P-Value to indicate our level of certainty that result = genuine effect on whole population (more on this later…)

Develop an experimental hypothesisH0 = null hypothesisH1 = alternative hypothesis

Statistically significant result P Value = .05

Probability that observed result is trueLevel = .05 or 5%95% certain our experimental effect is

genuine

Type 1 = false positiveType 2 = false negativeP = 1 – Probability of Type 1 error

Let’s pretend you came up with the following theory…

Having a baby increases brain volume (associated with possible structural changes)

Z - test

T - test

Population

CostNot able to include everyoneToo time consumingEthical right to privacy

Realistically researchers can only do sample based studies

T = differences between sample means / standard error of sample means

Degrees of freedom = sample size - 1

meansbetweensdifferenceoferrordardtansestimated

meanssamplebetweensdifferencet

______

H0 = There is no difference in brain size before or after giving birth

H1 = The brain is significantly smaller or significantly larger after giving birth (difference detected)

Before Delivery 6 Weeks After Delivery Difference1437.4 1494.5 57.11089.2 1109.7 20.51201.7 1245.4 43.71371.8 1383.6 11.81207.9 1237.7 29.81150.7 1180.1 29.41221.9 1268.8 46.91208.7 1248.3 39.6

Sum 9889.3 10168.1 278.8Mean 1236.1625 1271.0125 34.85SD 113.8544928 119.0413426 5.18685

T=(1271-1236)/(119-113)

T 6.718914454DF 7

http://www.danielsoper.com/statcalc/calc08.aspx

Women have a significantly larger brain after giving birth

One-sample (sample vs. hypothesized mean)

Independent groups (2 separate groups)Repeated measures (same group, different

measure)

ANalysis Of VArianceFactor = what is being compared (type of

pregnancy)Levels = different elements of a factor (age of

mother)F-Statistic Post hoc testing

1 Way Anova 1 factor with more than 2 levels

Factorial AnovaMore than 1 factor

Mixed Design AnovasSome factors are independent, others are

NOT where the difference lies

Finding exactly where the difference lies requires further statistical analysis = post hoc analysis

Z-Tests for populationsT-Tests for samplesANOVAS compare more than 2 groups in

more complicated scenarios

Varun V.Sethi

Objective

Correlation

Linear Regression

Take Home Points.

Correlation- How much linear is the relationship of two variables? (descriptive)

Regression- How good is a linear model to explain my data? (inferential)

Correlation

Correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear

relationships (bottom).

Strength and direction of the relationship between variables

Scattergrams

Positive correlation Negative correlation No correlation

Measures of Correlation

1)Covariance

2) Pearson Correlation Coefficient (r)

1) Covariance

- The covariance is a statistic representing the degree to which 2 variables vary together

{Note that Sx2 = cov(x,x) }

yyxxyx

ii ))((

),cov( 1

A statistic representing the degree to which 2 variables vary together

Covariance formula

cf. variance formula

yyxxyx

ii ))((

),cov( 1

2) Pearson correlation coefficient (r)

- r is a kind of ‘normalised’ (dimensionless) covariance

- r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation

yxxy ss

),cov( (S = st dev of sample)

Limitations:

Sensitive to extreme values

Relationship not a prediction.

Not Causality

Regression: Prediction of one variable from knowledge of one or more other variables

How good is a linear model (y=ax+b) to explain the relationship of two variables?

- If there is such a relationship, we can ‘predict’ the value y for a given x.

(25, 7.498)

Linear dependence between 2 variables

Two variables are linearly dependent when the increase of one variable is proportional to the increase of the other one

Samples: - Energy needed to boil water - Money needed to buy coffeepots

Fiting data to a straight line (o viceversa): Here, ŷ = ax + b

– ŷ : predicted value of y– a: slope of regression line– b: intercept

Residual error (εi): Difference between obtained and predicted values of y (i.e. y i- ŷi)

Best fit line (values of b and a) is the one that minimises the sum of squared errors (SSerror) (yi- ŷi)2

εi = residual= yi , observed= ŷi, predicted

ŷ = ax + b

Adjusting the straight line to data:

• Minimise (yi- ŷi)2 , which is (yi-axi+b)2

• Minimum SSerror is at the bottom of the curve where the gradient is zero – and this can found with calculus

• Take partial derivatives of (yi-axi-b)2 respect parametres a and b and solve for 0 as simultaneous equations, giving:

• This can always be done

rsa xayb

We can calculate the regression line for any data, but how well does it fit the data?

Total variance = predicted variance + error variancesy

2 = sŷ2 + ser

Also, it can be shown that r2 is the proportion of the variance in y that is explained by our regression model

r2 = sŷ2 / sy

Insert r2 sy2 into sy

2 = sŷ2 + ser

2 and rearrange to get:

ser2 = sy

2 (1 – r2)

From this we can see that the greater the correlation the smaller the error variance,

so the better our prediction

Do we get a significantly better prediction of y from our regression equation than by just predicting the mean?

F-statistic

Prediction / Forecasting Quantify strength between y and Xj ( X1, X2,

A General Linear Model is just any model that describes the data in terms of a straight line

Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept.

y = bx + a +ε

Multiple regression is used to determine the effect of a number of independent variables, x1, x2, x3 etc., on a single dependent variable, y

The different x variables are combined in a linear way and each has its own regression coefficient:

y = b0 + b1x1+ b2x2 +…..+ bnxn + ε

The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. i.e. the amount of variance in y that is accounted

for by each x variable after all the other x variables have been accounted for

Take Home Points

- Correlated doesn’t mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C02 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!!

- Relationship between two variables doesn’t mean causality(e.g leaves on the forest floor and hours of sun)

Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y

Multiple Regression models the effect of several independent variables, x1, x2 etc, on one dependent variable, y

Both are types of General Linear Model

Thank You

Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Documents

Transcript of Jennifer Siegel. Statistical background Z-Test T-Test Anovas.

Siegel 1987

TI Disbopox 477 AS-Siegel HR.pdf - Disbopox 477 AS-Siegel

Within Subject ANOVAs: Assumptions & Post Hoc Tests.

Biochem (Siegel)

Experimental Research Methods in Language Learning Chapter 14 Analyses of Variance (ANOVAs)

Nelson Siegel paperweb.math.ku.dk/~rolf/teaching/NelsonSiegel.pdfNelson Siegel paper

Dan Siegel

T Tests and ANovas

The siegel-tukey-test-for-equal-variability

TI Disbopox 477 AS-Siegel BA.pdf - Disbopox 477 AS-Siegel

Siegel blockchain deck

Sean Siegel Photography

Lab3: writing up results and ANOVAs with within and between factors

Business 205. Review Analysis of Variance (ANOVAs)

Siegel v Snyder

ANOVAs and SPM - University of Cambridge

{ THE PSYCHOLOGICAL IMPACT OF VISION LOSS MARGO SIEGEL, MSW MARGO SIEGEL, MSW.

Siegel decentral talk

€¦ · Web viewRepeated measures ANOVAs were conducted to test if there were significant differences in pain and distress ratings (1) over the recovery period (30s, 60s) following

Nathaniel Siegel