# Jennifer Siegel. Statistical background Z-Test T-Test Anovas

date post

14-Dec-2015Category

## Documents

view

214download

1

Embed Size (px)

### Transcript of Jennifer Siegel. Statistical background Z-Test T-Test Anovas

- Slide 1

Jennifer Siegel Slide 2 Statistical background Z-Test T-Test Anovas Slide 3 Science tries to predict the future Genuine effect? Attempt to strengthen predictions with stats Use P-Value to indicate our level of certainty that result = genuine effect on whole population (more on this later) Slide 4 Slide 5 Develop an experimental hypothesis H0 = null hypothesis H1 = alternative hypothesis Statistically significant result P Value =.05 Slide 6 Probability that observed result is true Level =.05 or 5% 95% certain our experimental effect is genuine Slide 7 Type 1 = false positive Type 2 = false negative P = 1 Probability of Type 1 error Slide 8 Slide 9 Lets pretend you came up with the following theory Having a baby increases brain volume (associated with possible structural changes) Slide 10 Z - test T - test Slide 11 Population Slide 12 Cost Not able to include everyone Too time consuming Ethical right to privacy Realistically researchers can only do sample based studies Slide 13 T = differences between sample means / standard error of sample means Degrees of freedom = sample size - 1 Slide 14 Slide 15 H0 = There is no difference in brain size before or after giving birth H1 = The brain is significantly smaller or significantly larger after giving birth (difference detected) Slide 16 T=(1271-1236)/(119-113) Slide 17 http://www.danielsoper.com/statcalc/calc08.aspx Women have a significantly larger brain after giving birth Slide 18 One-sample (sample vs. hypothesized mean) Independent groups (2 separate groups) Repeated measures (same group, different measure) Slide 19 Slide 20 ANalysis Of VAriance Factor = what is being compared (type of pregnancy) Levels = different elements of a factor (age of mother) F-Statistic Post hoc testing Slide 21 1 Way Anova 1 factor with more than 2 levels Factorial Anova More than 1 factor Mixed Design Anovas Some factors are independent, others are related Slide 22 There is a significant difference somewhere between groups NOT where the difference lies Finding exactly where the difference lies requires further statistical analysis = post hoc analysis Slide 23 Slide 24 Z-Tests for populations T-Tests for samples ANOVAS compare more than 2 groups in more complicated scenarios Slide 25 Varun V.Sethi Slide 26 Objective Correlation Linear Regression Take Home Points. Slide 27 Slide 28 Correlation - How much linear is the relationship of two variables? (descriptive) Regression - How good is a linear model to explain my data? (inferential) Slide 29 Slide 30 Correlation Correlation reflects the noisiness and direction of a linear relationship (top row), but not the slope of that relationship (middle), nor many aspects of nonlinear relationships (bottom). Slide 31 Strength and direction of the relationship between variables Scattergrams Y X YY X Y YY Positive correlationNegative correlationNo correlation Slide 32 Slide 33 Measures of Correlation 1)Covariance 2) Pearson Correlation Coefficient (r) Slide 34 1) Covariance -The covariance is a statistic representing the degree to which 2 variables vary together {Note that S x 2 = cov(x,x) } Slide 35 A statistic representing the degree to which 2 variables vary together Covariance formula cf. variance formula Slide 36 2) Pearson correlation coefficient (r) -r is a kind of normalised (dimensionless) covariance -r takes values fom -1 (perfect negative correlation) to 1 (perfect positive correlation). r=0 means no correlation (S = st dev of sample) Slide 37 Slide 38 Limitations: Sensitive to extreme values Relationship not a prediction. Not Causality Slide 39 Slide 40 Regression: Prediction of one variable from knowledge of one or more other variables Slide 41 How good is a linear model (y=ax+b) to explain the relationship of two variables? - If there is such a relationship, we can predict the value y for a given x. (25, 7.498) Slide 42 Linear dependence between 2 variables Two variables are linearly dependent when the increase of one variable is proportional to the increase of the other one xx yy Samples: - Energy needed to boil water - Money needed to buy coffeepots Slide 43 Slide 44 Fiting data to a straight line (o viceversa): Here, = ax + b : predicted value of y a: slope of regression line b: intercept Residual error ( i ): Difference between obtained and predicted values of y (i.e. y i - i ) Best fit line (values of b and a) is the one that minimises the sum of squared errors (SS error ) (y i - i ) 2 i i i = residual = y i, observed = i, predicted = ax + b Slide 45 Slide 46 Adjusting the straight line to data: Minimise (y i - i ) 2, which is (y i -ax i +b) 2 Minimum SS error is at the bottom of the curve where the gradient is zero and this can found with calculus Take partial derivatives of (y i -ax i -b) 2 respect parametres a and b and solve for 0 as simultaneous equations, giving: This can always be done Slide 47 We can calculate the regression line for any data, but how well does it fit the data? Total variance = predicted variance + error variance s y 2 = s 2 + s er 2 Also, it can be shown that r 2 is the proportion of the variance in y that is explained by our regression model r 2 = s 2 / s y 2 Insert r 2 s y 2 into s y 2 = s 2 + s er 2 and rearrange to get: s er 2 = s y 2 (1 r 2 ) From this we can see that the greater the correlation the smaller the error variance, so the better our prediction Slide 48 Do we get a significantly better prediction of y from our regression equation than by just predicting the mean? F-statistic Slide 49 Prediction / Forecasting Quantify strength between y and Xj ( X1, X2, X3 ) Slide 50 A General Linear Model is just any model that describes the data in terms of a straight line Linear regression is actually a form of the General Linear Model where the parameters are b, the slope of the line, and a, the intercept. y = bx + a + Slide 51 Multiple regression is used to determine the effect of a number of independent variables, x 1, x 2, x 3 etc., on a single dependent variable, y The different x variables are combined in a linear way and each has its own regression coefficient: y = b 0 + b 1 x 1 + b 2 x 2 +..+ b n x n + The a parameters reflect the independent contribution of each independent variable, x, to the value of the dependent variable, y. i.e. the amount of variance in y that is accounted for by each x variable after all the other x variables have been accounted for Slide 52 Take Home Points - Correlated doesnt mean related. e.g, any two variables increasing or decreasing over time would show a nice correlation: C0 2 air concentration in Antartica and lodging rental cost in London. Beware in longitudinal studies!!! - Relationship between two variables doesnt mean causality (e.g leaves on the forest floor and hours of sun) Slide 53 Linear regression is a GLM that models the effect of one independent variable, x, on one dependent variable, y Multiple Regression models the effect of several independent variables, x 1, x 2 etc, on one dependent variable, y Both are types of General Linear Model Slide 54 Thank You