Statistics

28
Statistics Correlation and regression

description

Statistics. Correlation and regression. Introduction. Some methods involve one variable is Treatment A as effective in relieving arthritic pain as Treatment B? Correlation and regression used to investigate relationships between variables most commonly linear relationships - PowerPoint PPT Presentation

Transcript of Statistics

Page 1: Statistics

Statistics

Correlation and regression

Page 2: Statistics

2

Introduction

Some methods involve one variable is Treatment A as effective in relieving

arthritic pain as Treatment B?

Correlation and regression used to investigate relationships between variables most commonly linear relationships between two variables

is BMD related to dietary calcium level?

Page 3: Statistics

3

Contents

Coefficients of correlation meaning values role significance

Regression line of best fit prediction significance

Page 4: Statistics

4

Introduction

Correlation the strength of the linear relationship between

two variables

Regression analysis determines the nature of the relationship

Is there a relationship between the number of units of alcohol consumed and the likelihood of developing cirrhosis of the liver?

Page 5: Statistics

5

Pearson’s coefficient of correlation

r Measures the strength of the linear

relationship between one dependent and one independent variable curvilinear relationships need other techniques

Values lie between +1 and -1 perfect positive correlation r = +1 perfect negative correlation r = -1 no linear relationship r = 0

Page 6: Statistics

6

Pearson’s coefficient of correlation

r = +1

r = -1

r = 0.6

r = 0

Page 7: Statistics

7

Scatter plot

dependent variablemake inferences

about

independent variable

controlled in some cases

Calcium intake

BMD

make inferences from

Page 8: Statistics

8

Non-Normal data

Page 9: Statistics

9

Normalised

Page 10: Statistics

10

Calculating r

The value and significance of r are calculated by SPSS

Page 11: Statistics

SPSS output: scatter plot

11

Page 12: Statistics

SPSS output: correlations

12

Page 13: Statistics

13

Interpreting correlation

Large r does not necessarily imply: strong correlation

r increases with sample size cause and effect

strong correlation between the number of televisions sold and the number of cases of paranoid schizophrenia

watching TV causes paranoid schizophrenia

may be due to indirect relationship

Page 14: Statistics

14

Interpreting correlation

Variation in dependent variable due to: relationship with independent variable: r2 random factors: 1 - r2 r2 is the Coefficient of Determination e.g. r = 0.661 r2 = = 0.44 less than half of the variation in the

dependent variable due to independent variable

Page 15: Statistics

15

Page 16: Statistics

16

Agreement

Correlation should never be used to determine the level of agreement between repeated measures: measuring devices users techniques

It measures the degree of linear relationship 1, 2, 3 and 2, 4, 6 are perfectly positively

correlated

Page 17: Statistics

17

Assumptions

Errors are differences of predicted values of Y from actual values

To ascribe significance to r: distribution of errors is Normal variance is same for all values of independent

variable X

Page 18: Statistics

18

Non-parametric correlation Make no assumptions Carried out on ranks Spearman’s

easy to calculate Kendall’s

has some advantages over distribution has better statistical

properties easier to identify concordant / discordant

pairs Usually both lead to same

conclusions

Page 19: Statistics

19

Calculation of value and significance

Computer does it!

Page 20: Statistics

20

Role of regression

Shows how one variable changes with another

By determining the line of best fit linear curvilinear

Page 21: Statistics

21

Line of best fit Simplest case linear Line of best fit between:

dependent variable Y BMD

independent variable X dietary intake of Calcium

value of Y when X=0

Y = a + bX

change in Y when X increases by 1

Page 22: Statistics

22

Role of regression

Used to predict the value of the dependent variable when value of independent variable(s)

known within the range of the known data

extrapolation risky! relation between age and bone age

Does not imply causality

Page 23: Statistics

SPSS output: regression

23

Page 24: Statistics

24

Assumptions

Only if statistical inferences are to be made significance of regression values of slope and intercept

Page 25: Statistics

25

Assumptions If values of independent variable are randomly

chosen then no further assumptions necessary Otherwise

as in correlation, assumptions based on errors balance out (mean=0) variances equal for all values of independent variable not related to magnitude of independent variable

seek advice / help

Page 26: Statistics

26

Multivariate regressionMore than one independent

variable BMD dependent on:

agegendercalorific intakeetc

Page 27: Statistics

27

Logistic regression The dependent variable is binary

yes / no predict whether a patient with Type 1

diabetes will undergo limb amputation given history of prior ulcer, time diabetic etc result is a probability

Can be extended to more than two categories Outcome after treatment

recovered, in remission, died

Page 28: Statistics

28

Summary Correlation

strength of linear relationship between two variables

Pearson’s - parametric Spearman’s / Kendalls non-parametric Interpret with care!

Regression line of best fit prediction multivariate logistic