SIMPLE LINEAR REGRESSION AND CORRELLATION

39
SIMPLE LINEAR REGRESSION AND CORRELLATION By Mpembeni RNM, School of Public Health and Social Sciences, Dept of Epidemiology and Biostatistics

description

LEARNING OBJECTIVES After successful completion of this session, you should be able to: Describe the correlation coefficient Describe the linear regression model Understand and check model assumptions Understand meaning of regression coefficients

Transcript of SIMPLE LINEAR REGRESSION AND CORRELLATION

Page 1: SIMPLE LINEAR REGRESSION AND CORRELLATION

SIMPLE LINEAR REGRESSION AND CORRELLATION

By Mpembeni RNM,School of Public Health and Social Sciences, Dept of Epidemiology and BiostatisticsMUHAS

Page 2: SIMPLE LINEAR REGRESSION AND CORRELLATION

LEARNING OBJECTIVESAfter successful completion of this

session, you should be able to:• Describe the correlation coefficient• Describe the linear regression model• Understand and check model

assumptions• Understand meaning of regression

coefficients

Page 3: SIMPLE LINEAR REGRESSION AND CORRELLATION

ANALYSING RELATIONSHIPS BETWEEN TWO OR MORE QUANTITATIVE

VARIABLES

Two commonly used Methods are:

• Correlation• linear regression• Multiple Linear Regression

Page 4: SIMPLE LINEAR REGRESSION AND CORRELLATION

CORRELATION• The (Pearson's) correlation coefficient, r

measures the closeness (strength) of the linear association

i.e. the closeness with which the points lie along the straight line

• r is a bivariate correlation coefficient summarizing the magnitude and direction of the relationship between two variables

Page 5: SIMPLE LINEAR REGRESSION AND CORRELLATION

Characteristics of r

•Ranges between -1 and +1

•r = 0: No linear relationship

•r = 1 perfect positive relationship

•r = -1 perfect negative relationship

Page 6: SIMPLE LINEAR REGRESSION AND CORRELLATION

Interpretation of r•If r > 0: variables are positively

correlated. i.e as x increases, y tends to increase, while as x decreases, y tends to decrease

•If r < 0: variables are said to be negatively correlated. i.e as x increases, y tends to decrease, while as x decreases, y tends to increase

Page 7: SIMPLE LINEAR REGRESSION AND CORRELLATION

Rule of thumb for rCorrelation Strong Weak

Positive up and right

0.7 to 1.0 0.3 to 0.7

Negative down and left

-1.0 to -0.7 -0.7 to -0.3

Little or No Correlation:  -0.3 to 0.3

Page 8: SIMPLE LINEAR REGRESSION AND CORRELLATION

SCATTER DIAGRAM • First step in investigating the relationship between

two variables• Two related variables - plotted on a graph in the

form of points or dots• Each point on the diagram represents a pair of

values, one based on X-scale and the other based on Y-scale.

• X-scale refer to the explanatory or independent variable and the Y-scale refer to the response or dependent variable.

• Diagram shows visually the shape and degree of closeness of the relationship

Page 9: SIMPLE LINEAR REGRESSION AND CORRELLATION

Head circumference and Gestational age of 100 LBW babies

Page 10: SIMPLE LINEAR REGRESSION AND CORRELLATION

Scatter Plot

• From the scatter plot, there is a trend of head circumference to increase with increasing gestational age

Page 11: SIMPLE LINEAR REGRESSION AND CORRELLATION

Strong positive correlation

Page 12: SIMPLE LINEAR REGRESSION AND CORRELLATION

Weak negative correlation

Page 13: SIMPLE LINEAR REGRESSION AND CORRELLATION

No correlation

Page 14: SIMPLE LINEAR REGRESSION AND CORRELLATION

CORRELATION COEFFICIENT

• r=∑(X-X )(Y-ˉ ӯ) √∑(X-X͞) 2∑(Y-Ῡ)2

= ∑XY-(∑X)(∑Y)/n √∑x2-(∑x)2/n ∑y2(∑y)2/n

Page 15: SIMPLE LINEAR REGRESSION AND CORRELLATION

Example: Association between Body weight and Plasma volume

Subject Body weight

(kg)

Plasma volume (l)

1 2 3 4 5 6 7 8

58.0 70.0 74.0 63.5 62.0 70.5 71.0 66.0

2.75 2.86 3.37 2.76 2.62 3.49 3.05 3.12

Page 16: SIMPLE LINEAR REGRESSION AND CORRELLATION

Calculation of r

• ∑xy – (∑x∑y)/n• = 1615.295 – 535 x 24.02/8=8.96• ∑x2 –(∑x)2/n = 35983.5-5352/8 = 205.38

• ∑y2-∑y2/n = 72.789 – 24.022/8 = 0.678• r = 8.96 √(205.38 x 0.678)= 0.76

Page 17: SIMPLE LINEAR REGRESSION AND CORRELLATION

STRENGHT OF THE ASSOCIATION BETWEEN WEIGHT AND PLASMA VOLUME

• How strong is the association?

Page 18: SIMPLE LINEAR REGRESSION AND CORRELLATION

Simple Linear Regression

The two quantitative variables should be defined:

• y refers to the dependent variable (AKA response or outcome variable)

• x the independent variable (AKA explanatory or predictor variable)

Page 19: SIMPLE LINEAR REGRESSION AND CORRELLATION

Simple linear regression• The objective of the analysis is to see

whether a change in an independent variable, x, is associated with a change in the dependent variable, y,

• Be able to predict the value of the dependent variable given the value of the independent variable

• Eg Age and Weight of a child under five years of age.

Page 20: SIMPLE LINEAR REGRESSION AND CORRELLATION

EXAMPLE

• Data on body weight and plasma volume of eight healthy men.

• The objective of the analysis is to see whether a change in plasma volume is associated with a change in body weight.

Page 21: SIMPLE LINEAR REGRESSION AND CORRELLATION

ASSOCIATION BETWEEN QUANTITATIVE VARIABLES

Subject Body weight

(kg)

Plasma volume (l)

1 2 3 4 5 6 7 8

58.0 70.0 74.0 63.5 62.0 70.5 71.0 66.0

2.75 2.86 3.37 2.76 2.62 3.49 3.05 3.12

Page 22: SIMPLE LINEAR REGRESSION AND CORRELLATION

.Scatter Diagram of Body weight and Plasma Volume

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

56 58 60 62 64 66 68 70 72 74 76

Body weight (kg)

Plasma volume

Page 23: SIMPLE LINEAR REGRESSION AND CORRELLATION

Body weight and plasma volume

• There is a trend of plasma volume to increase with increasing body weight

Page 24: SIMPLE LINEAR REGRESSION AND CORRELLATION

LINEAR REGRESSION

• When Linear relationship exists, can summarize the relationship by a line drawn through the scatter of points.

• any straight line drawn on a graph can be represented by the equation: y = a + bx

where y refers to the values of the response (dependent) variable x to values of the explanatory (independent) variable.

Page 25: SIMPLE LINEAR REGRESSION AND CORRELLATION

LINEAR REGRESSION

• The constant 'a' is the intercept, the point at which the line crosses the y-axis.

• That is, the value of y when x = 0.• The coefficient of x variable ('b') is the

slope of the line. • It tells us the average change (increase or

decrease) in y due to a unit change in x. • b is also called the regression coefficient.

Page 26: SIMPLE LINEAR REGRESSION AND CORRELLATION

METHOD OF LEAST SQUARES

• A mathematical technique to fit a straight line to a set of points

• i.e is used to estimate a and b

Page 27: SIMPLE LINEAR REGRESSION AND CORRELLATION

LINEAR REGRESSION

• Numerator =Sxy=

• Denominator =

• = Sxx

xy -(xy)/n

b = (x -x )(y -y ) (x -x )2

x2 - (x)2/n

Page 28: SIMPLE LINEAR REGRESSION AND CORRELLATION

LINEAR REGRESSION

where y = y/n and x = x/n

• The resultant line is called the regression line, which estimates the average value of y for a given value of x.

a = y- bx

Page 29: SIMPLE LINEAR REGRESSION AND CORRELLATION

Calculating the least Square EstimatesExample – data on plasma volume and body

weight

n = 8 x = 535 x2 = 35983.5 y = 24.02 y2 = 72.798 xy = 1615.295

Page 30: SIMPLE LINEAR REGRESSION AND CORRELLATION

Example

b = 1615.296 - (535)(24.02)/8 35983.5 - (535)2/8 = 8.96/205.38 = 0.043615 and a = 3.0025 - 0.043615 x 66.875 = 0.0857

Page 31: SIMPLE LINEAR REGRESSION AND CORRELLATION

Example

Regression line:

Plasma volume = 0.09 + 0.04 x body weight

Page 32: SIMPLE LINEAR REGRESSION AND CORRELLATION

ESTIMATION• Once you have the value of a and b,

you can substitute various values of x into the equation for the line, solve for the corresponding values of y.

• Eg what would be the plasma volume for an adult with 62 kgs?

• 77 kgs?

Page 33: SIMPLE LINEAR REGRESSION AND CORRELLATION

Regression line

2

2.2

2.4

2.6

2.8

3

3.2

3.4

3.6

56 58 60 62 64 66 68 70 72 74 76

Body weight (kg)

Plasma volume

Page 34: SIMPLE LINEAR REGRESSION AND CORRELLATION

INFERENCES FOR REGRESSION COEFFICIENTS

• Just like in any other estimate, the standard error for the regression coefficient can be calculated.

• Can test the hypothesis whether b differs significantly from b0 using a t test

• The t value and the corresponding p-value are all shown in the output table.

Page 35: SIMPLE LINEAR REGRESSION AND CORRELLATION

Evaluation of the model

• The coefficient of Determination, R2

which is the square of the Pearsons Correllation Coefficent, r, is used to assess how best the model fits the data.

• This is the proportion of the variability among the observed values of y that is explained by the linear regression of y on x

Page 36: SIMPLE LINEAR REGRESSION AND CORRELLATION

Model Evaluation

• If for example R2 is 0.6095 it implies that almost 61% of the variation among the observed values of y is explained by its linear relationship with the independent variable

Page 37: SIMPLE LINEAR REGRESSION AND CORRELLATION

EXERCISE

• Using the provided data set ( LBW babies)• Correlate birth weight with Gestational age• What is the Correlation Coefficient between bweight

and Gestage?• Regression of Birth weight on gestational age. What is

the equation of the line?• What is the estimated birth weight for a baby with 42

weeks of gestation?, 36 weeks? • What proportion of the variability of birth weight is

explained by gestational age?

Page 38: SIMPLE LINEAR REGRESSION AND CORRELLATION

Coefficients(a)

Model Unstandardized CoefficientsStandardized Coefficients t Sig.

B Std. Error Beta 1 (Constant) -932.404 234.488

-3.976 .000 gestage 70.310 8.086 .6608.695 .000a Dependent Variable: birthwt

Page 39: SIMPLE LINEAR REGRESSION AND CORRELLATION

Standardized CoefficientsB Std. Error Beta

(Constant) -932.404 234.488 -3.976 0.000gestage 70.310 8.086 0.660 8.695 0.000

1

a. Dependent Variable: birthwt

Model Unstandardized Coefficients

t Sig.