CBMR - Research - Correlation & Regression

46
Correlation & Regression

Transcript of CBMR - Research - Correlation & Regression

Page 1: CBMR - Research - Correlation & Regression

Correlation & Regression

Page 2: CBMR - Research - Correlation & Regression

Correlation

Finding the relationship between two quantitative variables without being able to infer causal relationships

Correlation is a statistical technique used to determine the degree to which two variables are related

Page 3: CBMR - Research - Correlation & Regression

• Rectangular coordinate• Two quantitative variables• One variable is called independent (X) and

the second is called dependent (Y)• Points are not joined • No frequency table

Scatter diagram

Y * *

*X

Page 4: CBMR - Research - Correlation & Regression

Wt. (kg)

67 69 85 83 74 81 97 92 114 85

SBP (mmHg)

120 125 140 160 130 180 150 140 200 130

Example

Page 5: CBMR - Research - Correlation & Regression

Scatter diagram of weight and systolic blood pressure

60 70 80 90 100 110 12080

100

120

140

160

180

200

220

wt (kg)

SBP(mmHg)Wt.

(kg) 67 69 85 83 74 81 97 92 114 85

SBP (mmHg)

120 125 140 160 130 180 150 140 200 130

Page 6: CBMR - Research - Correlation & Regression

60 70 80 90 100 110 12080

100

120

140

160

180

200

220

Wt (kg)

SBP(mmHg)

Scatter diagram of weight and systolic blood pressure

Page 7: CBMR - Research - Correlation & Regression

Scatter plots

The pattern of data is indicative of the type of relationship between two variables:

positive relationship negative relationship no relationship

Page 8: CBMR - Research - Correlation & Regression

Positive relationship

Page 9: CBMR - Research - Correlation & Regression

10 20 30 40 50 60 70 80 900

2

4

6

8

10

12

14

16

18

Age in Weeks

Hei

ght i

n C

M

Page 10: CBMR - Research - Correlation & Regression

Negative relationship

Reliability

Age of Car

Page 11: CBMR - Research - Correlation & Regression

No relation

Page 12: CBMR - Research - Correlation & Regression

Correlation Coefficient

Statistic showing the degree of relation between two variables

Page 13: CBMR - Research - Correlation & Regression

Simple Correlation coefficient (r)

It is also called Pearson's correlation or product moment correlationcoefficient.

It measures the nature and strength between two variables of the quantitative type.

Page 14: CBMR - Research - Correlation & Regression

The sign of r denotes the nature of association

while the value of r denotes the strength of association.

Page 15: CBMR - Research - Correlation & Regression

If the sign is +ve this means the relation is direct (an increase in one variable is associated with an increase in theother variable and a decrease in one variable is associated with adecrease in the other variable).

While if the sign is -ve this means an inverse or indirect relationship (which means an increase in one variable is associated with a decrease in the other).

Page 16: CBMR - Research - Correlation & Regression

The value of r ranges between ( -1) and ( +1) The value of r denotes the strength of the

association as illustrated by the following diagram.

-1 10-0.25-0.75 0.750.25

strong strongintermediate intermediateweak weak

no relation

perfect correlation

perfect correlation

Directindirect

Page 17: CBMR - Research - Correlation & Regression

If r = Zero this means no association or correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.

Page 18: CBMR - Research - Correlation & Regression

ny)(

y.nx)(

x

nyx

xyr

22

22

How to compute the simple correlation coefficient (r)

Page 19: CBMR - Research - Correlation & Regression

Example: A sample of 6 children was selected, data about their

age in years and weight in kilograms was recorded as shown in the following table . It is required to find the correlation between age and weight.

Weight (Kg)

Age (years)

serial No

12 7 18 6 2

12 8 310 5 411 6 513 9 6

Page 20: CBMR - Research - Correlation & Regression

These 2 variables are of the quantitative type, one variable (Age) is called the independent and denoted as (X) variable and the other (weight)is called the dependent and denoted as (Y) variables to find the relation between age and weight compute the simple correlation coefficient using the following formula:

ny)(

y.nx)(

x

nyx

xyr

22

22

Page 21: CBMR - Research - Correlation & Regression

Y2 X2 xyWeight

(Kg)(y)

Age (years)

(x)Serial

n.

144 49 84 12 7 1

64 36 48 8 6 2

144 64 96 12 8 3

100 25 50 10 5 4

121 36 66 11 6 5

169 81 117 13 9 6

∑y2=742

∑x2=291

∑xy= 461

∑y=66

∑x=41

Total

Page 22: CBMR - Research - Correlation & Regression

r = 0.759strong direct correlation

6(66)742.

6(41)291

66641461

r22

Page 23: CBMR - Research - Correlation & Regression

EXAMPLE: Relationship between Anxiety and Test Scores

Anxiety )X(

Test score (Y)

X2 Y2 XY

10 2 100 4 208 3 64 9 242 9 4 81 181 7 1 49 75 6 25 36 306 5 36 25 30

∑X = 32 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129

Page 24: CBMR - Research - Correlation & Regression

Calculating Correlation Coefficient

94.)200)(356(

102477432)204(632)230(6

)32)(32()129)(6(22

r

r = - 0.94

Indirect strong correlation

Page 25: CBMR - Research - Correlation & Regression

Regression Analyses

Regression: technique concerned with predicting some variables by knowing others

The process of predicting variable Y using variable X

Page 26: CBMR - Research - Correlation & Regression

Regression Uses a variable (x) to predict some outcome

variable (y) Tells you how values in y change as a function

of changes in values of x

Page 27: CBMR - Research - Correlation & Regression

Correlation and Regression

Correlation describes the strength of a linear relationship between two variables

Linear means “straight line”

Regression tells us how to draw the straight line described by the correlation

Page 28: CBMR - Research - Correlation & Regression

Regression Calculates the “best-fit” line for a certain set of dataThe regression line makes the sum of the squares of

the residuals smaller than for any other lineRegression minimizes residuals

60 70 80 90 100 110 12080

100

120

140

160

180

200

220

Wt (kg)

SBP(mmHg)

Page 29: CBMR - Research - Correlation & Regression

By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a straight line) we areable to construct a best fitting straight line to the scatter diagram points and then formulate a regression equation in the form of:

nx)(

x

nyx

xyb 2

21)xb(xyy b

bXay

Page 30: CBMR - Research - Correlation & Regression

Regression Equation

Regression equation describes the regression line mathematically– Intercept– Slope

60 70 80 90 100 110 12080

100

120

140

160

180

200

220

Wt (kg)

SBP(mmHg)

Page 31: CBMR - Research - Correlation & Regression

Linear EquationsY

Y = bX + a

a = Y-interceptX

Changein Y

Change in Xb = Slope

bXay

Page 32: CBMR - Research - Correlation & Regression

Hours studying and grades

Page 33: CBMR - Research - Correlation & Regression

Regressing grades on hours

Linear Regression

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

70.00

80.00

90.00

Final grade in course = 59.95 + 3.17 * studyR-Square = 0.88

Predicted final grade in class =

59.95 + 3.17*(number of hours you study per week)

Page 34: CBMR - Research - Correlation & Regression

Predict the final grade of…

• Someone who studies for 12 hours• Final grade = 59.95 + (3.17*12)• Final grade = 97.99

• Someone who studies for 1 hour:• Final grade = 59.95 + (3.17*1)• Final grade = 63.12

Predicted final grade in class = 59.95 + 3.17*(hours of study)

Page 35: CBMR - Research - Correlation & Regression

Exercise

A sample of 6 persons was selected the value of their age ( x variable) and their weight is demonstrated in the following table. Find the regression equation and what is the predicted weight when age is 8.5 years.

Page 36: CBMR - Research - Correlation & Regression

Weight (y) Age (x) Serial no.128

12101113

768569

123456

Page 37: CBMR - Research - Correlation & Regression

Answer

Y2 X2 xy Weight (y) Age (x) Serial no.

14464

144100121169

493664253681

8448965066117

128

12101113

768569

123456

742 291 461 66 41 Total

Page 38: CBMR - Research - Correlation & Regression

6.83641x 11

666

y

92.0

6)41(

291

666414612

b

Regression equation

6.83)0.9(x11y (x)

Page 39: CBMR - Research - Correlation & Regression

0.92x4.675y (x)

12.50Kg8.5*0.924.675y (8.5)

Kg58.117.5*0.924.675y (7.5)

Page 40: CBMR - Research - Correlation & Regression

11.411.611.8

1212.212.412.6

7 7.5 8 8.5 9

Age (in years)

Wei

ght (

in K

g)

we create a regression line by plotting two estimated values for y against their X component,

then extending the line right and left.

Page 41: CBMR - Research - Correlation & Regression

Exercise 2

The following are the age (in years) and systolic blood pressure of 20 apparently healthy adults.

B.P (y)

Age (x)

B.P (y)

Age (x)

128136146124143130124121126123

46536020634326193123

120128141126134128136132140144

20436326533158465870

Page 42: CBMR - Research - Correlation & Regression

Find the correlation between age and blood pressure using simple and Spearman's correlation coefficients, and comment.Find the regression equation?What is the predicted blood pressure for a man aging 25 years?

Page 43: CBMR - Research - Correlation & Regression

x2 xy y x Serial400 2400 120 20 1

1849 5504 128 43 23969 8883 141 63 3676 3276 126 26 4

2809 7102 134 53 5961 3968 128 31 6

3364 7888 136 58 72116 6072 132 46 83364 8120 140 58 94900 10080 144 70 10

Page 44: CBMR - Research - Correlation & Regression

x2 xy y x Serial2116 5888 128 46 112809 7208 136 53 123600 8760 146 60 13400 2480 124 20 143969 9009 143 63 151849 5590 130 43 16676 3224 124 26 17361 2299 121 19 18961 3906 126 31 19529 2829 123 23 20

41678 114486 2630 852 Total

Page 45: CBMR - Research - Correlation & Regression

nx)(

x

nyx

xyb 2

21 4547.0

2085241678

2026308521144862

=

=112.13 + 0.4547 x

for age 25 B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg

y

Page 46: CBMR - Research - Correlation & Regression

Multiple Regression

Multiple regression analysis is a straightforward extension of simple regression analysis which allows more than one independent variable.