Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It...

55
Correlation & Regression

Transcript of Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It...

Page 1: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Correlation &

Regression

Page 2: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Learning Objectives

• Define correlation

• Define simple correlation coefficient

• compute the simple correlation

coefficient (r)

• Define Spearman Rank Correlation

Coefficient (rs)

• Define Regression

Page 3: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Correlation

Finding the relationship between two

quantitative variables without being

able to infer causal relationships

Correlation is a statistical technique

used to determine the degree to which

two variables are related

Page 4: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

• Rectangular coordinate

• Two quantitative variables

• One variable is called independent (X) and

the second is called dependent (Y)

• Points are not joined

• No frequency table

Scatter diagram

Y

* *

*

X

Page 5: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Wt.

(kg)

67 69 85 83 74 81 97 92 114 85

SBP

(mmHg)

120 125 140 160 130 180 150 140 200 130

Example

Page 6: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Scatter diagram of weight and systolic blood

pressure

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120

wt (kg)

SBP(mmHg)Wt.

(kg)

67 69 85 83 74 81 97 92 114 85

SBP

(mmHg)

120 125 140 160 130 180 150 140 200 130

Page 7: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Scatter diagram of weight and systolic blood pressure

Page 8: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Scatter plots

The pattern of data is indicative of the type of

relationship between your two variables:

➢ positive relationship

➢ negative relationship

➢ no relationship

Page 9: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Positive relationship

Page 10: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

0

2

4

6

8

10

12

14

16

18

0 10 20 30 40 50 60 70 80 90

Age in Weeks

Heig

ht

in C

M

Page 11: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Negative relationship

Reliability

Age of Car

Page 12: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

No relation

Page 13: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Correlation Coefficient

Statistic showing the degree of relation

between two variables

Page 14: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Simple Correlation coefficient (r)

➢ It is also called Pearson's correlation

or product moment correlation

coefficient.

➢ It measures the nature and strength

between two variables of

the quantitative type.

Page 15: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

The sign of r denotes the nature of

association

while the value of r denotes the

strength of association.

Page 16: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

➢ If the sign is +ve this means the relationis direct (an increase in one variable isassociated with an increase in theother variable and a decrease in onevariable is associated with adecrease in the other variable).

➢ While if the sign is -ve this means aninverse or indirect relationship (whichmeans an increase in one variable isassociated with a decrease in the other).

Page 17: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

➢ The value of r ranges between ( -1) and ( +1)

➢ The value of r denotes the strength of the

association as illustrated

by the following diagram.

-1 10-0.25-0.75 0.750.25

strong strongintermediate intermediateweak weak

no relation

perfect

correlation

perfect

correlation

Directindirect

Page 18: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

If r = Zero this means no association or correlation between the two variables.

If 0 < r < 0.25 = weak correlation.

If 0.25 ≤ r < 0.75 = intermediate correlation.

If 0.75 ≤ r < 1 = strong correlation.

If r = l = perfect correlation.

Page 19: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

=

n

y)(y.

n

x)(x

n

yxxy

r2

2

2

2

How to compute the simple correlation

coefficient (r)

Page 20: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Example:

A sample of 6 children was selected, data about their

age in years and weight in kilograms was recorded as

shown in the following table . It is required to find the

correlation between age and weight.

Weight

(Kg)

Age

(years)

serial

No

1271

862

1283

1054

1165

1396

Page 21: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

These 2 variables are of the quantitative type, one

variable (Age) is called the independent and

denoted as (X) variable and the other (weight)

is called the dependent and denoted as (Y)

variables to find the relation between age and

weight compute the simple correlation coefficient

using the following formula:

=

n

y)(y.

n

x)(x

n

yxxy

r2

2

2

2

Page 22: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Y2X2xy

Weight

(Kg)

(y)

Age

(years)

(x)

Serial

n.

14449841271

643648862

14464961283

10025501054

12136661165

169811171396

∑y2=

742

∑x2=

291

∑xy=

461

∑y=

66

∑x=

41

Total

Page 23: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

r = 0.759

strong direct correlation

=

6

(66)742.

6

(41)291

6

6641461

r22

Page 24: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

EXAMPLE: Relationship between Anxiety and

Test Scores

Anxiety

(X)

Test

score (Y)X2 Y2 XY

10 2 100 4 20

8 3 64 9 24

2 9 4 81 18

1 7 1 49 7

5 6 25 36 30

6 5 36 25 30

∑X = 32 ∑Y = 32 ∑X2 = 230 ∑Y2 = 204 ∑XY=129

Page 25: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Calculating Correlation Coefficient

( )( )94.

)200)(356(

1024774

32)204(632)230(6

)32)(32()129)(6(

22−=

−=

−−

−=r

r = - 0.94

Indirect strong correlation

Page 26: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Spearman Rank Correlation Coefficient (rs)

It is a non-parametric measure of correlation.

This procedure makes use of the two sets of ranks that may be assigned to the sample values of x and Y.

Spearman Rank correlation coefficient could be computed in the following cases:

Both variables are quantitative.

Both variables are qualitative ordinal.

One variable is quantitative and the other is qualitative ordinal.

Page 27: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Procedure:

1. Rank the values of X from 1 to n where n

is the numbers of pairs of values of X and

Y in the sample.

2. Rank the values of Y from 1 to n.

3. Compute the value of di for each pair of

observation by subtracting the rank of Yi

from the rank of Xi

4. Square each di and compute ∑di2 which

is the sum of the squared values.

Page 28: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

5. Apply the following formula

1)n(n

(di)61r

2

2

s−

−=

The value of rs denotes the magnitude

and nature of association giving the same

interpretation as simple r.

Page 29: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Example

In a study of the relationship between level education and income the following data was obtained. Find the relationship between them and comment.

Income

(Y)

level education

(X)

sample

numbers

25Preparatory.A

10Primary.B

8University.C

10secondaryD

15secondaryE

50illiterateF

60University.G

Page 30: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Answer:

di2diRank

Y

Rank

X(Y)(X)

423525PreparatoryA

0.250.55.5610Primary.B

30.25-5.571.58University.C

4-25.53.510secondaryD

0.25-0.543.515secondaryE

2552750illiterateF

0.250.511.560university.G

∑ di2=64

Page 31: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Comment:

There is an indirect weak correlation

between level of education and income.

1.0)48(7

6461 −=

−=sr

Page 32: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

exercise

Page 33: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Regression Analyses

Regression: technique concerned with predicting some variables by knowing others

The process of predicting variable Y using variable X

Page 34: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Regression

➢ Uses a variable (x) to predict some outcome

variable (y)

➢ Tells you how values in y change as a function

of changes in values of x

Page 35: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Correlation and Regression

➢ Correlation describes the strength of a linear

relationship between two variables

➢ Linear means “straight line”

➢ Regression tells us how to draw the straight line

described by the correlation

Page 36: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Regression

➢ Calculates the “best-fit” line for a certain set of data

The regression line makes the sum of the squares of

the residuals smaller than for any other lineRegression minimizes residuals

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Page 37: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

By using the least squares method (a procedure

that minimizes the vertical deviations of plotted

points surrounding a straight line) we are

able to construct a best fitting straight line to the

scatter diagram points and then formulate a

regression equation in the form of:

=

n

x)(x

n

yxxy

b2

2

1)xb(xyy −+= b

bXay +=

Page 38: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Regression Equation

➢ Regression equation

describes the

regression line

mathematically

◼ Intercept

◼ Slope

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Page 39: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Linear Equations

Y

Y = bX + a

a = Y-intercept

X

Change

in Y

Change in X

b = Slope

bXay +=

Page 40: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Hours studying and grades

Page 41: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Regressing grades on hours

Linear Regressi on

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

70.00

80.00

90.00

Fin

al g

rade

in c

ours

e

Final grade in cour se = 59.95 + 3.17 * study

R-Square = 0.88

Predicted final grade in class =

59.95 + 3.17*(number of hours you study per week)

Page 42: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Predict the final grade of…

◼ Someone who studies for 12 hours

◼ Final grade = 59.95 + (3.17*12)

◼ Final grade = 97.99

◼ Someone who studies for 1 hour:

◼ Final grade = 59.95 + (3.17*1)

◼ Final grade = 63.12

Predicted final grade in class = 59.95 + 3.17*(hours of study)

Page 43: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Exercise

A sample of 6 persons was selected the

value of their age ( x variable) and their

weight is demonstrated in the following

table. Find the regression equation and

what is the predicted weight when age is

8.5 years.

Page 44: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Weight (y)Age (x)Serial no.

12

8

12

10

11

13

7

6

8

5

6

9

1

2

3

4

5

6

Page 45: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Answer

Y2X2xyWeight (y)Age (x)Serial no.

144

64

144

100

121

169

49

36

64

25

36

81

84

48

96

50

66

117

12

8

12

10

11

13

7

6

8

5

6

9

1

2

3

4

5

6

7422914616641Total

Page 46: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

6.836

41x == 11

6

66==y

92.0

6

)41(291

6

6641461

2=

=b

Regression equation

6.83)0.9(x11y (x) −+=

Page 47: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

0.92x4.675y (x) +=

12.50Kg8.5*0.924.675y (8.5) =+=

Kg58.117.5*0.924.675y (7.5) =+=

Page 48: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

11.4

11.6

11.8

12

12.2

12.4

12.6

7 7.5 8 8.5 9

Age (in years)

Weig

ht

(in

Kg

)

we create a regression line by plotting two

estimated values for y against their X component,

then extending the line right and left.

Page 49: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Exercise 2

The following are the age (in years) and systolic blood pressure of 20 apparently healthy adults.

B.P

(y)

Age

(x)

B.P

(y)

Age

(x)

128

136

146

124

143

130

124

121

126

123

46

53

60

20

63

43

26

19

31

23

120

128

141

126

134

128

136

132

140

144

20

43

63

26

53

31

58

46

58

70

Page 50: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Find the correlation between age and blood pressure using simple and Spearman's correlation coefficients, and comment.

Find the regression equation?

What is the predicted blood pressure for a man aging 25 years?

Page 51: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

x2xyyxSerial

4002400120201

18495504128432

39698883141633

6763276126264

28097102134535

9613968128316

33647888136587

21166072132468

33648120140589

4900100801447010

Page 52: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

x2xyyxSerial

211658881284611

280972081365312

360087601466013

40024801242014

396990091436315

184955901304316

67632241242617

36122991211918

96139061263119

52928291232320

416781144862630852Total

Page 53: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

=

n

x)(x

n

yxxy

b2

2

1 4547.0

20

85241678

20

2630852114486

2=

=

=112.13 + 0.4547 x

for age 25

B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg

y

Page 54: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two

Multiple Regression

Multiple regression analysis is a

straightforward extension of simple

regression analysis which allows more

than one independent variable.

Page 55: Correlation & Regression - National University 3-4...Spearman Rank Correlation Coefficient (r s) It is a non-parametric measure of correlation. This procedure makes use of the two