بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab...

55
م ي ح ر ل ا ن م ح ر ل ه ا ل ل م ا س ب م ي ح ر ل ا ن م ح ر ل ه ا ل ل م ا س ب

Transcript of بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab...

Page 1: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

الرحمن الله الرحمن بسم الله بسمالرحيمالرحيم

Page 2: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Correlation & Correlation & RegressionRegression

Dr. Moataza Mahmoud Abdel WahabLecturer of Biostatistics

High Institute of Public HealthUniversity of Alexandria

Page 3: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

CorrelationCorrelation

Finding the relationship between two quantitative variables without being able to infer causal relationships

Correlation is a statistical technique used to determine the degree to which two variables are related

Page 4: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

• Rectangular coordinate

• Two quantitative variables

• One variable is called independent (X) and

the second is called dependent (Y)

• Points are not joined

• No frequency table

Scatter diagram

Page 5: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Example

Page 6: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Scatter diagram of weight and systolic blood Scatter diagram of weight and systolic blood pressurepressure

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120wt (kg)

SBP(mmHg)

Page 7: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Scatter diagram of weight and systolic blood pressure

Page 8: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Scatter plots

The pattern of data is indicative of the type of relationship between your two variables:

positive relationship negative relationship no relationship

Page 9: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Positive relationshipPositive relationship

Page 10: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

0

2

4

6

8

10

12

14

16

18

0 10 20 30 40 50 60 70 80 90

Age in Weeks

Hei

gh

t in

CM

Page 11: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Negative relationshipNegative relationship

Reliability

Age of Car

Page 12: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

No relationNo relation

Page 13: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Correlation CoefficientCorrelation Coefficient

Statistic showing the degree of relation between two variables

Page 14: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Simple Correlation coefficient Simple Correlation coefficient (r)(r)

It is also called Pearson's correlation It is also called Pearson's correlation or product moment correlation or product moment correlationcoefficient. coefficient.

It measures the It measures the naturenature and and strengthstrength between two variables ofbetween two variables ofthe the quantitativequantitative type. type.

Page 15: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

The The signsign of of rr denotes the nature of denotes the nature of association association

while the while the valuevalue of of rr denotes the denotes the strength of association.strength of association.

Page 16: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

If the sign is If the sign is +ve+ve this means the relation this means the relation is is direct direct (an increase in one variable is (an increase in one variable is associated with an increase in theassociated with an increase in theother variable and a decrease in one other variable and a decrease in one variable is associated with avariable is associated with adecrease in the other variable).decrease in the other variable).

While if the sign is While if the sign is -ve-ve this means an this means an inverse or indirectinverse or indirect relationship (which relationship (which means an increase in one variable is means an increase in one variable is associated with a decrease in the other).associated with a decrease in the other).

Page 17: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

The value of r ranges between ( -1) and ( +1)The value of r ranges between ( -1) and ( +1) The value of r denotes the strength of the The value of r denotes the strength of the

association as illustratedassociation as illustratedby the following diagram.by the following diagram.

-1 10-0.25-0.75 0.750.25

strong strongintermediate intermediateweak weak

no relation

perfect correlation

perfect correlation

Directindirect

Page 18: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

If If rr = Zero = Zero this means no association or this means no association or correlation between the two variables.correlation between the two variables.

If If 0 < 0 < rr < 0.25 < 0.25 = weak correlation. = weak correlation.

If If 0.25 ≤ 0.25 ≤ rr < 0.75 < 0.75 = intermediate correlation. = intermediate correlation.

If If 0.75 ≤ 0.75 ≤ rr < 1 < 1 = strong correlation. = strong correlation.

If If r r = l= l = perfect correlation. = perfect correlation.

Page 19: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

n

y)(y.

n

x)(x

n

yxxy

r2

22

2

How to compute the simple correlation coefficient (r)

Page 20: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

ExampleExample::

A sample of 6 children was selected, data about their A sample of 6 children was selected, data about their age in years and weight in kilograms was recorded as age in years and weight in kilograms was recorded as shown in the following table . It is required to find the shown in the following table . It is required to find the correlation between age and weight.correlation between age and weight.

serial No

Age (years)

Weight (Kg)

1712

268

3812

4510

5611

6913

Page 21: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

These 2 variables are of the quantitative type, one These 2 variables are of the quantitative type, one variable (Age) is called the independent and variable (Age) is called the independent and denoted as (X) variable and the other (weight)denoted as (X) variable and the other (weight)is called the dependent and denoted as (Y) is called the dependent and denoted as (Y) variables to find the relation between age and variables to find the relation between age and weight compute the simple correlation coefficient weight compute the simple correlation coefficient using the following formula:using the following formula:

n

y)(y.

n

x)(x

n

yxxy

r2

22

2

Page 22: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Serial n.

Age (years)

(x)

Weight (Kg)

(y)xyX2Y2

17128449144

268483664

38129664144

45105025100

56116636121

691311781169

Total∑x=41

∑y=66

∑xy= 461

∑x2=291

∑y2=742

Page 23: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

r = 0.759r = 0.759

strong direct correlation strong direct correlation

6

(66)742.

6

(41)291

6

6641461

r22

Page 24: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

EXAMPLE: Relationship between Anxiety and EXAMPLE: Relationship between Anxiety and Test ScoresTest Scores

AnxietyAnxiety

))XX((

Test Test score (Y)score (Y)

XX22YY22XYXY

101022100100442020

88336464992424

22994481811818

117711494977

5566252536363030

6655363625253030

∑∑X = 32X = 32∑∑Y = 32Y = 32∑∑XX22 = 230 = 230∑∑YY22 = 204 = 204∑∑XY=129XY=129

Page 25: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Calculating Correlation CoefficientCalculating Correlation Coefficient

94.)200)(356(

1024774

32)204(632)230(6

)32)(32()129)(6(22

r

r = - 0.94

Indirect strong correlation

Page 26: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Spearman Rank Correlation Coefficient Spearman Rank Correlation Coefficient (r(rss))

It is a non-parametric measure of correlation. It is a non-parametric measure of correlation. This procedure makes use of the two sets of This procedure makes use of the two sets of ranks that may be assigned to the sample ranks that may be assigned to the sample values of x and Y.values of x and Y.Spearman Rank correlation coefficient could be Spearman Rank correlation coefficient could be computed in the following cases:computed in the following cases:Both variables are quantitative.Both variables are quantitative.Both variables are qualitative ordinal.Both variables are qualitative ordinal.One variable is quantitative and the other is One variable is quantitative and the other is qualitative ordinal. qualitative ordinal.

Page 27: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

ProcedureProcedure::

1.1. Rank the values of X from 1 to n where n Rank the values of X from 1 to n where n is the numbers of pairs of values of X and is the numbers of pairs of values of X and Y in the sample.Y in the sample.

2.2. Rank the values of Y from 1 to n.Rank the values of Y from 1 to n.

3.3. Compute the value of di for each pair of Compute the value of di for each pair of observation by subtracting the rank of Yi observation by subtracting the rank of Yi from the rank of Xifrom the rank of Xi

4.4. Square each di and compute ∑di2 which Square each di and compute ∑di2 which is the sum of the squared values.is the sum of the squared values.

Page 28: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

5.5. Apply the following formula Apply the following formula

1)n(n

(di)61r

2

2

s

The value of rs denotes the magnitude and nature of association giving the same interpretation as simple r.

Page 29: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

ExampleExample

In a study of the relationship between level In a study of the relationship between level education and income the following data was education and income the following data was obtained. Find the relationship between them and obtained. Find the relationship between them and comment.comment.

samplenumbers

level education(X)

Income(Y)

APreparatory.Preparatory.25

BPrimary.Primary.10

CUniversity.University.8

Dsecondarysecondary10

Esecondarysecondary15

Fillitilliterateerate50

GUniversity.University.60

Page 30: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Answer:Answer:

(X)(Y)Rank

XRank

Ydidi2

APreparatory255324

BPrimary.1065.50.50.25

CUniversity.81.57-5.530.25

Dsecondary103.55.5-24

Esecondary153.54-0.50.25

Filliterate5072525

Guniversity.601.510.50.25

∑ di2=64

Page 31: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Comment:Comment:

There is an indirect weak correlation There is an indirect weak correlation between level of education and income.between level of education and income.

1.0)48(7

6461

sr

Page 32: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

exerciseexercise

Page 33: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Regression AnalysesRegression Analyses

Regression: technique concerned with predicting some variables by knowing others

The process of predicting variable Y using variable X

Page 34: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

RegressionRegression

Uses a variable (x) to predict some outcome Uses a variable (x) to predict some outcome variable (y)variable (y)

Tells you how values in y change as a function Tells you how values in y change as a function of changes in values of xof changes in values of x

Page 35: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Correlation and RegressionCorrelation and Regression

Correlation describes the strength of a Correlation describes the strength of a linear relationship between two variables

Linear means “straight line”

Regression tells us how to draw the straight line described by the correlation

Page 36: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Regression Calculates the “best-fit” line for a certain set of dataCalculates the “best-fit” line for a certain set of data

The regression line makes the sum of the squares of The regression line makes the sum of the squares of the residuals smaller than for any other linethe residuals smaller than for any other line

Regression minimizes residuals

80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

Page 37: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

By using the least squares method (a procedure By using the least squares method (a procedure that minimizes the vertical deviations of plotted that minimizes the vertical deviations of plotted points surrounding a straight line) we arepoints surrounding a straight line) we areable to construct a best fitting straight line to the able to construct a best fitting straight line to the scatter diagram points and then formulate a scatter diagram points and then formulate a regression equation in the form of:regression equation in the form of:

n

x)(x

n

yxxy

b2

2

1)xb(xyy b

bXay

Page 38: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Regression Equation

Regression equation describes the regression line mathematically Intercept Slope 80

100

120

140

160

180

200

220

60 70 80 90 100 110 120Wt (kg)

SBP(mmHg)

Page 39: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Linear EquationsLinear EquationsLinear EquationsLinear Equations

Y

Y = bX + a

a = Y-intercept

X

Changein Y

Change in X

b = Slope

bXay

Page 40: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Hours studying and Hours studying and gradesgrades

Page 41: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Regressing grades on hours grades on hours

Linear Regression

2.00 4.00 6.00 8.00 10.00

Number of hours spent studying

70.00

80.00

90.00

Final grade in course = 59.95 + 3.17 * studyR-Square = 0.88

Predicted final grade in class =

59.95 + 3.17*(number of hours you study per week)

Page 42: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Predict the final grade ofPredict the final grade of……

Someone who studies for 12 hours Final grade = 59.95 + (3.17*12) Final grade = 97.99

Someone who studies for 1 hour: Final grade = 59.95 + (3.17*1) Final grade = 63.12

Predicted final grade in class = 59.95 + 3.17*(hours of study)

Page 43: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

ExerciseExercise

A sample of 6 persons was selected the A sample of 6 persons was selected the value of their age ( x variable) and their value of their age ( x variable) and their weight is demonstrated in the following weight is demonstrated in the following table. Find the regression equation and table. Find the regression equation and what is the predicted weight when age is what is the predicted weight when age is 8.5 years8.5 years..

Page 44: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Serial no.Age (x)Weight (y)

123456

768569

128

12101113

Page 45: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

AnswerAnswer

Serial no.Age (x)Weight (y)xyX2Y2

123456

768569

128

12101113

8448965066

117

493664253681

14464

144100121169

Total4166461291742

Page 46: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

6.836

41x 11

6

66y

92.0

6

)41(291

6

6641461

2

b

Regression equation

6.83)0.9(x11y (x)

Page 47: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

0.92x4.675y (x)

12.50Kg8.5*0.924.675y (8.5)

Kg58.117.5*0.924.675y (7.5)

Page 48: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

11.411.611.8

1212.212.412.6

7 7.5 8 8.5 9

Age (in years)

Wei

ght (

in K

g)

we create a regression line by plotting two estimated values for y against their X component,

then extending the line right and left.

Page 49: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Exercise 2Exercise 2

The following are the The following are the age (in years) and age (in years) and systolic blood systolic blood pressure of 20 pressure of 20 apparently healthy apparently healthy adults.adults.

Age (x)

B.P (y)

Age (x)

B.P (y)

20436326533158465870

120128141126134128136132140144

46536020634326193123

128136146124143130124121126123

Page 50: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Find the correlation between age Find the correlation between age and blood pressure using simple and blood pressure using simple and Spearman's correlation and Spearman's correlation coefficients, and comment.coefficients, and comment.Find the regression equation?Find the regression equation?What is the predicted blood What is the predicted blood pressure for a man aging 25 years?pressure for a man aging 25 years?

Page 51: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Serialxyxyx2

1201202400400

24312855041849

36314188833969

4261263276676

55313471022809

6311283968961

75813678883364

84613260722116

95814081203364

1070144100804900

Page 52: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Serialxyxyx2

114612858882116

125313672082809

136014687603600

14201242480400

156314390093969

164313055901849

17261243224676

18191212299361

19311263906961

20231232829529

Total852263011448

641678

Page 53: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

n

x)(x

n

yxxy

b2

2

1 4547.0

20

85241678

20

2630852114486

2

=

=112.13 + 0.4547 x

for age 25

B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg

y

Page 54: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.

Multiple Regression

Multiple regression analysis is a straightforward extension of simple regression analysis which allows more than one independent variable.

Page 55: بسم الله الرحمن الرحيم. Correlation & Regression Dr. Moataza Mahmoud Abdel Wahab Lecturer of Biostatistics High Institute of Public Health University.