Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear...

35
Measures of relationship Dr. Omar Al Jadaan

Transcript of Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear...

Page 1: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Measures of relationship

Dr. Omar Al Jadaan

Page 2: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Agenda

• Correlation– Need – meaning,

• simple linear regression– analysis – prediction

Page 3: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Correlation

• Correlation is a statistical measurement of the relationship between two variables.

• Possible correlations range from +1 to –1. • A zero correlation indicates that there is no relationship

between the variables. • A correlation of –1 indicates a perfect negative correlation,

meaning that as one variable goes up, the other goes down.

• A correlation of +1 indicates a perfect positive correlation, meaning that both variables move in the same direction together.

Page 4: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Correlation

• Why we need correlation?To discover the interaction patterns between

the dependent variable and infer the mathematical model of the relation.

Page 5: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Linear regression

• simple linear regression is the least squares estimator of a linear regression model with a single predictor variable. In other words, simple linear regression fits a straight line through the set of n points in such a way that makes the sum of squared residuals of the model (that is, vertical distances between the points of the data set and the fitted line) as small as possible.

• The adjective simple refers to the fact that this regression is one of the simplest in statistics. The fitted line has the slope equal to the correlation between y and x corrected by the ratio of standard deviations of these variables. The intercept of the fitted line is such that it passes through the center of mass (x, y) of the data points.

Page 6: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• The purpose of least-squares method is to find the equation of the straight line that fits the data in the sense of least squares.

• Assumption of regression:– Normality of errors (with zero mean of each value)– variation around the line of regression is constant for all

the values of x (this means that the errors vary by the same amount for small x as for large x.

– The errors are independent for all values of x.– The relationship between x, y is postulated to be linear.

Page 7: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• The linear model y=β0+β1x+ε

• to calculate the estimate of β0, β1 we have to calculate

xyss

ss

yxxxss

xxss

xx

xy

xy

xx

101

2

,  

))((

)(

Page 8: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Example (study hours and score) studied hours x scored on test observed y x-mean(x) squar y-mean(y) (x-mean(x))*(y-mean(y)) predicted y y-predicted y squar

10 78 -0.8 0.64 -0.9 0.72 77.59091 0.4090913 0.167356

15 83 4.2 17.64 4.1 17.22 85.77273 -2.772727 7.688014

8 75 -2.8 7.84 -3.9 10.92 74.31818 0.6818186 0.464877

7 77 -3.8 14.44 -1.9 7.22 72.68182 4.3181822 18.6467

13 80 2.2 4.84 1.1 2.42 82.5 -2.5 6.249998

15 85 4.2 17.64 6.1 25.62 85.77273 -0.772727 0.597107

20 95 9.2 84.64 16.1 148.12 93.95455 1.045455 1.092976

10 83 -0.8 0.64 4.1 -3.28 77.59091 5.4090913 29.25827

5 65 -5.8 33.64 -13.9 80.62 69.40909 -4.40909 19.44008

5 68 -5.8 33.64 -10.9 63.22 69.40909 -1.40909 1.985536

mean 10.8 78.9 4.312E-06 85.59091 SSE

SSxx 215.6this might be

zero

this is the minimum value you

can get

Ssxy 352.8

b1 1.636363636

b0 61.22727273

Page 9: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Exercise

• The following table shows the prominent product sales in millions for the years 1998-2003. Assuming the trend continues in 2004, predict the sales in 2004

Year Year coded Sales in millions

1998 1 71.3

1999 2 59.5

2000 3 51.9

2001 4 41.1

2002 5 24.9

2003 6 17.5

Page 10: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Solution

• Sales = 82.7-11.0 year coded• Sales= 82.7-11.0 (7) = 5.7 million• From the line equation we can conclude that

each year we pass the sales decreases 11 million.

Page 11: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Inference about the slope of the regression line

• Purpose of the test is to determine whether the given value is reasonable for the slope of the population regression line (H0: β1=c).

• The test H0: β1=0 is a test to determine whether a straight line should be fit to data. If he null hypothesis is not rejected then the straight line does not model the relationship between x and y.

Page 12: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• Assumptions: • The regression model is y=β0+β1x+ε and β1 is

the slope of the model. • To test the null hypothesis that β1 equals

some value, say c , we divide the difference (β1 - c ) by the standard error of β1

Page 13: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• The following test H0: β1=0 , Ha: β1 0

• Student test t with n-2 degree of freedom, where SE(β1) is the standard error of β1

)( 1

1

SE

ct

Page 14: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Example

• The following table shows the systolic blood pressure readings with weights for 10 newly diagnosed patients with high blood pressure.

Patient Systolic (y) weight(x) (pound)

1 145 2102 155 2453 160 2604 155 2305 130 1756 140 1857 135 2308 165 2499 150 200

10 130 190

We would like to test that the systolic blood pressure increases one point for each pound that the patient increases

Page 15: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Solution

  Coefficients Standard Error t Stat P-value

Intercept 72.56893125 19.4598404 3.729163743 0.005794692

X Variable 1 0.340069313 0.088777605 3.830575445 0.005013816

The regression equation is systolic = 72.56893 +0340069 weight

The statistic is computed as follow. C=1, β1= 0.34007 and the standard error of β1= 0.08878

42.7089.0

0.134.0

t

Page 16: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• We calculate

• At α=0.05, the t values with 8 degree of freedom are 2.306.

• The data would refute the null hypothesis, Each additional pound would increase the systolic blood pressure by less than 1.

• The T value (3.83) shown in the table along with the two-tailed p-value (0.005) is for the null hypothesis H0: β1=0 , Ha: β1 0

42.7089.0

0.134.0

t

Page 17: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

The coefficient of Correlation

• The coefficient correlation is used to measure the strength of the linear relationship between two random variables.

• A measure very much related to the slope of regression line is the Pearson correlation coefficient.

xyxx

xy

SSSS

SSr

Page 18: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• The value of r will be in the range of -1 to +1• If the point fall on the straight line with a

positive slope then r=+1, • If the point fall on the straight line with a

negative slope then r=-1, • If the point from the shotgun pattern, r=0.

Page 19: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Example

• The correlations for the systolic blood pressure with weight is as follow

• As you can see the linear relation is negative

  Systolic Weight

Systolic 1

Weight 0.804464 1

Page 20: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

The coefficient of determination

• The coefficient of determination is used to measure the strength of the linear relationship between dependent variables.

• Assumptions : the analysis of variance (ANOVA) for simple linear regression may be represented as follows

Source d.f. Sum of squares

Mean of squares

F-value

Explained variation

1 SSR MSR=SSR/1 F=MSR/MSE

Unexplained variation

N-2 SSE MSE=SSE/(n-2)

Total N-1 SS(total)

Page 21: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• The symbol r2 is used to represent the ration SSR/SS(total) and is called the coefficient of determination. Which measures the proportion of variation in y that explained by x.

• coefficient of determination can be called explained variation, regression variation, unexplained variation, residual variation.

Page 22: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Example

• The following table explains the contraceptive prevalence (x) and the fertility rate (y)

Country Contraceptive (x) Fertility (y)

Thailand 69 2.3

Costa Rica 71 3.5

Turkey 62 3.4

Mexico 55 4

Zimbabwe 46 5.4

Jordan 35 5.5

Gana 14 6

Pakistan 13 5

Sudan 10 4.8

Nigeria 7 5.7

Page 23: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.810919794

R Square 0.657590913

Adjusted R Square 0.614789777

Standard Error 0.748909931

Observations 10

ANOVA

  df SS MS F Significance F

Regression 1 8.617071323 8.617071323 15.36386592 0.004420408

Residual 8 4.486928677 0.560866085

Total 9 13.104

  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0%Upper 95.0%

Intercept 6.015740866 0.440476374 13.65735195 7.95434E-07 5.000000527 7.031481204 5.000000527 7.031481

contraceptive (x) -0.0381084 0.009722332 -3.919676762 0.004420408 -0.060528138 -0.015688661 -0.060528138 -0.01569

Page 24: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• The coefficient of determination is shown as R Square= 65.8%

• It may computed as

• The interpretation is that about 65.8% of the variation in fertility rates is explained by the variation in contraceptive prevalence

%8.651001040.13

6167.8

)(2

totalSS

SSRr

Page 25: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Using the model for estimation and prediction

• The estimated regression equation y=β0+β1x+ε can be used to predict the value of y for some value of x. also the same equation can be used to estimate the mean values of ys .

• Example – Suppose you would like to know the estimate

systolic blood pressure of a patient weighted 250 pound. Simply substitute the of the weight in the equation systolic = 72.56893 +0340069*(250)=157.6

Page 26: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• We would expect the prediction interval to be wider than the confidence interval, that is the interval estimate of the expected value of y will be narrower that the prediction interval for the same value of x and confidence interval.

Page 27: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• A (1-α)100% prediction interval for an individual new value of y at x=x0 is

• Where y^=b0+b1x, the t value is based on (n-2) degree of freedom and is referred as the estimated standard error of the regression model , n is the sample size, x0 is the fixed value of x ,

2n

SSEs

2)( xxSSxx

xxSS

xx

nsty

20

2

^ )(11

Page 28: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• A (1-α)100% confidence interval for the mean value of y at x=x0 is

xxSS

xx

nsty

20

2

^ )(1

Page 29: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Example

• The following table shows the results of an experiment conducted on 15 diabetic patients, the independent variable x was hemoglobin A1C value, taken after 3 months of taking the fasting blood glucose value each morning of the three months period and averaging the values. The later values was the dependent value y.

• We wish to set a 95% prediction interval for average glucose reading of a diabetic who has hemoglobin A1C value of 7.0 as well as 95% confidence interval for all diabetics with hemoglobin A1C value of 7.0.

Page 30: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Patient x, Hemoglobiny, average fasting blood sugar over 3

month period

1 6.1 120

2 6.8 146

3 6.5 125

4 7.1 135

5 7.4 140

6 5.8 115

7 8 145

8 8.3 147

9 8 150

10 5.5 110

11 10 160

12 7.7 145

13 9 155

14 11 170

15 5.5 118

Page 31: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.94747

R Square 0.897699

Adjusted R Square 0.88983

Standard Error 5.8809

Observations 15

ANOVA

  df SS MS F Significance F

Regression 1 3945.328514 3945.328514 114.0763366 8.33726E-08

Residual 13 449.6048192 34.58498609

Total 14 4394.933333

 Coefficient

s Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 60.55238 7.475701499 8.099892991 1.95158E-06 44.402111 76.70265334 44.402111 76.70265334

X Variable 1 10.40563 0.974250214 10.68065244 8.33726E-08 8.300888305 12.51036755 8.300888305 12.51036755

Page 32: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• Where y^=60.6+10.4(7.0)=133.39

• The 95% confidence interval is

• The prediction interval is

604.1425.36

)7.5133330.7(

15

1881.539.133 

2

FitSE

)85.136,93.129(425.36

)7.5133330.7(

15

1881.5*160.239.133%95

2

CI

)56.146,23.120(425.36

)7.5133330.7(

15

11881.5*160.239.133%95

2

PI

Page 33: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

• We are 95% confident that a diabetic with a hemoglobin A1C value of 7 had a fasting blood sugar over the past 3 months that average between 120.23 and 146.56.

• We are 95% confident that diabetics with hemoglobin A1A value of 7.0 had an average fasting blood sugar over the past 3 months between 129.93 and 136.85

Page 34: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Exerscises

1. Give the deterministic equation for the line passing through the following pair of points:

a) (1,1.5) and (3,8.5)b) (0,1) and (2,-3)c) (0,3.1) and (1,4.8)

Page 35: Measures of relationship Dr. Omar Al Jadaan. Agenda Correlation – Need – meaning, simple linear regression – analysis – prediction.

Reference

• This lecture prepared from• Advanced statistics demystified “MCGrawHill”

Dr. Larry J. Stephens