Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
-
date post
19-Dec-2015 -
Category
Documents
-
view
218 -
download
2
Transcript of Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)
![Page 1: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/1.jpg)
Reading – Linear Regression
• Le (Chapter 8 through 8.1.6)• C &S (Chapter 5:F,G,H)
![Page 2: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/2.jpg)
Issues with hypothesis testing
• Significance does not imply causality– Need a proper prospective experiment
• Significance does not imply practical importance– Trivial but significant differences
• Run lots of tests, will find significant difference by chance– With α = 0.05, expect 1 in 20 results to be sig. by chance
![Page 3: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/3.jpg)
Issues with hypothesis testing
• Large p-values because sample size is small– Effect could exist but we may not have a large enough
sample size
• Outliers may cause problems especially in small samples.
![Page 4: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/4.jpg)
Issues With Hypothesis Testing
What is the population of inference?
Example: A statistics class of n=15 women and n=5 men yield the following exam scores:
Women: mean = 90% SD = 10%Men: mean = 85% SD = 11%
Test the hypothesis that women did better on the exam then men.
![Page 5: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/5.jpg)
Hypothesis tests and Confidence Intervals
bapba
nnstxx
11)( *
Two sampletest statistic:
bap
ba
nns
xxt
11
CI for differencein means:
If 95% CI excludes 0 then the p-value will be <0.05.
![Page 6: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/6.jpg)
Linear Regression
• Investigate the relationship between two variables– Does blood pressure relate to age?– Does weight loss relate to blood pressure loss– Does income relate to education?– Do sales relate to years of experience?
• Dependent variable – The variable that is being predicted or explained
• Independent variable – The variable that is doing the predicting or explaining
• Think of data in pairs (xi, yi)
![Page 7: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/7.jpg)
Linear Regression - Purpose
• Is there an association between the two variables– Does weight change relate to BP change?
• Estimation of impact– How much BP change occurs per pound of weight change
• Prediction – If a person loses 10 pounds how much of a drop in blood
pressure can be expected
![Page 8: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/8.jpg)
Regression History
• Sir Francis Galton (1822-1911) studied the relationship between a father’s height and the son’s height.
• He found that although there was a relationship between father and son’s height the relationship was not perfect.
• If the father was above average in height so was the son (typically) but not as much above average. This was called regression to the mean
![Page 9: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/9.jpg)
Example of Regression Equation
We know systolic BP increases with age. How much does it increase per year and is the increase constant over time?
SBP = 90 + 0.8*AGE
Interpretation: For each year of age SBP increases by 0.8 mmHg.
At age 50: SBP = 90 + 0.8*50 = 130 mmHg
At age 60: SBP = 90 + 0.8*60 = 138 mmHg
Y or Dependent Variable
X or Dependent Variable
![Page 10: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/10.jpg)
Simple Linear Regression EquationSimple Linear Regression Equation
The The simple linear regression equationsimple linear regression equation is: is:
yy = = 00 + + 11xx
• Graph of the regression equation is a Graph of the regression equation is a straight line.straight line.
• 00 is the is the yy intercept of the regression line. intercept of the regression line.
• 11 is the slope of the regression line. is the slope of the regression line.
• yy is the mean value of is the mean value of yy for a given for a given xx value.value.
![Page 11: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/11.jpg)
Simple Linear Regression ModelSimple Linear Regression Model
The equation that describes how y is related to x and an The equation that describes how y is related to x and an error term is called the error term is called the regression modelregression model..
The The simple linear regression modelsimple linear regression model is: is:
yy = = 00 + + 11xx + +
• 00 and and 11 are called are called parameters of the modelparameters of the model..
• is a random variable called theis a random variable called the error term error term..
![Page 12: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/12.jpg)
Simple Linear Regression EquationSimple Linear Regression Equation
Positive Linear RelationshipPositive Linear Relationship
EE((yy))
xx
Slope Slope 11
is positiveis positive
Regression lineRegression line
InterceptIntercept00
![Page 13: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/13.jpg)
Simple Linear Regression EquationSimple Linear Regression Equation
Negative Linear RelationshipNegative Linear Relationship
EE((yy))
xx
Slope Slope 11
is negativeis negative
Regression lineRegression lineInterceptIntercept00
![Page 14: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/14.jpg)
Simple Linear Regression EquationSimple Linear Regression Equation
No RelationshipNo Relationship
EE((yy))
xx
Slope Slope 11
is 0is 0
Regression lineRegression lineInterceptIntercept
00
![Page 15: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/15.jpg)
Estimated Simple Linear Regression Estimated Simple Linear Regression EquationEquation
The The estimated simple linear regression estimated simple linear regression equationequation is: is:
• The graph is called the estimated The graph is called the estimated regression line.regression line.
• bb00 is the is the yy intercept of the line. intercept of the line.
• bb11 is the slope of the line. is the slope of the line.
• is the estimated value of is the estimated value of yy for a given for a given xx value.value.
0 1y b b x 0 1y b b x
yy
![Page 16: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/16.jpg)
Estimation ProcessEstimation Process
Regression ModelRegression Modelyy = = 00 + + 11xx + +
Regression EquationRegression Equationyy = = 00 + + 11xx
Unknown ParametersUnknown Parameters00, , 11
Sample Data:Sample Data:x yx y
xx11 y y11
. .. . . .. . xxnn yynn
EstimatedEstimatedRegression EquationRegression Equation
Sample StatisticsSample Statistics
bb00, , bb11
bb00 and and bb11
provide estimates ofprovide estimates of00 and and 11
0 1y b b x 0 1y b b x
![Page 17: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/17.jpg)
Least Squares MethodLeast Squares Method
Least Squares Criterion: Choose Least Squares Criterion: Choose and and to minimizeto minimize
where:where:
yyii = = observedobserved value of the dependent variable value of the dependent variable
for the for the iith observationth observation
S = Yi – 01
![Page 18: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/18.jpg)
Estimation
![Page 19: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/19.jpg)
Slope: Slope:
The Least Squares EstimatesThe Least Squares Estimates
21 )(
))((
xx
yyxxb
i
ii
21 )(
))((
xx
yyxxb
i
ii
0 1b y b x 0 1b y b x Intercept:Intercept:
![Page 20: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/20.jpg)
Example Restaurant Student Population
(Thousands)Quarterly Sales
1 2 58
2 6 105
3 8 88
4 8 118
5 12 117
6 16 137
7 20 157
8 20 169
9 22 149
10 26 202
![Page 21: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/21.jpg)
X-Y PLOT OF DATA
![Page 22: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/22.jpg)
CalculationsObs Xi Yi Xi-XBAR Yi-YBAR (Xi – XBAR)*
(Yi – YBAR)
(Xi – XBAR)2
1 2 58 -12 -72 864 144
2 6 105 -8 -25 200 64
3 8 88 -6 -42 252 36
4 8 118 -6 -12 72 36
5 12 117 -2 -13 26 4
6 16 137 2 7 14 4
7 20 157 6 27 162 36
8 20 169 6 39 234 36
9 22 149 8 19 152 64
10 26 202 12 72 864 144
Tot 140 1300 2840 568
![Page 23: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/23.jpg)
Estimates for Dataset
b1 = 2840/568 = 5 b0 = 130 – 5*14 = 60
Y = Sales; X = # thousands of students
Equation:
Y = 60 + 5* X
21 )(
))((
xx
yyxxb
i
ii
21 )(
))((
xx
yyxxb
i
ii0 1b y b x 0 1b y b x
![Page 24: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/24.jpg)
DATA sales;INFILE DATALINES;INPUT restaurant studentpop quarsales;DATALINES;1 2 582 6 1053 8 884 8 1185 12 1176 16 1377 20 1578 20 1699 22 14910 26 202;
![Page 25: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/25.jpg)
PROC PRINT DATA=sales;PROC MEANS DATA=sales;
PROC REG DATA=sales SIMPLE; MODEL quarsales = studentpop; PLOT quarsales * studentpop ;RUN;
![Page 26: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/26.jpg)
OUTPUT FROM PROC REG
The REG Procedure
Descriptive Statistics
Uncorrected StandardVariable Sum Mean SS Variance Deviation
Intercept 10.00000 1.00000 10.00000 0 0studentpop 140.00000 14.00000 2528.00000 63.11111 7.94425quarsales 1300.00000 130.00000 184730 1747.77778 41.80643
![Page 27: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/27.jpg)
Parameter Estimates
Parameter StandardVariable DF Estimate Error t Value Pr > |t|
Intercept 1 60.00000 9.22603 6.50 0.0002
studentpop 1 5.00000 0.58027 8.62 <.0001
REGRESSION EQUATION:
Y = 60.0 + 5.0*X
QUARSALES = 60 + 5*STUDENTPOP
![Page 28: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/28.jpg)
The Coefficient of DeterminationThe Coefficient of Determination
Relationship Among SST, SSR, SSERelationship Among SST, SSR, SSE
SST = SSR + SSESST = SSR + SSE
where:where: SST = total sum of squaresSST = total sum of squares SSR = sum of squares due to regressionSSR = sum of squares due to regression SSE = sum of squares due to errorSSE = sum of squares due to error
( ) ( ) ( )y y y y y yi i i i 2 2 2( ) ( ) ( )y y y y y yi i i i 2 2 2^^
![Page 29: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/29.jpg)
The The coefficient of determinationcoefficient of determination is: is:
rr22 = SSR/SST = SSR/SST
where:where:
SST = total sum of squaresSST = total sum of squares
SSR = sum of squares due to SSR = sum of squares due to regressionregression
The Coefficient of DeterminationThe Coefficient of Determination
![Page 30: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/30.jpg)
OUTPUT FROM PROC REG
Dependent Variable: quarsales
Analysis of Variance
Sum of MeanSource DF Squares Square F Value Pr > F
Model 1 SSR 14200 14200 74.25 <.0001
Error 8 SSE 1530 191.25000
Corrected Total 9 SST 15730
Root MSE 13.82932 R-Square 0.9027Dependent Mean 130.00000 Coeff Var 10.63794
Coefficient of Determination
![Page 31: Reading – Linear Regression Le (Chapter 8 through 8.1.6) C &S (Chapter 5:F,G,H)](https://reader030.fdocuments.in/reader030/viewer/2022032800/56649d375503460f94a0fe6c/html5/thumbnails/31.jpg)
42 130
46 115
42 148
71 100
80 156
74 162
70 151
80 156
85 162
72 158
64 155
81 160
41 125
61 150
75 165
First value is age
Second value is SBP
Find the regression equation
SBP = b0 + b1*age
Your TURN