1 Simple Linear Regression Linear regression model Prediction Limitation Correlation.
Week 3: Basic regression - 2. Introduction to linear model
Transcript of Week 3: Basic regression - 2. Introduction to linear model
![Page 1: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/1.jpg)
Week 3: Basic regression
2. Introduction to linear model
Stat 140 - 04
Mount Holyoke College
Dr. Shan Shan Slides posted at http://sshanshans.github.io/stat140
![Page 2: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/2.jpg)
Outline
1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression
2. Summary
![Page 3: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/3.jpg)
Outline
1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression
2. Summary
![Page 4: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/4.jpg)
Mortality and potion
Scientists believe that water with high concentrations of calciumand magnesium is beneficial for health.
We have recordings of the mortality rate (deaths per 100,000population) and concentration of calcium in drinking water(parts per million) in 61 large towns in England and Wales.
1
![Page 5: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/5.jpg)
Review of terms
I Response variable: variable whose behavior or variationyou are trying to understand, on the y-axis (dependentvariable)
I Explanatory variables: other variables that you want touse to explain the variation in the response, on the x-axis(independent variables); these are also referred to aspredictors or features.
2
![Page 6: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/6.jpg)
Add a line
3
![Page 7: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/7.jpg)
What is the best line of fit?
What qualities are important here?
4
![Page 8: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/8.jpg)
Residuals
Residual = Observed - Predicted
1. Predicted value y: estimate made from a model2. Observed value y: value in the dataset
5
![Page 9: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/9.jpg)
The line of best fit
The line of best fit is the line for which the sum of the squaredresiduals is smallest, the least squares line.
6
![Page 10: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/10.jpg)
Outline
1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression
2. Summary
![Page 11: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/11.jpg)
Lines
The algebraic equation for a line is
Y = b0 + b1X
The use of coordinate axes to show functional relationships wasinvented by Rene Descartes (1596-1650). He was an artilleryofficer, and probably got the idea from pictures that showed thetrajectories of cannonballs.
7
![Page 12: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/12.jpg)
A new function
General form of ‘lm‘ command:
lm(y variable ∼ x variable, data = data frame)
Use this to estimate the intercept and the slope of line in theMortality/Health data.
linear fit ← lm(Mortality ∼ Calcium, data = mortality water)linear fit
Coefficients:
(Intercept) Calcium
1676.356 -3.226
8
![Page 13: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/13.jpg)
Regression line
From the coefficients, we can write the regression line in theMortality/Health data as
Mortality = 1676− 3 Calcium
Abstractly,
y = 1676− 3x
9
![Page 14: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/14.jpg)
Outline
1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression
2. Summary
![Page 15: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/15.jpg)
Predict
One of the towns in our sample had a measured Calciumconcentration of 71. What is the predicted value for themortality rate in that town?
Mortality = 1676− 3 Calcium
By hand:
Mortality = 1676− 3× 71 = 1463
The predicted value for the mortality rate in that town is 1463deaths per 100,000 population.
10
![Page 16: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/16.jpg)
Predict
One of the towns in our sample had a measured Calciumconcentration of 71. What is the predicted value for themortality rate in that town?
Mortality = 1676− 3 Calcium
By hand:
Mortality = 1676− 3× 71 = 1463
The predicted value for the mortality rate in that town is 1463deaths per 100,000 population.
10
![Page 17: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/17.jpg)
Predict
By R:The general form of ‘predict‘ command:
predict(linear model, newdata = data frame)
Code:
linear_fit <- lm(Mortality ~ Calcium, data = mortality_water)
predict_data <- data.frame( Calcium = 71)
predict(linear_fit, newdata = predict_data)
Output:
1
1447.303
The outputs by hand and by R are different because of roundingerrors.
11
![Page 18: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/18.jpg)
Predict
By R:The general form of ‘predict‘ command:
predict(linear model, newdata = data frame)
Code:
linear_fit <- lm(Mortality ~ Calcium, data = mortality_water)
predict_data <- data.frame( Calcium = 71)
predict(linear_fit, newdata = predict_data)
Output:
1
1447.303
The outputs by hand and by R are different because of roundingerrors.
11
![Page 19: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/19.jpg)
Outline
1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression
2. Summary
![Page 20: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/20.jpg)
Anscombe’s data
x1, y1
b0 = 3
b1 = 0.5
R2 = 67%
x2, y2
b0 = 3
b1 = 0.5
R2 = 67%
x3, y3
b0 = 3
b1 = 0.5
R2 = 67%
x4, y4
b0 = 3
b1 = 0.5
R2 = 67%
x5, y5
b0 = 3
b1 = 0.5
R2 = 67%
All 5 have essentially the same estimated intercept, slope, R2!
That means the five data sets should be pretty much the sameright?
12
![Page 21: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/21.jpg)
Anscombe’s data
The scatterplots tell a different story.
Words of caution: always plot your data!13
![Page 22: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/22.jpg)
Anscombe’s data
Is a linear model useful here?
14
![Page 23: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/23.jpg)
Checking conditions
Be sure to check the conditions for linear regression beforereporting or interpreting a linear model.
I From the scatterplot of y against x, check the– Straight Enough Condition Is the relationship between y and x
straight enough to proceed with a linear regression model?– Outlier Condition Are there any outliers that might
dramatically influence the fit of the least squares line?– Does the Plot Thicken? Condition Does the spread of the
data around the generally straight relationship seem to beconsistent for all values of x?
15
![Page 24: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/24.jpg)
Outline
1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression
2. Summary
![Page 25: Week 3: Basic regression - 2. Introduction to linear model](https://reader031.fdocuments.in/reader031/viewer/2022030110/621cc031323dd4143438d585/html5/thumbnails/25.jpg)
Summary of main ideas
1. Line of best fit
2. Finding the least square line in R
3. Prediction
4. Conditions for linear regression
16