Week 3: Basic regression

2. Introduction to linear model

Stat 140 - 04

Mount Holyoke College

Dr. Shan Shan Slides posted at http://sshanshans.github.io/stat140

Outline

1. Main ideas1. Line of best fit2. Finding the least square line in R3. Prediction4. Conditions for linear regression

2. Summary

Outline

2. Summary

Mortality and potion

Scientists believe that water with high concentrations of calciumand magnesium is beneficial for health.

We have recordings of the mortality rate (deaths per 100,000population) and concentration of calcium in drinking water(parts per million) in 61 large towns in England and Wales.

Review of terms

I Response variable: variable whose behavior or variationyou are trying to understand, on the y-axis (dependentvariable)

I Explanatory variables: other variables that you want touse to explain the variation in the response, on the x-axis(independent variables); these are also referred to aspredictors or features.

Add a line

What is the best line of fit?

What qualities are important here?

Residuals

Residual = Observed - Predicted

1. Predicted value y: estimate made from a model2. Observed value y: value in the dataset

The line of best fit

The line of best fit is the line for which the sum of the squaredresiduals is smallest, the least squares line.

Outline

2. Summary

The algebraic equation for a line is

Y = b0 + b1X

The use of coordinate axes to show functional relationships wasinvented by Rene Descartes (1596-1650). He was an artilleryofficer, and probably got the idea from pictures that showed thetrajectories of cannonballs.

A new function

General form of ‘lm‘ command:

lm(y variable ∼ x variable, data = data frame)

Use this to estimate the intercept and the slope of line in theMortality/Health data.

linear fit ← lm(Mortality ∼ Calcium, data = mortality water)linear fit

Coefficients:

(Intercept) Calcium

1676.356 -3.226

Regression line

From the coefficients, we can write the regression line in theMortality/Health data as

Mortality = 1676− 3 Calcium

Abstractly,

y = 1676− 3x

Outline

2. Summary

Predict

One of the towns in our sample had a measured Calciumconcentration of 71. What is the predicted value for themortality rate in that town?

By hand:

Mortality = 1676− 3× 71 = 1463

The predicted value for the mortality rate in that town is 1463deaths per 100,000 population.

Predict

One of the towns in our sample had a measured Calciumconcentration of 71. What is the predicted value for themortality rate in that town?

By hand:

Mortality = 1676− 3× 71 = 1463

The predicted value for the mortality rate in that town is 1463deaths per 100,000 population.

Predict

By R:The general form of ‘predict‘ command:

predict(linear model, newdata = data frame)

linear_fit <- lm(Mortality ~ Calcium, data = mortality_water)

predict_data <- data.frame( Calcium = 71)

predict(linear_fit, newdata = predict_data)

Output:

1447.303

The outputs by hand and by R are different because of roundingerrors.

Predict

By R:The general form of ‘predict‘ command:

predict(linear model, newdata = data frame)

linear_fit <- lm(Mortality ~ Calcium, data = mortality_water)

predict_data <- data.frame( Calcium = 71)

predict(linear_fit, newdata = predict_data)

Output:

1447.303

The outputs by hand and by R are different because of roundingerrors.

Outline

2. Summary

Anscombe’s data

x1, y1

b0 = 3

b1 = 0.5

R2 = 67%

x2, y2

b0 = 3

b1 = 0.5

R2 = 67%

x3, y3

b0 = 3

b1 = 0.5

R2 = 67%

x4, y4

b0 = 3

b1 = 0.5

R2 = 67%

x5, y5

b0 = 3

b1 = 0.5

R2 = 67%

All 5 have essentially the same estimated intercept, slope, R2!

That means the five data sets should be pretty much the sameright?

Anscombe’s data

The scatterplots tell a different story.

Words of caution: always plot your data!13

Anscombe’s data

Is a linear model useful here?

Checking conditions

Be sure to check the conditions for linear regression beforereporting or interpreting a linear model.

I From the scatterplot of y against x, check the– Straight Enough Condition Is the relationship between y and x

straight enough to proceed with a linear regression model?– Outlier Condition Are there any outliers that might

dramatically influence the fit of the least squares line?– Does the Plot Thicken? Condition Does the spread of the

data around the generally straight relationship seem to beconsistent for all values of x?

Outline

2. Summary

Summary of main ideas

1. Line of best fit

2. Finding the least square line in R

3. Prediction

4. Conditions for linear regression

Week 3: Basic regression - 2. Introduction to linear model

Transcript of Week 3: Basic regression - 2. Introduction to linear model

Week 3: Basic regression - 2. Introduction to linear model

Documents

Transcript of Week 3: Basic regression - 2. Introduction to linear model

Multiple Linear Regression - Analysis Made Easy · Multiple Linear Regression The MULTIPLE LINEAR REGRESSION command performs simple multiple regression using least squares. Linear

Linear Regression - Wharton Finance - Finance …finance.wharton.upenn.edu/.../Linear-Regression-Slides.pdf · 2009-10-07 · regression model, 2-variable linear regression model)

Week 4: Simple Linear Regression II

1 Curve-Fitting Spline Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.

Week 4: Simple Linear Regression III

Chapter 13: SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.

The Algebra of Linear Regression - statpower.net Notes/RegressionAlgebra.pdf · The Algebra of Linear Regression 1 Introduction 2 Bivariate Linear Regression 3 Multiple Linear Regression

SIMPLE LINEAR REGRESSION. 2 Simple Regression Linear Regression.

Title stata.com etregress — Linear regression with ... · etregress — Linear regression with endogenous treatment effects ... etregress— Linear regression with endogenous ...

Week 11 Correlation & Linear Regression

Regression Linear Regression

ANOVA and Linear Regression ScWk 242 – Week 13 Slides.

Regression analysis Linear regression Logistic regression.

Week 3: Basic regression - 4. How useful is a linear model

Week 5: Simple Linear Regression - Princeton University · Week 5: Simple Linear Regression Brandon Stewart1 Princeton October 10, 12, 2016 1These slides are heavily in uenced by

Econometrics notes (Introduction, Simple Linear regression, Multiple linear regression)

Chapter 3 Multiple Linear Regression Model - IITKhome.iitk.ac.in/~shalab/regression/Chapter3-Regression-Multiple...Chapter 3 Multiple Linear Regression Model ... So a simple linear

1 Curve-Fitting Polynomial Interpolation. 2 Curve Fitting Regression Linear Regression Polynomial Regression Multiple Linear Regression Non-linear Regression.

Chapter 11 Multiple Linear Regression Chapter 11 Multiple Linear Regression.

Linear regression: Part 1 - NTNU · Linear regression: Part 1. Lecture Outline What are linear models?-EX1: What is the ‘best’ line? What is linear regression? ... Linear regression