Linear regression models

Simple Linear Regression

History

• Developed by Sir Francis Galton (1822-1911) in his article “Regression towards mediocrity in hereditary structure”

Purposes:

• To describe the linear relationship between two continuous variables, the response variable (y-axis) and a single predictor variable (x-axis)

• To determine how much of the variation in Y can be explained by the linear relationship with X and how much of this relationship remains unexplained

• To predict new values of Y from new values of X

The linear regression model is:

• Xi and Yi are paired observations (i = 1 to n)

• β0 = population intercept (when Xi =0)

• β1 = population slope (measures the change in Yi

per unit change in Xi)

• εi = the random or unexplained error associated with the i th observation. The εi are assumed to be independent and distributed as N(0, σ2).

iii XY 10

Linear relationshipY

Linear models approximate non-linear functions

over a limited domain

extrapolation extrapolationinterpolation

Yi = βo + β1*Xi + εi

ε ~ N(0,σ2) E(εi) = 0

E(Yi ) = βo + β1*Xi

• For a given value of X, the sampled Y values are independent with normally distributed errors:

Yi – Ŷi = εi (residual)

Fitting data to a linear model:

ii XY 10ˆˆˆ

The residual

2)( iii YYd

i ii YYRSS1

The residual sum of squares

Estimating Regression Parameters

• The “best fit” estimates for the regression population parameters (β0 and β1) are the values that minimize the residual sum of squares (SSresidual) between each observed value and the predicted value of the model:

iii XYYY

210 ))ˆˆ(()ˆ(minimize toˆ ,ˆ Choose

)()()(1 1

i iiiiY YYYYYYSS

)()(1 ii

i iiXY XXYYSS

Sum of squares

Sum of cross products

Least-squares parameter estimates

i iiX XXSS1

2 XXXXn

s ini iX

11 YYXX

ni iXY

Sample variance of X:

Sample covariance:

Thus, our estimated regression equation is:

Solving for the intercept:

XY 10ˆˆ

ii XY 10ˆˆˆ

Hypothesis Tests with Regression

• Null hypothesis is that there is no linear relationship between X and Y:

H0: β1 = 0 Yi = β0 + εi

HA: β1 ≠ 0 Yi = β0 + β1 Xi + εi

• We can use an F-ratio (i.e., the ratio of variances) to test these hypotheses

Variance of the error of regression:

residual

NOTE: this is also referred to as residual variance, mean squared error (MSE) or residual mean square (MSresidual)

Mean square of regression:

1regressionregression 1

The F-ratio is: (MSRegression)/(MSResidual)

This ratio follows the F-distribution with (1, n-2) degrees of freedom

Variance components and Coefficient of determination

RSSSSSS Yreg

RSSSSSS regY

Coefficient of determination

ANOVA table for regression

Source Degrees of freedom

Sum of squares Mean square

Expected mean square

F ratio

Regression 1

Residual n-2

Total n-1

i iiY YYSS1

i iireg YYSS1

i ii YYRSS1

1regSS

Product-moment correlation coefficient

Parametric Confidence Intervals• If we assume our parameter of interest has a particular sampling

distribution and we have estimated its expected value and variance, we can construct a confidence interval for a given percentile.

• Example: if we assume Y is a normal random variable with unknown mean μ and variance σ2, then is distributed as a standard normal variable. But, since we don’t know σ, we must divide by the standard error instead: , giving us a t-distribution with (n-1) degrees of freedom.

• The 100(1-α)% confidence interval for μ is then given by:

• IMPORTANT: this does not mean “There is a 100(1-α)% chance that the true population mean μ occurs inside this interval.” It means that if we were to repeatedly sample the population in the same way, 100(1-α)% of the confidence intervals would contain the true population mean μ.

YsY )(

YnYn stYstY )1;2/1()1;2/1(

Publication form of ANOVA table for regression

SourceSum of Squares df

Mean Square F Sig.

Regression 11.479 1 11.479 21.044 0.00035

Residual8.182 15 .545

Total 19.661 16

Variance of estimated intercept

ˆ2,00ˆ2,0 ˆˆˆˆˆ nn tt

Variance of the slope estimator

ˆ2,11ˆ2,1 ˆˆˆˆ nn tt

Variance of the fitted value

iXY SS

)|ˆ(2,)|ˆ(2, ˆˆˆˆˆXYnXYn tYYtY

Variance of the predicted value (Ỹ):

XXY SS

(2, ˆ~~

XYnXYn tYYtY

Regression

Ln( Island Area)

1086420-2

Ln (nu

Assumptions of regression

• The linear model correctly describes the functional relationship between X and Y

• The X variable is measured without error

• For a given value of X, the sampled Y values are independent with normally distributed errors

• Variances are constant along the regression line

Residual plot for species-area relationship

Unstandardized Predicted Value

6.05.55.04.54.03.53.02.5

Linear regression models

Documents

Transcript of Linear regression models

Linear regression models in matrix terms

Wavelet bootstrap Multiple linear regression models

Linear Regression Models, by Marcelo Moreira

Bayesian Inference Chapter 9. Linear models and regressionsayan/Sta613/2017/read/chapter_9.pdfChapter 9. Linear models and regression 9. Linear models and regression AFM Smith Objective

Non-linear Regression Models - krikamol.orgkrikamol.org/Courses/2016/Scma292/Lecture07.pdfNon-linear Regression Models SCMA292MathematicalModeling:MachineLearning KrikamolMuandet ...

3.Linear Models for Regression

Fuzzy Regression Models...fuzzy least-squares regression models. Keywords: fuzzy linear regression, fuzzy least-squares regression, fuzzy coefficients, possibilistic regression, term

5 - Linear Regression Models (Sections 5.5 - 5.9) - Winonacourse1.winona.edu/bdeppa/FIN 335/Handouts/Regression... · Web view5 - Linear Regression Models (Sections 5.5 - 5.9) Rob

Bayesian Networks and Linear Regression Models of …centerforknowledgecommunication.com/.../EDMArroyo... · Bayesian Networks and Linear Regression Models of Students’ Goals, Moods

Simple Linear Regression. Types of Regression Model Regression Models Simple (1 variable) LinearNon-Linear Multiple (2

Applied regression analysis and generalized linear models · As we have seen, linear statistical models—particularly linear regression analysis—make strong assumptions about the

Linear regression: Part 1 - NTNU · Linear regression: Part 1. Lecture Outline What are linear models?-EX1: What is the ‘best’ line? What is linear regression? ... Linear regression

Simple Linear Regression Models - Computer Sciencehzhang/courses/7290/Lectures/11 - Simple Linear... · Simple linear regression models Response Variable: Estimated variable Predictor

Linear Regression Models

Evaluating Hospital Case Cost Prediction Models Using ... · regression machine learning models: linear regression, Bayesian linear regression, decision forest regression, boosted

Linear Regression - SAS · MULTIPLE LINEAR REGRESSION SAS/STAT SYNTAX & EXAMPLE DATA ... SIMPLE LINEAR REGRESSION ... title 'Best Models Using All-Regression Option'; run;

Robust Cross-Validation of Linear Regression QSAR · PDF fileRobust Cross-Validation of Linear Regression QSAR Models ... Robust Cross-Validation of Linear Regression QSAR Models ...

Regression Model Building for Large, Complex Data with SAS ......Viya® procedures for building linear and logistic regression models, generalized linear models, quantile regression

Multiple Linear Regression Models in Outlier Detection

Evaluating Linear and Nonlinear Regression Models in ...