Time Series Analysis – Chapter 2 Simple Regression

38
Time Series Analysis – Chapter 2 Simple Regression Essentially, all models are wrong, but some are useful. - George Box Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN 0471810339 George Box is the son-in-law of Sir Ronald Fisher.

description

Time Series Analysis – Chapter 2 Simple Regression. Essentially, all models are wrong, but some are useful. - George Box Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN 0471810339 George Box is the son-in-law of Sir Ronald Fisher. - PowerPoint PPT Presentation

Transcript of Time Series Analysis – Chapter 2 Simple Regression

Page 1: Time Series Analysis – Chapter 2 Simple Regression

Time Series Analysis – Chapter 2Simple Regression

Essentially, all models are wrong, but some are useful. - George Box

Empirical Model-Building and Response Surfaces (1987), co-authored with Norman R. Draper, p. 424, ISBN 0471810339George Box is the son-in-law of Sir Ronald Fisher.

Page 2: Time Series Analysis – Chapter 2 Simple Regression

Time Series Analysis – Chapter 2Simple Regression

Equation of a Line – AlgebraVs.

Simple Regression – Statistics

Page 3: Time Series Analysis – Chapter 2 Simple Regression

Equation of a Line Exampley = mx + b

wage = 3.55educ – 33.8

y = wage in dollars per hourx = education in years completed

Note: if I know how many years of education someone has completed I can predict their wage perfectly. Nothing else matters.

Page 4: Time Series Analysis – Chapter 2 Simple Regression

Simple Regression Example

y = wage per hour ($) – dependent variablex = education completed (years) – independent variable= unknown intercept = unknown slopeu = error term – factors other than x that affect y

Page 5: Time Series Analysis – Chapter 2 Simple Regression
Page 6: Time Series Analysis – Chapter 2 Simple Regression

Simple Regression Example

Need to estimate and • Collect data• Conduct a “regression analysis”

Page 7: Time Series Analysis – Chapter 2 Simple Regression

Algebra vs. Statistics - Summary

Algebra: wage = 3.55educ – 33.8Deterministic Model

Statistics: Stochastic Model

Page 8: Time Series Analysis – Chapter 2 Simple Regression

Algebra vs. Statistics - Summary

All factors affecting y (wage) other than x (education) are considered unobservable. The error term u represents the effect of these other factors.

Upshot: u is independent of x.

Page 9: Time Series Analysis – Chapter 2 Simple Regression

+x if

Page 10: Time Series Analysis – Chapter 2 Simple Regression

+x if

• Equation tells us how the “average” value of y changes or is related to a particular x value.

• = 0.568 + 0.102 ACTStudent GPA ACT1 2.8 212 3.4 243 3.0 264 3.5 275 3.6 296 3.0 257 2.7 258 3.7 30

Page 11: Time Series Analysis – Chapter 2 Simple Regression

= 0.568 + 0.102 ACT

Page 12: Time Series Analysis – Chapter 2 Simple Regression

The Analysis of Variance TableAnalysis of Variance

Source DF SS MS F PRegression 1 0.59402 0.59402 8.20 0.029Residual Error 6 0.43473 0.07245Total 7 1.02875

Page 13: Time Series Analysis – Chapter 2 Simple Regression

ANOVA

Models can be evaluated by examining variability.

There are three types of variability that are quantified.• Overall or total variability present in the data (SST)• Variability explained by the regression model (SSR)• Error variability that is unexplained (SSE)

SST = SSR + SSE

Page 14: Time Series Analysis – Chapter 2 Simple Regression

ANOVA

The larger the regression variability (SSR) is compared to the error variability (SSE) the more evidence there is that the model is explanatory.

Analysis of Variance

Source DF SS MS F PRegression 1 0.59402 0.59402 8.20 0.029Residual Error 6 0.43473 0.07245Total 7 1.02875

Page 15: Time Series Analysis – Chapter 2 Simple Regression

ANOVA – R2

R2 is the Coefficient of Determination

R2 = SSR/SST = 1 – SSE/SST TYPO on pg. 40!! R2 is the percent of the variation in y (response variable)

explained by x (explanatory variable).

R-Sq = SSR/SST = 0.59402/ 1.02875 = 57.7%

Page 16: Time Series Analysis – Chapter 2 Simple Regression

ANOVA – r

r is the correlation coefficient and r = • Positive if a positive relationship is present• Negative if a negative relationship is present

0.7596

Page 17: Time Series Analysis – Chapter 2 Simple Regression

ANOVA – R2 vs. r

R2 always exists for simple regression and multiple regression and always has the same definition

r only exists and makes sense for simple regression

Page 18: Time Series Analysis – Chapter 2 Simple Regression

Nobel Prize vs. # of McDonalds

• Explanatory variable is number of McDonalds a country has

• Response variable is number of Nobel Prizes that have been awarded that country.

Page 19: Time Series Analysis – Chapter 2 Simple Regression

Logs

Page 20: Time Series Analysis – Chapter 2 Simple Regression

Level – Level Model• Dependent variable: y• Independent variable: x

• = 0.568 + 0.102 ACT (verify)

• (interpret)Student GPA ACT1 2.8 212 3.4 243 3.0 264 3.5 275 3.6 296 3.0 257 2.7 258 3.7 30

Page 21: Time Series Analysis – Chapter 2 Simple Regression

Level – Log Model• Dependent variable: y• Independent variable: log(x)

• Not used in this chapter, discussed in future chapters.

Page 22: Time Series Analysis – Chapter 2 Simple Regression

Log – Level Model• Dependent variable: log(y)• Independent variable: x

• = 0.341 + 0.0317 ACT (verify)

Student GPA ACT log(GPA)1 2.8 21 1.029622 3.4 24 1.223783 3.0 26 1.098614 3.5 27 1.252765 3.6 29 1.280936 3.0 25 1.098617 2.7 25 0.993258 3.7 30 1.30833

Page 23: Time Series Analysis – Chapter 2 Simple Regression

Log – Level Model• Dependent variable: log(y)• Independent variable: x• = 0.341 + 0.0317 ACT

• (see Appendix A)

• So, for every ACT score increase of 1 the GPA should increase by about 3.17%.

Page 24: Time Series Analysis – Chapter 2 Simple Regression

Level – Level Model• = 0.568 + 0.102 ACT

Page 25: Time Series Analysis – Chapter 2 Simple Regression

Log – Level Model• = 0.341 + 0.0317 ACT• =

Page 26: Time Series Analysis – Chapter 2 Simple Regression

Is this still linear regression?• = + ACT and this equation is linear in the parameters

and !

Page 27: Time Series Analysis – Chapter 2 Simple Regression

Log – Log Model• Dependent variable: log(y)• Independent variable: log(x)

• = - 1.41 + 0.791 log(ACT) (verify)

Student GPA ACT log(GPA) log(ACT)1 2.8 21 1.02962 3.044522 3.4 24 1.22378 3.178053 3.0 26 1.09861 3.258104 3.5 27 1.25276 3.295845 3.6 29 1.28093 3.367306 3.0 25 1.09861 3.218887 2.7 25 0.99325 3.218888 3.7 30 1.30833 3.40120

Page 28: Time Series Analysis – Chapter 2 Simple Regression

Log – Log or Constant Elasticity Model• Dependent variable: log(y)• Independent variable: log(x)• = - 1.41 + 0.791 log(ACT)

• (see Appendix A)

Page 29: Time Series Analysis – Chapter 2 Simple Regression

Log – Log or Constant Elasticity Model• = - 1.41 + 0.791 log(ACT)

• (see Appendix A)

• is the estimated elasticity of GPA with respect to ACT.

• A 1% increase in ACT implies a 0.791% increase in GPA.

Page 30: Time Series Analysis – Chapter 2 Simple Regression

Simple Linear Regression Assumptions

• SLR.1: The model to be estimated must be linear in the parameters and .

Page 31: Time Series Analysis – Chapter 2 Simple Regression

Simple Linear Regression Assumptions

• SLR.2: The sample of size n used to estimate the model parameters is a random sample (sometimes called a simple random sample).

What is the definition of a random sample?

Page 32: Time Series Analysis – Chapter 2 Simple Regression

Simple Linear Regression Assumptions

• SLR.3: The sample x values are not all the same value.

OK NOT OKx y3.4 243.0 263.5 273.6 293.0 252.7 25

x y3 243 263 273 293 253 25

Page 33: Time Series Analysis – Chapter 2 Simple Regression

Simple Linear Regression Assumptions

• SLR.4: The error variable u has an expected value of zero given any value f the explanatory variable x.

Page 34: Time Series Analysis – Chapter 2 Simple Regression

Simple Linear Regression Assumptions

• SLR.5: The error term u has the same variance (variability) associated with it given any value of the explanatory variable. In other words

Var

This is called homoskedasticity.

Page 35: Time Series Analysis – Chapter 2 Simple Regression

Ordinary Least Squares Estimators

• How do we estimate the parameters in the model

• Ordinary Least Squares gives unique estimates of and .

• Recall that the mean of is zero so we don’t need to estimate .

Page 36: Time Series Analysis – Chapter 2 Simple Regression

Ordinary Least Squares

• Minimize the sum of the squared residuals.

Page 37: Time Series Analysis – Chapter 2 Simple Regression

Ordinary Least Squares

Definition of residual:

Some are positive,Some negative, and

Student GPA ACT RESI11 2.8 21 0.0857142 3.4 24 0.3791213 3.0 26 -0.2252754 3.5 27 0.1725275 3.6 29 0.0681326 3.0 25 -0.1230777 2.7 25 -0.4230778 3.7 30 0.065934

Page 38: Time Series Analysis – Chapter 2 Simple Regression

Ordinary Least Squares

Minimize

(see notes in class)