2014 02 03 and 05 econ 141 uc berkeley

74
Ch.4: Simple Linear Regression Econ 141 Spring 2014 Lecture: February 02 and 05, 2014 Bart Hobijn 2/03&05/2014 Econ 141, Spring 2014 1 The views expressed in these lecture notes are solely those of the instructor and do not necessarily reflect those of the UC Berkeley, or other institutions with which he is affiliated.

description

bart hobijn uc berkeley econ 141 linear regression slides lecture 2

Transcript of 2014 02 03 and 05 econ 141 uc berkeley

Page 1: 2014 02 03 and 05 econ 141 uc berkeley

Ch.4: Simple Linear Regression

Econ 141 Spring 2014

Lecture: February 02 and 05, 2014

Bart Hobijn

2/03&05/2014 Econ 141, Spring 2014 1

The views expressed in these lecture notes are solely those of the instructor and do not necessarily

reflect those of the UC Berkeley, or other institutions with which he is affiliated.

Page 2: 2014 02 03 and 05 econ 141 uc berkeley

Example: Estimate MPC

β€’ MPC: Marginal Propensity to Consume

Suppose households’ pre-tax income

increases by a dollar, what fraction of this

dollar would they end up spending versus

paying in taxes or saving?

2/03&05/2014 Econ 141, Spring 2014 2

Page 3: 2014 02 03 and 05 econ 141 uc berkeley

Example: Estimate MPC

β€’ Basic equation

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 – 𝑖 MSA, (unit of observation)

– π‘Œπ‘– Average consumption expenditures per household

– 𝑋𝑖 Average pre-tax income per household

– 𝛽0 Average consumption level at zero income

– 𝛽1 Marginal propensity to consume (MPC)

– 𝑒𝑖 MSA-specific deviation from average linear

relationship between income and spending

β€’ How can we estimate value of MPC, i.e. 𝛽1?

2/03&05/2014 Econ 141, Spring 2014 3

Page 4: 2014 02 03 and 05 econ 141 uc berkeley

Income and spending by MSA

MSA

(π’Š) Spending

(π’€π’Š)

Income

(π‘Ώπ’Š)

MSA

(π’Š) Spending

(π’€π’Š)

Income

(π‘Ώπ’Š)

Chicago 57.7 74.4 Atlanta 51.9 71.2

Detroit 50.5 79.8 Miami 40.6 58.9

Minneapolis-

St. Paul 56.7 66.8

Dallas-

Fort Worth 57.1 71.0

Cleveland 48.0 65.9 Houston 58.2 73.5

New York 58.7 80.2 Los

Angeles 55.3 69.6

Philadelphia 53.5 71.7 San

Francisco 73.6 98.2

Boston 65.0 79.8 San Diego 56.2 76.4

Washington,

D.C. 77.9 111.9 Seattle 60.7 74.1

Baltimore 62.3 96.9 Phoenix 53.7 63.2

2/03&05/2014 Econ 141, Spring 2014 4

Note: Spending and income are annual average across households in thousands of dollars

Source: Consumer Expenditure Survey

Page 5: 2014 02 03 and 05 econ 141 uc berkeley

Data scatterplot

2/03&05/2014 Econ 141, Spring 2014 5

Chicago

Detroit

Minneapolis-St. Paul

Cleveland

New York

Philadelphia

Boston

Washington,D.C.

Baltimore

Atlanta

Miami

Dallas-Fort Worth

Houston

LosAngeles

SanFrancisco

San Diego

Seattle

Phoenix

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

Page 6: 2014 02 03 and 05 econ 141 uc berkeley

Data scatterplot

2/03&05/2014 Econ 141, Spring 2014 6

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

Page 7: 2014 02 03 and 05 econ 141 uc berkeley

Estimate of MPC (𝜷𝟏)?

2/03&05/2014 Econ 141, Spring 2014 7

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

Page 8: 2014 02 03 and 05 econ 141 uc berkeley

Estimate of MPC (𝜷𝟏)?

2/03&05/2014 Econ 141, Spring 2014 8

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

What is best estimate of line

defined by πœ·π’ and 𝜷𝟏?

Page 9: 2014 02 03 and 05 econ 141 uc berkeley

Ordinary Least Squares

2/03&05/2014 Econ 141, Spring 2014 9

Page 10: 2014 02 03 and 05 econ 141 uc berkeley

Simple linear regression model

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖

– 𝑖 observation number

– π‘Œπ‘– dependent variable (regressand)

– 𝑋𝑖 independent (explanatory) variable (regressor)

– 𝛽0 intercept / constant

– 𝛽1 slope coefficient

– 𝑒𝑖 error term / residual

2/03&05/2014 Econ 141, Spring 2014 10

Page 11: 2014 02 03 and 05 econ 141 uc berkeley

Simple linear regression model

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖

– 𝑖 observation number

– π‘Œπ‘– dependent variable (regressand)

– 𝑋𝑖 independent (explanatory) variable (regressor)

– 𝛽0 intercept / constant

– 𝛽1 slope coefficient

– 𝑒𝑖 error term / residual

2/03&05/2014 Econ 141, Spring 2014 11

Population regression line /

Population regression function

Average linear relationship between

dependent and independent variable.

Page 12: 2014 02 03 and 05 econ 141 uc berkeley

Simple linear regression model

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖

– 𝑖 observation number

– π‘Œπ‘– dependent variable (regressand)

– 𝑋𝑖 independent (explanatory) variable (regressor)

– 𝛽0 intercept / constant

– 𝛽1 slope coefficient

– 𝑒𝑖 error term / residual

2/03&05/2014 Econ 141, Spring 2014 12

Error term / Residual

Observation-specific deviation from

average linear relationship between

dependent and independent variable.

Page 13: 2014 02 03 and 05 econ 141 uc berkeley

Simple linear regression model

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖

– 𝑖 observation number

– π‘Œπ‘– dependent variable (regressand)

– 𝑋𝑖 independent (explanatory) variable (regressor)

– 𝛽0 Intercept / constant

– 𝛽1 Slope coefficient

– 𝑒𝑖 error term / residual

2/03&05/2014 Econ 141, Spring 2014 13

Why we need to estimate

Observed: Sample π‘Œπ‘– , 𝑋𝑖 for 𝑖 = 1, … , 𝑛.

Unobserved: Parameters 𝛽0 and 𝛽1 as well

as error terms 𝑒𝑖 for 𝑖 = 1, … , 𝑛.

Page 14: 2014 02 03 and 05 econ 141 uc berkeley

Ordinary Least Squares (OLS)

β€’ OLS estimates: Choose 𝛽 0 and 𝛽 1 to minimize the sum of squared

residuals (SSR)

𝛽 0, 𝛽 1 = argmin𝑏1,𝑏2

π‘Œπ‘– βˆ’ 𝑏0 βˆ’ 𝑏1𝑋𝑖2

𝑛

𝑖=1

β€’ Properties:

– What is solution for 𝛽 0, 𝛽 1 ?

– Are 𝛽 0, 𝛽 1 consistent estimates of true 𝛽0 and 𝛽1?

– Are 𝛽 0, 𝛽 1 unbiased estimates of true 𝛽0 and 𝛽1?

– What is their asymptotic distribution? 2/03&05/2014 Econ 141, Spring 2014 14

Page 15: 2014 02 03 and 05 econ 141 uc berkeley

Solution for 𝜷 𝟎, 𝜷 𝟏

First order necessary condition for 𝜷 𝟎

0 =πœ•

πœ•π›½ 0 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

2𝑛

𝑖=1

= πœ•

πœ•π›½ 0π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

2𝑛

𝑖=1

= βˆ’2 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

β‡’ 0 =1

𝑛 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

.

Solving for 𝛽 0

0 =1

𝑛 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

= π‘Œ βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋

Such that

𝛽 0 = π‘Œ βˆ’ 𝛽 1𝑋

2/03&05/2014 Econ 141, Spring 2014 15

Page 16: 2014 02 03 and 05 econ 141 uc berkeley

Solution for 𝜷 𝟎, 𝜷 𝟏

First order necessary condition for 𝜷 𝟏

0 =πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

2𝑛

𝑖=1

=πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

2𝑛

𝑖=1

= βˆ’2 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

β‡’ 0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 16

Page 17: 2014 02 03 and 05 econ 141 uc berkeley

Solution for 𝜷 𝟎, 𝜷 𝟏

First order necessary condition for 𝜷 𝟏

0 =πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

2𝑛

𝑖=1

=πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

2𝑛

𝑖=1

= βˆ’2 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

β‡’ 0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 17

This implies that

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛𝑖=1

Page 18: 2014 02 03 and 05 econ 141 uc berkeley

Solution for 𝜷 𝟎, 𝜷 𝟏

First order necessary condition for 𝜷 𝟏

0 =πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

2𝑛

𝑖=1

=πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

2𝑛

𝑖=1

= βˆ’2 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

β‡’ 0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ

𝑛

𝑖=1

βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

= π‘ π‘‹π‘Œ βˆ’π›½ 1𝑠𝑋2 .

2/03&05/2014 Econ 141, Spring 2014 18

Page 19: 2014 02 03 and 05 econ 141 uc berkeley

Solution for 𝜷 𝟎, 𝜷 𝟏

First order necessary condition for 𝜷 𝟏

0 =πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

2𝑛

𝑖=1

=πœ•

πœ•π›½ 1 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

2𝑛

𝑖=1

= βˆ’2 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

β‡’ 0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ

𝑛

𝑖=1

βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

= π‘ π‘‹π‘Œ βˆ’ 𝛽 1𝑠𝑋2 .

2/03&05/2014 Econ 141, Spring 2014 19

𝜷 𝟏 =π‘ π‘‹π‘Œ

𝑠𝑋2

Page 20: 2014 02 03 and 05 econ 141 uc berkeley

Simple linear regression with OLS

β€’ OLS estimators of 𝜷𝟎 and 𝜷𝟏:

– 𝛽 1 =1

𝑛 π‘‹π‘–βˆ’π‘‹ π‘Œπ‘–βˆ’π‘Œ 𝑛

𝑖=11

𝑛 π‘‹π‘–βˆ’π‘‹ 2𝑛

𝑖=1

=π‘ π‘‹π‘Œ

𝑠𝑋2

– 𝛽 0 = π‘Œ βˆ’ 𝛽 1𝑋

β€’ Derived estimates for each π’Š = 𝟏, … , 𝒏

– π‘Œ 𝑖 = 𝛽 0 + 𝛽 1𝑋𝑖, predicted/fitted value of π‘Œπ‘–

– 𝑒 𝑖 = π‘Œπ‘– βˆ’ π‘Œ 𝑖, residual

2/03&05/2014 Econ 141, Spring 2014 20

Page 21: 2014 02 03 and 05 econ 141 uc berkeley

Estimating the MPC

2/03&05/2014 Econ 141, Spring 2014 21

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

Page 22: 2014 02 03 and 05 econ 141 uc berkeley

Estimating the MPC

2/03&05/2014 Econ 141, Spring 2014 22

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

San Francisco

Page 23: 2014 02 03 and 05 econ 141 uc berkeley

Estimating the MPC

2/03&05/2014 Econ 141, Spring 2014 23

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

San Francisco

Estimated MPC out of

pre-tax income is 56

cents on the dollar

Page 24: 2014 02 03 and 05 econ 141 uc berkeley

73.6

69.6

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

San Francisco

SF predicted value and residual

2/03&05/2014 Econ 141, Spring 2014 24

Expenditures in

SF higher than

predicted by

regression

Page 25: 2014 02 03 and 05 econ 141 uc berkeley

SF predicted value and residual

2/03&05/2014 Econ 141, Spring 2014 25

73.6

69.6

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

San Francisco

Residual:

Page 26: 2014 02 03 and 05 econ 141 uc berkeley

Goodness of fit

β€’ Main question

What fraction of the variance of the

dependent variable, π‘Œπ‘–, is explained by the

regression line rather than unexplained? (unexplained means part of the residuals)

β€’ Variance accounting – 𝑇𝑆𝑆 = π‘Œπ‘– βˆ’ π‘Œ 2𝑛

𝑖=1 total sum of squares

– 𝐸𝑆𝑆 = π‘Œ 𝑖 βˆ’ π‘Œ 2𝑛

𝑖=1 , estimated sum of squares

a.k.a. model sum of squares

– 𝑆𝑆𝑅 = π‘Œπ‘– βˆ’ π‘Œ 𝑖2𝑛

𝑖=1 = 𝑒 𝑖2 𝑛

𝑖=1 sum of squares residuals

a.k.a. residual sum of squares

2/03&05/2014 Econ 141, Spring 2014 26

Page 27: 2014 02 03 and 05 econ 141 uc berkeley

TSS = ESS + SSR decomposition

𝑇𝑆𝑆 = π‘Œπ‘– βˆ’ π‘Œ 2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + π‘Œπ‘– βˆ’ π‘Œ 𝑖2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2=

𝑛

𝑖=1

𝛽 0 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= π‘Œ βˆ’ 𝛽 1𝑋 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒 𝑖2

𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+ 2𝛽 1 𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖

𝑛

𝑖=1

+ 𝑒 𝑖2

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 27

Page 28: 2014 02 03 and 05 econ 141 uc berkeley

TSS = ESS + SSR decomposition

𝑇𝑆𝑆 = π‘Œπ‘– βˆ’ π‘Œ 2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + π‘Œπ‘– βˆ’ π‘Œ 𝑖2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2=

𝑛

𝑖=1

𝛽 0 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= π‘Œ βˆ’ 𝛽 1𝑋 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒 𝑖2

𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+ 2𝛽 1 𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖

𝑛

𝑖=1

+ 𝑒 𝑖2

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 28

𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖𝑛𝑖=1 = 𝟎 according to first-order

necessary condition derived on slide 17.

Page 29: 2014 02 03 and 05 econ 141 uc berkeley

TSS = ESS + SSR decomposition

𝑇𝑆𝑆 = π‘Œπ‘– βˆ’ π‘Œ 2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + π‘Œπ‘– βˆ’ π‘Œ 𝑖2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2=

𝑛

𝑖=1

𝛽 0 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= π‘Œ βˆ’ 𝛽 1𝑋 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒 𝑖2

𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+ 2𝛽 1 𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖

𝑛

𝑖=1

+ 𝑒 𝑖2

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 29

𝛽 1 𝑋𝑖 βˆ’ 𝑋 = π‘Œ 𝑖 βˆ’ π‘Œ

Page 30: 2014 02 03 and 05 econ 141 uc berkeley

TSS = ESS + SSR decomposition

𝑇𝑆𝑆 = π‘Œπ‘– βˆ’ π‘Œ 2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + π‘Œπ‘– βˆ’ π‘Œ 𝑖2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2=

𝑛

𝑖=1

𝛽 0 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= π‘Œ βˆ’ 𝛽 1𝑋 + 𝛽 1𝑋𝑖 βˆ’ π‘Œ + 𝑒 𝑖

2𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒 𝑖2

𝑛

𝑖=1

= 𝛽 1 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+ 2𝛽 1 𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖

𝑛

𝑖=1

+ 𝑒 𝑖2

𝑛

𝑖=1

= π‘Œ 𝑖 βˆ’ π‘Œ 2

𝑛

𝑖=1

+ 𝑒 𝑖2

𝑛

𝑖=1

= 𝐸𝑆𝑆 + 𝑆𝑆𝑅.

2/03&05/2014 Econ 141, Spring 2014 30

Page 31: 2014 02 03 and 05 econ 141 uc berkeley

Goodness of fit for MCP regression

2/03&05/2014 Econ 141, Spring 2014 31

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

San Francisco

Page 32: 2014 02 03 and 05 econ 141 uc berkeley

π‘ΉπŸ: measure of goodness of fit

Measure Equation Value Share (percentage)

ESS π‘Œ 𝑖 βˆ’ π‘Œ 2𝑛

𝑖=1 944.0 74.9

SSR 𝑒 𝑖2

𝑛

𝑖=1 316.4 25.1

TSS π‘Œπ‘– βˆ’ π‘Œ 2𝑛

𝑖=1 1260.3 100.0

2/03&05/2014 Econ 141, Spring 2014 32

Page 33: 2014 02 03 and 05 econ 141 uc berkeley

π‘ΉπŸ: measure of goodness of fit

Measure Equation Value Share (percentage)

ESS π‘Œ 𝑖 βˆ’ π‘Œ 2𝑛

𝑖=1 944.0 74.9

SSR 𝑒 𝑖2

𝑛

𝑖=1 316.4 25.1

TSS π‘Œπ‘– βˆ’ π‘Œ 2𝑛

𝑖=1 1260.3 100.0

2/03&05/2014 Econ 141, Spring 2014 33

π‘ΉπŸ fraction of the variation in

the dependent variable, i.e.

of 𝑻𝑺𝑺, explained by the

regression line. π‘ΉπŸ = 𝟎. πŸ•πŸ’πŸ—.

Page 34: 2014 02 03 and 05 econ 141 uc berkeley

Standard error of the regression

Measure Equation Value Share (percentage)

ESS π‘Œ 𝑖 βˆ’ π‘Œ 2𝑛

𝑖=1 944.0 74.9

SSR 𝑒 𝑖2

𝑛

𝑖=1 316.4 25.1

TSS π‘Œπ‘– βˆ’ π‘Œ 2𝑛

𝑖=1 1260.3 100.0

2/03&05/2014 Econ 141, Spring 2014 34

𝑺𝑬𝑹 = 𝒔𝒖 , where 𝒔𝒖 𝟐 =

𝟏

π’βˆ’πŸ 𝒖 π’Š

𝟐 π’π’Š=𝟏 is

an unbiased estimate of variance of

residuals 𝒗𝒂𝒓 π’–π’Š

Page 35: 2014 02 03 and 05 econ 141 uc berkeley

Standard error of the regression

𝑆𝐸𝑅 = 𝑠𝑒 , where 𝑠𝑒 2 =

1

π‘›βˆ’2 𝑒 𝑖

2 𝑛𝑖=1

β€’ Degrees of freedom correction If we had only two observations we would be able to

perfectly fit a straight line and residuals would be

zero.

2/03&05/2014 Econ 141, Spring 2014 35

Page 36: 2014 02 03 and 05 econ 141 uc berkeley

Standard error of the regression

𝑆𝐸𝑅 = 𝑠𝑒 , where 𝑠𝑒 2 =

1

π‘›βˆ’2 𝑒 𝑖

2 𝑛𝑖=1

β€’ Degrees of freedom correction If we had only two observations we would be able to

perfectly fit a straight line and residuals would be

zero.

β€’ Measure of spread around regression line Estimate of standard deviation of deviation from the

regression line.

2/03&05/2014 Econ 141, Spring 2014 36

Page 37: 2014 02 03 and 05 econ 141 uc berkeley

MCP regression in Excel

2/03&05/2014 Econ 141, Spring 2014 37

30

40

50

60

70

80

90

50 60 70 80 90 100 110 120

Source: Consumer Expenditure Survey by MSA

Annual income and expenditures by household; 000's dollars; 2012

Average Income and Expenditures by major MSA

Income

Expenditures

San Francisco

Page 38: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 38

𝜷 𝟎 estimated

intercept

Page 39: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 39

𝜷 𝟏 estimated slope

Page 40: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 40

𝒏 sample size

Page 41: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 41

𝑬𝑺𝑺 Explained sum of squares

𝑺𝑺𝑹 Sum of squared residuals

𝑻𝑺𝑺 Total sum of squares

Page 42: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 42

π‘ΉπŸ

Page 43: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 43

𝑠𝑒 2 =

1

𝑛 βˆ’ 2 𝑒 𝑖

2 𝑛

𝑖=1

Page 44: 2014 02 03 and 05 econ 141 uc berkeley

SUMMARY OUTPUT

Dependent variable: Expenditures

Regression Statistics

Multiple R 0.865434015

R Square 0.748976034

Adjusted R Square 0.733287037

Standard Error 4.446724217

Observations 18

ANOVA

df SS MS F Significance F

Regression 1 943.9589518 943.9589518 47.73893411 3.51406E-06

Residual 16 316.3737002 19.77335626

Total 17 1260.332652

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 14.69690817 6.304502449 2.331176535 0.033146025 1.331960021 28.061856 1.33196002 28.06185632

Income 0.558872878 0.080886618 6.909336734 3.51406E-06 0.387400909 0.7303448 0.38740091 0.730344847

Example Excel regression output

2/03&05/2014 Econ 141, Spring 2014 44

𝑆𝐸𝑅 = 𝑠𝑒

Page 45: 2014 02 03 and 05 econ 141 uc berkeley

Why use OLS?

β€’ Most common estimation method

– Implemented in many different applications

– Most common methodology. Thus important to

understand.

β€’ OLS has very desirable properties

Under relatively general conditions OLS estimates

are

– Consistent

– Unbiased

– Have tractable asymptotic distribution 2/03&05/2014 Econ 141, Spring 2014 45

Page 46: 2014 02 03 and 05 econ 141 uc berkeley

Why use OLS?

β€’ Most common estimation method

– Implemented in many different applications

– Most common methodology. Thus important to

understand.

β€’ OLS has very desirable properties

Under relatively general conditions OLS estimates

are

– Consistent

– Unbiased

– Have tractable asymptotic distribution 2/03&05/2014 Econ 141, Spring 2014 46

Page 47: 2014 02 03 and 05 econ 141 uc berkeley

Conditions listed in book

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖, where 𝑖 = 1, … , 𝑛

β€’ No information in π‘Ώπ’Š about π’–π’Š

𝐸 𝑒𝑖 = 𝐸 𝑒𝑖 𝑋𝑖 = 0

2/03&05/2014 Econ 141, Spring 2014 47

Page 48: 2014 02 03 and 05 econ 141 uc berkeley

Conditions listed in book

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖, where 𝑖 = 1, … , 𝑛

β€’ No information in π‘Ώπ’Š about π’–π’Š

𝐸 𝑒𝑖 = 𝐸 𝑒𝑖 𝑋𝑖 = 0 Suppose not and instead 𝐸 𝑒𝑖 𝑋𝑖 = 𝛾𝑋𝑖, then we can write

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 = 𝛽0 + 𝛽1𝑋𝑖 + 𝛾𝑋𝑖 + 𝑒𝑖 βˆ’ 𝛾𝑋𝑖

= 𝛽0 + 𝛽1 + 𝛾 𝑋𝑖 + 𝑒𝑖 βˆ’ 𝛾𝑋𝑖 = 𝛽0 + 𝛽 1𝑋𝑖 + 𝑒 𝑖

where 𝐸 𝑒 𝑖 𝑋𝑖 = 𝐸 𝑒𝑖 βˆ’ 𝛾𝑋𝑖 𝑋𝑖 = 𝐸 𝑒𝑖 𝑋𝑖 βˆ’ 𝛾𝑋𝑖 = 0

So, in this case there is an alternative representation of the

linear regression line with a different slope parameter,

𝛽 1 = 𝛽1 + 𝛾, that satisfies this assumption.

2/03&05/2014 Econ 141, Spring 2014 48

Page 49: 2014 02 03 and 05 econ 141 uc berkeley

Conditions listed in book

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖, where 𝑖 = 1, … , 𝑛

β€’ No information in π‘Ώπ’Š about π’–π’Š

𝐸 𝑒𝑖 = 𝐸 𝑒𝑖 𝑋𝑖 = 0

Note, this implies that

𝐸 𝑒𝑖𝑋𝑖 = 𝐸 𝑋𝑖𝐸 𝑒𝑖 𝑋𝑖 = 0

Which, given 𝐸 𝑒𝑖 = 0 implies that

cov 𝑋𝑖 , 𝑒𝑖 = 𝐸 𝑒𝑖𝑋𝑖 βˆ’ 𝐸 𝑒𝑖 𝐸 𝑋𝑖 = 0

2/03&05/2014 Econ 141, Spring 2014 49

Page 50: 2014 02 03 and 05 econ 141 uc berkeley

Conditions listed in book

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖, where 𝑖 = 1, … , 𝑛

β€’ No information in π‘Ώπ’Š about π’–π’Š

𝐸 𝑒𝑖 = 𝐸 𝑒𝑖 𝑋𝑖 = 0

β€’ π‘Ώπ’Š, π’€π’Š well-behaved random variables

– 𝑋𝑖 , π‘Œπ‘– , 𝑖 = 1, … , 𝑛, are independently drawn from

identical joint distribution.

– Large outliers are unlikely.

2/03&05/2014 Econ 141, Spring 2014 50

Page 51: 2014 02 03 and 05 econ 141 uc berkeley

Conditions listed in book

π‘Œπ‘– = 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖, where 𝑖 = 1, … , 𝑛

β€’ No information in π‘Ώπ’Š about π’–π’Š

𝐸 𝑒𝑖 = 𝐸 𝑒𝑖 𝑋𝑖 = 0

β€’ π‘Ώπ’Š, π’€π’Š well-behaved random variables

– 𝑋𝑖 , π‘Œπ‘– , 𝑖 = 1, … , 𝑛, are independently drawn from

identical joint distribution.

– Large outliers are unlikely.

Last two assumptions are made such that we can apply LLN

and CLT to derive properties of 𝛽 0 and 𝛽 1.

2/03&05/2014 Econ 141, Spring 2014 51

Page 52: 2014 02 03 and 05 econ 141 uc berkeley

Properties of 𝛽 0 and 𝛽 1

Properties of 𝛽 0 and 𝛽 1 are derived by manipulating

the first-order necessary conditions from slides 15 and

17.

0 =1

𝑛 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑒 𝑖

𝑛

𝑖=1

and

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖

𝑛

𝑖=1

2/03&05/2014 Econ 141, Spring 2014 52

Page 53: 2014 02 03 and 05 econ 141 uc berkeley

Properties of 𝛽 0 and 𝛽 1

Properties of 𝛽 0 and 𝛽 1 are derived by manipulating

the first-order necessary conditions from slides 15 and

17.

0 =1

𝑛 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑒 𝑖

𝑛

𝑖=1

and

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒 𝑖

𝑛

𝑖=1

These are sample approximations of condition

𝐸 𝑒𝑖 = 𝐸 𝑒𝑖𝑋𝑖 = 0

2/03&05/2014 Econ 141, Spring 2014 53

Page 54: 2014 02 03 and 05 econ 141 uc berkeley

Consistency of 𝛽 1

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 βˆ’ 𝑒 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽1 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

= 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 54

Page 55: 2014 02 03 and 05 econ 141 uc berkeley

Consistency of 𝛽 1

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 βˆ’ 𝑒 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽1 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

= 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

2/03&05/2014 Econ 141, Spring 2014 55

𝒑 𝒗𝒂𝒓 π‘Ώπ’Š > 𝟎 𝒑

𝒄𝒐𝒗 π‘Ώπ’Š, π’–π’Š = 𝟎

Page 56: 2014 02 03 and 05 econ 141 uc berkeley

Consistency of 𝛽 1

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 βˆ’ 𝑒 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽1 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

= 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

𝑝 0 = 𝛽1 βˆ’ 𝛽 1 var 𝑋𝑖 .

Such that

𝛽1 βˆ’ 𝛽 1𝑝 0, that is 𝛽 1

𝑝 𝛽1

2/03&05/2014 Econ 141, Spring 2014 56

Page 57: 2014 02 03 and 05 econ 141 uc berkeley

Consistency of 𝛽 1

0 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ 𝛽 0 βˆ’ 𝛽 1𝑋𝑖

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 π‘Œπ‘– βˆ’ π‘Œ βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽0 + 𝛽1𝑋𝑖 + 𝑒𝑖 βˆ’ 𝛽0 βˆ’ 𝛽1𝑋𝑖 βˆ’ 𝑒 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋

𝑛

𝑖=1

=1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝛽1 βˆ’ 𝛽 1 𝑋𝑖 βˆ’ 𝑋 + 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

= 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

𝑝 0 = 𝛽1 βˆ’ 𝛽 1 var 𝑋𝑖 .

Such that

𝛽1 βˆ’ 𝛽 1𝑝 0, that is 𝛽 1

𝑝 𝛽1

2/03&05/2014 Econ 141, Spring 2014 57

As the sample size 𝒏 gets arbitrarily

large, i.e. 𝒏 ∞, our estimate of the

slope coefficient, 𝜷 𝟏, gets arbitrarily

close to the true parameter value

𝜷𝟏from the population regression line

Page 58: 2014 02 03 and 05 econ 141 uc berkeley

But, in real life, 𝒏 is finite

Small sample properties of OLS estimators

β€’ Unbiasedness

On average OLS estimate equals true

parameter value of interest.

β€’ Asymptotic distribution

OLS assumptions imply we can use CLT to

derive asymptotic normal distribution of OLS

estimates, that can be used as approximation

when 𝑛 is big.

2/03&05/2014 Econ 141, Spring 2014 58

Page 59: 2014 02 03 and 05 econ 141 uc berkeley

Unbiasedness of 𝛽 1

0 = 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

Such that

𝛽 1 = 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

Taking expectations yields

E 𝛽 1 = E 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋1, … , 𝑋𝑛𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 0𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1

2/03&05/2014 Econ 141, Spring 2014 59

Page 60: 2014 02 03 and 05 econ 141 uc berkeley

Unbiasedness of 𝛽 1

0 = 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

Such that

𝛽 1 = 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

Taking expectations yields

E 𝛽 1 = E 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋1, … , 𝑋𝑛𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 0𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1

2/03&05/2014 Econ 141, Spring 2014 60

Here is where we

apply second

condition from slide 43

Page 61: 2014 02 03 and 05 econ 141 uc berkeley

Unbiasedness of 𝛽 1

0 = 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

Such that

𝛽 1 = 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

Taking expectations yields

E 𝛽 1 = E 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋1, … , 𝑋𝑛𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 0𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1

2/03&05/2014 Econ 141, Spring 2014 61

Implied by first

condition from slide 43

Page 62: 2014 02 03 and 05 econ 141 uc berkeley

Unbiasedness of 𝛽 1

0 = 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

Such that

𝛽 1 = 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

Taking expectations yields

E 𝛽 1 = E 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋1, … , 𝑋𝑛𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 0𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1

2/03&05/2014 Econ 141, Spring 2014 62

Even as the sample size 𝒏 is not large, on

average our estimate of the slope

coefficient, 𝜷 𝟏, will equal the true parameter

value 𝜷𝟏from the population regression line.

𝐄 𝜷 𝟏 = 𝜷𝟏. Thus, 𝜷 𝟏 unbiased.

Page 63: 2014 02 03 and 05 econ 141 uc berkeley

Unbiasedness of 𝛽 1

0 = 𝛽1 βˆ’ 𝛽 11

𝑛 𝑋𝑖 βˆ’ 𝑋 2

𝑛

𝑖=1

+1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒

𝑛

𝑖=1

.

Such that

𝛽 1 = 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

Taking expectations yields

E 𝛽 1 = E 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋1, … , 𝑋𝑛𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 +E

1𝑛

𝑋𝑖 βˆ’ 𝑋 E 𝑒𝑖 βˆ’ 𝑒 𝑋𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1 + E

1𝑛

𝑋𝑖 βˆ’ 𝑋 0𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

= 𝛽1

2/03&05/2014 Econ 141, Spring 2014 63

Note that the first condition that

E 𝑒𝑖 βˆ’ 𝑒 𝑋𝑖 = 0

is crucial for OLS to be unbiased.

If this condition is not true the average OLS

estimate will deviate from

the true parameter value.

Page 64: 2014 02 03 and 05 econ 141 uc berkeley

Asymptotic distribution of 𝛽 1

𝛽 1 βˆ’ 𝛽1 =

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖 βˆ’ 𝑒 𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

=

1𝑛

𝑋𝑖 βˆ’ 𝑋 𝑒𝑖𝑛𝑖=1

1𝑛

𝑋𝑖 βˆ’ 𝑋 2𝑛𝑖=1

See slide 48

3 steps to deriving asymptotic distribution

1. Apply CLT to numerator

2. Apply LLN to denominator

3. Combine using Slutsky’s theorem (S&W page 676)

2/03&05/2014 Econ 141, Spring 2014 64

Page 65: 2014 02 03 and 05 econ 141 uc berkeley

Apply CLT to 1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖

𝑛𝑖=1

β€’ Define random variable

𝑣𝑖 = 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖

β€’ Condition 1 from slide 47: E 𝑣𝑖 = 0

β€’ Conditions 2&3 from slide 50 imply that

– var 𝑣𝑖 exists and is finite.

– CLT applies to sample mean of 𝑣𝑖.

𝑣 =1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖

𝑛

𝑖=1

β€’ Note that: 𝛽 1 βˆ’ 𝛽1 = 𝑣 𝑠𝑋2

2/03&05/2014 Econ 141, Spring 2014 65

Page 66: 2014 02 03 and 05 econ 141 uc berkeley

Apply CLT to 1

𝑛 𝑋𝑖 βˆ’ 𝑋 𝑒𝑖

𝑛𝑖=1

Apply the Central Limit Theorem

𝑍 =𝑣 𝑖 βˆ’ E 𝑣𝑖

var 𝑣𝑖 𝑛 =

𝑣 𝑖

var 𝑣𝑖 𝑛 𝑑 𝑁 0,1

Such that

𝑣 𝑖𝑑 𝑁 0, var 𝑣𝑖 𝑛

Thus, the numerator of

𝛽 1 βˆ’ 𝛽1 = 𝑣 𝑠𝑋2

has an asymptotic distribution that is normal

with a mean equal to zero. 2/03&05/2014 Econ 141, Spring 2014 66

Page 67: 2014 02 03 and 05 econ 141 uc berkeley

Apply LLN to 𝑠𝑋2

β€’ Conditions 2&3 from slide 50 imply that we

can apply the Law of Large Numbers to 𝑠𝑋2

and that 𝑠𝑋2 converges in probability to the

variance of 𝑋𝑖, i.e. to var 𝑋𝑖 .

β€’ Thus

𝑠𝑋2

𝑝 var 𝑋𝑖

β€’ Now that we know asymptotic behavior of

numerator and denominator of 𝛽 1 βˆ’ 𝛽1 = 𝑣 𝑠𝑋2

the only thing left is to combine them.

2/03&05/2014 Econ 141, Spring 2014 67

Page 68: 2014 02 03 and 05 econ 141 uc berkeley

Apply Slutsky’s theorem (S&W page 676)

Slutsky’s theorem implies that we can combine the

asymptotic properties of

𝑣 𝑖𝑑 𝑁 0, var 𝑣𝑖 𝑛

and

𝑠𝑋2

𝑝 var 𝑋𝑖

such that

𝛽 1 βˆ’ 𝛽1 = 𝑣 𝑠𝑋2

𝑑 𝑁 0,

var 𝑣𝑖 𝑛

var 𝑋𝑖

and thus

𝛽 1 = 𝛽1 + 𝑣 𝑠𝑋2

𝑑 𝑁 𝛽1,

var 𝑣𝑖 𝑛

var 𝑋𝑖

2/03&05/2014 Econ 141, Spring 2014 68

Page 69: 2014 02 03 and 05 econ 141 uc berkeley

𝛽 1 has tractable asymptotic distribution

Thus as sample size 𝑛 get large then estimated

slope coefficient has approximately a normal

distribution, such that

𝛽 1~𝑁 πœ‡π›½ 1, πœŽπ›½ 1

where

πœ‡π›½ 1= 𝛽1

πœŽπ›½ 1

2 =var 𝑣𝑖 𝑛

var 𝑋𝑖

2

=1

𝑛

var 𝑣𝑖

var 𝑋𝑖2

2/03&05/2014 Econ 141, Spring 2014 69

Page 70: 2014 02 03 and 05 econ 141 uc berkeley

𝛽 1 has tractable asymptotic distribution

Thus as sample size 𝑛 get large then estimated

slope coefficient has approximately a normal

distribution, such that

𝛽 1~𝑁 πœ‡π›½ 1, πœŽπ›½ 1

where

πœ‡π›½ 1= 𝛽1

πœŽπ›½ 1

2 =var 𝑣𝑖 𝑛

var 𝑋𝑖

2

=1

𝑛

var 𝑣𝑖

var 𝑋𝑖2

2/03&05/2014 Econ 141, Spring 2014 70

Remember:

𝐄 𝜷 𝟏 = 𝜷𝟏. Thus, 𝜷 𝟏 unbiased.

Page 71: 2014 02 03 and 05 econ 141 uc berkeley

𝛽 1 has tractable asymptotic distribution

Thus as sample size 𝑛 get large then estimated

slope coefficient has approximately a normal

distribution, such that

𝛽 1~𝑁 πœ‡π›½ 1, πœŽπ›½ 1

where

πœ‡π›½ 1= 𝛽1

πœŽπ›½ 1

2 =var 𝑣𝑖 𝑛

var 𝑋𝑖

2

=1

𝑛

var 𝑣𝑖

var 𝑋𝑖2

2/03&05/2014 Econ 141, Spring 2014 71

Remember:

πœŽπ›½ 1

2 𝟎 as 𝒏 ∞. Thus, 𝜷 𝟏 consistent.

Page 72: 2014 02 03 and 05 econ 141 uc berkeley

𝛽 1 has tractable asymptotic distribution

Thus as sample size 𝑛 get large then estimated

slope coefficient has approximately a normal

distribution, such that

𝛽 1~𝑁 πœ‡π›½ 1, πœŽπ›½ 1

where

πœ‡π›½ 1= 𝛽1

πœŽπ›½ 1

2 =var 𝑣𝑖 𝑛

var 𝑋𝑖

2

=1

𝑛

var 𝑣𝑖

var 𝑋𝑖2

2/03&05/2014 Econ 141, Spring 2014 72

This term is like a noise to signal ratio.

The numerator is related to variance of the

residual

The denominator is related to the variance of the

explanatory variable

The large the variation in the explanatory variable

relative to the residual the more accurate the

OLS estimate.

Page 73: 2014 02 03 and 05 econ 141 uc berkeley

Virtues of asymptotic normality

β€’ Asymptotic normality of OLS coefficients

allows us to

– Do hypothesis tests

– Calculate confidence intervals

β€’ Simple generalizations of same techniques

applied to the population mean.

β€’ Chapter 5! Next week.

2/03&05/2014 Econ 141, Spring 2014 73

Page 74: 2014 02 03 and 05 econ 141 uc berkeley

Summary

β€’ Linear regression

model

β€’ OLS estimators

β€’ 𝑅2 and 𝑆𝐸𝑅

β€’ OLS conditions

β€’ Consistency of 𝛽 1

β€’ Unbiasedness of 𝛽 1

β€’ Asymptotic distribution

of 𝛽 1

β€’ Even-numbered

problems:

4.2, 4.4, 4.6, 4.10,

4.12, 4.14

β€’ Study STATA tutorial

β€’ Empirical exercises

next week.

2/03&05/2014 Econ 141, Spring 2014 74