Economics 105: Statistics

Post on 06-Jan-2016

29 views 0 download

description

Economics 105: Statistics. Go over GH 21 GH 22 due Tuesday. Multiple Regression. Assumption (7) No perfect multicollinearity no X is an exact linear function of other X ’ s Venn diagram Other implicit assumptions - PowerPoint PPT Presentation

Transcript of Economics 105: Statistics

Economics 105: Statistics• Go over GH 21• GH 22 due Tuesday

Multiple Regression• Assumption (7) No perfect multicollinearity

– no X is an exact linear function of other X’s– Venn diagram

• Other implicit assumptions – data are a random sample of n observations from proper population– n > K -- in fact, good to have n>>K, *much* bigger– the little xij’s are fixed numbers (the same in repeated samples) or they are realizations of random variables, Xij, that are independent of error term & then inference is done CONDITIONAL on observed values of xij’s

Multiple Regression• Interpretation of multiple regression coefficients

–for one unit change in Xi …specify the units

– average change in Y– ceteris paribus– Venn diagram

Hypothesis Testing of a Single Coefficient

• For this test

• the test statistic is

• The relationship between the outcome and the explanatory variable may not be linear

• Make the scatterplot to examine• Example: Quadratic model

• Example: Log transformations

• Log always means natural log (ln) in economics

Nonlinear Relationships

Linear fit does not give random residuals

Linear vs. Nonlinear Fit

Nonlinear fit gives random residuals

X

residuals

X

Y

X

residuals

Y

X

Quadratic Regression Model

Source: http://marginalrevolution.com/marginalrevolution/2012/04/new-cities.html

Quadratic Regression Model

Source: http://marginalrevolution.com/marginalrevolution/2012/04/new-cities.html

Example

Testing the Overall Model

• The “whole model” F-testH0: β1 = β2 = β3 = … = β15 = 0

H1: at least 1 βi ≠ 0

• F-test statistic =

• Estimate the model to obtain the sample regression equation:

Testing the Overall Model

p-value = 0 = 1-F.DIST(120.145,15,430-15-1,1)

Critical value = 2.082= F.INV(0.99,15,430-15-1)

Testing for Significance of just a Quadratic Term

• t-test

Example

• Consider a change in X1 of ΔX1

• X2 is held constant!• Average effect on Y is difference in pop reg models

• Estimate of this pop difference is

Average Effect on Y of a change in X in Nonlinear Models

Example

• What is the average effect of an increase in Age from 30 to 40 years? 40 to 50 years?• 2.03*(40-30) - .02*(1600 – 900) = 20.3 – 14 = 6.3• 2.03*(50-40) - .02*(2500 – 1600) = 20.3 – 18 = 2.3

• Units?!

http://xkcd.com/985/

Coefficient of Determination for Multiple Regression

• Reports the proportion of total variation in Y explained by all X variables taken together

• Consider this model

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Squar 0.44172

Standard Error 47.46341

Observations 15

ANOVA   df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333      

  CoefficientsStandard

Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

52.1% of the variation in pie sales is explained by the variation in price and advertising

Multiple Coefficient of Determination(continued)

Adjusted R2

• R2 never decreases when a new X variable is added to the model–disadvantage when comparing models

• What is the net effect of adding a new variable?–We lose a degree of freedom when a new X

variable is added–Did the new X variable add enough

explanatory power to offset the loss of one degree of freedom?

Adjusted R2

• Penalizes excessive use of unimportant variables• Smaller than R2 and can increase, decrease, or stay

same• Useful in comparing among models, but don’t rely

too heavily on it – use theory and statistical signif

(continued)

Regression Statistics

Multiple R 0.72213

R Square 0.52148

Adjusted R Squar 0.44172

Standard Error 47.46341

Observations 15

ANOVA   df SS MS F Significance F

Regression 2 29460.027 14730.013 6.53861 0.01201

Residual 12 27033.306 2252.776

Total 14 56493.333      

  CoefficientsStandard

Error t Stat P-value Lower 95% Upper 95%

Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404

Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392

Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888

44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables

(continued)Adjusted R2

Log Functional Forms• Linear-Log

• Log-linear

• Log-log

• Log of a variable means interpretation is a percentage change in the variable

• (don’t forget Mark’s pet peeve)

Log Functional Forms

• Here’s why: ln(x+x) – ln(x) =

calculus:

• Numerically: ln(1.01) = .00995 = .01

ln(1.10) = .0953 = .10 (sort of)

24

Linear-Log Functional Form

Linear-Log Functional Form

Log-Linear Functional Form

Log-Linear Functional Form

Log-Log Functional Form

Log-Log Functional Form

Examples

Examples

Examples

Examples