Economics 105: Statistics
description
Transcript of Economics 105: Statistics
Economics 105: Statistics• Go over GH 21• GH 22 due Tuesday
Multiple Regression• Assumption (7) No perfect multicollinearity
– no X is an exact linear function of other X’s– Venn diagram
• Other implicit assumptions – data are a random sample of n observations from proper population– n > K -- in fact, good to have n>>K, *much* bigger– the little xij’s are fixed numbers (the same in repeated samples) or they are realizations of random variables, Xij, that are independent of error term & then inference is done CONDITIONAL on observed values of xij’s
Multiple Regression• Interpretation of multiple regression coefficients
–for one unit change in Xi …specify the units
– average change in Y– ceteris paribus– Venn diagram
Hypothesis Testing of a Single Coefficient
• For this test
• the test statistic is
• The relationship between the outcome and the explanatory variable may not be linear
• Make the scatterplot to examine• Example: Quadratic model
• Example: Log transformations
• Log always means natural log (ln) in economics
Nonlinear Relationships
Linear fit does not give random residuals
Linear vs. Nonlinear Fit
Nonlinear fit gives random residuals
X
residuals
X
Y
X
residuals
Y
X
Quadratic Regression Model
Source: http://marginalrevolution.com/marginalrevolution/2012/04/new-cities.html
Quadratic Regression Model
Source: http://marginalrevolution.com/marginalrevolution/2012/04/new-cities.html
Example
Testing the Overall Model
• The “whole model” F-testH0: β1 = β2 = β3 = … = β15 = 0
H1: at least 1 βi ≠ 0
• F-test statistic =
• Estimate the model to obtain the sample regression equation:
Testing the Overall Model
p-value = 0 = 1-F.DIST(120.145,15,430-15-1,1)
Critical value = 2.082= F.INV(0.99,15,430-15-1)
Testing for Significance of just a Quadratic Term
• t-test
Example
• Consider a change in X1 of ΔX1
• X2 is held constant!• Average effect on Y is difference in pop reg models
• Estimate of this pop difference is
Average Effect on Y of a change in X in Nonlinear Models
Example
• What is the average effect of an increase in Age from 30 to 40 years? 40 to 50 years?• 2.03*(40-30) - .02*(1600 – 900) = 20.3 – 14 = 6.3• 2.03*(50-40) - .02*(2500 – 1600) = 20.3 – 18 = 2.3
• Units?!
http://xkcd.com/985/
Coefficient of Determination for Multiple Regression
• Reports the proportion of total variation in Y explained by all X variables taken together
• Consider this model
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Squar 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
CoefficientsStandard
Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
52.1% of the variation in pie sales is explained by the variation in price and advertising
Multiple Coefficient of Determination(continued)
Adjusted R2
• R2 never decreases when a new X variable is added to the model–disadvantage when comparing models
• What is the net effect of adding a new variable?–We lose a degree of freedom when a new X
variable is added–Did the new X variable add enough
explanatory power to offset the loss of one degree of freedom?
Adjusted R2
• Penalizes excessive use of unimportant variables• Smaller than R2 and can increase, decrease, or stay
same• Useful in comparing among models, but don’t rely
too heavily on it – use theory and statistical signif
(continued)
Regression Statistics
Multiple R 0.72213
R Square 0.52148
Adjusted R Squar 0.44172
Standard Error 47.46341
Observations 15
ANOVA df SS MS F Significance F
Regression 2 29460.027 14730.013 6.53861 0.01201
Residual 12 27033.306 2252.776
Total 14 56493.333
CoefficientsStandard
Error t Stat P-value Lower 95% Upper 95%
Intercept 306.52619 114.25389 2.68285 0.01993 57.58835 555.46404
Price -24.97509 10.83213 -2.30565 0.03979 -48.57626 -1.37392
Advertising 74.13096 25.96732 2.85478 0.01449 17.55303 130.70888
44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables
(continued)Adjusted R2
Log Functional Forms• Linear-Log
• Log-linear
• Log-log
• Log of a variable means interpretation is a percentage change in the variable
• (don’t forget Mark’s pet peeve)
Log Functional Forms
• Here’s why: ln(x+x) – ln(x) =
calculus:
• Numerically: ln(1.01) = .00995 = .01
ln(1.10) = .0953 = .10 (sort of)
24
Linear-Log Functional Form
Linear-Log Functional Form
Log-Linear Functional Form
Log-Linear Functional Form
Log-Log Functional Form
Log-Log Functional Form
Examples
Examples
Examples
Examples