Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan...
-
Upload
reginald-sims -
Category
Documents
-
view
233 -
download
0
Transcript of Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan...
![Page 1: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/1.jpg)
Analysis of Cross Section and Panel Data
Yan ZhangSchool of Economics, Fudan University
CCER, Fudan University
![Page 2: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/2.jpg)
Introductory EconometricsA Modern Approach
Yan ZhangSchool of Economics, Fudan University
CCER, Fudan University
![Page 3: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/3.jpg)
Analysis of Cross Section and Panel Data
Part 1. Regression Analysis on Cross Sectional Data
![Page 4: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/4.jpg)
Chap 2. The Simple Regression Model——Practice for learning multiple Regression
Bivariate linear regression model
:the slope parameter in the relationship between y and x holding the other factors in u fixed; it is of primary interest in applied economics.
:the intercept parameter, also has its uses, although it is rarely central to an analysis.
![Page 5: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/5.jpg)
More Discussion
:A one-unit change in x has the same effect on y, regardless of the initial value of x. Increasing returns: wage-education (f. form)
Can we draw ceteris paribus conclusions about how x affects y from a random sample of data, when we are ignoring all the other factors? Only if we make an assumption restricting how
the unobservable random variable u is related to the explanatory variable x
![Page 6: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/6.jpg)
Classical Regression Assumptions
Feasible assumption if the intercept term is included
Linearly uncorrelated zero conditional expectation
Meaning = 内生性 PRF (Population Regression Function): sth. fixed but unknown
![Page 7: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/7.jpg)
OLS
Minimize uu sample regressi
on function (SRF)
The point is always on the OLS regression line.
拟合值与残差
PRF:
![Page 8: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/8.jpg)
OLS
Coefficient of determination
the fraction of the sample variation in y that is explained by x.
the square of the sample correlation coefficient between and
Low R-squareds
![Page 9: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/9.jpg)
Units of Measurement
If one of the dependent variables is multiplied by the constant c—which means each value in the sample is multiplied by c—then the OLS intercept and slope estimates are also multiplied by c.
If one of the independent variables is divided or multiplied by some nonzero constant, c, then its OLS slope coefficient is also multiplied or divided by c respectively.
The goodness-of-fit of the model, R-squareds, should not depend on the units of measurement of our variables.
![Page 10: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/10.jpg)
Function Form
Linear Nonlinear Logarithmic dependent variable
A Percentage change in y, semi-elasticity an increasing return to edu. Other nonlinearity: diploma effect
Bi-Logarithmic A a Constant elasticity
Change of units of measurement P45, error: b0* = b0+log(c1)-b1·log(c2)
Bi-LogarithmicAaConstant elasticity
Change of units of measurement
P45, error
b0* = b0+log(c1)-b1·log(c
2)Be proficient at interpreting the coef.
![Page 11: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/11.jpg)
Unbiasedness of OLS Estimators
Statistical properties of OLS 从总体中随机抽样取出的不同样本的 OLS 估计 的分
布性质 Assumptions
Linear in parameters (f. form; advanced methods) Random sampling (time series data; nonrandom sampling) Zero conditional mean (unbiased biased; spurious cor) Sample Variation in the independent variables (colinearity)
Theorem (Unbiasedness) Under the four assumptions above, we have:
![Page 12: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/12.jpg)
Variance of OLS Estimators
的随机抽样以 为中心,问题是 究竟距离 多远?
Assumptions Homoskedasticity:
Error variance A larger means that the distribution of the unobser
vables affecting y is more spread out. Theorem (Sampling variance of OLS estimators)
Under the five assumptions above:
![Page 13: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/13.jpg)
Variance of y given x
Conditional mean and variance of y: Heteroskedasticity
![Page 14: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/14.jpg)
What does depend on?
More variation in the unobservables affecting y makes it more difficult to precisely estimate
The more spread out is the sample of xi -s, the easier it is to find the relationship between
E(y x) and x As the sample size increases, so does the total
variation in the xi. Therefore, a larger sample size results in a smaller variance of the estimator
![Page 15: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/15.jpg)
Estimating Error Variance
Errors (Disturbances) and Residuals Errors: , population Residuals: , estimated f.
Theorem (The unbiased estimator of ) Under the five assumptions above, we have:
standard error of the regression (SER): Estimating the standard deviation in y after the effect of x
has been taken out. Standard Error of :
![Page 16: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/16.jpg)
Regression through the Origin
Regression through the Origin: Pass through E.g. income tax revenue —— income The estimator of OLS:
= only if 0 if the intercept 0, then is a biased
estimator of
![Page 17: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/17.jpg)
Chap 3. Multiple Regression Analysis : Estimation
Advantages of multiple regression analysis build better models for predicting the dependent variable.
E.g. generalize functional form.
Marginal propensity to consume Be more amenable to ceteris paribus analysis
Chap 3.2 Key assumption: Implication: other factors affecting wage are not relate
d on average to educ and exper. Multiple linear regression model:
:the ceteris paribus effect of xj on y
![Page 18: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/18.jpg)
Ordinary Least Square Estimator
SPF: OLS: Minimize
F.O.C: ceteris paribus interpretations:
Holding fixed, then Thus, we have controlled for the variables
when estimating the effect of x1 on y.
![Page 19: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/19.jpg)
Holding Other Factors Fixed
The power of multiple regression analysis is that it provides this ceteris paribus interpretation even though the data have not been collected in a ceteris paribus fashion.
it allows us to do in non-experimental environments what natural scientists are able to do in a controlled laboratory setting: keep other factors fixed.
![Page 20: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/20.jpg)
OLS and Ceteris Paribus Effects
Step of OLS: (1) :the OLS residuals from a multiple regression
of x1 on
(2) :the OLS estimator from a simple regression
of y on measures the effect of x1 on y after x2,…, xk have b
een partialled or netted out. Two special cases in which the simple regression of y
on x1 will produce the same OLS estimate on x1 as the regression of y on x1 and x2.
![Page 21: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/21.jpg)
Goodness-of-fit
also equal the squared correlation coef. between the actual and the fitted values of y.
R never decreases, and it usually increases when another independent variable is added to a regression.
The factor that should determine whether an explanatory variable belongs in a model is whether the explanatory variable has a nonzero partial effect on y in the population.
![Page 22: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/22.jpg)
Regression through the origin
the properties of OLS derived earlier no longer hold for regression through the origin. the OLS residuals no longer have a zero sample
average. can actually be negative.
to calculate it as the squared correlation coefficient
if the intercept in the population model is different from zero, then the OLS estimators of the slope parameters will be biased.
![Page 23: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/23.jpg)
The Expectation of OLS Estimator
Assumptions( 简单回归模型假定的直接推广;比较 ) Linear in parameters Random sampling Zero conditional mean No perfect co-linearity
none of the independent variables is constant; and there are no exact linear relationships among the in
dependent variables Theorem (Unbiasedness)
Under the four assumptions above, we have:
rank (X)=K
![Page 24: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/24.jpg)
Notice 1: Zero conditional mean
Exogenous Endogenous Misspecification of function form (Chap 9)
Omitting the quadratic term The level or log of variable
Omitting important factors that correlated with any independent v. 如果被遗漏的变量与解释变量相关,则零条件方差不
成立,回归结果有偏 Measurement Error (Chap 15, IV) Simultaneously determining one or more x-s with y (Chap
16, 联立方程组 )
![Page 25: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/25.jpg)
Omitted Variable Bias: The Simple Case
Problem : Excluding a relevant variable or Under-specifying the model (遗漏本来应该包括在总体(真实)模型中的变量)
Omitted Variable Bias (misspecification analysis) The true population model: The underspecified OLS line: The expectation of : The Omitted variable bias:
前面 3.2节中是 x1对 x2回归
![Page 26: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/26.jpg)
Omitted Variable Bias: Nonexistence Two cases where is unbiased:
The true population model:
is the sample covariance between x1 and x2 over the sample variance of x1
If , then 的无偏性与 x2 无关,估计时只需调整截距,将 x2 放入误差项不影响零条件均值假定
Summary of Omitted Variable Bias:
The expectation of : The Omitted variable bias:
![Page 27: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/27.jpg)
The Size of Omitted Variable Bias
Direction Size A small bias of either sign need not be a cause for
concern. Unknown Some idea
we usually have a pretty good idea about the direction of the partial effect of x2 on y, that is, the sign of
in many cases we can make an educated guess about whether x1 and x2 are positively or negatively correlated.
E.g. (Upward/downward Bias; biased toward zero)高估!
![Page 28: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/28.jpg)
Omitted Variable Bias: More General Cases
Suppose: x2 and x3 are uncorrelated, but that
x1 is correlated with x3. Both and will normally be biased. The o
nly exception to this is when x1 and x2 are also uncorrelated.
Difficult to obtain the direction of the bias in and
Approximation: if x1 and x2 are also uncor.
![Page 29: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/29.jpg)
Notice 2: No Perfect Collinearity
An assumption only about x-s, nothing about the relationship between u and x-s
Assumption MLR.4 does allow the independent variables to be correlated; they just cannot be perfectly correlated. Ceteris Paribus effect If we did not allow for any correlation among the indepen
dent variables, then multiple regression would not be very useful for econometric analysis.
Significance
![Page 30: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/30.jpg)
Cases of Perfect Collinearity
When can independent variables be perfectly collinear software—“singular” Nonlinear functions of the same variable is not an exact
linear f. Not to include the same explanatory variable measured in
different units in the same regression equation. More subtle ways
one independent variable can be expressed as an exact linear function of some or all of the other independent variables. Drop it
Key:
![Page 31: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/31.jpg)
Notice 3: Unbiase
the meaning of unbiasedness: an estimate cannot be unbiased: an estimate is a fixed num
ber, obtained from a particular sample, which usually is not equal to the population parameter.
When we say that OLS is unbiased under Assumptions MLR.1 through MLR.4, we mean that the procedure by which the OLS estimates are obtained is unbiased when we view the procedure as being applied across all possible random samples.
![Page 32: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/32.jpg)
Notice 4: Over-Specification
Inclusion of an irrelevant variable or over-specifying the model : does not affect the unbiasedness of the OLS
estimators.
including irrelevant variables can have und
esirable effects on the variances of the OLS
estimators.
![Page 33: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/33.jpg)
Variance of The OLS Estimators
Adding Assumptions Homoskedasticity:
Error variance A larger means that the distribution of the unobser
vables affecting y is more spread out. Gauss-Markov assumptions (for cross-sectional regr
ession): Assumption 1-5 Theorem (Sampling variance of OLS estimators)
Under the five assumptions above:
![Page 34: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/34.jpg)
More about
The stastical properties of y on x=(x1, x2, …, xk)
Error variance only one way to reduce the error variance: to add more ex
planatory variables——not always possible and desirable The total sample variations in xj: SSTj
Increase the sample size
![Page 35: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/35.jpg)
Multi-collinearity (多重共线性) The linear relationships among the independent v.
其他解释变量对 xj 的拟合优度(含截距项) If k=2 : : the proportion of the total variation in xj that
can be explained by the other independent variables : : : High (but not perfect) correlation between two o
r more of the in dependent variables is called multicollinearity.
![Page 36: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/36.jpg)
Micro-numerosity: problem of small sample size
High Low SSTj one thing is clear: everything else being equal,
for estimating j, it is better to have less correlation between xj and the other x-s.
How to “solve” the multicollinearity? Increase sample size Dropping some v.? 如果删除了总体模型中
的一个变量,则会导致有偏
![Page 37: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/37.jpg)
Notice: The influence of multicollinearity
A high degree of correlation between certain independent variables can be irrelevant as to how well we can estimate other parameters in the model.
E.g.
参见注释
Importance for economists : controlling v.
![Page 38: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/38.jpg)
Variances in Misspecified Models
![Page 39: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/39.jpg)
Whether or Not to Include x2: Two Favorable Reasons
The choice of whether or not to include a particular variable in a regression model can be made by analyzing the tradeoff between bias and variance..
However, when2 0, there are two favorable reasons for including x2 in the model. any bias in does not shrink as the sample size grows; The variance of estimators both shrink to zero as n increas
e
Therefor, the multicollinearity induced by adding x2 beco
mes less important as the sample size grows. In large samples, we would prefer
![Page 40: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/40.jpg)
Estimating : Standard Errors of the OLS Estimators
参见注释
![Page 41: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/41.jpg)
EFFICIENCY OF OLS: THE GAUSS-MARKOV THEOREM
BLUE “Best”: smallest variance
“linear”:
“unbiased”:
定理含义:( 1)无需寻找其他线性组合的无偏估计量;( 2)如果 G-M假设有一个不成立,则 BLUE不成立。例如零条件均值不成立(内生性)会导致有偏;异方差不会有偏,但会使方差不再是最小。
![Page 42: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/42.jpg)
Classical Linear Model Assumptions——Inference
![Page 43: Analysis of Cross Section and Panel Data Yan Zhang School of Economics, Fudan University CCER, Fudan University.](https://reader035.fdocuments.in/reader035/viewer/2022062221/56649e015503460f94aebe25/html5/thumbnails/43.jpg)
本部分课程内容参考资料
Jeffrey M. Wooldridge, Introductory Econometrics——A Modern Approach, Chap 2-3.